Latest working kernel version: Unknown. Earliest failing kernel version: Seen with linux-2.6.git on Sept 11, 2008 Distribution: Fedora 9 Hardware Environment: PS3 Software Environment: linux-2.6.git built with ps3_defconfig. Problem Description: On system reboot will very rarely hit the BUG() statement in the EHCI qh_destroy(). Here is the failing code: /* clean qtds first, and know this is not linked */ if (!list_empty (&qh->qtd_list) || qh->qh_next.ptr) { ehci_dbg (ehci, "unused qh not empty!\n"); BUG (); } kernel BUG at drivers/usb/host/ehci-mem.c:74! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=2 PS3 Modules linked in: snd_ps3 snd_pcm snd_page_alloc snd_timer ohci_hcd snd ehci_hcd sg ps3_lpm soundcore usbcore NIP: d000000000488690 LR: d00000000048ba60 CTR: c000000000330e20 REGS: c000000006ccf530 TRAP: 0700 Not tainted (2.6.27-ps3-linux-dev-42899-gc879c74-dirty) MSR: 8000000000028032 <EE,IR,DR> CR: 24008428 XER: 00000000 TASK = c000000006b46cc0[838] 'reboot' THREAD: c000000006ccc000 CPU: 0 <6>GPR00: c000000006c8d140 c000000006ccf7b0 d00000000049b470 c000000006c8d000 <6>GPR04: 0000000000000000 d00000000048e724 0000000000000001 0000000000000001 <6>GPR08: 0000000000000830 c000000006c8d058 0000000000000000 c000000000bb18dc <6>GPR12: d00000000048f0f8 c0000000005a1300 0000000000000000 00000000100d4dbc <6>GPR16: 00000000100ac3cc 00000000100d64dc 00000000100d4d70 00000000100f2dd8 <6>GPR20: 00000000100d4edc 0000000010108ef8 0000000000000000 000000001006b280 <6>GPR24: 0000000000000000 00000000100f4de8 0000000000000000 0000000000000031 <6>GPR28: c000000006aa7d90 c000000006aa7d70 d00000000049ad80 c000000006c8d000 NIP [d000000000488690] .qh_destroy+0x38/0x88 [ehci_hcd] LR [d00000000048ba60] .ehci_mem_cleanup+0x40/0x130 [ehci_hcd] Call Trace: [c000000006ccf7b0] [c00000000007bf4c] .trace_hardirqs_on_caller+0x140/0x190 (unreliable) [c000000006ccf840] [d00000000048ba60] .ehci_mem_cleanup+0x40/0x130 [ehci_hcd] [c000000006ccf8d0] [d00000000048e730] .ehci_stop+0xc4/0x11c [ehci_hcd] [c000000006ccf960] [d0000000002dfaac] .usb_remove_hcd+0xf4/0x168 [usbcore] [c000000006ccf9f0] [d0000000004873ec] .ps3_ehci_remove+0x40/0xd8 [ehci_hcd] [c000000006ccfa90] [c0000000000413b4] .ps3_system_bus_shutdown+0x78/0x98 [c000000006ccfb10] [c00000000020a6f8] .device_shutdown+0x80/0xec [c000000006ccfb90] [c000000000063a54] .kernel_restart_prepare+0x38/0x58 [c000000006ccfc10] [c000000000063ae0] .kernel_restart+0x20/0x6c [c000000006ccfc90] [c000000000065940] .sys_reboot+0x1cc/0x254 [c000000006ccfe30] [c0000000000074dc] syscall_exit+0x0/0x40 Instruction dump: fbe10088 39230058 fba10078 7c7f1b78 f80100a0 e8030058 eba30078 7fa04800 409e0010 e8030050 2fa00000 419e000c <0fe00000> 48000000 e8830068 2fa40000 <4>---[ end trace 7bd2c1403dd01fb7 ]--- Steps to reproduce: Hit on reboot. Difficult to reproduce. -Geoff
hm. Why is this considered a PS3 bug and not a core USB bug?
Reply-To: david-b@pacbell.net On Monday 27 October 2008, bugme-daemon@bugzilla.kernel.org wrote: > hm.
I added a routine ps3_ehci_shutdown(), but I find that it fails in a different way (below). http://marc.info/?l=linux-usb&m=122539106902666 It seems the new failure is due to usb_stor_scan_thread() waking up after ps3_ehci_shutdown() has run. usb_stor_scan_thread() tries to use the device, but ps3_ehci_shutdown() has already freed the system resources. This system has a 'reboot' command in rc.sysinit to continually test system reboot. I moved the reboot command to later in the startup to when usb_stor_scan_thread() is assured to have completed, and this problem is no longer seen. The addition of ps3_ehci_shutdown() fixes a USB crash seen in Yellow Dog Linux 6.1: http://bugzilla.terrasoftsolutions.com/show_bug.cgi?id=317 ps3_ehci_shutdown:260: <- usb_stor_scan_thread:918: woke <6>ps3_hpte_insert:result=-5 va=adc7d4c2d02e0010 pa=400000160000 ix=7668 v=adc7d4c2d001 r=4000001601ac <0>------------[ cut here ]------------ <2>kernel BUG at /home/geoff/projects/cell/ps3-linux-dev/arch/powerpc/platforms/ps3/htab.c:80! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=2 PS3 Modules linked in: usb_storage snd_ps3 snd_pcm snd_page_alloc snd_timer ehci_hcd ohci_hcd ps3_lpm sg snd soundcore usbcore NIP: c00000000003db0c LR: c00000000003db08 CTR: 0000000000000001 REGS: c00000001d047090 TRAP: 0700 Not tainted (2.6.28-rc2-ps3-linux-dev-43160-gd1395e6-dirty) MSR: 8000000000020032 <IR,DR> CR: 22000022 XER: 20000000 TASK = c000000006742b40[862] 'usb-stor-scan' THREAD: c00000001d044000 CPU: 1 <6>GPR00: c00000000003db08 c00000001d047310 c00000000056f720 0000000000000067 <6>GPR04: 0000000000000001 c000000000051cc4 0000000000000000 0000000000000002 <6>GPR08: 0000000000000000 0000000000000000 c000000006742b40 c000000006742b40 <6>GPR12: 8000000000000032 c0000000005a1500 0000000000000000 0000000000000000 <6>GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 <6>GPR20: 0000000000000010 0000000000000000 0000000000000000 adc7d4c2d02e0010 <6>GPR24: 0000000000007668 00000000000001ac c0000000005a21f8 0000400000160000 <6>GPR28: 0000000000000000 00004000001601ac c000000000509c30 0000adc7d4c2d001 NIP [c00000000003db0c] .ps3_hpte_insert+0x144/0x1e8 LR [c00000000003db08] .ps3_hpte_insert+0x140/0x1e8 Call Trace: [c00000001d047310] [c00000000003db08] .ps3_hpte_insert+0x140/0x1e8 (unreliable) [c00000001d047420] [c000000000026e64] .htab_call_hpte_insert1+0x4/0x38 [c00000001d047520] [c000000000025704] .hash_page+0x310/0x33c [c00000001d0475b0] [c0000000000051d8] .do_hash_page+0x58/0x90 --- Exception: 301 at .qh_link_async+0x84/0x168 [ehci_hcd] LR = .ehci_urb_enqueue+0x180/0xe6c [ehci_hcd] [c00000001d0478a0] [c00000001d047940] 0xc00000001d047940 (unreliable) [c00000001d047940] [d000000000632018] .ehci_urb_enqueue+0x180/0xe6c [ehci_hcd] [c00000001d047a60] [d0000000002e445c] .usb_hcd_submit_urb+0xb84/0xca4 [usbcore] [c00000001d047bb0] [d0000000002e4b18] .usb_submit_urb+0x230/0x2a0 [usbcore] [c00000001d047c30] [d0000000009bdee4] .usb_stor_msg_common+0xbc/0x180 [usb_storage] [c00000001d047d10] [d0000000009be048] .usb_stor_control_msg+0xa0/0xd0 [usb_storage] [c00000001d047da0] [d0000000009be0c0] .usb_stor_Bulk_max_lun+0x48/0x70 [usb_storage] [c00000001d047e30] [d0000000009bf6f8] .usb_stor_scan_thread+0x168/0x1a4 [usb_storage] [c00000001d047f00] [c00000000006b8e8] .kthread+0x78/0xc4 [c00000001d047f90] [c00000000001fff4] .kernel_thread+0x54/0x70 Instruction dump: 7c6507b4 2fa50000 419e0030 e89e8088 7ee6bb78 7f67db78 e87e8080 7f08c378 7fe9fb78 7faaeb78 48014a31 60000000 <0fe00000> 48000000 e8810080 38a100a0 <4>---[ end trace ba74a4f46bc00325 ]--- -Geoff
Reply-To: david-b@pacbell.net On Thursday 30 October 2008, bugme-daemon@bugzilla.kernel.org wrote: > It seems the new failure is due to usb_stor_scan_thread() waking up after > ps3_ehci_shutdown() has run.
Reply-To: david-b@pacbell.net On Thursday 30 October 2008, bugme-daemon@bugzilla.kernel.org wrote: > [c00000001d0475b0] [c0000000000051d8] .do_hash_page+0x58/0x90 > --- Exception: 301 at .qh_link_async+0x84/0x168 [ehci_hcd] >
On Thu, 30 Oct 2008 bugme-daemon@bugzilla.kernel.org wrote: > In this case, usb-storage seems NOT to have implemented > appropriate shutdown() behavior. The driver model will > have called its shutdown() before its parents' shutdown > methods -- i.e. before the EHCI shutdown -- and expects > that shutdown to behave correctly. > > (That is: it's a different, valid, non-ehci bug, and > deserving of a separate bug report.) I'm not convinced that usb-storage needs to implement a shutdown procedure. Shouldn't ehci-hcd behave sanely if it gets a request after a shutdown? I'm pretty sure that uhci-hcd does. Besides, usb-storage isn't alone. I bet almost no USB drivers implement shutdown (does usbtest?), especially seeing as how they don't communicate directly with the hardware. And isn't shutdown all about stopping interactions with the hardware? It has to be easier to fix ehci-hcd than to change every USB driver. Alan Stern
Seems like a reboot notifier in usb-storage is enough to kill the kthread running usb_stor_scan_thread. -Geoff
Did this ever get resolved? It looks as though all that was needed was to add the line: ehci_to_hcd(ehci)->state = HC_STATE_HALT; to ehci_silence_controller().
No, it has not been resolved. I submitted a fix (link below), but on review it was found to have (race) problems. I have been working on higher priority tasks since then. I have some other PS3 USB work to do, and plan to re-work this fix when I get to that other USB work. http://git.kernel.org/?p=linux/kernel/git/geoff/ps3-linux-patches.git;a=blob;f=ps3-wip/usb-fix-ps3-ehci-suspend.patch;hb=HEAD -Geoff
Marking as obsolete