Bug 11819

Summary: kernel BUG at ehci-mem.c:74
Product: Platform Specific/Hardware Reporter: Geoff Levand (geoffrey.levand)
Component: PS3Assignee: platform_ps3 (platform_ps3)
Status: CLOSED OBSOLETE    
Severity: normal CC: akpm, alan, dbrownell, Geert.Uytterhoeven, geoffrey.levand, mokuno, stern
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27 Subsystem:
Regression: No Bisected commit-id:

Description Geoff Levand 2008-10-24 09:17:36 UTC
Latest working kernel version:

Unknown.

Earliest failing kernel version:

Seen with linux-2.6.git on Sept 11, 2008

Distribution:

Fedora 9

Hardware Environment:

PS3

Software Environment:

linux-2.6.git built with ps3_defconfig.

Problem Description:

On system reboot will very rarely hit the BUG() statement in the EHCI qh_destroy().

Here is the failing code:

	/* clean qtds first, and know this is not linked */
	if (!list_empty (&qh->qtd_list) || qh->qh_next.ptr) {
		ehci_dbg (ehci, "unused qh not empty!\n");
		BUG ();
	}

kernel BUG at drivers/usb/host/ehci-mem.c:74!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2 PS3
Modules linked in: snd_ps3 snd_pcm snd_page_alloc snd_timer ohci_hcd snd ehci_hcd sg ps3_lpm soundcore usbcore
NIP: d000000000488690 LR: d00000000048ba60 CTR: c000000000330e20
REGS: c000000006ccf530 TRAP: 0700   Not tainted  (2.6.27-ps3-linux-dev-42899-gc879c74-dirty)
MSR: 8000000000028032 <EE,IR,DR>  CR: 24008428  XER: 00000000
TASK = c000000006b46cc0[838] 'reboot' THREAD: c000000006ccc000 CPU: 0
<6>GPR00: c000000006c8d140 c000000006ccf7b0 d00000000049b470 c000000006c8d000
<6>GPR04: 0000000000000000 d00000000048e724 0000000000000001 0000000000000001
<6>GPR08: 0000000000000830 c000000006c8d058 0000000000000000 c000000000bb18dc
<6>GPR12: d00000000048f0f8 c0000000005a1300 0000000000000000 00000000100d4dbc
<6>GPR16: 00000000100ac3cc 00000000100d64dc 00000000100d4d70 00000000100f2dd8
<6>GPR20: 00000000100d4edc 0000000010108ef8 0000000000000000 000000001006b280
<6>GPR24: 0000000000000000 00000000100f4de8 0000000000000000 0000000000000031
<6>GPR28: c000000006aa7d90 c000000006aa7d70 d00000000049ad80 c000000006c8d000
NIP [d000000000488690] .qh_destroy+0x38/0x88 [ehci_hcd]
LR [d00000000048ba60] .ehci_mem_cleanup+0x40/0x130 [ehci_hcd]
Call Trace:
[c000000006ccf7b0] [c00000000007bf4c] .trace_hardirqs_on_caller+0x140/0x190 (unreliable)
[c000000006ccf840] [d00000000048ba60] .ehci_mem_cleanup+0x40/0x130 [ehci_hcd]
[c000000006ccf8d0] [d00000000048e730] .ehci_stop+0xc4/0x11c [ehci_hcd]
[c000000006ccf960] [d0000000002dfaac] .usb_remove_hcd+0xf4/0x168 [usbcore]
[c000000006ccf9f0] [d0000000004873ec] .ps3_ehci_remove+0x40/0xd8 [ehci_hcd]
[c000000006ccfa90] [c0000000000413b4] .ps3_system_bus_shutdown+0x78/0x98
[c000000006ccfb10] [c00000000020a6f8] .device_shutdown+0x80/0xec
[c000000006ccfb90] [c000000000063a54] .kernel_restart_prepare+0x38/0x58
[c000000006ccfc10] [c000000000063ae0] .kernel_restart+0x20/0x6c
[c000000006ccfc90] [c000000000065940] .sys_reboot+0x1cc/0x254
[c000000006ccfe30] [c0000000000074dc] syscall_exit+0x0/0x40
Instruction dump:
fbe10088 39230058 fba10078 7c7f1b78 f80100a0 e8030058 eba30078 7fa04800
409e0010 e8030050 2fa00000 419e000c <0fe00000> 48000000 e8830068 2fa40000
<4>---[ end trace 7bd2c1403dd01fb7 ]---

Steps to reproduce:

Hit on reboot.  Difficult to reproduce.

-Geoff
Comment 1 Andrew Morton 2008-10-27 17:30:44 UTC
hm.  Why is this considered a PS3 bug and not a core USB bug?
Comment 2 Anonymous Emailer 2008-10-27 17:49:51 UTC
Reply-To: david-b@pacbell.net

On Monday 27 October 2008, bugme-daemon@bugzilla.kernel.org wrote:
> hm. 
Comment 3 Geoff Levand 2008-10-30 11:33:00 UTC
I added a routine ps3_ehci_shutdown(), but I find that it fails in a different way (below).

  http://marc.info/?l=linux-usb&m=122539106902666

It seems the new failure is due to usb_stor_scan_thread() waking up after ps3_ehci_shutdown() has run.  usb_stor_scan_thread() tries to use the device, but ps3_ehci_shutdown() has already freed the system resources.

This system has a 'reboot' command in rc.sysinit to continually test system reboot.  I moved the reboot command to later in the startup to when usb_stor_scan_thread() is assured to have completed, and this problem is no longer seen.

The addition of ps3_ehci_shutdown() fixes a USB crash seen in Yellow Dog Linux 6.1:

  http://bugzilla.terrasoftsolutions.com/show_bug.cgi?id=317

ps3_ehci_shutdown:260: <-
usb_stor_scan_thread:918: woke
<6>ps3_hpte_insert:result=-5 va=adc7d4c2d02e0010 pa=400000160000 ix=7668 v=adc7d4c2d001 r=4000001601ac
<0>------------[ cut here ]------------
<2>kernel BUG at /home/geoff/projects/cell/ps3-linux-dev/arch/powerpc/platforms/ps3/htab.c:80!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2 PS3
Modules linked in: usb_storage snd_ps3 snd_pcm snd_page_alloc snd_timer ehci_hcd ohci_hcd ps3_lpm sg snd soundcore usbcore
NIP: c00000000003db0c LR: c00000000003db08 CTR: 0000000000000001
REGS: c00000001d047090 TRAP: 0700   Not tainted  (2.6.28-rc2-ps3-linux-dev-43160-gd1395e6-dirty)
MSR: 8000000000020032 <IR,DR>  CR: 22000022  XER: 20000000
TASK = c000000006742b40[862] 'usb-stor-scan' THREAD: c00000001d044000 CPU: 1
<6>GPR00: c00000000003db08 c00000001d047310 c00000000056f720 0000000000000067
<6>GPR04: 0000000000000001 c000000000051cc4 0000000000000000 0000000000000002
<6>GPR08: 0000000000000000 0000000000000000 c000000006742b40 c000000006742b40
<6>GPR12: 8000000000000032 c0000000005a1500 0000000000000000 0000000000000000
<6>GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
<6>GPR20: 0000000000000010 0000000000000000 0000000000000000 adc7d4c2d02e0010
<6>GPR24: 0000000000007668 00000000000001ac c0000000005a21f8 0000400000160000
<6>GPR28: 0000000000000000 00004000001601ac c000000000509c30 0000adc7d4c2d001
NIP [c00000000003db0c] .ps3_hpte_insert+0x144/0x1e8
LR [c00000000003db08] .ps3_hpte_insert+0x140/0x1e8
Call Trace:
[c00000001d047310] [c00000000003db08] .ps3_hpte_insert+0x140/0x1e8 (unreliable)
[c00000001d047420] [c000000000026e64] .htab_call_hpte_insert1+0x4/0x38
[c00000001d047520] [c000000000025704] .hash_page+0x310/0x33c
[c00000001d0475b0] [c0000000000051d8] .do_hash_page+0x58/0x90
--- Exception: 301 at .qh_link_async+0x84/0x168 [ehci_hcd]
    LR = .ehci_urb_enqueue+0x180/0xe6c [ehci_hcd]
[c00000001d0478a0] [c00000001d047940] 0xc00000001d047940 (unreliable)
[c00000001d047940] [d000000000632018] .ehci_urb_enqueue+0x180/0xe6c [ehci_hcd]
[c00000001d047a60] [d0000000002e445c] .usb_hcd_submit_urb+0xb84/0xca4 [usbcore]
[c00000001d047bb0] [d0000000002e4b18] .usb_submit_urb+0x230/0x2a0 [usbcore]
[c00000001d047c30] [d0000000009bdee4] .usb_stor_msg_common+0xbc/0x180 [usb_storage]
[c00000001d047d10] [d0000000009be048] .usb_stor_control_msg+0xa0/0xd0 [usb_storage]
[c00000001d047da0] [d0000000009be0c0] .usb_stor_Bulk_max_lun+0x48/0x70 [usb_storage]
[c00000001d047e30] [d0000000009bf6f8] .usb_stor_scan_thread+0x168/0x1a4 [usb_storage]
[c00000001d047f00] [c00000000006b8e8] .kthread+0x78/0xc4
[c00000001d047f90] [c00000000001fff4] .kernel_thread+0x54/0x70
Instruction dump:
7c6507b4 2fa50000 419e0030 e89e8088 7ee6bb78 7f67db78 e87e8080 7f08c378
7fe9fb78 7faaeb78 48014a31 60000000 <0fe00000> 48000000 e8810080 38a100a0
<4>---[ end trace ba74a4f46bc00325 ]---

-Geoff
Comment 4 Anonymous Emailer 2008-10-30 12:13:24 UTC
Reply-To: david-b@pacbell.net

On Thursday 30 October 2008, bugme-daemon@bugzilla.kernel.org wrote:
> It seems the new failure is due to usb_stor_scan_thread() waking up after
> ps3_ehci_shutdown() has run. 
Comment 5 Anonymous Emailer 2008-10-30 12:30:25 UTC
Reply-To: david-b@pacbell.net

On Thursday 30 October 2008, bugme-daemon@bugzilla.kernel.org wrote:
> [c00000001d0475b0] [c0000000000051d8] .do_hash_page+0x58/0x90
> --- Exception: 301 at .qh_link_async+0x84/0x168 [ehci_hcd]
> 
Comment 6 Alan Stern 2008-10-30 12:49:13 UTC
On Thu, 30 Oct 2008 bugme-daemon@bugzilla.kernel.org wrote:

> In this case, usb-storage seems NOT to have implemented
> appropriate shutdown() behavior.  The driver model will
> have called its shutdown() before its parents' shutdown
> methods -- i.e. before the EHCI shutdown -- and expects
> that shutdown to behave correctly.
> 
> (That is:  it's a different, valid, non-ehci bug, and
> deserving of a separate bug report.)

I'm not convinced that usb-storage needs to implement a shutdown 
procedure.  Shouldn't ehci-hcd behave sanely if it gets a request after 
a shutdown?  I'm pretty sure that uhci-hcd does.

Besides, usb-storage isn't alone.  I bet almost no USB drivers
implement shutdown (does usbtest?), especially seeing as how they don't
communicate directly with the hardware.  And isn't shutdown all about
stopping interactions with the hardware?

It has to be easier to fix ehci-hcd than to change every USB driver.

Alan Stern
Comment 7 Geoff Levand 2008-10-30 15:09:39 UTC
Seems like a reboot notifier in usb-storage is enough to kill the kthread running usb_stor_scan_thread.

-Geoff
Comment 8 Alan Stern 2009-03-09 12:29:31 UTC
Did this ever get resolved?  It looks as though all that was needed was to add the line:

	ehci_to_hcd(ehci)->state = HC_STATE_HALT;

to ehci_silence_controller().
Comment 9 Geoff Levand 2009-03-09 13:02:55 UTC
No, it has not been resolved.

I submitted a fix (link below), but on review it was found
to have (race) problems.  I have been working on higher
priority tasks since then.  I have some other PS3 USB
work to do, and plan to re-work this fix when I get to
that other USB work.

http://git.kernel.org/?p=linux/kernel/git/geoff/ps3-linux-patches.git;a=blob;f=ps3-wip/usb-fix-ps3-ehci-suspend.patch;hb=HEAD

-Geoff
Comment 10 Alan 2012-05-14 16:44:55 UTC
Marking as obsolete