Bug 6374

Summary: ehci via disconnect and usb hang
Product: Drivers Reporter: crazy
Component: USBAssignee: David Brownell (dbrownell)
Status: CLOSED OBSOLETE    
Severity: normal CC: alan, brebs, bunk, evi.linux.dev.kernel.org.bugzilla.ev.01, gentoo, greg, jan, newsgroup, protasnb, service, sputnik0891, stern, vermontkb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.25-rc3 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 5089, 5325    
Attachments: experimental 2.6.17-rc3 patch reworking ehci unlink
dmesg of my system
lspci
[ev] laptop - suspend2-sources-2.6.15-r8_04 - dmesg output
[ev] laptop - suspend2-sources-2.6.15-r8_04 - 01 - lspci -v output
[ev] laptop - suspend2-sources-2.6.15-r8_04 - 01 - cat /proc/cpuinfo output
[ev] laptop - suspend2-sources-2.6.15-r8_04 - kernel config
[ev] laptop - suspend2-sources-2.6.16-r4_01 - dmesg output
[ev] laptop - suspend2-sources-2.6.16-r4_01 - kernel config
[ev] laptop - vanilla-sources-2.6.17_rc3_01 - dmesg output
[ev] laptop - vanilla-sources-2.6.17_rc3_01 - kernel config
Complete 2.6.19.1 kern.log for my VIA EHCI disconnect

Description crazy 2006-04-10 17:17:18 UTC
Most recent kernel where this bug did not occur: ?
Distribution: Debian
Hardware Environment: Via KM400 chipset
Software Environment: 
Problem Description: Eventually any ehci device will disconnect and hang khubd
in D state, in my case a ZD1211 wireless ethernet controller and a Sandisk
memorystick reader, but only if ehci_hcd is loaded. With uhci_hcd only
everything works fine. It looks to be the same as
http://marc.theaimsgroup.com/?l=linux-ide&m=113653806322524&w=2 except it talks
about an IDE throughput problem, which I don't see.

The stack for khubd after hang looks like:

Apr 10 16:56:30 shows kernel: khubd         D 002BD2CA     0  1489      5      
   3529  1309 (L-TLB)
Apr 10 16:56:30 shows kernel: df386eb0 c0104281 df386ec0 002bd2ca 00000000
dbbbce00 003d14e3 dfc68178
Apr 10 16:56:30 shows kernel:        dfc68050 e0437900 003d14e3 00000000
00000000 00000282 df386ec0 0031def2
Apr 10 16:56:30 shows kernel:        00000010 dec874d0 c028b1ac df386ec0
0031def2 c032c3b8 c032c3b8 0031def2
Apr 10 16:56:30 shows kernel: Call Trace:
Apr 10 16:56:30 shows kernel:  [<c0104281>] do_IRQ+0x48/0x50
Apr 10 16:56:30 shows kernel:  [<c028b1ac>] schedule_timeout+0x76/0x95
Apr 10 16:56:30 shows kernel:  [<c011a87e>] process_timeout+0x0/0x9
Apr 10 16:56:30 shows kernel:  [<e13dd7f1>] ehci_endpoint_disable+0x9f/0x13f
[ehci_hcd]
Apr 10 16:56:30 shows kernel:  [<e1388008>] usb_disable_device+0x65/0x140 [usbcore]
Apr 10 16:56:30 shows kernel:  [<e13834da>] usb_disconnect+0x8e/0x115 [usbcore]
Apr 10 16:56:30 shows kernel:  [<e138494b>] hub_thread+0x535/0xcf7 [usbcore]
Apr 10 16:56:30 shows kernel:  [<c0122c54>] autoremove_wake_function+0x0/0x3a
Apr 10 16:56:30 shows kernel:  [<c0122be5>] kthread+0x80/0xc1
Apr 10 16:56:30 shows kernel:  [<e1384416>] hub_thread+0x0/0xcf7 [usbcore]
Apr 10 16:56:30 shows kernel:  [<c0122bf9>] kthread+0x94/0xc1
Apr 10 16:56:30 shows kernel:  [<c0122b65>] kthread+0x0/0xc1
Apr 10 16:56:30 shows kernel:  [<c0101005>] kernel_thread_helper+0x5/0xb

ehci device lspci -vvxxx:

0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) (prog-if 20
[EHCI])
        Subsystem: Giga-byte Technology GA-7VAX Mainboard
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin D routed to IRQ 3
        Region 0: Memory at ea010000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 04 31 13 00 10 02 82 20 03 0c 10 20 00 00
10: 00 00 01 ea 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 04 50
30: 00 00 00 00 80 00 00 00 00 00 00 00 03 04 00 00
40: 00 00 03 00 00 00 00 00 80 10 00 09 00 00 00 00
50: 00 5a 00 80 00 00 00 00 04 0b 88 88 53 88 00 00
60: 20 20 01 00 00 00 00 00 01 00 00 00 00 00 00 c0
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 00 c2 ff 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 87 00 00 00 00 00 00 00 00 00

kern.log just before hang:

Apr 10 15:53:04 shows kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 
evt 0002
Apr 10 15:53:04 shows kernel: ehci_hcd 0000:00:10.3: GetStatus port 1 status
00180b POWER sig=j PEC CSC CONNECT
Apr 10 15:53:04 shows kernel: hub 4-0:1.0: port 1, status 0501, change 0003, 480
Mb/s
Apr 10 15:53:04 shows kernel: usb 4-1: USB disconnect, address 2
Apr 10 15:53:04 shows kernel: usb 4-1: usb_disable_device nuking all URBs
Apr 10 15:53:04 shows kernel: ehci_hcd 0000:00:10.3: shutdown urb def44ec0 pipe
c0010280 ep2in-bulk
Apr 10 15:53:06 shows kernel: zd1211: failed reg_urb
Apr 10 15:53:06 shows kernel: zd1211:USB ST Code = -19


Steps to reproduce:
Load ehci_hcd, plug in a high speed device and wait.
Comment 1 Andrew Morton 2006-04-10 17:30:32 UTC

Begin forwarded message:

Date: Mon, 10 Apr 2006 17:17:22 -0700
From: bugme-daemon@bugzilla.kernel.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 6374] New: ehci via disconnect and usb hang


http://bugzilla.kernel.org/show_bug.cgi?id=6374

           Summary: ehci via disconnect and usb hang
    Kernel Version: 2.6.16.2
            Status: NEW
          Severity: normal
             Owner: greg@kroah.com
         Submitter: crazy@wi.rr.com


Most recent kernel where this bug did not occur: ?
Distribution: Debian
Hardware Environment: Via KM400 chipset
Software Environment: 
Problem Description: Eventually any ehci device will disconnect and hang khubd
in D state, in my case a ZD1211 wireless ethernet controller and a Sandisk
memorystick reader, but only if ehci_hcd is loaded. With uhci_hcd only
everything works fine. It looks to be the same as
http://marc.theaimsgroup.com/?l=linux-ide&m=113653806322524&w=2 except it talks
about an IDE throughput problem, which I don't see.

The stack for khubd after hang looks like:

Apr 10 16:56:30 shows kernel: khubd         D 002BD2CA     0  1489      5      
   3529  1309 (L-TLB)
Apr 10 16:56:30 shows kernel: df386eb0 c0104281 df386ec0 002bd2ca 00000000
dbbbce00 003d14e3 dfc68178
Apr 10 16:56:30 shows kernel:        dfc68050 e0437900 003d14e3 00000000
00000000 00000282 df386ec0 0031def2
Apr 10 16:56:30 shows kernel:        00000010 dec874d0 c028b1ac df386ec0
0031def2 c032c3b8 c032c3b8 0031def2
Apr 10 16:56:30 shows kernel: Call Trace:
Apr 10 16:56:30 shows kernel:  [<c0104281>] do_IRQ+0x48/0x50
Apr 10 16:56:30 shows kernel:  [<c028b1ac>] schedule_timeout+0x76/0x95
Apr 10 16:56:30 shows kernel:  [<c011a87e>] process_timeout+0x0/0x9
Apr 10 16:56:30 shows kernel:  [<e13dd7f1>] ehci_endpoint_disable+0x9f/0x13f
[ehci_hcd]
Apr 10 16:56:30 shows kernel:  [<e1388008>] usb_disable_device+0x65/0x140 [usbcore]
Apr 10 16:56:30 shows kernel:  [<e13834da>] usb_disconnect+0x8e/0x115 [usbcore]
Apr 10 16:56:30 shows kernel:  [<e138494b>] hub_thread+0x535/0xcf7 [usbcore]
Apr 10 16:56:30 shows kernel:  [<c0122c54>] autoremove_wake_function+0x0/0x3a
Apr 10 16:56:30 shows kernel:  [<c0122be5>] kthread+0x80/0xc1
Apr 10 16:56:30 shows kernel:  [<e1384416>] hub_thread+0x0/0xcf7 [usbcore]
Apr 10 16:56:30 shows kernel:  [<c0122bf9>] kthread+0x94/0xc1
Apr 10 16:56:30 shows kernel:  [<c0122b65>] kthread+0x0/0xc1
Apr 10 16:56:30 shows kernel:  [<c0101005>] kernel_thread_helper+0x5/0xb

ehci device lspci -vvxxx:

0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) (prog-if 20
[EHCI])
        Subsystem: Giga-byte Technology GA-7VAX Mainboard
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin D routed to IRQ 3
        Region 0: Memory at ea010000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 06 11 04 31 13 00 10 02 82 20 03 0c 10 20 00 00
10: 00 00 01 ea 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 04 50
30: 00 00 00 00 80 00 00 00 00 00 00 00 03 04 00 00
40: 00 00 03 00 00 00 00 00 80 10 00 09 00 00 00 00
50: 00 5a 00 80 00 00 00 00 04 0b 88 88 53 88 00 00
60: 20 20 01 00 00 00 00 00 01 00 00 00 00 00 00 c0
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 00 c2 ff 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 87 00 00 00 00 00 00 00 00 00

kern.log just before hang:

Apr 10 15:53:04 shows kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 
evt 0002
Apr 10 15:53:04 shows kernel: ehci_hcd 0000:00:10.3: GetStatus port 1 status
00180b POWER sig=j PEC CSC CONNECT
Apr 10 15:53:04 shows kernel: hub 4-0:1.0: port 1, status 0501, change 0003, 480
Mb/s
Apr 10 15:53:04 shows kernel: usb 4-1: USB disconnect, address 2
Apr 10 15:53:04 shows kernel: usb 4-1: usb_disable_device nuking all URBs
Apr 10 15:53:04 shows kernel: ehci_hcd 0000:00:10.3: shutdown urb def44ec0 pipe
c0010280 ep2in-bulk
Apr 10 15:53:06 shows kernel: zd1211: failed reg_urb
Apr 10 15:53:06 shows kernel: zd1211:USB ST Code = -19


Steps to reproduce:
Load ehci_hcd, plug in a high speed device and wait.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Comment 2 Alan Stern 2006-04-11 07:41:06 UTC
Some code was added to the ehci-hcd driver to try and help with with problem. 
Can you try using the current 2.6.17-rc kernel to see if the same thing still
happens?
Comment 3 crazy 2006-04-11 16:11:43 UTC
With 2.6.17-rc1 no change.
Comment 4 David Brownell 2006-05-03 15:22:24 UTC
Created attachment 8020 [details]
experimental 2.6.17-rc3 patch reworking ehci unlink

This may help.	There's some goofy code that could easily hide bugs,
and the patch takes a different -- and, I hope, more obviously
correct -- approach to the "wait for unlink to complete" problem.
Comment 5 crazy 2006-05-04 14:00:00 UTC
Applied patch to 2.6.17-rc3 and there's no change.
Comment 6 Adrian Bunk 2006-05-07 12:11:53 UTC
Since this issue involves the external (although GPL'ed) zd1211 module:

Is it really an ehci_hcd issue, or could there be a bug in the zd1211 driver
causing this issue?
Comment 7 Alan Stern 2006-05-08 07:01:29 UTC
This is definitely a problem with ehci-hcd and the EHCI hardware.  Other people
have seen the same thing, and in at least one case it was shown (by printk) that
a routine in ehci-hcd was starting but never exiting.
Comment 8 Miroslaw Mieszczak 2006-05-09 02:10:02 UTC
I have similar problem with EHCI on SMP.
After some time of run of the system, I notice that interrupt counter of USB
devices stops (in /proc/interrupts). The first visible thing is that USB moust
stop run smouthly, it start jumping in about 1 second cycles, and in this time
interrupt counters of USB don't increment.
After stop of interrupts, system don't detect new USB devices.
If I removed ehci_hcd module in this state, and loaded it again I received
something like that in log:

May  9 10:51:22 [kernel] ehci_hcd 0000:00:10.4: remove, state 1
May  9 10:51:22 [kernel] usb usb5: USB disconnect, address 1
May  9 10:51:22 [kernel] ehci_hcd 0000:00:10.4: USB bus 5 deregistered
May  9 10:51:22 [kernel] ACPI: PCI interrupt for device 0000:00:10.4 disabled
May  9 10:56:05 [kernel] ACPI: PCI Interrupt 0000:00:10.4[C] -> Link [ALKD] ->
GSI 21 (level, low) -> IRQ 58
May  9 10:56:05 [kernel] ehci_hcd 0000:00:10.4: EHCI Host Controller
May  9 10:56:05 [kernel] ehci_hcd 0000:00:10.4: new USB bus registered, assigned
bus number 5
May  9 10:56:05 [kernel] ehci_hcd 0000:00:10.4: irq 58, io mem 0xd2206c00
May  9 10:56:05 [kernel] ehci_hcd 0000:00:10.4: USB 2.0 started, EHCI 1.00,
driver 10 Dec 2004
May  9 10:56:05 [kernel] usb usb5: configuration #1 chosen from 1 choice
May  9 10:56:05 [kernel] hub 5-0:1.0: USB hub found
May  9 10:56:05 [kernel] hub 5-0:1.0: 8 ports detected
May  9 10:56:05 [kernel] usb 5-7: new high speed USB device using ehci_hcd and
address 3
May  9 10:56:06 [kernel] ehci_hcd 0000:00:10.4: Unlink after no-IRQ?  Controller
is probably using the wrong IRQ.
May  9 10:56:17 [kernel] usb 5-7: device not accepting address 3, error -110
May  9 10:56:17 [kernel] usb 5-7: new high speed USB device using ehci_hcd and
address 4
May  9 10:56:28 [kernel] usb 5-7: device not accepting address 4, error -110
May  9 10:56:29 [kernel] usb 5-7: new high speed USB device using ehci_hcd and
address 5
May  9 10:56:39 [kernel] usb 5-7: device not accepting address 5, error -110
May  9 10:56:39 [kernel] usb 5-7: new high speed USB device using ehci_hcd and
address 6
May  9 10:56:50 [kernel] usb 5-7: device not accepting address 6, error -110
May  9 10:56:50 [kernel] usb 2-1: USB disconnect, address 3
May  9 10:56:50 [kernel] usb 2-1: new low speed USB device using uhci_hcd and
address 4
May  9 10:56:53 [kernel] usb 2-1: configuration #1 chosen from 1 choice
May  9 10:56:55 [kernel] input: Microsoft Microsoft Wheel Mouse Optical as
/class/input/input3
May  9 10:56:55 [kernel] input: USB HID v1.00 Mouse [Microsoft Microsoft Wheel
Mouse Optical] on usb-0000:00:10.1-1



In that case USB mouse was in system all the time. When problem occured, I
plugged USB stick, then unloaded ehci module, and loaded it again.


When the problem occur, there is nothing in the log and dmesg.
Comment 9 Miroslaw Mieszczak 2006-05-09 02:10:45 UTC
Created attachment 8067 [details]
dmesg of my system
Comment 10 Miroslaw Mieszczak 2006-05-09 02:11:26 UTC
Created attachment 8068 [details]
lspci
Comment 11 Ezequiel Valenzuela 2006-05-10 03:51:36 UTC
I'm also experiencing random/unexpected USB disconnects, even with different
devices. I've seen this behaviour using two similar devices: a Linksys wusb54g
and a Linksys wusb54gp.

I've used different kernel versions, up to a suspend2-gentoo kernel (I know,
it's not vanilla, but still): suspend2-sources-2.6.15-r8.

I still have to try with later versions, even with a vanilla version, to make
sure this still happens, but by the looks of it it seems to be the same bug.

I have an Acer 1355LC laptop (VIA KM400) chipset, and I'm using ehci_hcd to
handle the devices.

As stated in other bug reports, everything seems to be working for a while (in
my case, normally a few hours, even under heavy usage), and then, for no
apparent reason... "usb disconnect".

The devices work on another machine with a kernel from the same package (same
sources), only compiled with some other options, because of the different
hardware. But the USB options are similar.

Please find the "dmesg", "lspci -v" and "cat /proc/cpuinfo" output, as well as
my kernel config, as attachments.

Please note that in the dmesg_01 output, the time between plugging the device
and getting the "usb disconnect" is more than 3 hours. The error messages that
ndiswrapper reports are usual, and I'm getting those on another machine that
doesn't have this problem.
Comment 12 Ezequiel Valenzuela 2006-05-10 03:55:12 UTC
Created attachment 8081 [details]
[ev] laptop - suspend2-sources-2.6.15-r8_04 - dmesg output
Comment 13 Ezequiel Valenzuela 2006-05-10 03:56:24 UTC
Created attachment 8082 [details]
[ev] laptop - suspend2-sources-2.6.15-r8_04 - 01 - lspci -v output
Comment 14 Ezequiel Valenzuela 2006-05-10 03:57:14 UTC
Created attachment 8083 [details]
[ev] laptop - suspend2-sources-2.6.15-r8_04 - 01 - cat /proc/cpuinfo output
Comment 15 Ezequiel Valenzuela 2006-05-10 03:58:35 UTC
Created attachment 8084 [details]
[ev] laptop - suspend2-sources-2.6.15-r8_04 - kernel config
Comment 16 Ezequiel Valenzuela 2006-05-11 00:33:56 UTC
Now tested against 2.6.16-suspend2-r4 (gentoo).
Kernel config and dmesg output attached to this entry.
Comment 17 Ezequiel Valenzuela 2006-05-11 00:36:55 UTC
Created attachment 8089 [details]
[ev] laptop - suspend2-sources-2.6.16-r4_01 - dmesg output
Comment 18 Ezequiel Valenzuela 2006-05-11 00:37:51 UTC
Created attachment 8090 [details]
[ev] laptop - suspend2-sources-2.6.16-r4_01 - kernel config
Comment 19 Ezequiel Valenzuela 2006-05-11 00:42:51 UTC
Sorry, I forgot to mention that the behaviour is still the same: it works for a
few hours, then it "disconnects" itself.

I disabled several things before leaving the machine pinging the router (so that
"heavy usage" issues are discarded, considering the machine was doing nothing else):

 * cpufreqd (normally it shouldn't kick in anyway, but...)
 * vmware drivers
 * at BIOS setup: disabled "legacy USB support" (you can see that the kernel
still loaded the uhci driver and successfully recognized the hardware as USB 1.1
compliant, though).

Please note that the device still used ehci_hcd (usb 2.0), and it worked in the
same way as before (that is, apparently successfully until the "usb disconnect").
Comment 20 Ezequiel Valenzuela 2006-05-12 02:46:59 UTC
Hi. More news on this issue.

I've compiled 2.6.17-rc3 from the vanilla sources (no gentoo patches this time).
I've also disabled the vmware drivers, but this time I left the cpufreqd daemon
running.

This time, the device stay connected close to the 12 hour mark, although this
may just be circumstancial. The strangest thing is, the device seems to be
working (at least the pings using the usb device still work), although not
"completely". Using the network interface as such seems to be working ok, but:

* any attempts to run lsusb are futile (processes are "hung").

* running programs that try to access the device directly (such as running
"iwlist wlan0 scan") report error messages. This can also be seen in the kernel
log I'm attaching.

So in a way, it looks exactly as before, if you consider that I can't access the
usb device as such, or the usb core with programs like "lsusb".

*But* the network interface is still working. This never happened until now.

I'll try and apply the patch attached to this bug later to test whether it
changes anything. For now, I'm attaching my "dmesg" output and my kernel config.

Hope this helps.
Comment 21 Ezequiel Valenzuela 2006-05-12 02:52:41 UTC
Created attachment 8095 [details]
[ev] laptop - vanilla-sources-2.6.17_rc3_01 - dmesg output
Comment 22 Ezequiel Valenzuela 2006-05-12 02:54:04 UTC
Created attachment 8096 [details]
[ev] laptop - vanilla-sources-2.6.17_rc3_01 - kernel config
Comment 23 Ezequiel Valenzuela 2006-05-16 01:41:07 UTC
I've also tried with vanilla-sources-2.6.16.16 and vanilla-sources-2.6.17_rc4,
but it's still not working.

Note: the device *is* working on a different PC *and* it's also working as a USB
1.1 device on the very same laptop machine.
Comment 24 Jan Richling 2006-08-31 04:06:14 UTC
I have exactly the same problem with an EPIA ME6000 (Via vt8235 southbridge) and
a Cinergy T2 USB DVB-T box using kernel 2.6.17.1. The device works for some time
(I have seen everything from 5 minutes to 5 hours) and then dies with 

usb 1-1: USB disconnect, address 2

It makes no difference if it is connected directly or using a powered hub or
using an unpowered hub. Several people at vdrportal.de report this device
working with similar drivers but other USB 2 hosts (and some have the same
problem with EPIA boards using Via southbridges). Disabling ehci and sticking to
USB 1 solves the issues but makes the device unusable as it needs USB 2 for
operation (in USB 1 mode it acts only as a remote control receiver but for any
period of time).

It seems that (in USB 2 mode) it dies faster if the device is heavy used but it
also occurs if there is only low traffic.

dmesg etc. does not tell anything that was not reported in the other comments so
I skip this.
Comment 25 David Brownell 2006-09-19 11:27:57 UTC
Well I've had one person confirm that the patch makes _their_ problem  
with VIA EHCI go away, so I've submitted it for merge to 2.6.19 ...   
  
However note that there are multiple problems here.  One is the wedge  
in removing an URB from its queue ... where VIA's EHCI silicon has  
always needed a bug workaround, improved in that patch.  
  
Another problem is as noted in Comment #8 ... it's not just the IAA  
IRQ that stops arriving, it's _all_ IRQs.  Of course #8 indicates an  
IRQ routing problem (notice the "no-irq" message), potentially resolved  
in more current kernels.  (VIA has IRQ routing quirks that have never  
been handled quite right...)  
 
And a third seems to be the spontaneous "disconnect", which I don't 
recall being reported on non-VIA hardware.  It's unclear whether that 
would stop once the first two vanish, but in any case it's clearly the 
hardware which is misbehaving.  (I hate to say it but I've explicitly 
avoided systems with VIA hardware since in my experience it's been so 
flakey...) 
 
I'll ask folk to try reproducing this after 2.6.18 ships and the first 
batch of USB patches goes upstream, with that unlink patch. 
 
Comment 26 Jan Richling 2006-09-19 13:12:28 UTC
This sounds promising! As testing is not that simple (the machine is my video
disk recorder and therefore can be used for testing only in a limited way) but I
want to help and I want to test the patch - would you test it with the installed
2.6.17.1 or better change to the latest 2.6.18-pre-whatever (which could imply
problems with the other components of the system)? If I understand you correctly
the patch was developed for 2.6.17-pre so it could be even better to stay with
2.6.17.1 for the test, right?

Regarding VIA... after the experience with that EPIA-board (which is not really
low cost) and the non-existing help from VIA (they even did not respond to
similar reports in their own forum) I stopped buying VIA-equipped hardware but I
still habe to live with that little thing.
Comment 27 Jan Richling 2006-09-22 11:48:13 UTC
Hi again,

in the meantime 2.6.18 is released so I was able to test the patch against this.
It was necessary to add one missing "break;" statement in order to compile
ehci-hcd.c.

It was pretty stable for one hour... than it died again with "USB disconnect" so
this patch unfortunately does not solve that part (the third part) of the
problem. I also tested with 2.6.17.1 with no USb disconnect but this was
probably due to the short test time (only few hours... sometimes it takes days
to occure, sometimes minutes).

So I will continue solving the problem by finding another board that fits into
the home made HTPC-box and does not include any VIA components... :-(

Jan
Comment 28 Vermont Rutherfoord 2006-12-28 22:37:13 UTC
I believe I'm experiencing the same bug described here. To reproduce it quickly
and semi-reliably, I generate a ton of incoming traffic with my iPod Nano like so:

dd if=/dev/sda bs=4096 count=276072 | cat | wc -c

A disconnect usually results before 400M is transfered. If any of you have
memory sticks, this might also work for you.

I'd be willing to test any patches for this. Here's the last few kernel messages
I get before dd hangs in state D, along with any new lsusb processes. They seem
similar to the messages which started this thread.

Dec 28 20:45:31 eggnog kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0008
Dec 28 20:45:31 eggnog kernel: ehci_hcd 0000:00:10.3: GetStatus port 3 status
00180b POWER sig=j PEC CSC CONNECT
Dec 28 20:45:31 eggnog kernel: hub 4-0:1.0: port 3, status 0501, change 0003,
480 Mb/s
Dec 28 20:45:31 eggnog kernel: usb 4-3: USB disconnect, address 3
Dec 28 20:45:31 eggnog kernel: usb 4-3: unregistering device
Dec 28 20:45:31 eggnog kernel: usb 4-3: usb_disable_device nuking all URBs
Dec 28 20:45:31 eggnog kernel: ehci_hcd 0000:00:10.3: shutdown urb ec906360 pipe
c0008380 ep1in-bulk
Dec 28 20:46:01 eggnog kernel: usb 4-3: usb_sg_cancel, unlink --> -19


Here's my VIA USB controllers as they show up in lspci -v:

0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 80) (prog-if 00 [UHCI])                                           
        Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer: Unknown device a232
        Flags: bus master, medium devsel, latency 32, IRQ 10
        I/O ports at d800 [size=32]                 
        Capabilities: <available only to root>      
                                                    
0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) (prog-if 20
[EHCI])
        Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer: Unknown device a232
        Flags: bus master, medium devsel, latency 32, IRQ 5
        Memory at e6000000 (32-bit, non-prefetchable) [size=256]
Comment 29 Vermont Rutherfoord 2006-12-28 22:46:42 UTC
Created attachment 9959 [details]
Complete 2.6.19.1 kern.log for my VIA EHCI disconnect
Comment 30 Natalie Protasevich 2007-06-13 11:43:18 UTC
Is this still happening with 2.6.22-rc4?
Thanks,
--Natalie
Comment 31 Adrian Bunk 2007-09-19 16:37:21 UTC
Please reopen this bug if it's still present with kernel 2.6.22.
Comment 32 Mighty 2007-09-23 03:39:16 UTC
Please reopen!
The problem remains unsolved up until 2.6.23-rc6.

I've tried next to all major+minor releases since 2.6.18.
Disabling EHCI brings a few days more uptime. But the issue shows up
in any case with next to any sys configuration; in and excluding 
SMP/Preempt/ACPI and some other stuff. Just minor variations in uptime.

Only disabling EHCI seems to make a statistic relevant difference.

However, quite a few people using the EPIA ME-6000 have reported this issue 
and it would be great to have a fix as these boards are often used as headless
servers/for HTPC application.

I can provide any further data, test any configuration.
Comment 33 Natalie Protasevich 2007-09-23 10:02:47 UTC
Can you please attach boot log, lspci -vv, any error messages from other logs that you see, and /proc/interrupts, and your system/chipset, to make sure this is same problem.
Thanks.
Comment 34 Alan Stern 2007-09-23 11:37:58 UTC
Please look also at Bug #8692, which includes some test patches for the 2.6.22 kernel.  In particular, try using the patches in comments #18 and #38 of that bug report.
Comment 35 Natalie Protasevich 2007-09-23 17:25:55 UTC
Just so we don't miss another data point: we just closed bug #6708 because of reporter won't reply, but that was the same error as the one reported here originally, and reporter stated that 2.6.15 was free from this bug.
Comment 36 Alan Stern 2007-09-24 07:29:28 UTC
As described in comment #25, there actually are 3 separate (possibly related) bugs mentioned in this bug report:

   (1) Spontaneous disconnects
   (2) EHCI interrupts stop occurring
   (3) Khubd hangs in D state

The patches in Bug #8692 are meant to help with (3).  We can't help with (1) or (2) because we don't know what causes them or how to prevent them.

We also don't know of any kernel in which these bugs don't occur.  Can anybody please try out 2.6.15 (or even earlier versions) to see if they also have these bugs?
Comment 37 Mighty 2007-09-24 08:31:54 UTC
Oha. I applied the patch from #8692, comment #38, and successfully recompiled
the 2.6.23-rc6 kernel. The box has been operating stable for one and a half days... with a NEC USB Card, as you proposed in another thread. 

But thats not really
a solution as the (only :/) PCI-Slot in the little box usually carries 
a DVB-S card.

I will write back when system is back on my desk and wired up.
Comment 38 Michael Rüttgers 2008-02-27 03:37:15 UTC
I'm having the same USB troubles with my VIA chipset.

The things I've figured out so far:

2.6.25-rc3 -> Spontaneous disconnects, Khubd keeps responding, device is reassigned in companion mode (see log with usb debugging enabled)

2.6.24 -> Spontaneous disconnects, Khubd hangs in D state

2.6.15 -> Spontaneous disconnects, whole system seems to slow down and becomes not responding (it's quite hard to get a remote terminal for sysrq reboot)

< 2.6.15 -> I will try earlier kernel versions and report the results

---

Feb 26 17:37:12 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:37:12 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 00100a POWER sig=se0 PEC CSC
Feb 26 17:37:12 vdr kernel: hub 4-0:1.0: port 2, status 0100, change 0003, 12 Mb/s
Feb 26 17:37:12 vdr kernel: usb 4-2: USB disconnect, address 2
Feb 26 17:37:12 vdr kernel: usb 4-2: unregistering device
Feb 26 17:37:12 vdr kernel: usb 4-2: usb_disable_device nuking all URBs
Feb 26 17:37:12 vdr kernel: ehci_hcd 0000:00:10.3: shutdown urb dde369c0 ep1in-bulk
Feb 26 17:37:12 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:12 vdr kernel: usb 4-2: unregistering interface 4-2:1.0
Feb 26 17:37:12 vdr kernel: ndiswrapper: device wlan1 removed
Feb 26 17:37:12 vdr kernel: usb 4-2:1.0: uevent
Feb 26 17:37:12 vdr kernel: usb 4-2: uevent
Feb 26 17:37:12 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001803 POWER sig=j CSC CONNECT
Feb 26 17:37:13 vdr kernel: hub 4-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x501
Feb 26 17:37:13 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:37:13 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
Feb 26 17:37:13 vdr kernel: usb 4-2: new high speed USB device using ehci_hcd and address 3
Feb 26 17:37:18 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:18 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:23 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:23 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:28 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:28 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:28 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:37:28 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 00100f POWER sig=se0 PEC PE CSC CONNECT
Feb 26 17:37:28 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:37:28 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:37:28 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001803 POWER sig=j CSC CONNECT
Feb 26 17:37:28 vdr kernel: hub 4-0:1.0: port 2, status 0501, change 0001, 480 Mb/s
Feb 26 17:37:28 vdr kernel: hub 4-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x501
Feb 26 17:37:28 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:37:28 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
Feb 26 17:37:28 vdr kernel: usb 4-2: new high speed USB device using ehci_hcd and address 4
Feb 26 17:37:33 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:33 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:38 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:38 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:43 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:43 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:43 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:37:43 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 00100f POWER sig=se0 PEC PE CSC CONNECT
Feb 26 17:37:43 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:37:43 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:37:43 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001803 POWER sig=j CSC CONNECT
Feb 26 17:37:43 vdr kernel: hub 4-0:1.0: port 2, status 0501, change 0001, 480 Mb/s
Feb 26 17:37:43 vdr kernel: hub 4-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x501
Feb 26 17:37:43 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:37:43 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
Feb 26 17:37:43 vdr kernel: usb 4-2: new high speed USB device using ehci_hcd and address 5
Feb 26 17:37:48 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:48 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:53 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:53 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:58 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:37:58 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:37:58 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:37:58 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 00100f POWER sig=se0 PEC PE CSC CONNECT
Feb 26 17:37:58 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:37:58 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:37:58 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001803 POWER sig=j CSC CONNECT
Feb 26 17:37:58 vdr kernel: hub 4-0:1.0: port 2, status 0501, change 0001, 480 Mb/s
Feb 26 17:37:58 vdr kernel: hub 4-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x501
Feb 26 17:37:59 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:37:59 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
Feb 26 17:37:59 vdr kernel: usb 4-2: new high speed USB device using ehci_hcd and address 6
Feb 26 17:38:04 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:38:04 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:38:09 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:38:09 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:38:14 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:38:14 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:38:14 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:38:14 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 00100f POWER sig=se0 PEC PE CSC CONNECT
Feb 26 17:38:14 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:38:14 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:38:14 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001803 POWER sig=j CSC CONNECT
Feb 26 17:38:14 vdr kernel: hub 4-0:1.0: port 2, status 0501, change 0001, 480 Mb/s
Feb 26 17:38:14 vdr kernel: hub 4-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x501
Feb 26 17:38:14 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:38:14 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
Feb 26 17:38:14 vdr kernel: usb 4-2: new high speed USB device using ehci_hcd and address 7
Feb 26 17:38:19 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:38:19 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:38:24 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:38:24 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:38:29 vdr kernel: ehci_hcd 0000:00:10.3: IAA watchdog: status a008 cmd 10069
Feb 26 17:38:29 vdr kernel: usb 4-2: khubd timed out on ep0in len=0/64
Feb 26 17:38:29 vdr kernel: ehci_hcd 0000:00:10.3: port 2 high speed
Feb 26 17:38:29 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 00100f POWER sig=se0 PEC PE CSC CONNECT
Feb 26 17:38:29 vdr kernel: usb usb1: wakeup_rh (auto-start)
Feb 26 17:38:29 vdr kernel: hub 4-0:1.0: state 7 ports 6 chg 0000 evt 0004
Feb 26 17:38:29 vdr kernel: ehci_hcd 0000:00:10.3: GetStatus port 2 status 003802 POWER OWNER sig=j CSC
Feb 26 17:38:29 vdr kernel: hub 4-0:1.0: port 2, status 0100, change 0001, 12 Mb/s
Feb 26 17:38:29 vdr kernel: hub 4-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
Feb 26 17:38:29 vdr kernel: hub 1-0:1.0: state 7 ports 2 chg 0000 evt 0004
Feb 26 17:38:29 vdr kernel: uhci_hcd 0000:00:10.0: port 2 portsc 0093,00
Feb 26 17:38:29 vdr kernel: hub 1-0:1.0: port 2, status 0101, change 0001, 12 Mb/s
Feb 26 17:38:29 vdr kernel: hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x101
Feb 26 17:38:29 vdr kernel: usb 1-2: new full speed USB device using uhci_hcd and address 2
Feb 26 17:38:30 vdr kernel: usb 1-2: not running at top speed; connect to a high speed hub
Feb 26 17:38:30 vdr kernel: usb 1-2: default language 0x0409
Feb 26 17:38:30 vdr kernel: usb 1-2: uevent
Feb 26 17:38:30 vdr kernel: usb 1-2: usb_probe_device
Feb 26 17:38:30 vdr kernel: usb 1-2: configuration #1 chosen from 1 choice
Feb 26 17:38:30 vdr kernel: usb 1-2: adding 1-2:1.0 (config #1, interface 0)
Feb 26 17:38:30 vdr kernel: usb 1-2:1.0: uevent
Feb 26 17:38:30 vdr kernel: ndiswrapper 1-2:1.0: usb_probe_interface
Feb 26 17:38:30 vdr kernel: ndiswrapper 1-2:1.0: usb_probe_interface - got id
Feb 26 17:38:30 vdr kernel: usb 1-2: reset full speed USB device using uhci_hcd and address 2
Feb 26 17:38:30 vdr kernel: ndiswrapper: driver rt73 (Ralink,01/12/2006, 1.00.04.0000) loaded
Feb 26 17:38:31 vdr kernel: wlan0: ethernet device 00:0e:2e:dc:0d:a0 using NDIS driver: rt73, version: 0x0, NDIS version: 0x500, vendor: 'IEEE 802.11g Wireles
s Card.', 148F:2573.F.conf
Feb 26 17:38:31 vdr kernel: wlan0: encryption modes supported: WEP; TKIP with WPA, WPA2, WPA2PSK; AES/CCMP with WPA, WPA2, WPA2PSK
Feb 26 17:38:31 vdr kernel: drivers/usb/core/inode.c: creating file '002'
Feb 26 17:38:31 vdr kernel: usb 1-2: New USB device found, idVendor=148f, idProduct=2573
Feb 26 17:38:31 vdr kernel: usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Feb 26 17:38:31 vdr kernel: usb 1-2: Product: 802.11 bg WLAN
Feb 26 17:38:31 vdr kernel: usb 1-2: Manufacturer: Ralink