Bug 8690 - Bulk IN transfers hanging randomly on Intel ICHx USB EHCI HCD
Bulk IN transfers hanging randomly on Intel ICHx USB EHCI HCD
Status: REJECTED WILL_NOT_FIX
Product: Drivers
Classification: Unclassified
Component: USB
All Linux
: P1 high
Assigned To: Greg Kroah-Hartman
:
Depends on:
Blocks: USB
  Show dependency treegraph
 
Reported: 2007-06-29 14:13 UTC by Martin Drab
Modified: 2009-03-23 11:12 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.22-rc6
Tree: Mainline
Regression: ---


Attachments
/sys/class/usb_host/usb_host5 before transfer and after the hang (1.31 KB, application/octet-stream)
2007-07-01 17:30 UTC, Martin Drab
Details
lsusb -v (23.58 KB, text/plain)
2007-07-01 17:31 UTC, Martin Drab
Details
/proc/bus/usb/devices (4.62 KB, text/plain)
2007-07-01 17:32 UTC, Martin Drab
Details

Description Martin Drab 2007-06-29 14:13:25 UTC
Most recent kernel where this bug did not occur: none (??)
Distribution: FC7
Hardware Environment:

00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02)
00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 440 AGP 8x] (rev a2)
02:04.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 61)
02:04.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 61)
02:04.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 63)
02:04.3 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev 46)
02:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)

Software Environment:
Problem Description:

When I'm doing a continuous High-Speed bulk IN transfers from a device (the device has a Philips/NXP ISP1581 USB device controller) and this device is connected to an Intel ICHx (tried with ICH5 and ICH7 and both had the problem) (EHCI) HCD, the transfer starts, works for some while, which is not really predictable, sometimes it is a second (few tenths of MB transferred), sometimes it is few minutes (few GB transferred) (but usually not more) and then all of a sudden, the transfer stops. Without any warning, error, anything, it just stops.

The driver for the device (that I'm writing) keeps about 10 to 40 buffers of 32 KB size permanently linked/submitted for the Bulk IN transfers from the device (resp. its particular Bulk IN endpoint) and the number of linked/submitted buffers at a time never falls below 10 during the transfer. When one buffer transfer completes, another is immediately linked/submitted from within the URB completion handler.

But while the bulk IN transfers on ICHx HCD do have this problem, isochronous IN transfers from the same device (only different endpoint) on the same HCDs do not have any problem, they work correctly for any period of time.

So at first I thought it might be a problem either in the device USB controller ISP1581 or somewhere in my driver. But then I tried on different computers with non Intel chipset and CPU (though I'm not sure whether CPU is relevant in this case), like AMD on VIA and AMD on nVidia, and there everything works great, i.e. the transfer doesn't stop. I also tried to insert a USB HCD add-on card based on VIA HCD chip into the computer where it failed on the ICH5 chipset (otherwise everything else remained the same, both other HW and kernel), ant connected the device through this card and then it also worked without a problem.

That makes me think, that the problem would probably not be in the ISP1581 or my driver, but rather in the HCD driver in handling the bulk transfers for the ICHx HCD. Unless all the ICHx HCDs are buggy in hardware, which of course I do not know either. But is it? Or can there be any other explanation? Please help.

So, does any USB HCD guru around here have any idea where the problem might possibly be? I can help with debugging or even fixing, but someone more familiar with the HCD core might have a better luck or at least some hints as to where to look, at least.

Steps to reproduce: Described above.
Comment 1 Andrew Morton 2007-06-29 14:26:28 UTC
Subject: Re: [Bugme-new]  New: Bulk IN transfers hanging randomly
 on Intel ICHx USB EHCI HCD

On Fri, 29 Jun 2007 14:09:23 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8690
> 
>            Summary: Bulk IN transfers hanging randomly on Intel ICHx USB
>                     EHCI HCD
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.22-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: USB
>         AssignedTo: greg@kroah.com
>         ReportedBy: drab@kepler.fjfi.cvut.cz
> 
> 
> Most recent kernel where this bug did not occur: none (??)
> Distribution: FC7
> Hardware Environment:
> 
> 00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
> Interface (rev 02)
> 00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev
> 02)
> 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #2 (rev 02)
> 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #3 (rev 02)
> 00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI
> Controller #4 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI
> Controller (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
> 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
> Bridge (rev 02)
> 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller
> (rev 02)
> 00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev
> 02)
> 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev
> 02)
> 00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R)
> AC'97 Audio Controller (rev 02)
> 01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 440 AGP
> 8x] (rev a2)
> 02:04.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 61)
> 02:04.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
> Controller (rev 61)
> 02:04.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 63)
> 02:04.3 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller
> (rev 46)
> 02:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
> 
> Software Environment:
> Problem Description:
> 
> When I'm doing a continuous High-Speed bulk IN transfers from a device (the
> device has a Philips/NXP ISP1581 USB device controller) and this device is
> connected to an Intel ICHx (tried with ICH5 and ICH7 and both had the problem)
> (EHCI) HCD, the transfer starts, works for some while, which is not really
> predictable, sometimes it is a second (few tenths of MB transferred), sometimes
> it is few minutes (few GB transferred) (but usually not more) and then all of a
> sudden, the transfer stops. Without any warning, error, anything, it just
> stops.
> 
> The driver for the device (that I'm writing) keeps about 10 to 40 buffers of 32
> KB size permanently linked/submitted for the Bulk IN transfers from the device
> (resp. its particular Bulk IN endpoint) and the number of linked/submitted
> buffers at a time never falls below 10 during the transfer. When one buffer
> transfer completes, another is immediately linked/submitted from within the URB
> completion handler.
> 
> But while the bulk IN transfers on ICHx HCD do have this problem, isochronous
> IN transfers from the same device (only different endpoint) on the same HCDs do
> not have any problem, they work correctly for any period of time.
> 
> So at first I thought it might be a problem either in the device USB controller
> ISP1581 or somewhere in my driver. But then I tried on different computers with
> non Intel chipset and CPU (though I'm not sure whether CPU is relevant in this
> case), like AMD on VIA and AMD on nVidia, and there everything works great,
> i.e. the transfer doesn't stop. I also tried to insert a USB HCD add-on card
> based on VIA HCD chip into the computer where it failed on the ICH5 chipset
> (otherwise everything else remained the same, both other HW and kernel), ant
> connected the device through this card and then it also worked without a
> problem.
> 
> That makes me think, that the problem would probably not be in the ISP1581 or
> my driver, but rather in the HCD driver in handling the bulk transfers for the
> ICHx HCD. Unless all the ICHx HCDs are buggy in hardware, which of course I do
> not know either. But is it? Or can there be any other explanation? Please help.
> 
> So, does any USB HCD guru around here have any idea where the problem might
> possibly be? I can help with debugging or even fixing, but someone more
> familiar with the HCD core might have a better luck or at least some hints as
> to where to look, at least.
> 
> Steps to reproduce: Described above.
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.

Comment 2 Alan Stern 2007-06-30 16:08:38 UTC
Try building a kernel with CONFIG_USB_DEBUG.  When the stoppage occurs, look at the files in /sys/class/usb_host/usb_hostN (where N is the bus number of the EHCI controller) and attach them to this bug report.  Also check to see if any error messages appear in the dmesg log.
Comment 3 Martin Drab 2007-07-01 17:30:34 UTC
Created attachment 11913 [details]
/sys/class/usb_host/usb_host5 before transfer and after the hang

This is from yet another different computer, but with the same chipset (ICH5) and the same problem. In the attachment you'll find the contents of the /sys/class/usb_host/usb_host5 (the ICH5 EHCI controller is on bus 5) before the transfer began and after the transfer got stuck. Just for the record, the transfer got stuck after about 171 MB was transferred (or let's say delivered to the user-space). The transfer size for one URB is 32 KB. After the hang, 10 URBs that were submitted/linked have finished with the urb->status = -71 (EPROTO), and any URB that was linked afterwards haven't finished at all and needed to be unlinked forcefully (via usb_unlink_urb(), but I did that only after capturing the files in usb_host5.after, so that it did not potentially get influenced by the unlink itself). The communication was done with the device 3 on bus 5 endpoint 3 IN. You can see the dump of the lsusb -v attached below as well as the contents of the /proc/bus/usb/devices.
Comment 4 Martin Drab 2007-07-01 17:31:23 UTC
Created attachment 11914 [details]
lsusb -v
Comment 5 Martin Drab 2007-07-01 17:32:09 UTC
Created attachment 11915 [details]
/proc/bus/usb/devices
Comment 6 Martin Drab 2007-09-05 08:16:15 UTC
Surprisingly enough, on Dell Latitude D810 notebook with ICH6 this bug does not seem to manifest itself, so far (tested many times). Only ICH5 and ICH7 seems to be affected, but not ICH6. Strange.
Comment 7 Martin Drab 2007-09-11 12:48:31 UTC
OK, so the bug does manifest itself even on ICH6. Only not so often. I managed to make it stuck there as well. :-(
Comment 8 Alan Stern 2007-09-11 13:16:20 UTC
I don't think there's going to be any way to solve this problem just by looking at data on the software level.  The only way to find out what is going wrong is to use a USB analyzer and see what is actually being sent on the bus.

The contents of the debugging files don't show anything out of the ordinary, and from the point of view of the driver is just looks as though the device is simply NAKing all the transfers.  I know that this doesn't happen with non-Intel controller hardware, but still it might be the case.  Monitoring at the hardware level is needed.

Now, USB bus analyzers aren't cheap.  Perhaps somebody at Intel would be willing to help you by running a test?

Note You need to log in before you can comment on or make changes to this bug.