Bug 32432

Summary: USB Disconnects / resets after commit b963801164618e25fbdc0cd452ce49c3628b46c8
Product: Drivers Reporter: Jools Wills (jools)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, holken, matejken, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27 - 2.6.38 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: patch to revert b963801164618e25fbdc0cd452ce49c3628b46c8

Description Jools Wills 2011-04-01 12:54:18 UTC
After commit b963801164618e25fbdc0cd452ce49c3628b46c8
(USB: ehci-hcd unlink speedups) I have instability with an external USB drive on a Intel Mac Mini. USB device will stop working after some minutes/hours/or a few days.

Manually reverting the commit fixes it and the USB device is completely stable.  I spent a long time trying to track the bug down, firstly trying every single USB quirk mode and kernel configs, and eventually found reverting this commit fixed it. I am currently running a 2.6.27 kernel with the attached patch. The patch still applies to GIT HEAD as of 01/04/2011

The errors looked like

Jun  7 22:56:05 malus kernel: usb 1-1: reset high speed USB device using
ehci_hcd and address 2
Jun  7 22:56:16 malus kernel: usb 1-1: reset high speed USB device using
ehci_hcd and address 2
Jun  7 22:56:19 malus kernel: usb 1-1: USB disconnect, address 2
Jun  7 22:56:20 malus kernel: sd 4:0:0:0: Device offlined - not ready
after error recovery

the usb device in question:
root@malus:/usr/local/src# cat /proc/scsi/usb-storage/4 
   Host scsi4: usb-storage
       Vendor: Super Top 
      Product: USB 2.0  IDE DEVICE    
Serial Number: ST  Killer  
     Protocol: Transparent SCSI
    Transport: Bulk
       Quirks: IGNORE_RESIDUE

Has a quirk active by default, but as everything was fine prior to the patch and the fact that I am not the only person who has had issues with this commit, I wonder if there is a problem in the code somewhere or something that could be done.

Other people with issues pointing to that commit

http://lkml.org/lkml/2009/5/22/405 (USB/DVB - Old Technotrend TT-connect S-2400 regression tracked down)
http://kerneltrap.com/mailarchive/linux-usb/2008/12/19/4459054/thread 9usb: Fix PS3 EHCI suspend)

hardware on machine

00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller (rev 03)
00:07.0 Performance counters: Intel Corporation Device 27a3 (rev 03)
00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 02)
00:1d.0 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 02)
01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 22)
02:00.0 Ethernet controller: Atheros Communications Inc. AR5001 Wireless Network Adapter (rev 01)
03:03.0 FireWire (IEEE 1394): Agere Systems FW322/323 (rev 61)

let me know if I can provide other information. The machine is question is a production machine in a hosting facility, so my priority was to get this working as I have now.
Comment 1 Jools Wills 2011-04-01 12:59:12 UTC
small correction. I am currently running 2.6.37 with this patch
Comment 2 Jools Wills 2011-04-01 12:59:58 UTC
Created attachment 52962 [details]
patch to revert b963801164618e25fbdc0cd452ce49c3628b46c8
Comment 3 Greg Kroah-Hartman 2011-04-02 20:45:09 UTC
On Fri, Apr 01, 2011 at 12:54:47PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=32432
> 
>            Summary: USB Disconnects / resets after commit
>                     b963801164618e25fbdc0cd452ce49c3628b46c8

Please post this on the linux-usb@vger.kernel.org list so the developers
there can comment on it and revert it if needed.
Comment 4 matejken 2011-04-14 07:40:17 UTC
Original report in Launchpad:

https://bugs.launchpad.net/linux/+bug/349767


Alternative report (most probably a duplicate) with traces created with options:

CONFIG_USB_STORAGE_DEBUG=y
CONFIG_USB=y
CONFIG_USB_DEBUG=y

https://bugs.launchpad.net/linux/+bug/701011
Comment 5 Florian Mickler 2011-08-08 14:00:48 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit 004c19682884d4f40000ce1ded53f4a1d0b18206
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Tue Jul 5 12:34:05 2011 -0400

    USB: EHCI: go back to using the system clock for QH unlinks
Comment 6 Greg Kroah-Hartman 2012-02-22 21:13:11 UTC
All USB bugs should be sent to the linux-usb@vger.kernel.org mailing 
list, and not entered into bugzilla.  Please bring this issue up there,
if it is still a problem in the latest kernel release.
Comment 7 Jools Wills 2012-02-23 00:03:48 UTC
you can remove the invalid tag maybe and just mark it as resolved. i believe this was all discussed and fixed for 3.1

yeh as above. fixed.

is it invalid to report usb bugs to the kernel bugtracker ? it;s not obvious to users to go to the mailinglist first ?
Comment 8 Florian Mickler 2012-02-23 22:12:03 UTC
No, it is not invalid. But it's a tad bit difficult to explain. 

Every subsystem has different rules for reporting bugs, so it is a bit difficult for users to get it right and they are not expected to get it right. 

Above, Greg did just clean up orphaned bug entries and was a bit too eager to get them all closed I think. 

I guess he used a script to close all bugs against the usb subsystem? :-) 
He did probably not consider that some of these bugs were considered regressions and tracked by the regression tracking team. 

From a regression-tracking perspective:

The normal workflow for a subsystem that doesn't want to use the bugzilla would be like this:

1. Someone reports a regression bug against USB to the bugzilla
2. He get's told to report it to the mailinglist and post a link to the bug. 
3. Sometime later someone scans the regression reports and closes closed bugs. (Usually before Rafael posts his regression report) 

OR:

1. Someone reports a regression on a mailinglist
2. Someone (Maciej) see's it and opens a tracking-bug on the bugzilla with a reference to the mailinglist discussion
3.  Sometime later someone scans the regression reports and closes closed bugs. 



The difficult and time-consuming part for the regression tracking is the step nr 3.

Rafael will probably post some documentation about the regression-tracking workflow later on....
Comment 9 Jools Wills 2012-02-23 22:36:38 UTC
thanks for the clarification.