Bug 4717

Summary: Large USB Storage transactions hang kernel.
Product: IO/Storage Reporter: dlamblin (dlamblin)
Component: OtherAssignee: io_other
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: high CC: stern
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.12-rc5 Subsystem:
Regression: --- Bisected commit-id:
Attachments: .config
dmesg after first reboot after first hanging.

Description dlamblin 2005-06-06 06:14:07 UTC
Distribution: Gentoo 2005.0
Hardware Environment: ix86 (PII) Desktop
Software Environment: Gentoo with either genkernel built 2.6.11-gentoo-r9 or
2.6.12-rc5;  See atachments for .config and for dmesg
Problem Description:

from: http://bugs.gentoo.org/show_bug.cgi?id=94772
First occurance was with 2.6.11-gentoo-r9; recurred with 2.6.12-rc5
Large transactions on a USB drive (LaCie Hard Drive Extreme Tripple 250Gb) hang
the system.  Both formatting the full sized partition, and writing 100-500mb
onto a 10Gb formatted partition.

Subsequent testing showed that setting noapic and acpi=off allowed a 512mb cp to
complete.  HOWEVER, I again started a 512mb cp to this drive and then when it
had about 2 minutes to go or so, I issued a `shutdown +5 min` the system then
hung once I recieve the message, system shutting down in 3 minutes.  The cp did
not entirely complete, stopping at 420mb.

There are no messages on the console when this happens; system is hardlocked,
and can't be pinged.

Steps to reproduce:
CASE 1, mkfs.xfs /dev/sda1 #where dev/sda1 is a 250GB partition
CASE 2, (on a 10gb xfs partition) cp a file that's 64mb or more onto the 
mounted drive.
Actual Results:  
CASE 1: The drive blinked that it was busy; it kept blinking that (with out 
making any discernable noise) for 8 hours. Unplugging USB had no effect.  The 
console was unresponsive, the network interface wouldn't respond to pings. 
there were no messages from the kernel.
CASE 2: the drive blinked that it was busy for about an hour; after the first 
15 second the network interface stopped responding to pings; no console 
messages were printed. 

CASE 3: booting with noapic and acpi=off. cp a 512mb file to the drive.  Success
CASE 4: booting with noapic and acpi=off. cp a 512mb file to the drive, at
half-way shutdown the system in 5 or more minutes.  It should hang before the
countdown or file cp is complete.

[it can remain hung for a whole weekend]
Comment 1 dlamblin 2005-06-06 06:17:21 UTC
Created attachment 5131 [details]
.config

Originally from Gentoo Boot CD 2005.0 adapted to 2.6.11-gentoo-r9 and
oldconfiged to Linux 2.6.12-rc5
Comment 2 dlamblin 2005-06-06 06:18:17 UTC
Created attachment 5132 [details]
dmesg after first reboot after first hanging.
Comment 3 Alan Stern 2005-07-18 13:09:29 UTC
Does 2.6.13-rc3 work any better?
Comment 4 dlamblin 2005-07-29 15:04:57 UTC
I looks like it might have, since I got no deadlocking:

# time cp l /mnt/usb1/e;ls -l /mnt/usb1/e;time cat l >> /mnt/usb1/e; ls -
l /mnt/usb1/e

real    7m51.361s
user    0m0.374s
sys     0m12.518s
-rw-r--r--  1 root root 536870912 Jul 29 17:28 /mnt/usb1/e

real    9m20.248s
user    0m0.351s
sys     0m12.272s
-rw-r--r--  1 root root 1073741824 Jul 29 17:37 /mnt/usb1/e
Comment 5 Alan Stern 2005-07-29 18:59:03 UTC
If large I/O transfers continue to work okay, feel free to close this bug.  If
any other problems crop up, let us know.

In case you're interested, 2.6.13 contains a new, different error-recovery
scheme for usb-storage.  Actually it's the scheme used by Windows.  While it's
not inherently superior to the old recovery technique, lots of drives and
USB-IDE converters respond better to it.  No surprise -- you can guess what the
vendors use for testing!

I can't be certain that a failure of error recovery was responsible for the
hangs you experienced.  It might merely have been the trigger for some deeper
problem.
Comment 6 dlamblin 2005-08-04 14:07:57 UTC
Yes I've worked with the drive a few days now; the entire 250gb were formatted 
as XFS (something that did not work previously), and large files were copied 
to and from it.