Bug 4717 - Large USB Storage transactions hang kernel.
Summary: Large USB Storage transactions hang kernel.
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-06 06:14 UTC by dlamblin
Modified: 2005-08-04 14:07 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.12-rc5
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
.config (37.13 KB, text/plain)
2005-06-06 06:17 UTC, dlamblin
Details
dmesg after first reboot after first hanging. (11.55 KB, text/plain)
2005-06-06 06:18 UTC, dlamblin
Details

Description dlamblin 2005-06-06 06:14:07 UTC
Distribution: Gentoo 2005.0
Hardware Environment: ix86 (PII) Desktop
Software Environment: Gentoo with either genkernel built 2.6.11-gentoo-r9 or
2.6.12-rc5;  See atachments for .config and for dmesg
Problem Description:

from: http://bugs.gentoo.org/show_bug.cgi?id=94772
First occurance was with 2.6.11-gentoo-r9; recurred with 2.6.12-rc5
Large transactions on a USB drive (LaCie Hard Drive Extreme Tripple 250Gb) hang
the system.  Both formatting the full sized partition, and writing 100-500mb
onto a 10Gb formatted partition.

Subsequent testing showed that setting noapic and acpi=off allowed a 512mb cp to
complete.  HOWEVER, I again started a 512mb cp to this drive and then when it
had about 2 minutes to go or so, I issued a `shutdown +5 min` the system then
hung once I recieve the message, system shutting down in 3 minutes.  The cp did
not entirely complete, stopping at 420mb.

There are no messages on the console when this happens; system is hardlocked,
and can't be pinged.

Steps to reproduce:
CASE 1, mkfs.xfs /dev/sda1 #where dev/sda1 is a 250GB partition
CASE 2, (on a 10gb xfs partition) cp a file that's 64mb or more onto the 
mounted drive.
Actual Results:  
CASE 1: The drive blinked that it was busy; it kept blinking that (with out 
making any discernable noise) for 8 hours. Unplugging USB had no effect.  The 
console was unresponsive, the network interface wouldn't respond to pings. 
there were no messages from the kernel.
CASE 2: the drive blinked that it was busy for about an hour; after the first 
15 second the network interface stopped responding to pings; no console 
messages were printed. 

CASE 3: booting with noapic and acpi=off. cp a 512mb file to the drive.  Success
CASE 4: booting with noapic and acpi=off. cp a 512mb file to the drive, at
half-way shutdown the system in 5 or more minutes.  It should hang before the
countdown or file cp is complete.

[it can remain hung for a whole weekend]
Comment 1 dlamblin 2005-06-06 06:17:21 UTC
Created attachment 5131 [details]
.config

Originally from Gentoo Boot CD 2005.0 adapted to 2.6.11-gentoo-r9 and
oldconfiged to Linux 2.6.12-rc5
Comment 2 dlamblin 2005-06-06 06:18:17 UTC
Created attachment 5132 [details]
dmesg after first reboot after first hanging.
Comment 3 Alan Stern 2005-07-18 13:09:29 UTC
Does 2.6.13-rc3 work any better?
Comment 4 dlamblin 2005-07-29 15:04:57 UTC
I looks like it might have, since I got no deadlocking:

# time cp l /mnt/usb1/e;ls -l /mnt/usb1/e;time cat l >> /mnt/usb1/e; ls -
l /mnt/usb1/e

real    7m51.361s
user    0m0.374s
sys     0m12.518s
-rw-r--r--  1 root root 536870912 Jul 29 17:28 /mnt/usb1/e

real    9m20.248s
user    0m0.351s
sys     0m12.272s
-rw-r--r--  1 root root 1073741824 Jul 29 17:37 /mnt/usb1/e
Comment 5 Alan Stern 2005-07-29 18:59:03 UTC
If large I/O transfers continue to work okay, feel free to close this bug.  If
any other problems crop up, let us know.

In case you're interested, 2.6.13 contains a new, different error-recovery
scheme for usb-storage.  Actually it's the scheme used by Windows.  While it's
not inherently superior to the old recovery technique, lots of drives and
USB-IDE converters respond better to it.  No surprise -- you can guess what the
vendors use for testing!

I can't be certain that a failure of error recovery was responsible for the
hangs you experienced.  It might merely have been the trigger for some deeper
problem.
Comment 6 dlamblin 2005-08-04 14:07:57 UTC
Yes I've worked with the drive a few days now; the entire 250gb were formatted 
as XFS (something that did not work previously), and large files were copied 
to and from it.

Note You need to log in before you can comment on or make changes to this bug.