Bug 3505 - usb-storage since at least 2.6.3: destroy's partitions on hdd and stick
Summary: usb-storage since at least 2.6.3: destroy's partitions on hdd and stick
Status: REJECTED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: i386 Linux
: P2 blocking
Assignee: David Brownell
URL:
Keywords:
Depends on:
Blocks: USB
  Show dependency tree
 
Reported: 2004-10-03 10:04 UTC by Thomas Kurt Sch
Modified: 2008-03-04 00:53 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.3 - at least 2.6.9-rc3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
My .config for 2.6.13-rc6 (28.53 KB, text/plain)
2005-08-14 11:55 UTC, Thomas Kurt Sch
Details

Description Thomas Kurt Sch 2004-10-03 10:04:59 UTC
Distribution: RedHat9
Hardware Environment: HP Omnibook XE 3 GC and others
Software Environment: Redhat 9 partly updated / Aurox 9 / SuSe
Problem Description:
since at least Kernel 2.6.3 partititons on usb-devices (HDD'S with USB-Adapter
or USB-Sticks) get distroyed when writing to them.

Steps to reproduce:
mkfs.ext3 /dev/sda1 (happens with any filesystem ext3, ext2, fat)
mount /dev/sda1 /mnt/usbdisk
cp -Rp linux-2.6.9-rc3 /mnt/usbdisk
cd /mnt/usbdisk
make mrproper
make menuconfig or copy .config 
make

or just write a lot of files and delete and write ...

you will get IO-Errors. On ext3 you get an aborted journal....
Comment 1 Adrian Bunk 2005-08-05 02:54:05 UTC
Is this problem still present in kernel 2.6.13-rc5?
Comment 2 Thomas Kurt Sch 2005-08-14 09:06:55 UTC
I can't reproduce this bug nomore. Not even with 2.6.7.
I have changed my hardware and distribution meanwhile.
I did also try it on my old HP Omnibook XE3 where I had the problem before. But
I have lost my USB2.0 PCMCIA card. With the build in USB 1.1 it seems to work
fine. The only difference on this machine is, that I'm using a newer Aurox
Distribution. Maybe glibc and/or gcc had caused the problem?

On the AMD64 I'm using Gentoo 2005.1 and the gentoo kernel 2.6.12-gentoo-r6.
It seems to work fine now.
Comment 3 Thomas Kurt Sch 2005-08-14 11:53:00 UTC
Perdone, in my retests I made a mistake. I did just untar the archive and a 
make menuconfig
and
make

But if I copy a already compiled kernel to the usb disk and do
make mrproper
copy somewhere/.config
make
it will end up in compile errors. But not in i/o errors. But if I then delete
the kernel source directory I will get errors in systemlog and the filesystem is
automatically mounted readonly.

I have retested with 2.6.13-rc6

So the problem does still exist. Sorry for this.

What can I do to help you to find a solution? I would really like to use my
USB-Disk and Sticks under Kernel 2.6.
Comment 4 Thomas Kurt Sch 2005-08-14 11:55:38 UTC
Created attachment 5636 [details]
My .config for 2.6.13-rc6
Comment 5 Matthew Dharm 2005-08-18 13:51:27 UTC
Just to eliminate some obvious possibilities, can you destroy the partition
table and re-create it with fdisk, then format the partition and continue with
the rest of the tests?

This will eliminate the possibility of funkyness in the partition table.

Also, is this a full-speed or high-speed device?  If it is high-speed, rmmod
ehci may help data integrity (at the expense of speed).  If this fixes it, it's
either a device problem or an EHCI HCD problem.
Comment 6 Victor Moroz 2005-11-08 10:05:34 UTC
I have similar problems with USB 2.0-ATA converters.

lsusb: Bus 001 Device 004: ID 04cf:8818 Myson Century, Inc.

usb 1-3: new high speed USB device using ehci_hcd and address 6
scsi7 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 6
usb-storage: waiting for device to settle before scanning
  Vendor: ST340016  Model: A                 Rev: 3.10
  Type:   Direct-Access                      ANSI SCSI revision: 00
SCSI device sda: 78165360 512-byte hdwr sectors (40021 MB)
sda: assuming drive cache: write through
SCSI device sda: 78165360 512-byte hdwr sectors (40021 MB)
sda: assuming drive cache: write through

badblocks -svw /dev/sda -- first pass writing/reading looks OK except for 1-2
sector errors, after badblocks switch to second pattern writing -- 100% error
rate (end_request: I/O error, dev sda, sector xxx). After unpluging/plugging
drive you can start again with the same result.

Didn't check under USB 1.1 (well, 80 GB at 12 Mbit/sec is not for the fainted
heart).
Comment 7 Victor Moroz 2005-11-08 10:09:00 UTC
Oops, missed kernel version. Tried 2.6.11.12 and 2.6.14
Comment 8 Adapted 2005-11-08 14:18:27 UTC
Just to pitch in here - I have the same problem in 2.6.12-9, using the pre-built
ubuntu kernel. This bug seems to occur when you have been transferring a lot of
data. I just got a USB 2.0 SATA-USB external hard drive and I've seen this bug
over and over again in the last few days. If I'm reading, all I get is a device
lockup. If I'm writing, I tend to get this sort of thing in the root directory:

000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa  ..............U.
00000200: 5252 6141 0000 0000 0000 0000 0000 0000  RRaA............
00000210: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000220: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000230: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000240: 5553 4243 b3ba 0d00 0002 0000 0000 0a2a  USBC...........*
00000250: 0017 4a1c 8300 0001 0000 0000 0000 0000  ..J.............
00000260: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000270: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000280: 5252 6141 0000 0000 0000 0000 0000 0000  RRaA............
00000290: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002c0: 5553 4243 b3ba 0d00 0002 0000 0000 0a2a  USBC...........*
000002d0: 0017 4a1c 8300 0001 0000 0000 0000 0000  ..J.............
000002e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000002f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000300: 5252 6141 0000 0000 0000 0000 0000 0000  RRaA............

I've had the same problems with FAT32 and XFS. I think the device is just
dumping USB traffic to disk for some reason.
Comment 9 Adapted 2005-11-10 12:23:19 UTC
I tried 'rmmod ehci_hcd' and then md5sum on a 100Gb partition. 24 hours later,
it still wasn't done (wow USB 1.1 is slow!), but the device hadn't locked up either.
This was done with a clean (cfdisk) partition table. High speed access (md5sums
of big files) still crashed the drive.
This should answer Matthew's questions in comment #5.
Comment 10 Matthew Dharm 2005-11-20 18:04:41 UTC
I think this is pretty likely an EHCI HCD problem of some sort.
Comment 11 Greg Kroah-Hartman 2006-02-09 13:53:49 UTC
Ok, passing it to david then...
Comment 12 David Brownell 2006-02-09 14:15:37 UTC
Let's see the "lspci -vv" info for the relevant controller. 
Comment 13 Adapted 2006-02-22 16:46:58 UTC
Here's lspci -vv output from the EHCI device:
----------------------
0000:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) (prog-if 20
[EHCI])
	Subsystem: Asustek Computer, Inc. A7V600 motherboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping-
SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
	Latency: 32, Cache Line Size: 0x10 (64 bytes)
	Interrupt: pin C routed to IRQ 21
	Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=256]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
----------------------
And in case it's useful here's the output from one of the UHCI controllers:
----------------------
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
Controller (rev 81) (prog-if 00 [UHCI])
	Subsystem: Asustek Computer, Inc. A7V600 motherboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping-
SERR- FastB2B-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort-
<MAbort- >SERR- <PERR-
	Latency: 32, Cache Line Size: 0x08 (32 bytes)
	Interrupt: pin A routed to IRQ 21
	Region 4: I/O ports at 8400 [size=32]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
----------------------
Comment 14 David Brownell 2006-02-22 20:38:50 UTC
One thing to try is the patch at 
 
http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-04-usb/usb-ehci-unlink-tweaks.patch 
 
which changes some code paths that have been problematic on 
VIA hardware for a long time. 
 
Otherwise the only interesting pci-level thing is "MemWINV+" 
which is not something most hardware claims to support.  It 
could be that it's not _correctly_ supported here.  It'd be 
interesting to see what turning that bit off does; you could 
do that with "setpci", or more simply by commenting out the 
call to pci_set_mwi() in the EHCI startup, rebuilding, and 
rebooting after a power cycle. 
 
Re Comment #8, notice the pattern (starts USBC, 128 bytes apart) 
is the part of a mass storage CBW (request) ... those are 31 bytes 
each.  The RRaA might be some FAT labeling thing. 
 
The fact that the device _ever_ sees a CBW as data means that 
there's some kind of bug. 
 
One possibility is that it's in usb-storage fault recovery logic, 
where some request isn't properly terminated (aborted?) and so 
the next command comes in while the drive is expecting data. 
It's common that EHCI triggers fault modes that slower drivers 
can't ... both hardware and software (including firmware). 
 
Another is that it's a PCI level bug where for some reason a 
DMA request gets the wrong cacheline.  Or similarly, an EHCI 
level bug that sends "old" data. 
Comment 15 Adrian Bunk 2006-08-19 07:33:19 UTC
Please reopen this bug if:
- it is still present in kernel 2.6.17 and
- you can answer the points raised in Comment #14.
Comment 16 Adapted 2007-06-03 09:25:22 UTC
I stayed away from USB drives because of the corruption issue,
but I just got a new 500 gig drive that my bios wouldn't recognize,
so I thought I'd give it another go (until I figured out that it
needed a jumper to drop down to SATA-1).

I created a filesystem, mounted it, and used lftp to download
about 26 gig of data onto it. That seemed to work fine. When I
then started doing md5sum on the files, the whole filesystem
was trashed pretty quickly. I noticed the following in the log:

[ 9601.083861] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9631.289475] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9641.519123] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9667.398810] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9697.604425] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9707.834071] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9715.063927] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9745.281526] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9755.527152] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9794.288764] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9804.518405] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9855.275198] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9865.508829] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9904.266445] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9914.496101] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9950.897014] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[ 9961.130654] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10012.510570] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10022.740203] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10063.898454] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10074.128097] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10121.741292] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10131.982926] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10149.801915] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10180.007532] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10190.237186] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10243.390604] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10253.620242] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10261.477216] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10291.682837] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10301.912477] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10308.339460] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10338.553064] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10348.782713] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10355.413418] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10385.619024] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10395.848669] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10446.094166] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10448.227179] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10478.432792] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10488.662436] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10494.917663] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10525.135260] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10535.364904] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10575.780195] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10586.017833] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10622.231026] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10622.342856] usb 5-1: device descriptor read/64, error -71
[10622.558556] usb 5-1: device descriptor read/64, error -71
[10622.774253] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10622.886094] usb 5-1: device descriptor read/64, error -71
[10623.101794] usb 5-1: device descriptor read/64, error -71
[10623.317490] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10623.724916] usb 5-1: device not accepting address 6, error -71
[10623.836761] usb 5-1: reset high speed USB device using ehci_hcd and address 6
[10624.244196] usb 5-1: device not accepting address 6, error -71
[10624.244237] usb 5-1: USB disconnect, address 6
[10624.244515] sd 5:0:0:0: scsi: Device offlined - not ready after error recovery

I got exactly the same problem as before, using Ubuntu Feisty's
linux-image-2.6.20-15-generic - I downloaded the source, and it
seems that usb-ehci-unlink-tweaks.patch has already been applied.
I'm guessing that it got confused after one of those many resets.

I did recompile the ehci module without the call to pci_set_mwi,
but the kernel I got wasn't functional, and the version didn't
match that of the active kernel so I couldn't just swap in the
module - this seems to be some Ubuntu problem.

Anyway, I've now got this drive to work internally, and so won't
need the USB adapter for a while again - I certainly don't have
time to waste trashing filesystems on my drives. Still, this is
a pretty drastic bug and it has existed for nearly 2 years so
I'm asking for it to be reopened.
Comment 17 Natalie Protasevich 2007-09-22 19:25:32 UTC
Any update on this problem please, has the issue been addressed?
Thanks.
Comment 18 Natalie Protasevich 2008-03-04 00:53:28 UTC
I am closing this bug, since no updates/complaints has been done. 
Please reopen if the problem confirmed with latest kernel.

Note You need to log in before you can comment on or make changes to this bug.