Distribution: RedHat9 Hardware Environment: HP Omnibook XE 3 GC and others Software Environment: Redhat 9 partly updated / Aurox 9 / SuSe Problem Description: since at least Kernel 2.6.3 partititons on usb-devices (HDD'S with USB-Adapter or USB-Sticks) get distroyed when writing to them. Steps to reproduce: mkfs.ext3 /dev/sda1 (happens with any filesystem ext3, ext2, fat) mount /dev/sda1 /mnt/usbdisk cp -Rp linux-2.6.9-rc3 /mnt/usbdisk cd /mnt/usbdisk make mrproper make menuconfig or copy .config make or just write a lot of files and delete and write ... you will get IO-Errors. On ext3 you get an aborted journal....
Is this problem still present in kernel 2.6.13-rc5?
I can't reproduce this bug nomore. Not even with 2.6.7. I have changed my hardware and distribution meanwhile. I did also try it on my old HP Omnibook XE3 where I had the problem before. But I have lost my USB2.0 PCMCIA card. With the build in USB 1.1 it seems to work fine. The only difference on this machine is, that I'm using a newer Aurox Distribution. Maybe glibc and/or gcc had caused the problem? On the AMD64 I'm using Gentoo 2005.1 and the gentoo kernel 2.6.12-gentoo-r6. It seems to work fine now.
Perdone, in my retests I made a mistake. I did just untar the archive and a make menuconfig and make But if I copy a already compiled kernel to the usb disk and do make mrproper copy somewhere/.config make it will end up in compile errors. But not in i/o errors. But if I then delete the kernel source directory I will get errors in systemlog and the filesystem is automatically mounted readonly. I have retested with 2.6.13-rc6 So the problem does still exist. Sorry for this. What can I do to help you to find a solution? I would really like to use my USB-Disk and Sticks under Kernel 2.6.
Created attachment 5636 [details] My .config for 2.6.13-rc6
Just to eliminate some obvious possibilities, can you destroy the partition table and re-create it with fdisk, then format the partition and continue with the rest of the tests? This will eliminate the possibility of funkyness in the partition table. Also, is this a full-speed or high-speed device? If it is high-speed, rmmod ehci may help data integrity (at the expense of speed). If this fixes it, it's either a device problem or an EHCI HCD problem.
I have similar problems with USB 2.0-ATA converters. lsusb: Bus 001 Device 004: ID 04cf:8818 Myson Century, Inc. usb 1-3: new high speed USB device using ehci_hcd and address 6 scsi7 : SCSI emulation for USB Mass Storage devices usb-storage: device found at 6 usb-storage: waiting for device to settle before scanning Vendor: ST340016 Model: A Rev: 3.10 Type: Direct-Access ANSI SCSI revision: 00 SCSI device sda: 78165360 512-byte hdwr sectors (40021 MB) sda: assuming drive cache: write through SCSI device sda: 78165360 512-byte hdwr sectors (40021 MB) sda: assuming drive cache: write through badblocks -svw /dev/sda -- first pass writing/reading looks OK except for 1-2 sector errors, after badblocks switch to second pattern writing -- 100% error rate (end_request: I/O error, dev sda, sector xxx). After unpluging/plugging drive you can start again with the same result. Didn't check under USB 1.1 (well, 80 GB at 12 Mbit/sec is not for the fainted heart).
Oops, missed kernel version. Tried 2.6.11.12 and 2.6.14
Just to pitch in here - I have the same problem in 2.6.12-9, using the pre-built ubuntu kernel. This bug seems to occur when you have been transferring a lot of data. I just got a USB 2.0 SATA-USB external hard drive and I've seen this bug over and over again in the last few days. If I'm reading, all I get is a device lockup. If I'm writing, I tend to get this sort of thing in the root directory: 000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U. 00000200: 5252 6141 0000 0000 0000 0000 0000 0000 RRaA............ 00000210: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000220: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000230: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000240: 5553 4243 b3ba 0d00 0002 0000 0000 0a2a USBC...........* 00000250: 0017 4a1c 8300 0001 0000 0000 0000 0000 ..J............. 00000260: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000270: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000280: 5252 6141 0000 0000 0000 0000 0000 0000 RRaA............ 00000290: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000002a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000002b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000002c0: 5553 4243 b3ba 0d00 0002 0000 0000 0a2a USBC...........* 000002d0: 0017 4a1c 8300 0001 0000 0000 0000 0000 ..J............. 000002e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 000002f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000300: 5252 6141 0000 0000 0000 0000 0000 0000 RRaA............ I've had the same problems with FAT32 and XFS. I think the device is just dumping USB traffic to disk for some reason.
I tried 'rmmod ehci_hcd' and then md5sum on a 100Gb partition. 24 hours later, it still wasn't done (wow USB 1.1 is slow!), but the device hadn't locked up either. This was done with a clean (cfdisk) partition table. High speed access (md5sums of big files) still crashed the drive. This should answer Matthew's questions in comment #5.
I think this is pretty likely an EHCI HCD problem of some sort.
Ok, passing it to david then...
Let's see the "lspci -vv" info for the relevant controller.
Here's lspci -vv output from the EHCI device: ---------------------- 0000:00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86) (prog-if 20 [EHCI]) Subsystem: Asustek Computer, Inc. A7V600 motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, Cache Line Size: 0x10 (64 bytes) Interrupt: pin C routed to IRQ 21 Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=256] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- ---------------------- And in case it's useful here's the output from one of the UHCI controllers: ---------------------- 0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81) (prog-if 00 [UHCI]) Subsystem: Asustek Computer, Inc. A7V600 motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32, Cache Line Size: 0x08 (32 bytes) Interrupt: pin A routed to IRQ 21 Region 4: I/O ports at 8400 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- ----------------------
One thing to try is the patch at http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-04-usb/usb-ehci-unlink-tweaks.patch which changes some code paths that have been problematic on VIA hardware for a long time. Otherwise the only interesting pci-level thing is "MemWINV+" which is not something most hardware claims to support. It could be that it's not _correctly_ supported here. It'd be interesting to see what turning that bit off does; you could do that with "setpci", or more simply by commenting out the call to pci_set_mwi() in the EHCI startup, rebuilding, and rebooting after a power cycle. Re Comment #8, notice the pattern (starts USBC, 128 bytes apart) is the part of a mass storage CBW (request) ... those are 31 bytes each. The RRaA might be some FAT labeling thing. The fact that the device _ever_ sees a CBW as data means that there's some kind of bug. One possibility is that it's in usb-storage fault recovery logic, where some request isn't properly terminated (aborted?) and so the next command comes in while the drive is expecting data. It's common that EHCI triggers fault modes that slower drivers can't ... both hardware and software (including firmware). Another is that it's a PCI level bug where for some reason a DMA request gets the wrong cacheline. Or similarly, an EHCI level bug that sends "old" data.
Please reopen this bug if: - it is still present in kernel 2.6.17 and - you can answer the points raised in Comment #14.
I stayed away from USB drives because of the corruption issue, but I just got a new 500 gig drive that my bios wouldn't recognize, so I thought I'd give it another go (until I figured out that it needed a jumper to drop down to SATA-1). I created a filesystem, mounted it, and used lftp to download about 26 gig of data onto it. That seemed to work fine. When I then started doing md5sum on the files, the whole filesystem was trashed pretty quickly. I noticed the following in the log: [ 9601.083861] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9631.289475] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9641.519123] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9667.398810] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9697.604425] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9707.834071] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9715.063927] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9745.281526] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9755.527152] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9794.288764] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9804.518405] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9855.275198] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9865.508829] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9904.266445] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9914.496101] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9950.897014] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [ 9961.130654] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10012.510570] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10022.740203] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10063.898454] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10074.128097] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10121.741292] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10131.982926] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10149.801915] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10180.007532] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10190.237186] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10243.390604] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10253.620242] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10261.477216] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10291.682837] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10301.912477] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10308.339460] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10338.553064] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10348.782713] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10355.413418] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10385.619024] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10395.848669] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10446.094166] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10448.227179] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10478.432792] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10488.662436] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10494.917663] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10525.135260] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10535.364904] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10575.780195] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10586.017833] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10622.231026] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10622.342856] usb 5-1: device descriptor read/64, error -71 [10622.558556] usb 5-1: device descriptor read/64, error -71 [10622.774253] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10622.886094] usb 5-1: device descriptor read/64, error -71 [10623.101794] usb 5-1: device descriptor read/64, error -71 [10623.317490] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10623.724916] usb 5-1: device not accepting address 6, error -71 [10623.836761] usb 5-1: reset high speed USB device using ehci_hcd and address 6 [10624.244196] usb 5-1: device not accepting address 6, error -71 [10624.244237] usb 5-1: USB disconnect, address 6 [10624.244515] sd 5:0:0:0: scsi: Device offlined - not ready after error recovery I got exactly the same problem as before, using Ubuntu Feisty's linux-image-2.6.20-15-generic - I downloaded the source, and it seems that usb-ehci-unlink-tweaks.patch has already been applied. I'm guessing that it got confused after one of those many resets. I did recompile the ehci module without the call to pci_set_mwi, but the kernel I got wasn't functional, and the version didn't match that of the active kernel so I couldn't just swap in the module - this seems to be some Ubuntu problem. Anyway, I've now got this drive to work internally, and so won't need the USB adapter for a while again - I certainly don't have time to waste trashing filesystems on my drives. Still, this is a pretty drastic bug and it has existed for nearly 2 years so I'm asking for it to be reopened.
Any update on this problem please, has the issue been addressed? Thanks.
I am closing this bug, since no updates/complaints has been done. Please reopen if the problem confirmed with latest kernel.