Bug 6817
Summary: | USB errors and "reset high speed USB device" | ||
---|---|---|---|
Product: | Drivers | Reporter: | Hendryk (Hendryk.Pfeiffer) |
Component: | USB | Assignee: | Alan Stern (stern) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | garkein, greg, protasnb, stern |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.17.1 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 5089 | ||
Attachments: |
dmesg when connecting the DVD writer
dmesg when "growisofs -Z /dev/sr0 -R -J <some_file>" kernel 2.6.16.11 connecting the DVD writer kernel 2.6.16.11 "growisofs -Z /dev/sr0 -R -J <some_file>" Slow down transfers to Lacie DVD drive usb device reset when blanking cdrw usb device reset when writing large file to usb hard drive DVD RW error writing DVD |
Description
Hendryk
2006-07-11 06:06:46 UTC
Your second problem (the "cp: failed to preserve ownership" error) is easy to explain. The filesystem on the memory stick is VFAT, which does not have any way to store file ownership information. Hence when you copy a file to the stick, the cp program gets an error when it tries to set the new file's owner. This has nothing to do with USB; you would get the same error if you copied a file to a VFAT-formatted floppy. The DVD problem requires more information. Try building your 2.6.17 kernel with CONFIG_USB_DEBUG and CONFIG_USB_STORAGE_DEBUG both on. Then attach the dmesg logs showing what happens when you turn on (or plug in) the DVD drive and when you try to run the growisofs program. Created attachment 8541 [details]
dmesg when connecting the DVD writer
Created attachment 8542 [details]
dmesg when "growisofs -Z /dev/sr0 -R -J <some_file>"
Thanks a lot so far! First, on the USB memory stick. I do get corrupt data sometimes, but so far I have failed to reproduce it. What I wrote last time wasn't the bug, you are right. Concerning the DVD writer, I have compiled 2.6.17.1 with CONFIG_USB_DEBUG and CONFIG_USB_STORAGE_DEBUG. I have added the dmesg output that I get when I connect the DVD writer and when the growisofs fails as attachments to this bug. David Brownell sent me this email which was not added to this bug description for some reason. I have some comments and include it here: > On Tuesday 11 July 2006 12:32 pm, Andrew Morton wrote: > > > Problem Description: > > Write errors when large files are transferred; > > sometimes no useful error message; > > in some cases "reset high speed USB device"; details see below > > > > I think this is a software problem because: > > * the same LaCie DVD writer works under Windows > > * the same 1GB USB sticks work under Windows > > * various USB 2.0 memory sticks with 128 MB or less work fine > > I'm curious ... how does the throughput compare between Linux and Windows? > For big blocks; say, what "dd bs=64K" would do. (Preferably measuring > read > and write speeds without a filesystem in the way.) > > I don't really have a handle on this and I'm not likely to get one without > some quality time with a high speed analyser ... Linux seems to write the DVDs with the maximum speed of the DVD writer - Windows (at least NERO or how this program is called), doesn't do that, definitely. > But one theory I've had about this class of errors is that Linux may be > driving the device faster than Windows, and thereby hitting races in > those devices (hardware or firmware) that Windows wouldn't trigger. > > > * boot the computer > > * connect the DVD writer > > * do a "growisofs", get the same error as above > > * disconnect the DVD writer and connect it again > > * do another "growisofs" > > * this one works! I get a DVD burned without any errors. > > * further attempts to "growisofs" all fail unless I reboot and follow > the > > procedure from the beginning > > That's most bizarre. This drive has an external power adapter, yes? > Which means it could retain state unless you power it off ... did you > try this experiment both with and without driver power-off? My DVD writer gets its power from the USB cable. So it was a power off-power on. I'll try to figure out whether this happened because Linux unloads and reloads a number of kernel modules when you disconnect/reconnect the device or whether it happened because the device was switched off and on. > If it works without power-off, and fails with it, that strongly > suggests this is a bug in the peripheral. Strongly enough that I'd > suggest taking it up with LaCie... > > > cp: failed to preserve ownership of '/media/usb/<filename>': Operation > not > > permitted. > > Strange. USB doesn't issue EPERM in many places ... it'd be easy to > rule out (or in!) the USB stack as a factor here. > > The key spot is in core/hcd.c::hcd_submit_urb() when urb->reject is set, > and that should probably change ... it's not a permission error in the > security sense intended by EPERM. You might change that to something > less generic like EBADRQC, and see if that symptom changes. If it does, > that's a solid clue ... > > - Dave It definitely looks like there's something funny with the DVD drive itself. First, it took at least 30 seconds to become ready (that might be normal, I don't know). Second, although the drive executed lots of simple commands okay, it choked on a READ command: Jul 13 00:03:17 myhost kernel: usb-storage: Hardware Error: Logical unit communication failure It's not clear exactly what that means, but it doesn't sound good. After executing a bunch of other commands I don't understand, the drive died on the first WRITE command. Again, I don't know why. It appears that this is not a USB problem -- that is, the communications at the USB level were working okay. It may be a hardware problem with the drive, or it may be a problem with the growisofs application. Somebody who knows how DVD drives are supposed to work should like at this. Created attachment 8572 [details]
kernel 2.6.16.11 connecting the DVD writer
Created attachment 8573 [details]
kernel 2.6.16.11 "growisofs -Z /dev/sr0 -R -J <some_file>"
Hi, I can write DVDs with kernel 2.6.16.11 (from www.kernel.org). I have compiled it with CONFIG_USB_DEBUG and CONFIG_USB_STORAGE_DEBUG and attached the dmesg output when I connect the device and when I burn the DVD with growisofs. At 23:09:38, growisofs is called and checks whether the DVD(+RW) is empty or not. Then, growisofs creates an ISO image and does not use the DVD drive for about 25 seconds. At 23:10:45, the actual writing starts. It is in fact just a dd command. I have truncated the (200 MB) log file until shortly before the writing finishes at 23:25:10. Since writing DVDs is possible with 2.6.16.11, I am not convinced this is a hardware problem. Did the kernel people change anything about the timing of the USB data transfer between 2.6.16.11 and 2.6.17.1? With 2.6.16.11 without USB_DEBUG and USB_STORAGE_DEBUG, I can also get the behaviour that I get a "reset high speed USB device" after about 80% of a 4.2 GB DVD have been written. But _with_ USB_DEBUG and USB_STORAGE_DEBUG, this error never happens. Can it be true that something is slower when huge log files are written, but without the log files, the transfer is too fast for the hardware, and I get these errors? If that's the case, is there a way of patching the kernel and slowing down the USB data transfer (or the sequence of USB commands - you guys know this much better than I do)? If you suggest a patch, I would be happy to try and report what happens. Thanks again for your help! Created attachment 8927 [details]
Slow down transfers to Lacie DVD drive
Nothing about the speed of USB transfers has changed between 2.6.16 and 2.6.17.
This patch for 2.6.17 will slightly slow down the data transfers to your drive.
It turns out that the USB-IDE adapters made by Genesys Logic require a 100
microsecond delay at a certain phase of the transfer protocol; perhaps that
delay will help your drive also.
If it doesn't, you could try reducing the max_sectors setting for your drive
(see the explanation in the FAQ at www.linux-usb.org).
Same here. Occasionally I get "reset high speed USB device" during CD/DVD writing, mostly that CD-R is lost then. I also get this reset message very often when reading/writing to my USB attached hard drive. These problems came up after 2.6.16, now I'm using 2.6.18-rc2 (updated to that since the problems appeared in 2.6.17). The enclosure for the DVD writer has an ALI chip. (see also bug #6935) Hi, many thanks to Alan Stern! The patch does the job. With 2.6.17.1 and this patch, I was able to burn a number of DVDs. No problem at all. Great! From what google says when you search for `reset high speed USB device', many people get such strange errors with various sorts of USB devices. It seems there is a lot of hardware out there for which the standard Linux kernel is `too fast'. Maybe it would even be worthwhile to make the slow data transfer the default behaviour (and provide a kernel config for the wizards who know what they are doing and who want it faster). Thaks again! Ok, I tried UNUSUAL_DEV(0x04b4, 0x6830, 0x0000, 0x0000, "Cypress", "Cypress AT2LP RC7", US_SC_DEVICE, US_PR_DEVICE, NULL, US_FL_GO_SLOW ), in unusual_devs.h and it did help for my hard drive enclosure. A quick test showed no more resets while writing large files. Removed and they did show up again. I'm not sure about what to exactly fill in the Vendor/Product Name fields, as /proc/bus/usb/devices shows T: Bus=04 Lev=02 Prnt=02 Port=03 Cnt=04 Dev#= 7 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=04b4 ProdID=6830 Rev= 2.40 S: Product=Cypress AT2LP RC7 S: SerialNumber=DEF10AFD02B9 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 2mA I: If#= 0 Alt= 0 #EPs= 3 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=81(I) Atr=03(Int.) MxPS= 2 Ivl=256ms whereas `lsusb` shows "Bus 004 Device 007: ID 04b4:6830 Cypress Semiconductor Corp. USB-2.0 IDE Adapter" Now I'll try to see if that will also help with my DVD+R enclosure. Grrrr.... it happend again also with US_FL_GO_SLOW on the hard drive. Building debug kernel right now.... I've also found a way to reproduce that usb resets on the DVD+R. (Erasing a CD-RW instantly gives such resets) Created attachment 9021 [details]
usb device reset when blanking cdrw
Created attachment 9022 [details]
usb device reset when writing large file to usb hard drive
Added two debug dumps of the device resets I get on my enclosures.
Hendryk: I'm glad the patch works for you. I'll submit it for inclusion in the kernel. It's probably too late for the patch to appear in 2.6.19 but it ought to get into 2.6.20. Daniel: If your patch reduces the frequency of those resets then it's a good thing. You shouldn't worry too much about a reset occurring while writing a hard disk; having a few of them now and then doesn't matter so long as the disk remains accessible and all the data eventually gets transferred correctly. The problem with blanking the CD-RW is totally different. The log clearly shows the drive returning an error for some operation (sorry I don't know what the operation is, the log only says that the opcode is 0xad). The application issued the command twice, the drive returned the same error each time, and it got reset after each error. This wasn't a driver problem at all; for whatever reason, the drive wasn't able to carry out the command. Maybe it doesn't like the disc you were using. The information for the Vendor and Product names should be the Vendor and Model fields you see in /proc/scsi/scsi. If those are blank then you can use the USB strings "Cypress" and "AT2LP RC7" (no need for all the extra blanks). Whip the patch into good shape and send it to Phil Dibowitz, according to the instructions at the top of unusual_devs.h. Created attachment 9024 [details]
DVD RW error writing DVD
Alan,
actually I'm not sure whether that patch actually reduces the appereance of
that problem. Currently I would say no, as, at least on my DVD enclosure it
would happen only on about every 5th recordable.
However I haven't yet seen any confirmation from the debug output that confirms
that my unusual devs entry is correctly in place and working.
That CDRW blanking reset I only found by coincidence and thought it would make
a nice quick test, but you're right, it was not the real problem.
I took some time to repeatedly write DVDs to reproduce my original problem:
Looks quite similar to that hard drive one (status code -71) and the bad thing
is that unlike the hard drive, it does not recover. That recordable would be
lost (if it wasn't a RW in this case) and since that already happened a few
times I'm looking for a solution since it costs money every time it happens.
Regarding Vendor/Name for the unusual devs entry: I didn't wanted to take that
of /proc/scsi/scsi since it shows the name of the drive mounted in that
enclosure, not the enclosure itself.
A quick look at the sources and some EHCI documentation led me to the conclusion that the host controller reported a transaction error due to - CRC - Timeout - Bad PID From a colleague, who has some experience in developing USB hardware I got the hint that this may happen if a NAK from the device got corrupted or if the device does not answer at all, which does happen in real life. From what I understand from a quick look at the sources, there is a define EHCI_TUNE_CERR 3 /* 0-3 qtd retries; which lets the host controller retry that URB three times before issuing that error. According to my colleagues it is however not sufficient to rely on these three times as these three times would by far not sufficient to cope with slow devices (which may not answer at all under load for some periods of time) and that addiotional retry handling would be needed on a higher layer. To me the trace looks as if such additional retry handling does not exist in the storage driver. Could you follow my explanations? Maybe you have some sugguestions for additional tests? Is such extension to reenqueue that URB already planned? How can we proceed? Regards, Daniel You're correct that the additional retry handling does not exist in the usb-storage driver. It exists at a higher level, in the SCSI core. Basically usb-storage just resets the device and then reports the error to the SCSI layer, which then issues the retry. However in your case the commands come from an application program instead of the SCSI core, and apparently the program doesn't attempt to retry failed WRITE commands. Understandable, I suppose, since any error when streaming data to a DVD or CD is probably fatal. The interesting thing is that even after repeated resets, your drive did not recover. It reported error status for every single command following the failed WRITE. When a device behaves like that, it really doesn't matter what the various software layers do or don't do -- nothing is going to work. Alan, thank you for the insights. Now, from Sep 15 00:25:32 haktar kernel: usb-storage: usb_stor_bulk_transfer_sglist: xfer 32768 bytes, 8 entries Sep 15 00:25:32 haktar kernel: ehci_hcd 0000:00:1d.7: devpath 1.2 ep2out 3strikes Sep 15 00:25:32 haktar kernel: usb-storage: Status code -71; transferred 9216/32768 I understand that together with that WRITE command, 32k Bytes data are to be transferred. From 9216/32768 I believe to understand that those 32k are transferred in multiple(?) URBs, the one after 9216 Bytes which is failing. (If not URBs, then at least USB data packets?) Now assuming the device does not fully buffer the 32k data but, because of small buffers, transferres that data directly to the DVD writer, which in turn directly burns them, how is that transmission error to be recovered on a SCSI command basis? This may work for random access media like a hard drive, but for DVD writers I see no other chances to recover from that (hopefully temporarily) occurring transmission errors than to try (longer) in retransmitting that failing URB. I know I'm most likely leaving out a lot of details I'm not aware of, but do you understand what I mean? Is this plausible, and is there a way of re-inserting that URB? I understand that the CERR counter can only be set to a max. of 3. 0 would try endless, but then I won't notice the occurrence of that issue at all (only if it would fail endlessly), so that wouldn't help in verifying a probable solution either. If my theory makes sense, I would like to test it to verify if that is a temporarily occuring device unresponsiveness that could be cured by trying just a little longer. What do you think? Regards, Daniel Definitely multiple packets, since the largest high-speed bulk packet size is 512 bytes. The number of URBs is 8. The failure occurred after 9216 bytes had been transferred successfully (18 packets). I don't know if the transmission error would be temporary with your drive. In general when things like this happen, the errors are not temporary. The only way to recover is to reset the device. You can try setting CERR to 0. It might even work. My guess is that you'll get stuck in endless retransmissions. (The SCSI layer will abort the command after 30 seconds or so.) The fact of the matter is, your device isn't obeying the USB protocol. If its buffers are full and it's not ready to accept more data, then it should respond with NAK or NYET. Instead it isn't responding at all. This doesn't bode well for recovery attempts. Ok, 32kB in 8 URBs with 512 Bytes packets, one of them failing. If I would set CERR to zero and if that would "solve" the problem, do I guess right that I wouldn't be able to tell so since the host controller would repeat that on its own without telling me? Otherwise, it would be worth trying just for getting more information about what would help and what not. Does that CERR counter counts for one complete URB of 4k or does it count for every bulk packet of 512 Bytes? If the error is indeed a missing answer, do I assume right that (given a temporary failure) one would have to repeat that single packet with the same sequence bit? Is that possible in the ehci driver? Regards, Daniel I would think it would be worth trying to set CERR to 0 whether it helps or not. If it doesn't help then you've learned something, and if it does help then you're finished! CERR gets reset every time a packet is transmitted successfully (unless you set it to 0, in which case it never gets reset at all). If the failure really is temporary then repeating the same packet with the same data toggle should fix it. And of course that's exactly what CERR=0 would do. All you need to do is change the definition of EHCI_TUNE_CERR in ehci-hcd.c. I'll try for sure. But let me ask again: As this failure doesn't happen every time I burn a DVD, but only every now and then - am I correct in understanding that with CERR == 0 && success, I won't be able to tell wether it acutally helped or I had just luck that the problem didn't occur that time? That's right. However my guess is that you will be able to tell, because the operation will fail persistently. Did 10 full rewrites of a DVD+RW without any aborting WRITE with CERR == 0. Not sure what this means but I'll stay with that CERR on 0 meanwhile. There were still some -71 status results which didn't harm, for example when the application tried to READ CAPACITY while the tray was still open. However I discovered due to a debug message I added, that my unusual devs entry didn't work, as the GO_SLOW bit wasn't set. Any idea what I made wrong? UNUSUAL_DEV(0x3873, 0x5621, 0x0000, 0x0000, "", "USB 2.0 Storage Device", US_SC_DEVICE, US_PR_DEVICE, NULL, US_FL_GO_SLOW ), T: Bus=04 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#= 5 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=3873 ProdID=5621 Rev= 1.05 S: Product=USB 2.0 Storage Device S: SerialNumber=00000000000000004913 C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr= 0mA I: If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms The problem with your unusual_devs entry is the revision limits (the third and fourth numbers on the first line of the entry). You set the minimum and maximum both to 0, but your device's revision number is 1.05 so the entry didn't apply. If you change those two numbers to 0x0105 then it will apply. Thanks, didn't got the meaning of the fields obviously. Redid DVD writer test with EHCI_CERR back to 3, but with GO_SLOW set. Went fine until run 8, in which I got: usb-storage: Command WRITE_10 (10 bytes) usb-storage: 2a 00 00 15 3a d0 00 00 10 00 usb-storage: Bulk Command S 0x43425355 T 0xc1d95 L 32768 F 0 Trg 0 LUN 0 CL 10 usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes usb-storage: Status code 0; transferred 31/31 usb-storage: -- transfer complete usb-storage: Bulk command transfer result=0 usb-storage: usb_stor_bulk_transfer_sglist: xfer 32768 bytes, 2 entries usb-storage: command_abort called usb-storage: usb_stor_stop_transport called usb-storage: -- cancelling sg request usb-storage: Status code -104; transferred 30720/32768 usb-storage: -- transfer cancelled usb-storage: Bulk data transfer result 0x4 usb-storage: Device slow? 100 usb-storage: -- command was aborted usb-storage: storage_pre_reset hub 4-1:1.0: port 2 not reset yet, waiting 10ms usb 4-1.2: reset high speed USB device using ehci_hcd and address 5 Now that does look different from those -71 error I got until now and what makes me wonder the most is that line usb-storage: command_abort called Err... At least I didn't do anything to abort something. k3b (growisofs) tells: :-( unable to WRITE@LBA=153ad0h: Input/output error :-( write failed: Input/output error Hmm. Who was issuing that abort, - and why? The SCSI core aborts a command after a certain amount of time (usually 30 seconds, although growisofs might use a different limit). In this case it means that the drive must have been sending NAK or NYET packets. Any updates on the problem? Could you try the latest kernel please? If it works we can close the bug, or add people from other subsystems. Thanks, --Natalie As for my part, I think replacing the USB cable from the host adapter to the USB hub to which the DVD writer was connected, cured the problem. No stalled DVD writes since then. For my part, as I said in #12 above, the patch worked fine. Meanwhile, I got different hardware and therefore cannot test it anymore. Best wishes, Hendryk Since everybody is happy, I'm closing the bug report. |