Bug 6817

Summary: USB errors and "reset high speed USB device"
Product: Drivers Reporter: Hendryk (Hendryk.Pfeiffer)
Component: USBAssignee: Alan Stern (stern)
Status: CLOSED CODE_FIX    
Severity: high CC: garkein, greg, protasnb, stern
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.17.1 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on:    
Bug Blocks: 5089    
Attachments: dmesg when connecting the DVD writer
dmesg when "growisofs -Z /dev/sr0 -R -J <some_file>"
kernel 2.6.16.11 connecting the DVD writer
kernel 2.6.16.11 "growisofs -Z /dev/sr0 -R -J <some_file>"
Slow down transfers to Lacie DVD drive
usb device reset when blanking cdrw
usb device reset when writing large file to usb hard drive
DVD RW error writing DVD

Description Hendryk 2006-07-11 06:06:46 UTC
Most recent kernel where this bug did not occur: none
Distribution: Suse 10.1 (but with kernel 2.6.17.1 from www.kernel.org)

Hardware Environment: 
  IBM ThinkPad X41 with 
  LaCie slimline DVD+-RW Dual Layer DVD writer (USB 2.0, purchased July 2005) 
  and some 1GB memory sticks (USB 2.0)

Problem Description:
  Write errors when large files are transferred; 
  sometimes no useful error message;
  in some cases "reset high speed USB device"; details see below

  I think this is a software problem because:
  * the same LaCie DVD writer works under Windows
  * the same 1GB USB sticks work under Windows
  * various USB 2.0 memory sticks with 128 MB or less work fine

  I have google'd for the error message "Reset high speed USB device",
  and it seems that everyone is confused. So I compiled and installed
  the most recent kernel (2.6.17.1), and I can reproduce the errors.

Steps to reproduce:

(A) with the DVD writer:
------------------------

Boot the computer. Connect the DVD writer. /var/log/messages says:

Jun 26 00:29:03 myhost kernel: usb 1-3: new high speed USB device using ehci_hcd
and address 3
Jun 26 00:29:03 myhost kernel: usb 1-3: configuration #1 chosen from 1 choice
Jun 26 00:29:03 myhost kernel: Initializing USB Mass Storage driver...
Jun 26 00:29:03 myhost kernel: scsi2 : SCSI emulation for USB Mass Storage devices
Jun 26 00:29:03 myhost kernel: usb-storage: device found at 3
Jun 26 00:29:03 myhost kernel: usb-storage: waiting for device to settle before
scanning
Jun 26 00:29:03 myhost kernel: usbcore: registered new driver usb-storage
Jun 26 00:29:03 myhost kernel: USB Mass Storage support registered.
Jun 26 00:29:08 myhost kernel:   Vendor: _NEC      Model: DVD+-RW ND-6650A  Rev:
1.23
Jun 26 00:29:08 myhost kernel:   Type:   CD-ROM                             ANSI
SCSI revision: 00
Jun 26 00:29:08 myhost kernel: sr0: scsi3-mmc drive: 24x/24x writer cd/rw
xa/form2 cdda tray
Jun 26 00:29:08 myhost kernel: Uniform CD-ROM driver Revision: 3.20
Jun 26 00:29:08 myhost kernel: sr 2:0:0:0: Attached scsi CD-ROM sr0
Jun 26 00:29:08 myhost kernel: sr 2:0:0:0: Attached scsi generic sg1 type 5
Jun 26 00:29:08 myhost kernel: usb-storage: device scan complete

When I do a "growisofs -Z /dev/sr0 -R -J <some_file>", the step motor of the
DVD drive clicks like it is going to a park position. This happens a few
times, the drive speeds up and slows down a few times, all this sounds a
little unhealthy, and then I get an error before the DVD burning starts:

/dev/sr0: "Current Write Speed" is 4.1x1385KBps.
:-( unable to WRITE@LBA=0h: Input/output error
:-( write failed: Input/output error

/var/log/messages says nothing else but

Jun 26 00:30:53 myhost kernel: usb 1-3: reset high speed USB device using
ehci_hcd and address 3

I know that very similar errors can be caused by the "hald"
deamon. Apparently, this program regularly probes the USB bus for new
devices and somehow interfers with the DVD writing. In my case, the
"hald" was switched off all the time !

I can get the LaCie USB device to work only with kernel 2.6.16.11 (from
www.kernel.org) and only when I follow a strange recipe. It is weird, but
reproducible.

* boot the computer
* connect the DVD writer
* do a "growisofs", get the same error as above
* disconnect the DVD writer and connect it again
* do another "growisofs"
* this one works! I get a DVD burned without any errors.
* further attempts to "growisofs" all fail unless I reboot and follow the
procedure from the beginning

(B) problems with the 1GB memory sticks
---------------------------------------

When I copy a single large file to the USB memory stick with 

  cp <some_file> /media/usb/<filename>

I get (after about 600 MB of copying) the following error:

  cp: failed to preserve ownership of '/media/usb/<filename>': Operation not
permitted.

But there is nothing in /var/log/messages.

I'd be most grateful if someone could help. I am happy to provide any
additional data you require or more verbose error messages (if you
tell me how to activate that feature). 

Finally, some additional info that might help:

The following is a similar LaCie slimline DVD+-RW (purchased 09/2004) which
works fine:

Aug 13 21:42:54 myhost kernel: usb 1-3: new high speed USB device using ehci_hcd
and address 5
Aug 13 21:42:54 myhost kernel: scsi4 : SCSI emulation for USB Mass Storage devices
Aug 13 21:42:54 myhost kernel: usb-storage: device found at 5
Aug 13 21:42:54 myhost kernel: usb-storage: waiting for device to settle before
scanning
Aug 13 21:42:59 myhost kernel:   Vendor: _NEC      Model: DVD+-RW ND-6500A  Rev:
2.22
Aug 13 21:42:59 myhost kernel:   Type:   CD-ROM                             ANSI
SCSI revision: 00
Aug 13 21:42:59 myhost kernel: sr0: scsi3-mmc drive: 24x/24x writer cd/rw
xa/form2 cdda tray
Aug 13 21:42:59 myhost kernel: Attached scsi CD-ROM sr0 at scsi4, channel 0, id
0, lun 0
Aug 13 21:42:59 myhost kernel: Attached scsi generic sg1 at scsi4, channel 0, id
0, lun 0,  type 5
Aug 13 21:42:59 myhost kernel: usb-storage: device scan complete

The following is a 128 MB USB stick that always works:

Jun 28 00:19:05 myhost kernel: usb 2-3: new high speed USB device using ehci_hcd
and address 3
Jun 28 00:19:05 myhost kernel: usb 2-3: configuration #1 chosen from 1 choice
Jun 28 00:19:06 myhost kernel: Initializing USB Mass Storage driver...
Jun 28 00:19:06 myhost kernel: scsi2 : SCSI emulation for USB Mass Storage devices
Jun 28 00:19:06 myhost kernel: usb-storage: device found at 3
Jun 28 00:19:06 myhost kernel: usb-storage: waiting for device to settle before
scanning
Jun 28 00:19:06 myhost kernel: usbcore: registered new driver usb-storage
Jun 28 00:19:06 myhost kernel: USB Mass Storage support registered.
Jun 28 00:19:11 myhost kernel:   Vendor: LEXAR     Model: GEYSER JUMPDRIVE  Rev:
1.00
Jun 28 00:19:11 myhost kernel:   Type:   Direct-Access                      ANSI
SCSI revision: 01 CCS
Jun 28 00:19:11 myhost kernel: SCSI device sdb: 251904 512-byte hdwr sectors
(129 MB)
Jun 28 00:19:11 myhost kernel: sdb: Write Protect is off
Jun 28 00:19:11 myhost kernel: sdb: Mode Sense: 23 00 00 00
Jun 28 00:19:11 myhost kernel: sdb: assuming drive cache: write through
Jun 28 00:19:11 myhost kernel: SCSI device sdb: 251904 512-byte hdwr sectors
(129 MB)
Jun 28 00:19:11 myhost kernel: sdb: Write Protect is off
Jun 28 00:19:11 myhost kernel: sdb: Mode Sense: 23 00 00 00
Jun 28 00:19:11 myhost kernel: sdb: assuming drive cache: write through
Jun 28 00:19:11 myhost kernel:  sdb: sdb1
Jun 28 00:19:11 myhost kernel: sd 2:0:0:0: Attached scsi removable disk sdb
Jun 28 00:19:11 myhost kernel: sd 2:0:0:0: Attached scsi generic sg1 type 0
Jun 28 00:19:11 myhost kernel: usb-storage: device scan complete

And here is one of the 1GB USB memory sticks that doesn't work:

Jun 28 00:19:29 myhost kernel: usb 2-3: new high speed USB device using ehci_hcd
and address 4
Jun 28 00:19:29 myhost kernel: usb 2-3: configuration #1 chosen from 1 choice
Jun 28 00:19:29 myhost kernel: scsi3 : SCSI emulation for USB Mass Storage devices
Jun 28 00:19:29 myhost kernel: usb-storage: device found at 4
Jun 28 00:19:29 myhost kernel: usb-storage: waiting for device to settle before
scanning
Jun 28 00:19:34 myhost kernel:   Vendor: USB 2.0   Model: Flash Disk        Rev:
2.00
Jun 28 00:19:34 myhost kernel:   Type:   Direct-Access                      ANSI
SCSI revision: 02
Jun 28 00:19:34 myhost kernel: SCSI device sdb: 2031616 512-byte hdwr sectors
(1040 MB)
Jun 28 00:19:34 myhost kernel: sdb: Write Protect is off
Jun 28 00:19:34 myhost kernel: sdb: Mode Sense: 0b 00 00 08
Jun 28 00:19:34 myhost kernel: sdb: assuming drive cache: write through
Jun 28 00:19:34 myhost kernel: SCSI device sdb: 2031616 512-byte hdwr sectors
(1040 MB)
Jun 28 00:19:34 myhost kernel: sdb: Write Protect is off
Jun 28 00:19:34 myhost kernel: sdb: Mode Sense: 0b 00 00 08
Jun 28 00:19:34 myhost kernel: sdb: assuming drive cache: write through
Jun 28 00:19:34 myhost kernel:  sdb: sdb1
Jun 28 00:19:34 myhost kernel: sd 3:0:0:0: Attached scsi removable disk sdb
Jun 28 00:19:34 myhost kernel: sd 3:0:0:0: Attached scsi generic sg1 type 0
Jun 28 00:19:34 myhost kernel: usb-storage: device scan complete

For the computer and the LaCie drive that doesn't work, I also include
the following output:

lspci -vv

[...]

00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
USB UHCI #1 (rev 03) (prog-if 00 [UHCI])
        Subsystem: IBM Unknown device 0565
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin A routed to IRQ 177
        Region 4: I/O ports at 1820 [size=32]

00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
USB UHCI #2 (rev 03) (prog-if 00 [UHCI])
        Subsystem: IBM Unknown device 0565
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin B routed to IRQ 217
        Region 4: I/O ports at 1840 [size=32]

00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
USB UHCI #3 (rev 03) (prog-if 00 [UHCI])
        Subsystem: IBM Unknown device 0565
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin C routed to IRQ 225
        Region 4: I/O ports at 1860 [size=32]

00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
USB UHCI #4 (rev 03) (prog-if 00 [UHCI])
        Subsystem: IBM Unknown device 0565
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin D routed to IRQ 201
        Region 4: I/O ports at 1880 [size=32]

00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
USB2 EHCI Controller (rev 03) (prog-if 20 [EHCI])
        Subsystem: IBM Unknown device 0566
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin D routed to IRQ 201
        Region 0: Memory at a0040000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Debug port

[...]

lsusb -v

[...]

Bus 001 Device 003: ID 059f:0643 LaCie, Ltd
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  idVendor           0x059f LaCie, Ltd
  idProduct          0x0643
  bcdDevice            0.00
  iManufacturer           1 LaCie
  iProduct                2 LaCie DVDRW USB
  iSerial                 3 10000F100A571126
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           32
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0xc0
      Self Powered
    MaxPower                2mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         8 Mass Storage
      bInterfaceSubClass      5 SFF-8070i
      bInterfaceProtocol     80
      iInterface              0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x0001
  Self Powered

[...]
Comment 1 Alan Stern 2006-07-11 13:36:50 UTC
Your second problem (the "cp: failed to preserve ownership" error) is easy to
explain.  The filesystem on the memory stick is VFAT, which does not have any
way to store file ownership information.  Hence when you copy a file to the
stick, the cp program gets an error when it tries to set the new file's owner. 
This has nothing to do with USB; you would get the same error if you copied a
file to a VFAT-formatted floppy.

The DVD problem requires more information.  Try building your 2.6.17 kernel with
CONFIG_USB_DEBUG and CONFIG_USB_STORAGE_DEBUG both on.  Then attach the dmesg
logs showing what happens when you turn on (or plug in) the DVD drive and when
you try to run the growisofs program.
Comment 2 Hendryk 2006-07-13 04:38:57 UTC
Created attachment 8541 [details]
dmesg when connecting the DVD writer
Comment 3 Hendryk 2006-07-13 04:40:47 UTC
Created attachment 8542 [details]
dmesg when "growisofs -Z /dev/sr0 -R -J <some_file>"
Comment 4 Hendryk 2006-07-13 04:44:23 UTC
Thanks a lot so far! 

First, on the USB memory stick. I do get corrupt data sometimes, but so far I
have failed to reproduce it. What I wrote last time wasn't the bug, you are right.

Concerning the DVD writer, I have compiled 2.6.17.1 with CONFIG_USB_DEBUG
and CONFIG_USB_STORAGE_DEBUG. I have added the dmesg output that I get when I
connect the DVD writer and when the growisofs fails as attachments to this bug.


Comment 5 Hendryk 2006-07-13 04:57:17 UTC
David Brownell sent me this email which was not added to this bug description
for some reason. I have some comments and include it here:

> On Tuesday 11 July 2006 12:32 pm, Andrew Morton wrote:
> 
> > Problem Description:
> >   Write errors when large files are transferred; 
> >   sometimes no useful error message;
> >   in some cases "reset high speed USB device"; details see below
> > 
> >   I think this is a software problem because:
> >   * the same LaCie DVD writer works under Windows
> >   * the same 1GB USB sticks work under Windows
> >   * various USB 2.0 memory sticks with 128 MB or less work fine
> 
> I'm curious ... how does the throughput compare between Linux and Windows?
> For big blocks; say, what "dd bs=64K" would do.  (Preferably measuring
> read
> and write speeds without a filesystem in the way.)
> 
> I don't really have a handle on this and I'm not likely to get one without
> some quality time with a high speed analyser ...

Linux seems to write the DVDs with the maximum speed of the DVD writer - Windows
(at least NERO or how this program is called), doesn't do that, definitely.

> But one theory I've had about this class of errors is that Linux may be
> driving the device faster than Windows, and thereby hitting races in
> those devices (hardware or firmware) that Windows wouldn't trigger.
> 
> > * boot the computer
> > * connect the DVD writer
> > * do a "growisofs", get the same error as above
> > * disconnect the DVD writer and connect it again
> > * do another "growisofs"
> > * this one works! I get a DVD burned without any errors.
> > * further attempts to "growisofs" all fail unless I reboot and follow
> the
> > procedure from the beginning
> 
> That's most bizarre.  This drive has an external power adapter, yes?
> Which means it could retain state unless you power it off ... did you
> try this experiment both with and without driver power-off?

My DVD writer gets its power from the USB cable. So it was a power off-power on.
 I'll try to figure out whether this happened because Linux unloads and reloads
a number of kernel modules when you disconnect/reconnect the device or whether
it happened because the device was switched off and on.

> If it works without power-off, and fails with it, that strongly
> suggests this is a bug in the peripheral.  Strongly enough that I'd
> suggest taking it up with LaCie...
> 
> >   cp: failed to preserve ownership of '/media/usb/<filename>': Operation
> not
> > permitted.
> 
> Strange.  USB doesn't issue EPERM in many places ... it'd be easy to
> rule out (or in!) the USB stack as a factor here.
> 
> The key spot is in core/hcd.c::hcd_submit_urb() when urb->reject is set,
> and that should probably change ... it's not a permission error in the
> security sense intended by EPERM.  You might change that to something
> less generic like EBADRQC, and see if that symptom changes.  If it does,
> that's a solid clue ...
> 
> - Dave
Comment 6 Alan Stern 2006-07-13 08:06:09 UTC
It definitely looks like there's something funny with the DVD drive itself. 
First, it took at least 30 seconds to become ready (that might be normal, I
don't know).  Second, although the drive executed lots of simple commands okay,
it choked on a READ command:

Jul 13 00:03:17 myhost kernel: usb-storage: Hardware Error: Logical unit
communication failure

It's not clear exactly what that means, but it doesn't sound good.  After
executing a bunch of other commands I don't understand, the drive died on the
first WRITE command.  Again, I don't know why.

It appears that this is not a USB problem -- that is, the communications at the
USB level were working okay.  It may be a hardware problem with the drive, or it
may be a problem with the growisofs application.  Somebody who knows how DVD
drives are supposed to work should like at this.
Comment 7 Hendryk 2006-07-18 03:02:04 UTC
Created attachment 8572 [details]
kernel 2.6.16.11 connecting the DVD writer
Comment 8 Hendryk 2006-07-18 03:03:54 UTC
Created attachment 8573 [details]
kernel 2.6.16.11 "growisofs -Z /dev/sr0 -R -J <some_file>"
Comment 9 Hendryk 2006-07-18 03:27:40 UTC
Hi,

I can write DVDs with kernel 2.6.16.11 (from www.kernel.org). I have compiled it
with CONFIG_USB_DEBUG and CONFIG_USB_STORAGE_DEBUG and attached the dmesg output
when I connect the device and when I burn the DVD with growisofs.

At 23:09:38, growisofs is called and checks whether the DVD(+RW) is empty or
not. Then, growisofs creates an ISO image and does not use the DVD drive for
about 25 seconds. At 23:10:45, the actual writing starts. It is in fact just a
dd command. I have truncated the (200 MB) log file until shortly before the
writing finishes at 23:25:10.

Since writing DVDs is possible with 2.6.16.11, I am not convinced this is a
hardware problem. 

Did the kernel people change anything about the timing of the USB data transfer
between 2.6.16.11 and 2.6.17.1? 

With 2.6.16.11 without USB_DEBUG and USB_STORAGE_DEBUG, I can also get the
behaviour that I get a "reset high speed USB device" after about 80% of a 4.2 GB
DVD have been written. But _with_ USB_DEBUG and USB_STORAGE_DEBUG, this error
never happens. 

Can it be true that something is slower when huge log files are written, but
without the log files, the transfer is too fast for the hardware, and I get
these errors? 

If that's the case, is there a way of patching the kernel and slowing down the
USB data transfer (or the sequence of USB commands - you guys know this much
better than I do)? If you suggest a patch, I would be happy to try and report
what happens. 

Thanks again for your help!
Comment 10 Alan Stern 2006-09-01 07:39:45 UTC
Created attachment 8927 [details]
Slow down transfers to Lacie DVD drive

Nothing about the speed of USB transfers has changed between 2.6.16 and 2.6.17.


This patch for 2.6.17 will slightly slow down the data transfers to your drive.
 It turns out that the USB-IDE adapters made by Genesys Logic require a 100
microsecond delay at a certain phase of the transfer protocol; perhaps that
delay will help your drive also.

If it doesn't, you could try reducing the max_sectors setting for your drive
(see the explanation in the FAQ at www.linux-usb.org).
Comment 11 Daniel 2006-09-13 09:04:55 UTC
Same here.
Occasionally I get

"reset high speed USB device" during CD/DVD writing, mostly that CD-R is lost then.

I also get this reset message very often when reading/writing to my USB attached
hard drive.

These problems came up after 2.6.16, now I'm using 2.6.18-rc2 (updated to that
since the problems appeared in 2.6.17).

The enclosure for the DVD writer has an ALI chip.

(see also bug #6935)
Comment 12 Hendryk 2006-09-14 01:28:50 UTC
Hi,

many thanks to Alan Stern! The patch does the job. With 2.6.17.1 and this patch,
I was able to burn a number of DVDs. No problem at all. Great!

From what google says when you search for `reset high speed USB device', many
people get such strange errors with various sorts of USB devices. It seems there
is a lot of hardware out there for which the standard Linux kernel is `too
fast'. Maybe it would even be worthwhile to make the slow data transfer the
default behaviour (and provide a kernel config for the wizards who know what
they are doing and who want it faster). 

Thaks again!
Comment 13 Daniel 2006-09-14 03:41:29 UTC
Ok, I tried

UNUSUAL_DEV(0x04b4, 0x6830, 0x0000, 0x0000, 
		"Cypress",
		"Cypress AT2LP      RC7",
		US_SC_DEVICE, US_PR_DEVICE, NULL,
		US_FL_GO_SLOW ), 

in unusual_devs.h and it did help for my hard drive enclosure.
A quick test showed no more resets while writing large files. Removed and they
did show up again.
I'm not sure about what to exactly fill in the Vendor/Product Name fields,
as /proc/bus/usb/devices shows 

T:  Bus=04 Lev=02 Prnt=02 Port=03 Cnt=04 Dev#=  7 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=04b4 ProdID=6830 Rev= 2.40
S:  Product=Cypress AT2LP      RC7
S:  SerialNumber=DEF10AFD02B9
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=  2mA
I:  If#= 0 Alt= 0 #EPs= 3 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=81(I) Atr=03(Int.) MxPS=   2 Ivl=256ms

whereas `lsusb` shows

"Bus 004 Device 007: ID 04b4:6830 Cypress Semiconductor Corp. USB-2.0 IDE Adapter"

Now I'll try to see if that will also help with my DVD+R enclosure.



Comment 14 Daniel 2006-09-14 04:30:28 UTC
Grrrr....
it happend again also with US_FL_GO_SLOW on the hard drive.
Building debug kernel right now....

I've also found a way to reproduce that usb resets on the DVD+R.
(Erasing a CD-RW instantly gives such resets)
Comment 15 Daniel 2006-09-14 05:47:15 UTC
Created attachment 9021 [details]
usb device reset when blanking cdrw
Comment 16 Daniel 2006-09-14 05:50:40 UTC
Created attachment 9022 [details]
usb device reset when writing large file to usb hard drive

Added two debug dumps of the device resets I get on my enclosures.
Comment 17 Alan Stern 2006-09-14 12:01:27 UTC
Hendryk: I'm glad the patch works for you.  I'll submit it for inclusion in the
kernel. It's probably too late for the patch to appear in 2.6.19 but it ought to
get into 2.6.20.

Daniel: If your patch reduces the frequency of those resets then it's a good
thing.  You shouldn't worry too much about a reset occurring while writing a
hard disk; having a few of them now and then doesn't matter so long as the disk
remains accessible and all the data eventually gets transferred correctly.

The problem with blanking the CD-RW is totally different.  The log clearly shows
the drive returning an error for some operation (sorry I don't know what the
operation is, the log only says that the opcode is 0xad).  The application
issued the command twice, the drive returned the same error each time, and it
got reset after each error.  This wasn't a driver problem at all; for whatever
reason, the drive wasn't able to carry out the command.  Maybe it doesn't like
the disc you were using.

The information for the Vendor and Product names should be the Vendor and Model
fields you see in /proc/scsi/scsi.  If those are blank then you can use the USB
strings "Cypress" and "AT2LP RC7" (no need for all the extra blanks).  Whip the
patch into good shape and send it to Phil Dibowitz, according to the
instructions at the top of unusual_devs.h.
Comment 18 Daniel 2006-09-14 15:34:33 UTC
Created attachment 9024 [details]
DVD RW error writing DVD

Alan,
actually I'm not sure whether that patch actually reduces the appereance of
that problem. Currently I would say no, as, at least on my DVD enclosure it
would happen only on about every 5th recordable.
However I haven't yet seen any confirmation from the debug output that confirms
that my unusual devs entry is correctly in place and working.

That CDRW blanking reset I only found by coincidence and thought it would make
a nice quick test, but you're right, it was not the real problem.

I took some time to repeatedly write DVDs to reproduce my original problem: 
Looks quite similar to that hard drive one (status code -71) and the bad thing
is that unlike the hard drive, it does not recover. That recordable would be
lost (if it wasn't a RW in this case) and since that already happened a few
times I'm looking for a solution since it costs money every time it happens.

Regarding Vendor/Name for the unusual devs entry: I didn't wanted to take that
of /proc/scsi/scsi since it shows the name of the drive mounted in that
enclosure, not the enclosure itself.
Comment 19 Daniel 2006-09-19 04:13:03 UTC
A quick look at the sources and some EHCI documentation led me to the conclusion
that the host controller reported a transaction error due to

- CRC
- Timeout
- Bad PID

From a colleague, who has some experience in developing USB hardware I got the
hint that this may happen if a NAK from the device got corrupted or if the
device does not answer at all, which does happen in real life.

From what I understand from a quick look at the sources, there is a 

define EHCI_TUNE_CERR          3       /* 0-3 qtd retries;

which lets the host controller retry that URB three times before issuing that
error. According to my colleagues it is however not sufficient to rely on these
three times as these three times would by far not sufficient to cope with slow
devices (which may not answer at all under load for some periods of time) and
that addiotional retry handling would be needed on a higher layer.

To me the trace looks as if such additional retry handling does not exist in the
storage driver.

Could you follow my explanations? Maybe you have some sugguestions for
additional tests?
Is such extension to reenqueue that URB already planned?
How can we proceed?

Regards,
Daniel
Comment 20 Alan Stern 2006-09-19 07:32:22 UTC
You're correct that the additional retry handling does not exist in the
usb-storage driver.  It exists at a higher level, in the SCSI core.

Basically usb-storage just resets the device and then reports the error to the
SCSI layer, which then issues the retry.  However in your case the commands come
from an application program instead of the SCSI core, and apparently the program
doesn't attempt to retry failed WRITE commands.  Understandable, I suppose,
since any error when streaming data to a DVD or CD is probably fatal.

The interesting thing is that even after repeated resets, your drive did not
recover.  It reported error status for every single command following the failed
WRITE.  When a device behaves like that, it really doesn't matter what the
various software layers do or don't do -- nothing is going to work.
Comment 21 Daniel 2006-09-19 14:27:20 UTC
Alan, thank you for the insights.
Now, from

Sep 15 00:25:32 haktar kernel: usb-storage: usb_stor_bulk_transfer_sglist: xfer
32768 bytes, 8 entries
Sep 15 00:25:32 haktar kernel: ehci_hcd 0000:00:1d.7: devpath 1.2 ep2out 3strikes
Sep 15 00:25:32 haktar kernel: usb-storage: Status code -71; transferred 9216/32768

I understand that together with that WRITE command, 32k Bytes data are to be
transferred. From 9216/32768 I believe to understand that those 32k are
transferred in multiple(?) URBs, the one after 9216 Bytes which is failing.
(If not URBs, then at least USB data packets?)

Now assuming the device does not fully buffer the 32k data but, because of small
buffers, transferres that data directly to the DVD writer, which in turn
directly burns them, how is that transmission error to be recovered on a SCSI
command basis?
This may work for random access media like a hard drive, but for DVD writers I
see no other chances to recover from that (hopefully temporarily) occurring
transmission errors than to try (longer) in retransmitting that failing URB.

I know I'm most likely leaving out a lot of details I'm not aware of, but do you
understand what I mean? Is this plausible, and is there a way of re-inserting
that URB?
I understand that the CERR counter can only be set to a max. of 3. 0 would try
endless, but then I won't notice the occurrence of that issue at all (only if it
would fail endlessly), so that wouldn't help in verifying a probable solution
either.

If my theory makes sense, I would like to test it to verify if that is a
temporarily occuring device unresponsiveness that could be cured by trying just
a little longer. 

What do you think?

Regards,
Daniel
Comment 22 Alan Stern 2006-09-19 19:25:03 UTC
Definitely multiple packets, since the largest high-speed bulk packet size is
512 bytes.  The number of URBs is 8.  The failure occurred after 9216 bytes had
been transferred successfully (18 packets).

I don't know if the transmission error would be temporary with your drive.  In
general when things like this happen, the errors are not temporary.  The only
way to recover is to reset the device.

You can try setting CERR to 0.  It might even work.  My guess is that you'll get
stuck in endless retransmissions.  (The SCSI layer will abort the command after
30 seconds or so.)

The fact of the matter is, your device isn't obeying the USB protocol.  If its
buffers are full and it's not ready to accept more data, then it should respond
with NAK or NYET.  Instead it isn't responding at all.  This doesn't bode well
for recovery attempts.
Comment 23 Daniel 2006-09-20 10:04:29 UTC
Ok, 32kB in 8 URBs with 512 Bytes packets, one of them failing.

If I would set CERR to zero and if that would "solve" the problem, do I guess
right that I wouldn't be able to tell so since the host controller would repeat
that on its own without telling me?
Otherwise, it would be worth trying just for getting more information about what
would help and what not.

Does that CERR counter counts for one complete URB of 4k or does it count for
every bulk packet of 512 Bytes?
If the error is indeed a missing answer, do I assume right that (given a
temporary failure) one would have to repeat that single packet with the same
sequence bit?
Is that possible in the ehci driver?

Regards,
Daniel

Comment 24 Alan Stern 2006-09-20 11:12:47 UTC
I would think it would be worth trying to set CERR to 0 whether it helps or not.
 If it doesn't help then you've learned something, and if it does help then
you're finished!

CERR gets reset every time a packet is transmitted successfully (unless you set
it to 0, in which case it never gets reset at all).

If the failure really is temporary then repeating the same packet with the same
data toggle should fix it.  And of course that's exactly what CERR=0 would do. 
All you need to do is change the definition of EHCI_TUNE_CERR in ehci-hcd.c.
Comment 25 Daniel 2006-09-20 11:56:24 UTC
I'll try for sure. But let me ask again:
As this failure doesn't happen every time I burn a DVD, but only every now and
then - am I correct in understanding that with CERR == 0 && success, I won't be
able to tell wether it acutally helped or I had just luck that the problem
didn't occur that time?

Comment 26 Alan Stern 2006-09-20 12:32:30 UTC
That's right.  However my guess is that you will be able to tell, because the
operation will fail persistently.
Comment 27 Daniel 2006-09-20 15:44:59 UTC
Did 10 full rewrites of a DVD+RW without any aborting WRITE with CERR == 0.
Not sure what this means but I'll stay with that CERR on 0 meanwhile.

There were still some -71 status results which didn't harm, for example when the
application tried to READ CAPACITY while the tray was still open.

However I discovered due to a debug message I added, that my unusual devs entry
didn't work, as the GO_SLOW bit wasn't set. Any idea what I made wrong?

UNUSUAL_DEV(0x3873, 0x5621, 0x0000, 0x0000, 
		"",
		"USB 2.0 Storage Device",
		US_SC_DEVICE, US_PR_DEVICE, NULL,
		US_FL_GO_SLOW ), 

T:  Bus=04 Lev=02 Prnt=02 Port=01 Cnt=02 Dev#=  5 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=3873 ProdID=5621 Rev= 1.05
S:  Product=USB 2.0 Storage Device
S:  SerialNumber=00000000000000004913
C:* #Ifs= 1 Cfg#= 1 Atr=c0 MxPwr=  0mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
Comment 28 Alan Stern 2006-09-21 09:42:52 UTC
The problem with your unusual_devs entry is the revision limits (the third and
fourth numbers on the first line of the entry).  You set the minimum and maximum
both to 0, but your device's revision number is 1.05 so the entry didn't apply.
 If you change those two numbers to 0x0105 then it will apply.
Comment 29 Daniel 2006-09-21 12:14:20 UTC
Thanks, didn't got the meaning of the fields obviously.
Redid DVD writer test with EHCI_CERR back to 3, but with GO_SLOW set.

Went fine until run 8, in which I got:

usb-storage: Command WRITE_10 (10 bytes)
usb-storage:  2a 00 00 15 3a d0 00 00 10 00
usb-storage: Bulk Command S 0x43425355 T 0xc1d95 L 32768 F 0 Trg 0 LUN 0 CL 10
usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
usb-storage: Status code 0; transferred 31/31
usb-storage: -- transfer complete
usb-storage: Bulk command transfer result=0
usb-storage: usb_stor_bulk_transfer_sglist: xfer 32768 bytes, 2 entries
usb-storage: command_abort called
usb-storage: usb_stor_stop_transport called
usb-storage: -- cancelling sg request
usb-storage: Status code -104; transferred 30720/32768
usb-storage: -- transfer cancelled
usb-storage: Bulk data transfer result 0x4
usb-storage: Device slow? 100
usb-storage: -- command was aborted
usb-storage: storage_pre_reset
hub 4-1:1.0: port 2 not reset yet, waiting 10ms
usb 4-1.2: reset high speed USB device using ehci_hcd and address 5

Now that does look different from those -71 error I got until now and what makes
me wonder the most is that line

usb-storage: command_abort called

Err... At least I didn't do anything to abort something.
k3b (growisofs) tells:

:-( unable to WRITE@LBA=153ad0h: Input/output error
:-( write failed: Input/output error

Hmm. Who was issuing that abort, - and why?
Comment 30 Alan Stern 2006-09-21 13:35:43 UTC
The SCSI core aborts a command after a certain amount of time (usually 30
seconds, although growisofs might use a different limit).  In this case it means
that the drive must have been sending NAK or NYET packets.
Comment 31 Natalie Protasevich 2007-06-13 13:16:11 UTC
Any updates on the problem? Could you try the latest kernel please? If it works we can close the bug, or add people from other subsystems.
Thanks,
--Natalie
Comment 32 Daniel 2007-06-13 14:34:56 UTC
As for my part, I think replacing the USB cable from the host adapter to the USB hub to which the DVD writer was connected, cured the problem.
No stalled DVD writes since then.
Comment 33 Hendryk 2007-06-14 01:10:49 UTC
For my part, as I said in #12 above, the patch worked fine. Meanwhile, I got different hardware and therefore cannot test it anymore. 

Best wishes,

Hendryk
Comment 34 Alan Stern 2007-06-14 06:50:39 UTC
Since everybody is happy, I'm closing the bug report.