Bug 6722 - second device open() (by periodic hal polling loop!) kills ongoing transfer (CD burner underruns, HDD hard resets)
Summary: second device open() (by periodic hal polling loop!) kills ongoing transfer (...
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: SCSI (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: io_scsi
URL:
Keywords:
Depends on:
Blocks: USB
  Show dependency tree
 
Reported: 2006-06-20 01:17 UTC by Andreas Mohr
Modified: 2008-09-22 16:41 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.17-rc6-mm2 (and others)
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Andreas Mohr 2006-06-20 01:17:54 UTC
Bug report filed here since posting on LKML
(http://lkml.org/lkml/2006/6/19/54) yielded no response.


The problem identified below could be the cause of Bug 6194 (USB freeze after a
rather characteristic 100MB transferred - after 2 seconds???)


Most recent kernel where this bug did not occur: unknown
(problematic testing on older kernels due to udev/hotplug changes)
2.6.16ish and 2.6.15ish tested, same problem


Distribution: Debian testing


Hardware Environment:
AMD Athlon @1200 512MB
burner:
  Vendor: HL-DT-ST  Model: DVDRAM GSA-4160B  Rev: A306
  Type:   CD-ROM                             ANSI SCSI revision: 00
via:
Bus 001 Device 005: ID 0402:5621 ALi Corp. USB 2.0 Storage Device
(that's the "HL-DT-ST" external case ID string given above)
on USB 2.0 connector:
0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)


Software Environment:
cdrecord 4:2.01+01a03-3 (newest)
hal 0.5.7-2 (newest)
libhal-storage1 0.5.7-1 (newest)
libhal1 0.5.7-1 (newest)


Problem Description:

I'm having severe issues with cdrecord aborting with "buffer underrun"
message, *always*.

# lsmod (USB-related only)
sd_mod                 18560  2
sg                     31772  0
sr_mod                 16164  0
usb_storage            71880  1
scsi_mod              132876  4 sd_mod,sg,sr_mod,usb_storage
uhci_hcd               23948  0
ehci_hcd               29964  0
usbcore               130844  4 usb_storage,uhci_hcd,ehci_hcd

# cat /proc/interrupts
           CPU0
  0:   14492818   IO-APIC-edge     timer
  1:      14298   IO-APIC-edge     i8042
  6:          3   IO-APIC-edge     floppy
  8:          4   IO-APIC-edge     rtc
  9:          0   IO-APIC-fasteoi  acpi
 12:     247193   IO-APIC-edge     i8042
 14:     172079   IO-APIC-edge     ide0
 16:     741075   IO-APIC-fasteoi  radeon@pci:0000:01:00.0
 19:     128123   IO-APIC-fasteoi  ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3,
uhci_hcd:usb4
 20:       7235   IO-APIC-fasteoi  au8830, eth1
 21:          2   IO-APIC-fasteoi  ohci1394
 22:      13029   IO-APIC-fasteoi  Aztech AZF3328 (PCI168), au8820
 23:          0   IO-APIC-fasteoi  VIA8233
NMI:          0
LOC:   14494370
ERR:          0
MIS:          0



There's a /usr/lib/hal/hald-addon-storage running for this CD device:

nanosleep({2, 0}, {2, 0})               = 0 <2.000965>
open("/dev/scsi/host2/bus0/target0/lun0/cd",
+O_RDONLY|O_NONBLOCK|O_EXCL|O_LARGEFILE) = 4 <0.002021>
ioctl(4, CDROM_DRIVE_STATUS, 0x7fffffff) = 4 <0.001811>
ioctl(4, CDROM_MEDIA_CHANGED, 0x7fffffff) = 0 <0.006341>
ioctl(4, SG_IO, 0xbfdbc0c8)             = 0 <0.002186>
close(4)                                = 0 <0.001087>
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000016>
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 <0.000015>
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000015>
nanosleep({2, 0},  <unfinished ...>




It has a 2 second polling loop, and this is *exactly* the delay until
cdrecord bombs with an "underrun" message:



Script started on Mo 26 Jun 2006 13:10:04 CEST
# cdrecord -speed=4 -dummy dev=2,0,0 -eject - ^Mv ubuntu-6.06-desktop-i386.iso
cdrecord: No write mode specified.^M
cdrecord: Asuming -tao mode.^M
cdrecord: Future versions of cdrecord may have different drive dependent
+defaults.^M
cdrecord: Continuing in 5 seconds...^M
Cdrecord-Clone 2.01.01a03 (i686-pc-linux-gnu) Copyright (C) 1995-2005 Joerg
+Schilling^M
NOTE: this version of cdrecord is an inofficial (modified) release of cdrecord
      and thus may have bugs that are not present in the original version.^M
      Please send bug reports and support requests to
+<cdrtools@packages.debian.org>.^M
      The original author should not be bothered with problems of this version.
^M
cdrecord: Warning: Running on Linux-2.6.17-rc6-mm2^M
cdrecord: There are unsettled issues with Linux-2.5 and newer.^M
cdrecord: If you have unexpected problems, please try Linux-2.4 or Solaris.^M
TOC Type: 1 = CD-ROM^M
scsidev: '2,0,0'^M
scsibus: 2 target: 0 lun: 0^M
Linux sg driver version: 3.5.33^M
Using libscg version 'ubuntu-0.8ubuntu1'.^M
cdrecord: Warning: using inofficial version of libscg (ubuntu-0.8ubuntu1
+'@(#)scsitransp.c      1.91 04/06/17 Copyright 1988,1995,2000-2004 J.
+Schilling').^M
SCSI buffer size: 64512^M
atapi: 1^M
Device type    : Removable CD-ROM^M
Version        : 0^M
Response Format: 2^M
Capabilities   : ^M
Vendor_info    : 'HL-DT-ST'^M
Identifikation : 'DVDRAM GSA-4160B'^M
Revision       : 'A306'^M
Device seems to be: Generic mmc2 DVD-R/DVD-RW.^M
Current: 0x000A^M
Profile: 0x0012 ^M
Profile: 0x0011 ^M
Profile: 0x0014 ^M
Profile: 0x0013 ^M
Profile: 0x001A ^M
Profile: 0x001B ^M
Profile: 0x002B ^M
Profile: 0x0010 ^M
Profile: 0x0009 ^M
Profile: 0x000A (current)^M
Profile: 0x0008 ^M
Profile: 0x0002 ^M
cdrecord: This version of cdrecord does not include DVD-R/DVD-RW support code.
cdrecord: See /usr/share/doc/cdrecord/README.DVD.Debian for details on DVD
+support.^M
Using generic SCSI-3/mmc   CD-R/CD-RW driver (mmc_cdr).^M
Driver flags   : MMC-3 SWABAUDIO BURNFREE ^M
Supported modes: TAO PACKET SAO SAO/R96P SAO/R96R RAW/R16 RAW/R96P RAW/R96R^M
Drive buf size : 1053696 = 1029 KB^M
Drive DMA Speed: 11837 kB/s 67x CD 8x DVD^M
FIFO size      : 4194304 = 4096 KB^M
Track 01: data   697 MB        ^M
Total size:      801 MB (79:23.98) = 357299 sectors^M
Lout start:      801 MB (79:25/74) = 357299 sectors^M
Current Secsize: 2048^M
ATIP info from disk:^M
  Indicated writing power: 3^M
  Reference speed: 6^M
  Is not unrestricted^M
  Is erasable^M
  Disk sub type: High speed Rewritable (CAV) media (1)^M
  ATIP start of lead in:  -11700 (97:26/00)^M
  ATIP start of lead out: 359849 (79:59/74)^M
  1T speed low:  4 1T speed high: 10^M
  2T speed low:  4 2T speed high:  0 (reserved val  6)^M
  power mult factor: 1 5^M
  recommended erase/write power: 5^M
  A1 values: 24 1A D8^M
  A2 values: 26 B2 4A^M
Disk type:    Phase change^M
Manuf. index: 5^M
Manufacturer: FORNET INTERNATIONAL PTE LTD.^M
Blocks total: 359849 Blocks current: 359849 Blocks remaining: 2550^M
Starting to write CD/DVD at speed 8 in dummy TAO mode for single session.^M
Last chance to quit, starting dummy write in 9
+seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   8 seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   7
+seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   6 seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   5
+seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   4 seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   3
+seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   2 seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   1
+seconds.^H^H^H^H^H^H^H^H^H^H^H^H^H   0 seconds. Operation starts.^M
Waiting for reader process to fill input buffer ... input buffer ready.^M
BURN-Free is OFF.^M
Starting new track at sector: 0^M
^MTrack 01:    0 of  697 MB written.^MTrack 01:    1 of  697 MB written (fifo
+100%) [buf  98%]   2.5x.cdrecord: Input/output error. write_g1: scsi sendcmd:
+no error^M
CDB:  2A 00 00 00 02 2E 00 00 1F 00^M
status: 0x2 (CHECK CONDITION)^M
Sense Bytes: 70 00 05 00 00 00 00 10 2A 30 06 90 21 02 00 00^M
Sense Key: 0x5 Illegal Request, Segment 0^M
Sense Code: 0x21 Qual 0x02 (invalid address for write) Fru 0x0^M
Sense flags: Blk 0 (not valid) ^M
resid: 63488^M
cmd finished after 0.004s timeout 40s^M
^M
write track data: error after 1142784 bytes^M
cdrecord: The current problem looks like a buffer underrun.^M
cdrecord: Try to use 'driveropts=burnfree'.^M
cdrecord: Make sure that you are root, enable DMA and check your HW/OS set up.
Writing  time:   10.479s^M
Average write speed 457.0x.^M
Min drive buffer fill was 98%^M
Fixating...^M
WARNING: Some drives don't like fixation in dummy mode.^M
Fixating time:   33.828s^M
cdrecord: fifo had 82 puts and 19 gets.^M
cdrecord: fifo was 0 times empty and 7 times full, min fill was 90%.^M
#
Script done on Mo 26 Jun 2006 13:11:17 CEST


Killing the SCSI /usr/lib/hal/hald-addon-storage (after killing its hald-runner
supervisor parent) lets (non-simulation) burning of a whole CD proceed
properly...


However running a simple test program as a different non-root user:

#define __USE_LARGEFILE64

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>

/* argh, still not defined... manually! */
# define O_LARGEFILE    0100000


int main () {
        int fd;

        fd = open("/dev/scsi/host2/bus0/target0/lun0/cd",
+O_RDONLY|O_NONBLOCK|O_EXCL|O_LARGEFILE);
        printf("fd is %d\n", fd);
        close(fd);
        return 0;
}

kills concurrent cdrecord burning (by root) on the spot.

Note that this still happens if I omit O_NONBLOCK, even though this yields
me an invalid fd (-1)!.
Any other flag combination (even 0) does the same.
Thus it's clearly caused by open() handling itself.


Perfect DoS, if you ask me (but OTOH on many distro setups there probably
won't be user access permissions for this device by default).

This might still be on the borders of tolerable under normal working
conditions, but the fact that hald does an open() polling loop on it
renders it catastrophic.


Note that I also had severe issues with a chinese "Cenda 201" USB OTG 40GB HDD
device, I got USB bus reset every couple seconds, resulting in an audible
drive head-park reset each time:

May 29 02:05:23 andi kernel: ehci_hcd 0000:00:10.3: qh deb16080 (#00) state 4
May 29 02:05:24 andi kernel: usb 4-2: reset high speed USB device using
+ehci_hcd and address 2
May 29 02:05:56 andi last message repeated 17 times
May 29 02:06:57 andi last message repeated 32 times
May 29 02:07:35 andi last message repeated 20 times
May 29 02:07:37 andi kernel: ehci_hcd 0000:00:10.3: qh deb16200 (#00) state 4
May 29 02:07:37 andi kernel: usb 4-2: reset high speed USB device using
+ehci_hcd and address 2
May 29 02:07:43 andi last message repeated 3 times


Now that I found this I'd *bet* that this was exactly the same
problem there (caused by HAL polling the device file).


I cannot quite believe that such a severe issue hasn't been noticed
and fixed immediately by whichever person out there:
Possibly this is a VIA-only (oh no, not again... ;) problem?
(either USB controller chip or southbridge or so).

0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo
+KT266/A/333]0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo
+KT266/A/333 AGP]
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
+Controller (rev 80)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
+Controller (rev 80)
0000:00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1
+Controller (rev 80)
0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc.
+VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
0000:00:11.5 Multimedia audio controller: VIA Technologies, Inc.
+VT8233/A/8235/8237 AC97 Audio Controller (rev 50)


I don't think hal can do much about hald-addon-storage to prevent a simple
open() from distorting a running device operation, so I'd tend to think
that this is a kernel issue (perhaps in the USB host controller drivers).


Thanks,

Andreas Mohr
Comment 1 Greg Kroah-Hartman 2006-08-30 01:03:58 UTC
I don't see how this is a usb issue, but reassigning to the usb-storage people
to see if they have an idea...
Comment 2 Matthew Dharm 2006-08-30 10:11:16 UTC
This is definatly not a USB issue, tho it affects USB more than other interfaces
because error-recovery is very harsh.

For some reason, the SCSI layer is allowing non-cdrecord commands to go to the
drive, and the drive is getting confused.  We've had lots of problems with HAL
sending bogus commands to devices (which cause the device to report an error) in
the past; HAL seems to send these commands without concern for how the device
responds.

The annoying behavior of HAL asside, if the device is open for use by cdrecord,
other things should not be able to cause commands to be sent to the device. 
That's something the SCSI folks need to fix.
Comment 3 Matthew Dharm 2006-08-30 10:11:50 UTC
Greg, please send this back to the SCSI folks.
Comment 4 Greg Kroah-Hartman 2006-08-30 10:29:09 UTC
Ok, this is a scsi issue, not a USB specific thing...
Comment 5 Andreas Mohr 2006-08-30 11:14:43 UTC
Indeed it's probably SCSI, since an important hint towards the end of the LKML
discussion was:

http://lkml.org/lkml/2006/6/21/371

where it's said that it should probably be implemented via a counter in
scsi_device.h

Given the multiple complaints about this issue (this particular discussion
started by myself wasn't the first one about this problem, and of a particularly
nasty kind was the recent cdrom.c PATCH/FIX, http://lkml.org/lkml/2006/8/16/229 ).

I'll try looking into it after I managed to submit two patches on my ToDo list
in case nobody beats me to it.
Comment 6 Natalie Protasevich 2008-03-25 20:15:23 UTC
Any status update on this problem please? Has it been resolved?
Thanks.

Note You need to log in before you can comment on or make changes to this bug.