Bug 3694 - kernel bug when writing to usb-disk
kernel bug when writing to usb-disk
Status: CLOSED PATCH_ALREADY_AVAILABLE
Product: Drivers
Classification: Unclassified
Component: USB
i386 Linux
: P2 normal
Assigned To: Matthew Dharm
:
Depends on:
Blocks: USB
  Show dependency treegraph
 
Reported: 2004-11-04 01:15 UTC by simon manlay
Modified: 2007-02-27 10:02 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.8 (debian sarge kernel)
Tree: Mainline
Regression: ---


Attachments
More information (54.68 KB, text/plain)
2005-11-23 03:14 UTC, RaffigeRaffe
Details

Description simon manlay 2004-11-04 01:15:05 UTC
Distribution: debian sarge
Hardware Environment: IBM 400Mhz server, USB2 PCI
Software Environment:
Problem Description: When backuping to usb disk (disk may be defectuous), scsi
disconnected the media, but the driver still tryed to write and hangs

Here is the trace:

SCSI error : <1 0 0 0> return code = 0x70000
end_request: I/O error, dev sdc, sector 15694548
usb 1-1: USB disconnect, address 2
ehci_hcd 0000:00:12.2: qh ce121080 (#0) state 1
SCSI error : <1 0 0 0> return code = 0x70000
end_request: I/O error, dev sdc, sector 15694549
FAT: bread(block 3867) in fat_access failed
scsi: Device offlined - not ready after error recovery: host 1 channel 0 id 0 lun 0
sd 1:0:0:0: Illegal state transition cancel->offline
Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1643
 [<d084e5f0>] scsi_device_set_state+0xc4/0xcf [scsi_mod]
 [<d084c75d>] scsi_eh_offline_sdevs+0x47/0x60 [scsi_mod]
 [<d084cd3d>] scsi_unjam_host+0x18d/0x1a2 [scsi_mod]
 [<d084ce68>] scsi_error_handler+0x116/0x15a [scsi_mod]
 [<d084cd52>] scsi_error_handler+0x0/0x15a [scsi_mod]
 [<c01041e1>] kernel_thread_helper+0x5/0xb
SCSI error : <1 0 0 0> return code = 0x70000
end_request: I/O error, dev sdc, sector 15694550
printk: 308 messages suppressed.
Buffer I/O error on device sdc1, logical block 15694518
lost page write due to I/O error on sdc1


------------[ cut here ]------------
kernel BUG at drivers/block/as-iosched.c:1852!
invalid operand: 0000 [#1]
PREEMPT
Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ppp_deflate zlib_deflate
bsd_comp ipt_state ipt_REJECT ipt_LOG capability commoncap ppp_async af_packet
crc_ccitt ipv6 ppp_generic slhc ip_conntrack_irc ip_conntrack_ftp ipt_MASQUERADE
iptable_nat ip_conntrack iptable_filter ip_tables usb_storage ide_core ehci_hcd
usbcore 8139too crc32 e100 mii rtc sd_mod ext3 jbd cryptoloop loop blowfish
aic7xxx scsi_mod unix
CPU:    0
EIP:    0060:[<c01dcee4>]    Not tainted
EFLAGS: 00010297   (2.6.8-slt)
EIP is at as_exit+0x22/0x62
eax: cf9d57f4   ebx: cf9d57e0   ecx: 00000000   edx: ce335eb0
esi: ce318eb8   edi: 00000282   ebp: cf9d14b4   esp: ce335ee4
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_1 (pid: 446, threadinfo=ce334000 task=cea7f160)
Stack: ce318e2c c01d5bdf ce318e2c ce318e38 c01d7462 ce318e2c ce2f1c24 ce2f1c00
       d084fd92 ce318e2c ce2f1da8 c02c9a88 c02c9aa0 cf9d14d8 c01d2a51 ce2f1d84
       c018c494 ce2f1da8 ce2f1c08 00000286 cf9d1400 ce2f1c00 d0849d10 ce2f1da8
Call Trace:
 [<c01d5bdf>] elevator_exit+0x12/0x15
 [<c01d7462>] blk_cleanup_queue+0x1f/0x62
 [<d084fd92>] scsi_device_dev_release+0xd8/0xf9 [scsi_mod]
 [<c01d2a51>] device_release+0x14/0x44
 [<c018c494>] kobject_cleanup+0x40/0x65
 [<d0849d10>] __scsi_iterate_devices+0x69/0x73 [scsi_mod]
 [<d084c28e>] scsi_eh_stu+0x11d/0x136 [scsi_mod]
 [<d084ca94>] scsi_eh_ready_devs+0x17/0x5d [scsi_mod]
 [<d084cd3d>] scsi_unjam_host+0x18d/0x1a2 [scsi_mod]
 [<d084ce68>] scsi_error_handler+0x116/0x15a [scsi_mod]
 [<d084cd52>] scsi_error_handler+0x0/0x15a [scsi_mod]
 [<c01041e1>] kernel_thread_helper+0x5/0xb
Code: 0f 0b 3c 07 85 69 26 c0 8d 43 0c 39 43 0c 74 08 0f 0b 3d 07
Comment 1 simon manlay 2004-11-10 03:34:06 UTC
After testing, it appears that the USB-disk was defective. With a new, disk, no
problem is detected. But The last time, whem the scsi went down (because of the
defective disk), the server was half crashed. I think it's bad to crash the
whole system because the external device has problems.
Comment 2 simon manlay 2004-11-22 00:04:13 UTC
A similar probleme has occured again, with an USB disk certified goog.
Here is the call trace:

SCSI error : <6 0 0 0> return code = 0x70000
end_request: I/O error, dev sdc, sector 42494602
SCSI error : <6 0 0 0> return code = 0x70000
end_request: I/O error, dev sdc, sector 42494603
usb 1-2: USB disconnect, address 10
SCSI error : <6 0 0 0> return code = 0x70000
end_request: I/O error, dev sdc, sector 42494604
scsi: Device offlined - not ready after error recovery: host 6 channel 0 id 0 lun 0
sd 6:0:0:0: Illegal state transition cancel->offline
Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1643
 [<d084e5f0>] scsi_device_set_state+0xc4/0xcf [scsi_mod]
 [<d084c75d>] scsi_eh_offline_sdevs+0x47/0x60 [scsi_mod]
 [<d084cd3d>] scsi_unjam_host+0x18d/0x1a2 [scsi_mod]
 [<d084ce68>] scsi_error_handler+0x116/0x15a [scsi_mod]
 [<d084cd52>] scsi_error_handler+0x0/0x15a [scsi_mod]
 [<c01041e1>] kernel_thread_helper+0x5/0xb
SCSI error : <6 0 0 0> return code = 0x70000
end_request: I/O error, dev sdc, sector 42494605
printk: 267 messages suppressed.
Buffer I/O error on device sdc1, logical block 42494573
lost page write due to I/O error on sdc1
------------[ cut here ]------------
kernel BUG at drivers/block/as-iosched.c:1852!
invalid operand: 0000 [#1]
PREEMPT
Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ppp_deflate zlib_deflate
bsd_comp ipt_state ipt_REJECT ipt_LOG capability commoncap ppp_async af_packet
crc_ccitt ipv6 ppp_generic slhc ip_conntrack_irc ip_conntrack_ftp ipt_MASQUERADE
iptable_nat ip_conntrack iptable_filter ip_tables usb_storage ide_core ehci_hcd
usbcore 8139too crc32 e100 mii rtc sd_mod ext3 jbd cryptoloop loop blowfish
aic7xxx scsi_mod unix
CPU:    0
EIP:    0060:[<c01dcee4>]    Not tainted
EFLAGS: 00010212   (2.6.8-slt)
EIP is at as_exit+0x22/0x62
eax: cafb1774   ebx: cafb1760   ecx: 00000000   edx: c3a09eb0
esi: cec67eb8   edi: 00000282   ebp: cffd28b4   esp: c3a09ee4
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_6 (pid: 16528, threadinfo=c3a08000 task=ca1140f0)
Stack: cec67e2c c01d5bdf cec67e2c cec67e38 c01d7462 cec67e2c ca118c24 ca118c00
       d084fd92 cec67e2c ca118da8 c02c9a88 c02c9aa0 cffd28d8 c01d2a51 ca118d84
       c018c494 ca118da8 ca118c08 00000286 cffd2800 ca118c00 d0849d10 ca118da8
Call Trace:
 [<c01d5bdf>] elevator_exit+0x12/0x15
 [<c01d7462>] blk_cleanup_queue+0x1f/0x62
 [<d084fd92>] scsi_device_dev_release+0xd8/0xf9 [scsi_mod]
 [<c01d2a51>] device_release+0x14/0x44
 [<c018c494>] kobject_cleanup+0x40/0x65
 [<d0849d10>] __scsi_iterate_devices+0x69/0x73 [scsi_mod]
 [<d084c28e>] scsi_eh_stu+0x11d/0x136 [scsi_mod]
 [<d084ca94>] scsi_eh_ready_devs+0x17/0x5d [scsi_mod]
 [<d084cd3d>] scsi_unjam_host+0x18d/0x1a2 [scsi_mod]
 [<d084ce68>] scsi_error_handler+0x116/0x15a [scsi_mod]
 [<d084cd52>] scsi_error_handler+0x0/0x15a [scsi_mod]
 [<c01041e1>] kernel_thread_helper+0x5/0xb
Code: 0f 0b 3c 07 85 69 26 c0 8d 43 0c 39 43 0c 74 08 0f 0b 3d 07
Comment 3 simon manlay 2004-11-22 00:05:44 UTC
cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 5
model name      : Pentium II (Deschutes)
stepping        : 2
cpu MHz         : 398.370
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr
bogomips        : 784.38

Comment 4 RaffigeRaffe 2005-11-23 03:14:06 UTC
Created attachment 6651 [details]
More information

Hello,
problem seems not to be related to defect hard disc.
Problem is repoducable on various motherboards, USB 2.0 PCI cards, 2.5" and
3.5"
USB 2.0 hard disc boxes, various hard discs ( Hitachi Desc/Travel Star, Samsung
1SP161.. ) and all 2.6 kernel ( with processor optimisation or 'out of the box
Debian Sarge 3.10a Distribution )

Problem occures:
1.  quickly when reading very big files ( e.g.: DVD iso image ) from
    a fast ide/scsi device to USB 2.0 hard disc.
2.  after cache is almost full ( on my system: 700 MB with appr. 3-4 MB Ram 
    free.
3.  NOT ( or not seen by me yet ) when copying many small files. 

It looks very much like a time out or buffer overflow ( in the USB hard disc 
box ) problem. 
Either from the USB EHCI <> usb_storage <> USB hard disc box ( have seen USB
disconnect messages ) or the USB <> scsi emulation layer.

System sometimes hangs completly sometimes a disconnect and reconnect of the
USB storage device helps. The latter comes with a switch of the sdx device.
E.g.: sda -> sdb first time then sdb -> sda again after 2nd
disconnect/reconnect
procedure.
Of course, there a lot of hanfging processes related to ehci, pdflush ...
I can send more kern.log ... on request.
Comment 5 Adrian Bunk 2006-04-18 07:32:11 UTC
Is this issue still present in kernel 2.6.16.7?
Comment 6 RaffigeRaffe 2006-04-21 04:26:45 UTC
Hello,

> ------- Additional Comments From bunk@stusta.de  2006-04-18 07:32 -------
> Is this issue still present in kernel 2.6.16.7?

Answer:
The latest kernel I tested with is the 2.6.16 ( after xx-rcy ). I could
still see these errors - but really seldom. 
Some more tests ( specially after trying same thing at my friends place
and instantly running into this trouble )
revealed trouble with hardware ( cables, USB2IDE adapters ? external
HDs ?? ).
THIS means that changing the just and ONLY the usb cable made things
working or worse. Using the same external drive and usb cable on another
computer  with another usb 2.0 adapter lead to trouble - or not.

ANYHOW:: I think that if there's some problems with transmission of data
via USB the error recovery between the scsi emulation and usb driver /
layer is not working that good.
E.G::: Meanwhile I got a quite old SCSI hard disc. During linux
installation on that one I saw that the hd reported defect sector(s) and
started a very long time conduming sector re-arrangement procedure.
BUT:: Instead of disconnecting and braking the installation procedure,
the ( native ??) scsi driver waited for the disc to finish. 

Question:: Can it be that there is no possibility ( Specs limits ... )
or missing implentation of such error recovery between the scsi and USB
Sub system??

Anyhow : Yesterday I started again with firewire and hard discs and had
same problems ( either disc not recognized at all or if found and
mounted it disconnected several times during reading of about 2 GBytes
files ( 2 different discs )

Remark:: I'm not that aware of writing drivers and specs of USB and
SCSI ::: So, please be patient - I'm trying to help AND NOT ANNOY !!

Many greetings

/RalfS 

Comment 7 Matthew Dharm 2006-05-08 13:39:49 UTC
I would like to clarify the situation.

First, in the latest reprot, it appears that the problem goes away if you switch
cables.  Is that correct?

Second, this problem only happens with EHCI (not UHCI or OHCI)?

Third, what does the crash message look like with the latest kernel?
Comment 8 RaffigeRaffe 2006-05-11 03:09:28 UTC
> ------- Additional Comments From mdharm-usb@one-eyed-alien.net  2006-05-08 13:39 -------
> I would like to clarify the situation.
> 
> First, in the latest reprot, it appears that the problem goes away if you switch
> cables.  Is that correct?
> 
Answer: Correct ++ using the same cable with other hardware ( computer,
PCI to USB interface card ( VIA, ALI chipset ) and or USB hard disc
(including their various USB to ide chips ) make the problem disappear.
 
> Second, this problem only happens with EHCI (not UHCI or OHCI)?
> 
Answer: Well, recently I only use EHCI because of the speed. BUT: I have
never seen a problem in former time when I had only USB 1.1 interfaces.
Addon, probably I never copied such an amount of data with 1.1 .

> Third, what does the crash message look like with the latest kernel?
> 
Sorry, havn't downloaded the latest kernel  yet. Will do so but may take
a little while.

------
I hope that I could clarify things. If not, please respond. I will
restart the testings in a more professionell way and write a list of
hardware combinations and results.

Thanks for Your patience

/RalfS

Comment 9 Matthew Dharm 2007-02-27 10:02:37 UTC
Error recovery has greatly improved in recent kernels.  If this continues to be
an issue, we can re-open this bug.

Note You need to log in before you can comment on or make changes to this bug.