Distribution: debian sarge Hardware Environment: IBM 400Mhz server, USB2 PCI Software Environment: Problem Description: When backuping to usb disk (disk may be defectuous), scsi disconnected the media, but the driver still tryed to write and hangs Here is the trace: SCSI error : <1 0 0 0> return code = 0x70000 end_request: I/O error, dev sdc, sector 15694548 usb 1-1: USB disconnect, address 2 ehci_hcd 0000:00:12.2: qh ce121080 (#0) state 1 SCSI error : <1 0 0 0> return code = 0x70000 end_request: I/O error, dev sdc, sector 15694549 FAT: bread(block 3867) in fat_access failed scsi: Device offlined - not ready after error recovery: host 1 channel 0 id 0 lun 0 sd 1:0:0:0: Illegal state transition cancel->offline Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1643 [<d084e5f0>] scsi_device_set_state+0xc4/0xcf [scsi_mod] [<d084c75d>] scsi_eh_offline_sdevs+0x47/0x60 [scsi_mod] [<d084cd3d>] scsi_unjam_host+0x18d/0x1a2 [scsi_mod] [<d084ce68>] scsi_error_handler+0x116/0x15a [scsi_mod] [<d084cd52>] scsi_error_handler+0x0/0x15a [scsi_mod] [<c01041e1>] kernel_thread_helper+0x5/0xb SCSI error : <1 0 0 0> return code = 0x70000 end_request: I/O error, dev sdc, sector 15694550 printk: 308 messages suppressed. Buffer I/O error on device sdc1, logical block 15694518 lost page write due to I/O error on sdc1 ------------[ cut here ]------------ kernel BUG at drivers/block/as-iosched.c:1852! invalid operand: 0000 [#1] PREEMPT Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ppp_deflate zlib_deflate bsd_comp ipt_state ipt_REJECT ipt_LOG capability commoncap ppp_async af_packet crc_ccitt ipv6 ppp_generic slhc ip_conntrack_irc ip_conntrack_ftp ipt_MASQUERADE iptable_nat ip_conntrack iptable_filter ip_tables usb_storage ide_core ehci_hcd usbcore 8139too crc32 e100 mii rtc sd_mod ext3 jbd cryptoloop loop blowfish aic7xxx scsi_mod unix CPU: 0 EIP: 0060:[<c01dcee4>] Not tainted EFLAGS: 00010297 (2.6.8-slt) EIP is at as_exit+0x22/0x62 eax: cf9d57f4 ebx: cf9d57e0 ecx: 00000000 edx: ce335eb0 esi: ce318eb8 edi: 00000282 ebp: cf9d14b4 esp: ce335ee4 ds: 007b es: 007b ss: 0068 Process scsi_eh_1 (pid: 446, threadinfo=ce334000 task=cea7f160) Stack: ce318e2c c01d5bdf ce318e2c ce318e38 c01d7462 ce318e2c ce2f1c24 ce2f1c00 d084fd92 ce318e2c ce2f1da8 c02c9a88 c02c9aa0 cf9d14d8 c01d2a51 ce2f1d84 c018c494 ce2f1da8 ce2f1c08 00000286 cf9d1400 ce2f1c00 d0849d10 ce2f1da8 Call Trace: [<c01d5bdf>] elevator_exit+0x12/0x15 [<c01d7462>] blk_cleanup_queue+0x1f/0x62 [<d084fd92>] scsi_device_dev_release+0xd8/0xf9 [scsi_mod] [<c01d2a51>] device_release+0x14/0x44 [<c018c494>] kobject_cleanup+0x40/0x65 [<d0849d10>] __scsi_iterate_devices+0x69/0x73 [scsi_mod] [<d084c28e>] scsi_eh_stu+0x11d/0x136 [scsi_mod] [<d084ca94>] scsi_eh_ready_devs+0x17/0x5d [scsi_mod] [<d084cd3d>] scsi_unjam_host+0x18d/0x1a2 [scsi_mod] [<d084ce68>] scsi_error_handler+0x116/0x15a [scsi_mod] [<d084cd52>] scsi_error_handler+0x0/0x15a [scsi_mod] [<c01041e1>] kernel_thread_helper+0x5/0xb Code: 0f 0b 3c 07 85 69 26 c0 8d 43 0c 39 43 0c 74 08 0f 0b 3d 07
After testing, it appears that the USB-disk was defective. With a new, disk, no problem is detected. But The last time, whem the scsi went down (because of the defective disk), the server was half crashed. I think it's bad to crash the whole system because the external device has problems.
A similar probleme has occured again, with an USB disk certified goog. Here is the call trace: SCSI error : <6 0 0 0> return code = 0x70000 end_request: I/O error, dev sdc, sector 42494602 SCSI error : <6 0 0 0> return code = 0x70000 end_request: I/O error, dev sdc, sector 42494603 usb 1-2: USB disconnect, address 10 SCSI error : <6 0 0 0> return code = 0x70000 end_request: I/O error, dev sdc, sector 42494604 scsi: Device offlined - not ready after error recovery: host 6 channel 0 id 0 lun 0 sd 6:0:0:0: Illegal state transition cancel->offline Badness in scsi_device_set_state at drivers/scsi/scsi_lib.c:1643 [<d084e5f0>] scsi_device_set_state+0xc4/0xcf [scsi_mod] [<d084c75d>] scsi_eh_offline_sdevs+0x47/0x60 [scsi_mod] [<d084cd3d>] scsi_unjam_host+0x18d/0x1a2 [scsi_mod] [<d084ce68>] scsi_error_handler+0x116/0x15a [scsi_mod] [<d084cd52>] scsi_error_handler+0x0/0x15a [scsi_mod] [<c01041e1>] kernel_thread_helper+0x5/0xb SCSI error : <6 0 0 0> return code = 0x70000 end_request: I/O error, dev sdc, sector 42494605 printk: 267 messages suppressed. Buffer I/O error on device sdc1, logical block 42494573 lost page write due to I/O error on sdc1 ------------[ cut here ]------------ kernel BUG at drivers/block/as-iosched.c:1852! invalid operand: 0000 [#1] PREEMPT Modules linked in: nls_iso8859_1 nls_cp437 vfat fat ppp_deflate zlib_deflate bsd_comp ipt_state ipt_REJECT ipt_LOG capability commoncap ppp_async af_packet crc_ccitt ipv6 ppp_generic slhc ip_conntrack_irc ip_conntrack_ftp ipt_MASQUERADE iptable_nat ip_conntrack iptable_filter ip_tables usb_storage ide_core ehci_hcd usbcore 8139too crc32 e100 mii rtc sd_mod ext3 jbd cryptoloop loop blowfish aic7xxx scsi_mod unix CPU: 0 EIP: 0060:[<c01dcee4>] Not tainted EFLAGS: 00010212 (2.6.8-slt) EIP is at as_exit+0x22/0x62 eax: cafb1774 ebx: cafb1760 ecx: 00000000 edx: c3a09eb0 esi: cec67eb8 edi: 00000282 ebp: cffd28b4 esp: c3a09ee4 ds: 007b es: 007b ss: 0068 Process scsi_eh_6 (pid: 16528, threadinfo=c3a08000 task=ca1140f0) Stack: cec67e2c c01d5bdf cec67e2c cec67e38 c01d7462 cec67e2c ca118c24 ca118c00 d084fd92 cec67e2c ca118da8 c02c9a88 c02c9aa0 cffd28d8 c01d2a51 ca118d84 c018c494 ca118da8 ca118c08 00000286 cffd2800 ca118c00 d0849d10 ca118da8 Call Trace: [<c01d5bdf>] elevator_exit+0x12/0x15 [<c01d7462>] blk_cleanup_queue+0x1f/0x62 [<d084fd92>] scsi_device_dev_release+0xd8/0xf9 [scsi_mod] [<c01d2a51>] device_release+0x14/0x44 [<c018c494>] kobject_cleanup+0x40/0x65 [<d0849d10>] __scsi_iterate_devices+0x69/0x73 [scsi_mod] [<d084c28e>] scsi_eh_stu+0x11d/0x136 [scsi_mod] [<d084ca94>] scsi_eh_ready_devs+0x17/0x5d [scsi_mod] [<d084cd3d>] scsi_unjam_host+0x18d/0x1a2 [scsi_mod] [<d084ce68>] scsi_error_handler+0x116/0x15a [scsi_mod] [<d084cd52>] scsi_error_handler+0x0/0x15a [scsi_mod] [<c01041e1>] kernel_thread_helper+0x5/0xb Code: 0f 0b 3c 07 85 69 26 c0 8d 43 0c 39 43 0c 74 08 0f 0b 3d 07
cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 5 model name : Pentium II (Deschutes) stepping : 2 cpu MHz : 398.370 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr bogomips : 784.38
Created attachment 6651 [details] More information Hello, problem seems not to be related to defect hard disc. Problem is repoducable on various motherboards, USB 2.0 PCI cards, 2.5" and 3.5" USB 2.0 hard disc boxes, various hard discs ( Hitachi Desc/Travel Star, Samsung 1SP161.. ) and all 2.6 kernel ( with processor optimisation or 'out of the box Debian Sarge 3.10a Distribution ) Problem occures: 1. quickly when reading very big files ( e.g.: DVD iso image ) from a fast ide/scsi device to USB 2.0 hard disc. 2. after cache is almost full ( on my system: 700 MB with appr. 3-4 MB Ram free. 3. NOT ( or not seen by me yet ) when copying many small files. It looks very much like a time out or buffer overflow ( in the USB hard disc box ) problem. Either from the USB EHCI <> usb_storage <> USB hard disc box ( have seen USB disconnect messages ) or the USB <> scsi emulation layer. System sometimes hangs completly sometimes a disconnect and reconnect of the USB storage device helps. The latter comes with a switch of the sdx device. E.g.: sda -> sdb first time then sdb -> sda again after 2nd disconnect/reconnect procedure. Of course, there a lot of hanfging processes related to ehci, pdflush ... I can send more kern.log ... on request.
Is this issue still present in kernel 2.6.16.7?
Hello, > ------- Additional Comments From bunk@stusta.de 2006-04-18 07:32 ------- > Is this issue still present in kernel 2.6.16.7? Answer: The latest kernel I tested with is the 2.6.16 ( after xx-rcy ). I could still see these errors - but really seldom. Some more tests ( specially after trying same thing at my friends place and instantly running into this trouble ) revealed trouble with hardware ( cables, USB2IDE adapters ? external HDs ?? ). THIS means that changing the just and ONLY the usb cable made things working or worse. Using the same external drive and usb cable on another computer with another usb 2.0 adapter lead to trouble - or not. ANYHOW:: I think that if there's some problems with transmission of data via USB the error recovery between the scsi emulation and usb driver / layer is not working that good. E.G::: Meanwhile I got a quite old SCSI hard disc. During linux installation on that one I saw that the hd reported defect sector(s) and started a very long time conduming sector re-arrangement procedure. BUT:: Instead of disconnecting and braking the installation procedure, the ( native ??) scsi driver waited for the disc to finish. Question:: Can it be that there is no possibility ( Specs limits ... ) or missing implentation of such error recovery between the scsi and USB Sub system?? Anyhow : Yesterday I started again with firewire and hard discs and had same problems ( either disc not recognized at all or if found and mounted it disconnected several times during reading of about 2 GBytes files ( 2 different discs ) Remark:: I'm not that aware of writing drivers and specs of USB and SCSI ::: So, please be patient - I'm trying to help AND NOT ANNOY !! Many greetings /RalfS
I would like to clarify the situation. First, in the latest reprot, it appears that the problem goes away if you switch cables. Is that correct? Second, this problem only happens with EHCI (not UHCI or OHCI)? Third, what does the crash message look like with the latest kernel?
> ------- Additional Comments From mdharm-usb@one-eyed-alien.net 2006-05-08 13:39 ------- > I would like to clarify the situation. > > First, in the latest reprot, it appears that the problem goes away if you switch > cables. Is that correct? > Answer: Correct ++ using the same cable with other hardware ( computer, PCI to USB interface card ( VIA, ALI chipset ) and or USB hard disc (including their various USB to ide chips ) make the problem disappear. > Second, this problem only happens with EHCI (not UHCI or OHCI)? > Answer: Well, recently I only use EHCI because of the speed. BUT: I have never seen a problem in former time when I had only USB 1.1 interfaces. Addon, probably I never copied such an amount of data with 1.1 . > Third, what does the crash message look like with the latest kernel? > Sorry, havn't downloaded the latest kernel yet. Will do so but may take a little while. ------ I hope that I could clarify things. If not, please respond. I will restart the testings in a more professionell way and write a list of hardware combinations and results. Thanks for Your patience /RalfS
Error recovery has greatly improved in recent kernels. If this continues to be an issue, we can re-open this bug.