Bug 8968
Summary: | One broken USB storage device can hang the entire USB subsystem | ||
---|---|---|---|
Product: | Drivers | Reporter: | Alain Knaff (kernel) |
Component: | USB | Assignee: | Greg Kroah-Hartman (greg) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, bunk, protasnb, stern |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22.6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 5089 | ||
Attachments: |
echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo
echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo Dmesg output on a kernel with CONFIG_USB_DEBUG=y echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo without dodgy USB hub /proc/bus/usb/devices before crash /sys/class/usb_host/usb_host6/registers /sys/class/usb_host/usb_host6/async /proc/bus/usb/devices after crash |
Description
Alain Knaff
2007-09-02 14:07:05 UTC
Please make it hang, then do echo t > /proc/sysrq-trigger dmesg -s 1000000 > foo then attach foo to this report, thanks. Created attachment 12674 [details]
echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo
Output of echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo
Alas, not all the output is there. You'll need a larger kernel log buffer. Can you please add "log_buf_len=1M" to the kernel boot command line, reboot and then rerun the sysrq-T thing? Thank. Created attachment 12682 [details]
echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo
Output of echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo
Btw, this time the drive led stayed red... (formerly it was green)
Reply-To: akpm@linux-foundation.org > On Mon, 3 Sep 2007 10:55:26 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=8968 Either a scsi bug or a usb-storage bug. The interesting bits are below: > [ 411.364000] scsi_eh_6 D F7E21F28 0 2782 2 (L-TLB) > [ 411.364000] f7e21f3c 00000046 00000002 f7e21f28 f7e21f24 00000000 > 00000082 f79ec600 > [ 411.364000] dd5c5a00 0000000a b1510900 00000058 00000000 dd56a10c > c3015b00 00000000 > [ 411.364000] 00000002 00000001 deb0e1c0 00004ec3 00000000 00000082 > 000000ff 00000000 > [ 411.364000] Call Trace: > [ 411.364000] [<c02f11e4>] wait_for_completion+0x94/0xe0 > [ 411.364000] [<c0122bc0>] default_wake_function+0x0/0x10 > [ 411.364000] [<f88fe6d5>] command_abort+0x85/0xb0 [usb_storage] > [ 411.364000] [<f8a51f89>] __scsi_try_to_abort_cmd+0x19/0x20 [scsi_mod] > [ 411.364000] [<f8a53b86>] scsi_error_handler+0x356/0x510 [scsi_mod] > [ 411.364000] [<c01205c0>] complete+0x40/0x60 > [ 411.364000] [<f8a53830>] scsi_error_handler+0x0/0x510 [scsi_mod] > [ 411.364000] [<c013ba12>] kthread+0x42/0x70 > [ 411.364000] [<c013b9d0>] kthread+0x0/0x70 > [ 411.364000] [<c0105487>] kernel_thread_helper+0x7/0x10 > [ 411.364000] ======================= > [ 411.364000] usb-storage D F7E23EA0 0 2783 2 (L-TLB) > [ 411.364000] f7e23eb4 00000046 00000000 f7e23ea0 f79ec600 de7e6a80 > 000031ba 00000000 > [ 411.364000] 00000a00 0000000a b56a6600 00000051 00000000 dd56a62c > c3015b00 00000000 > [ 411.364000] c02f18d8 0001b000 de7e6a80 000031ba dfb89d00 00002000 > 00000010 c0008380 > [ 411.364000] Call Trace: > [ 411.364000] [<c02f18d8>] schedule_timeout+0x78/0xd0 > [ 411.364000] [<c02f11e4>] wait_for_completion+0x94/0xe0 > [ 411.364000] [<c0122bc0>] default_wake_function+0x0/0x10 > [ 411.364000] [<f8887716>] usb_sg_wait+0x126/0x170 [usbcore] > [ 411.364000] [<f88ff2e0>] usb_stor_bulk_transfer_sg+0xb0/0x100 > [usb_storage] > [ 411.364000] [<f88ff11e>] usb_stor_bulk_transfer_buf+0x4e/0x90 > [usb_storage] > [ 411.364000] [<f88ff776>] usb_stor_Bulk_transport+0x126/0x290 > [usb_storage] > [ 411.364000] [<f89007b0>] usb_stor_control_thread+0x0/0x1d0 [usb_storage] > [ 411.364000] [<f88ff904>] usb_stor_invoke_transport+0x24/0x2f0 > [usb_storage] > [ 411.364000] [<c012003f>] __wake_up_locked+0x1f/0x30 > [ 411.364000] [<c02f2820>] __down_interruptible+0xf0/0x120 > [ 411.364000] [<c0122bc0>] default_wake_function+0x0/0x10 > [ 411.364000] [<f89007b0>] usb_stor_control_thread+0x0/0x1d0 [usb_storage] > [ 411.364000] [<f89007b0>] usb_stor_control_thread+0x0/0x1d0 [usb_storage] > [ 411.364000] [<f8900918>] usb_stor_control_thread+0x168/0x1d0 > [usb_storage] > [ 411.364000] [<c01205c0>] complete+0x40/0x60 > [ 411.364000] [<f89007b0>] usb_stor_control_thread+0x0/0x1d0 [usb_storage] > [ 411.364000] [<c013ba12>] kthread+0x42/0x70 > [ 411.364000] [<c013b9d0>] kthread+0x0/0x70 > [ 411.364000] [<c0105487>] kernel_thread_helper+0x7/0x10 > [ 411.364000] ======================= > > ... > > [ 411.364000] rsync D 000031F9 0 8809 8779 (NOTLB) > [ 411.364000] de1a9db0 00000082 00000051 000031f9 c3016a38 00000000 > df93a6d0 c3010000 > [ 411.364000] c010320a 00000009 b56a6600 00000051 00000000 df93a62c > c3015b00 00000000 > [ 411.364000] de1a9df0 00000082 de7e6a80 00000086 de1a9ea8 c012da61 > c0118735 b56a6600 > [ 411.364000] Call Trace: > [ 411.364000] [<c010320a>] __switch_to+0xaa/0x1d0 > [ 411.364000] [<c012da61>] irq_exit+0x51/0x80 > [ 411.364000] [<c0118735>] smp_apic_timer_interrupt+0x55/0x80 > [ 411.364000] [<c02f171d>] io_schedule+0x1d/0x30 > [ 411.364000] [<c015e7c9>] sync_page+0x39/0x50 > [ 411.364000] [<c02f19ad>] __wait_on_bit_lock+0x3d/0x70 > [ 411.364000] [<c015e790>] sync_page+0x0/0x50 > [ 411.364000] [<c015e773>] __lock_page+0x73/0x80 > [ 411.364000] [<c013bd20>] wake_bit_function+0x0/0x60 > [ 411.364000] [<c015f193>] do_generic_mapping_read+0x213/0x5d0 > [ 411.364000] [<c01612ec>] generic_file_aio_read+0xcc/0x1d0 > [ 411.364000] [<c015e520>] file_read_actor+0x0/0xf0 > [ 411.364000] [<c01802f5>] do_sync_read+0xd5/0x120 > [ 411.364000] [<c013bcd0>] autoremove_wake_function+0x0/0x50 > [ 411.364000] [<c0180c2c>] vfs_read+0xbc/0x160 > [ 411.364000] [<c0180220>] do_sync_read+0x0/0x120 > [ 411.364000] [<c0181161>] sys_read+0x41/0x70 > [ 411.364000] [<c01041d2>] sysenter_past_esp+0x6b/0xa9 > > It's entirely possible that the hang (not the original error, just the hang) is caused by a bug in your USB host controller hardware. See for example Bugzilla entry #8692. You need to provide more information about your setup. Start with the output from lspci and the contents of /proc/bus/usb/devices -- indicate which device is causing the problem. Also, you should build a kernel with CONFIG_USB_DEBUG enabled and attach the dmesg log after the error occurs. Created attachment 12686 [details]
Dmesg output on a kernel with CONFIG_USB_DEBUG=y
Dmesg output on a kernel with CONFIG_USB_DEBUG=y
I don't see anything in your dmesg log indicating a hang. But one thing does stand out: The Chesen USB-PS2 converter (attached to a keyboard) constantly gets reset every few seconds. You might want to try plugging that device into the other hub, the one with the audio device, instead. The Chesen USB-PS2 converter is actually an USB2 hub, which also happens to have a PS/2 mouse and PS/2 keyboard connector on the side. I bought it for use as a hub, so my keyboard is actually connected directly to the PC, and not to this hub. At the time of the crash nothing else was (no USB device) was connected to that hub either, the "faulty" Sharkoon enclosure was connected directly to another USB connector. However, although my (PS/2) keyboard was connected directly to the PC, it did act up occasionally during my tests: a single keypress would generate multiple repeated characters. The USB loudspeakers are connected to another USB, which is only 1.1. So, it would be somewhat pointless to connect the fast USB 2.0 hub to the slow one. However, the fast hub is indeed a tad bid dodgy: a number of devices (the loudspeakers, and also the serial cable to connect my mobile phone) do not work correctly on it, but only on the slow hub. Ok, in any case, I disconnected that dodgy hub altogether, and repeated the test. With the hub removed, the crash did indeed take much longer to occur (after 5 Gigs had been successfully copied), but it did occur nevertheless. Attached is the echo t >/proc/sysrq-trigger output after this happened. This time, I used cp rather than rsync. Created attachment 12699 [details]
echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo without dodgy USB hub
echo t > /proc/sysrq-trigger ; dmesg -s 1000000 > foo without dodgy USB hub
Okay. The stack trace (echo t >/proc/sysrq-trigger) doesn't provide any more information than it did before, so you may as well leave it out in the future. The next thing to do is check the contents of /sys/class/usb_host/usb_host6 after the next crash. The "registers" and "async" files are of particular interest; please attach them to this bug report. (And be sure to keep CONFIG_USB_DEBUG enabled!) Also, it wouldn't hurt to attach a copy of /proc/bus/usb/devices (before the crash, but with the drive plugged in). Created attachment 12700 [details]
/proc/bus/usb/devices before crash
/proc/bus/usb/devices before crash
Created attachment 12701 [details]
/sys/class/usb_host/usb_host6/registers
Created attachment 12702 [details]
/sys/class/usb_host/usb_host6/async
Created attachment 12703 [details]
/proc/bus/usb/devices after crash
Any new status on this problem please? Thanks. The problem hasn't re-occurred since a while (since the machine was upgraded to Kubuntu 7.10 , kernel 2.6.22-14-generic if I remember correctly) |