Possibly a hardware failure, but should not cause a system crash. Hardware: A Maxtor MaxLine II Plus, connected to Intel AHCI controller (Asus P5AD2-E Premium / i925XE). Three other HDDs connected to the same controller, many more HDDs on other controllers. Steps to reproduce: read data from the faulty drive (when it is in its faulty state). Hardly any data will get read (it's very slow) and then kernel panic occurs (after maybe 20 seconds). After a reset it is repeatable and will still crash. After a poweroff and touching the cables the disk can once again be read and written fine, until it fails at later time. This hasn't happened enough times for me to try guessing whether a poweroff or replugging the loose SATA cables was what fixed it. I could use the HDD fine for several days until it failed again. None of the other HDDs are showing any signs of failure, while this particular HDD has failed numerous times. This is what I gathered of the panic message (numerical data omitted): handle_IRQ_event __do_IRQ __wake_up_common do_IRQ common_interrupt scsi_request_fn blk_run_queue scsi_next_command scsi_end_request scsi_io_completion sd_rw_intr ata_scsi_qc_complete ata_qc_complete ahci_eng_timeout scsi_error_handler ata_scsi_error scsi_error_handler kthread kthread kernel_thread_helper <0>Kernel panic - not syncing. Fatal exception in interrupt
Still happens with 2.6.15. Could someone kick Jeff Garzik? I'd really appreciate a reply. I would post a full error message, but netconsole stops working when init is started (prints messages fine until "Freeing unused kernel memory", but nothing after that row).
Got the netconsole working (by using /bin/sh as init). Here's the log after that crash occurs. I'm reading data off the disk here, the reading stalls, and after a couple of seconds this gets printed: ata4: handling error/timeout ata4: port reset, p_is 4000000 is 0 pis 4000000 cmd 4017 tf d0 ss 113 se 400000 ata4: status=0x50 { DriveReady SeekComplete } sdd: Current: sense key=0x0 ASC=0x0 ASCQ=0x0 Assertion failed! qc->flags & ATA_QCFLAG_ACTIVE,drivers/scsi/libata-core.c,ata_qc_complete,line=3513 ------------[ cut here ]------------ kernel BUG at drivers/scsi/scsi.c:295! invalid operand: 0000 [#1] Modules linked in: CPU: 0 EIP: 0060:[<c037dcaf>] Not tainted VLI EFLAGS: 00010046 (2.6.15-gentoo) EIP is at scsi_put_command+0x8b/0x95 eax: f7ea7990 ebx: f7ed9b00 ecx: f7ed9b0c edx: f7ed9b0c esi: f7e9f000 edi: 00000282 ebp: f7ea7800 esp: f7ebdc80 ds: 007b es: 007b ss: 0068 Process scsi_eh_3 (pid: 950, threadinfo=f7ebc000 task=f7e82030) Stack: f7ea79f8 c026301f f7ed9b00 f7ea7990 f7e71808 f7e71808 c0382684 f7ed9b00 f73ebd94 f7ed9b00 00000286 c038279d f7ed9b00 00000001 00000000 f73ebd94 00000000 00000000 f7ed9b00 c0382a8a f7ed9b00 00000001 00000000 00000001 Call Trace: [<c026301f>] kobject_get+0x17/0x1e [<c0382684>] scsi_next_command+0x2f/0x4f [<c038279d>] scsi_end_request+0xc3/0xe7 [<c0382a8a>] scsi_io_completion+0x137/0x4d5 [<c03936c9>] sd_rw_intr+0x13d/0x256 [<c037e378>] scsi_finish_command+0x24/0xa4 [<c038d9ca>] ata_scsi_qc_complete+0x5b/0xaf [<c038af41>] ata_qc_complete+0x3a/0xb4 [<c038f681>] ahci_interrupt+0xe0/0x20a [<c01049dd>] do_IRQ+0x1e/0x24 [<c010309a>] common_interrupt+0x1a/0x20 [<c01391b0>] handle_IRQ_event+0x39/0x6d [<c0139243>] __do_IRQ+0x5f/0xc0 [<c011a3db>] __wake_up_common+0x38/0x57 [<c01049d8>] do_IRQ+0x19/0x24 [<c010309a>] common_interrupt+0x1a/0x20 [<c03835a7>] scsi_request_fn+0x1b1/0x2e2 [<c02565eb>] blk_run_queue+0x3a/0x3c [<c038268c>] scsi_next_command+0x37/0x4f [<c038279d>] scsi_end_request+0xc3/0xe7 [<c0382a8a>] scsi_io_completion+0x137/0x4d5 [<c03936c9>] sd_rw_intr+0x13d/0x256 [<c038da02>] ata_scsi_qc_complete+0x93/0xaf [<c038af41>] ata_qc_complete+0x3a/0xb4 [<c038f575>] ahci_eng_timeout+0x83/0xae [<c0381b54>] scsi_error_handler+0x0/0xa0 [<c038d175>] ata_scsi_error+0x17/0x2b [<c0381bd8>] scsi_error_handler+0x84/0xa0 [<c038d175>] ata_scsi_error+0x17/0x2b [<c0381bd8>] scsi_error_handler+0x84/0xa0 [<c012fb4c>] kthread+0xb4/0xea [<c012fa98>] kthread+0x0/0xea [<c0101329>] kernel_thread_helper+0x5/0xb Code: 5c 24 08 8b 74 24 0c 89 44 24 1c 8b 7c 24 10 8b 6c 24 14 83 c4 18 e9 a8 11 f6 ff 89 43 0c 31 db 89 48 04 89 4e 14 89 51 04 eb b7 <0f> 0b 27 01 f6 27 5e c0 eb 95 57 56 53 83 ec 28 8b 74 24 38 8d <0>Kernel panic - not syncing: Fatal exception in interrupt <6>SysRq : Terminate All Tasks SysRq : Terminate All Tasks SysRq : Kill All Tasks SysRq : Emergency Remount R/O SysRq : Emergency Sync SysRq : Power Off SysRq : Power Off SysRq : Power Off The system is dead.
libata now has full error handling. Please re-open if still seen