Distribution: Debian stable + selfcompiled kernel (vanilla 2.6.16) (config as attachment) Hardware Environment: motherboard Intel D945something, 2GB RAM, 3x SATA disk e1000 not used - not working reliably computer tested with memtest86, burnP6, all OK server:~# lspci 0000:00:00.0 Host bridge: Intel Corp.: Unknown device 2770 (rev 02) 0000:00:02.0 VGA compatible controller: Intel Corp.: Unknown device 2772 (rev 02) 0000:00:1c.0 PCI bridge: Intel Corp.: Unknown device 27d0 (rev 01) 0000:00:1c.2 PCI bridge: Intel Corp.: Unknown device 27d4 (rev 01) 0000:00:1c.3 PCI bridge: Intel Corp.: Unknown device 27d6 (rev 01) 0000:00:1c.4 PCI bridge: Intel Corp.: Unknown device 27e0 (rev 01) 0000:00:1c.5 PCI bridge: Intel Corp.: Unknown device 27e2 (rev 01) 0000:00:1d.0 USB Controller: Intel Corp.: Unknown device 27c8 (rev 01) 0000:00:1d.1 USB Controller: Intel Corp.: Unknown device 27c9 (rev 01) 0000:00:1d.2 USB Controller: Intel Corp.: Unknown device 27ca (rev 01) 0000:00:1d.3 USB Controller: Intel Corp.: Unknown device 27cb (rev 01) 0000:00:1d.7 USB Controller: Intel Corp.: Unknown device 27cc (rev 01) 0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev e1) 0000:00:1f.0 ISA bridge: Intel Corp.: Unknown device 27b8 (rev 01) 0000:00:1f.1 IDE interface: Intel Corp.: Unknown device 27df (rev 01) 0000:00:1f.2 0106: Intel Corp.: Unknown device 27c1 (rev 01) 0000:00:1f.3 SMBus: Intel Corp.: Unknown device 27da (rev 01) 0000:01:00.0 Ethernet controller: Intel Corp.: Unknown device 108c (rev 03) 0000:01:00.2 IDE interface: Intel Corp.: Unknown device 108d (rev 03) 0000:01:00.3 Serial controller: Intel Corp.: Unknown device 108f (rev 03) 0000:01:00.4 0c07: Intel Corp.: Unknown device 108e (rev 03) 0000:06:01.0 Ethernet controller: 3Com Corporation 3c905 100BaseTX [Boomerang] Software Environment: Debian stable, samba, cups, dhcp server Problem Description: 1) strange crash with filesystem corruption server (samba) is not accessible, onsite person rebooted server, boot failed with filesystem errors (on ext3 partition), manual fsck is needed, major data loss (sorry, I don't have more info on this one) 2) mdadm --manage .. --add locks computer after recovering data from backup I noticed broken sw raid. I tried mdadm --manage /dev/md1 --add /dev/sdb2 nothing happened (observed by cat /prod/mdstat) so I tried it again At this point any program trying to access disk is locked). I found some error messages on console. sw raid configuration: (correct, at problem time md2 and md3 are brokem - bad sdb) server:~# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb2[0] sda5[1] 1999936 blocks [2/2] [UU] md2 : active raid1 sdb3[0] sda6[1] 3903680 blocks [2/2] [UU] md3 : active raid1 sdb4[0] sda7[1] 283241856 blocks [2/2] [UU] md0 : active raid1 sdb1[0] sda1[1] 3903680 blocks [2/2] [UU] unused devices: <none> Console messages (written to paper): ata2: handling error/timeout ata2: port reset, p_is 0 is 0 pis 0 cmd 44 017 f7 d0 ss 113 se 0 ata2_status = 0x50 Assertion failed! qc->err_mask == 0 drivers/scsc/ahci.c, ahci_host_intr line 681 Assertion failed! qc->flags & ATA_QCFLAG_ACTIVE drivers/scsi/libata-core.c, ata_qc_complete, line 3631 ata2: status 0x50 sdb: Current sense key = 0x0 ASC = 0x0 ASCQ = 0x0 Badness in blk_do_ordered at block/ll_rw_blk.c: 550 blk_do_ordered+0x282 elv_next_request+0xd6 scsi_request_fn+0x60 blk_run_queue+0x1f scsi_run_queue+0xfa scsi_next_command+0x26 scsi_end_request+0x94 scsi_io_completion+0x193 sd_rw_intr+0x1b1 scsi_finish_command+0x13 ata_scsi_qc_complete+0x171 ahci_interrupt+0xda handle_IRQ_event+0x20 __do_IRQ+0x53 do_IRQ+0x19 common_interrupt+0x1a cast6_decrypt+16e scsi_request_fn+0x232 blk_run_queue+0x1f scsi_run_queue+0xfa scsi_next_command+0x26 scsi_end_request+0x94 scsi_io_completion+0x193 scsi_blk_pc_done+0x26 ata_scsi_qc_complete+0x6a ata_qc_complete+0x171 ahci_eng_timeout+0x6a scsi_error_handle+0x0 ata_scsi_error+0x12 scsi_error_handle+0x69 kthread-0x80 kthread+0x9a kthread+00 kernel_thread_helper Badness in blk_do_ordered at block/ll_rw_blk.c: 550 blk_do_ordered+0x282 elv_next_request+0xd6 scsi_request_fn+0x60 scsi_error_handler+0x0 blk_run_queue+0x1f scsi_run_queue+0xfa scsi_error_handler+0x0 scsi_run_???_queues+0x12 scsi_error_handler+0x6c2 scsi_error_handler+0 kthread+0x80 scsi_error_handler+0x0 kthread+0x94 kthread kernel_thread_helper Steps to reproduce: 1) can't repoduce 2) mdadm --manage /dev/md1 --add /dev/sdb2, now solved by new disk
Created attachment 8159 [details] config of affected kernel
Looks more like a SATA problem to me -- reassigning. It might be helpful if you have any more details on the way in which sdb was 'bad'.
Disk is on my desk. What I should test?
Petr, have you tested with latest kernels, do you still see the problem? Thanks.
Sorry, I don't have this bad disk so I can't test :-(.
This is definitely fixed now. Closing.