Distribution: OpenSUSE 10.1 Hardware Environment: Supermicro H8DCE motherboard (based on nVidia nForce Pro chipset) with two AMD Opterons 246 2x SATA harddrives WD1600JS inside hotplug SATA bays Problem Description: Hello, I observe a problem described bellow. I'm not sure if it's not related more to MD (RAID) then SATA. I'm running two SATA disks in a software (md) RAID1. If I extract a disk from RAID array, I would expect it should by instantly signed by RAID as faulty but a test shows something else. If I extract one HDD from RAID1 the kernel freezes after first request towards filesystem. If I insert the HDD back, kernel runs again and the MD RAID state remains unaffected (output of /proc/mdstat without any failure). After kernel revovery from the frozen state the syslog shows: Jun 8 13:03:48 helpdesk kernel: nv_sata: Secondary device removed Jun 8 13:04:18 helpdesk kernel: ata2: command 0xea timeout, stat 0xd0 host_stat 0x0 Jun 8 13:04:18 helpdesk kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Jun 8 13:04:18 helpdesk kernel: ata2: status=0xd0 { Busy } Jun 8 13:04:18 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:04:18 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:04:18 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:04:37 helpdesk kernel: nv_sata: Secondary device added Jun 8 13:05:18 helpdesk kernel: ata2: command 0x35 timeout, stat 0x50 host_stat 0x21 If I extract one HDD as above but wait for long enough time while the kernel is frozen then after two minutes some timeout occures. Then the extracted HDD is kicked out from the RAID and kernel runs again. The syslog shows: Jun 8 13:11:06 helpdesk kernel: nv_sata: Secondary device removed Jun 8 13:11:36 helpdesk kernel: ata2: command 0xea timeout, stat 0xd0 host_stat 0x0 Jun 8 13:11:36 helpdesk kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Jun 8 13:11:36 helpdesk kernel: ata2: status=0xd0 { Busy } Jun 8 13:11:36 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:11:36 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:11:36 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:12:36 helpdesk kernel: ata2: command 0x35 timeout, stat 0xd0 host_stat 0x21 Jun 8 13:12:36 helpdesk kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Jun 8 13:12:36 helpdesk kernel: ata2: status=0xd0 { Busy } Jun 8 13:12:36 helpdesk kernel: sd 1:0:0:0: SCSI error: return code = 0x8000002 Jun 8 13:12:36 helpdesk kernel: sdb: Current: sense key: Aborted Command Jun 8 13:12:36 helpdesk kernel: Additional sense: Scsi parity error Jun 8 13:13:06 helpdesk kernel: end_request: I/O error, dev sdb, sector 46154502 Jun 8 13:13:06 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:13:06 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:13:06 helpdesk kernel: ATA: abnormal status 0xD0 on port 0xC007 Jun 8 13:13:06 helpdesk kernel: ata2: command 0xea timeout, stat 0xd0 host_stat 0x0 Jun 8 13:13:06 helpdesk kernel: ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00 Jun 8 13:13:06 helpdesk kernel: ata2: status=0xd0 { Busy } Jun 8 13:13:06 helpdesk kernel: raid1: Disk failure on sdb2, disabling device. Jun 8 13:13:06 helpdesk kernel: Operation continuing on 1 devices Jun 8 13:13:06 helpdesk kernel: RAID1 conf printout: Jun 8 13:13:06 helpdesk kernel: --- wd:1 rd:2 Jun 8 13:13:06 helpdesk kernel: disk 0, wo:0, o:1, dev:sda2 Jun 8 13:13:06 helpdesk kernel: disk 1, wo:1, o:0, dev:sdb2 Jun 8 13:13:06 helpdesk kernel: RAID1 conf printout: Jun 8 13:13:06 helpdesk kernel: --- wd:1 rd:2 Jun 8 13:13:06 helpdesk kernel: disk 0, wo:0, o:1, dev:sda2 If I remove the HDD from RAID (by mdadm) before I extract it, nothing bad happens. The syslog shows only: Jun 8 13:24:58 helpdesk kernel: nv_sata: Secondary device removed Jun 8 13:25:07 helpdesk kernel: nv_sata: Secondary device added Thanks Jiri Here are more informations about the system: # uname -a Linux helpdesk 2.6.16.20-smp #1 SMP Wed Jun 7 21:56:29 CEST 2006 x86_64 x86_64 x86_64 GNU/Linux # cat /proc/mdstat Personalities : [raid1] md2 : active raid1 sda3[0] sdb3[1] 133210880 blocks [2/2] [UU] md1 : active raid1 sda2[0] sdb2[1] 20972736 blocks [2/2] [UU] md0 : active raid1 sda1[0] sdb1[1] 2104384 blocks [2/2] [UU] unused devices: <none> # lspci 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon X300 (PCIE)] 01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE] 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 80:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 80:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 80:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) # dmesg (extract from the boot-up) md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 libata version 1.20 loaded. sata_nv 0000:00:07.0: version 0.8 ACPI: PCI Interrupt Link [LTID] enabled at IRQ 22 GSI 16 sharing vector 0xD9 and IRQ 16 ACPI: PCI Interrupt 0000:00:07.0[A] -> Link [LTID] -> GSI 22 (level, low) -> IRQ 217 PCI: Setting latency timer of device 0000:00:07.0 to 64 ata1: SATA max UDMA/133 cmd 0xC800 ctl 0xC402 bmdma 0xB800 irq 217 ata2: SATA max UDMA/133 cmd 0xC000 ctl 0xBC02 bmdma 0xB808 irq 217 ata1: SATA link up 3.0 Gbps (SStatus 123) ata1: dev 0 cfg 49:2f00 82:706b 83:7e61 84:4023 85:7069 86:3e41 87:4023 88:407f ata1: dev 0 ATA-7, max UDMA/133, 312581808 sectors: LBA48 nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed ata1: dev 0 configured for UDMA/133 scsi0 : sata_nv ata2: SATA link up 3.0 Gbps (SStatus 123) ata2: dev 0 cfg 49:2f00 82:706b 83:7e61 84:4023 85:7069 86:3e41 87:4023 88:407f ata2: dev 0 ATA-7, max UDMA/133, 312581808 sectors: LBA48 nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed ata2: dev 0 configured for UDMA/133 scsi1 : sata_nv Vendor: ATA Model: WDC WD1600JS-98N Rev: 10.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back sda:<4>nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed sda1 sda2 sda3 sd 0:0:0:0: Attached scsi disk sda Vendor: ATA Model: WDC WD1600JS-98N Rev: 10.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back SCSI device sdb: 312581808 512-byte hdwr sectors (160042 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back sdb:<4>nv_sata: Primary device added nv_sata: Primary device removed nv_sata: Secondary device added nv_sata: Secondary device removed sdb1 sdb2 sdb3 sd 1:0:0:0: Attached scsi disk sdb ACPI: PCI Interrupt Link [LTIE] enabled at IRQ 21 GSI 17 sharing vector 0xE1 and IRQ 17 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [LTIE] -> GSI 21 (level, low) -> IRQ 225 PCI: Setting latency timer of device 0000:00:08.0 to 64 ata3: SATA max UDMA/133 cmd 0xB400 ctl 0xB002 bmdma 0xA400 irq 225 ata4: SATA max UDMA/133 cmd 0xAC00 ctl 0xA802 bmdma 0xA408 irq 225 sd 0:0:0:0: Attached scsi generic sg0 type 0 sd 1:0:0:0: Attached scsi generic sg1 type 0 ata3: SATA link down (SStatus 0) scsi2 : sata_nv ata4: SATA link down (SStatus 0) scsi3 : sata_nv ACPI: PCI Interrupt Link [LT3D] enabled at IRQ 47 GSI 18 sharing vector 0xE9 and IRQ 18 ACPI: PCI Interrupt 0000:80:07.0[A] -> Link [LT3D] -> GSI 47 (level, low) -> IRQ 233 PCI: Setting latency timer of device 0000:80:07.0 to 64 ata5: SATA max UDMA/133 cmd 0xF800 ctl 0xF402 bmdma 0xE800 irq 233 ata6: SATA max UDMA/133 cmd 0xF000 ctl 0xEC02 bmdma 0xE808 irq 233 ata5: SATA link down (SStatus 0) scsi4 : sata_nv ata6: SATA link down (SStatus 0) scsi5 : sata_nv ACPI: PCI Interrupt Link [LT2E] enabled at IRQ 46 GSI 19 sharing vector 0x32 and IRQ 19 ACPI: PCI Interrupt 0000:80:08.0[A] -> Link [LT2E] -> GSI 46 (level, low) -> IRQ 50 PCI: Setting latency timer of device 0000:80:08.0 to 64 ata7: SATA max UDMA/133 cmd 0xE400 ctl 0xE002 bmdma 0xD400 irq 50 ata8: SATA max UDMA/133 cmd 0xDC00 ctl 0xD802 bmdma 0xD408 irq 50 ata7: SATA link down (SStatus 0) scsi6 : sata_nv ata8: SATA link down (SStatus 0) scsi7 : sata_nv md: raid1 personality registered for level 1 md: md0 stopped. md: bind<sdb1> md: bind<sda1> raid1: raid set md0 active with 2 out of 2 mirrors md: md1 stopped. md: bind<sdb2> md: bind<sda2> raid1: raid set md1 active with 2 out of 2 mirrors Attempting manual resume md: md1 stopped. md: unbind<sda2> md: export_rdev(sda2) md: unbind<sdb2> md: export_rdev(sdb2) md: md0 still in use. md: md0 still in use. md: md0 still in use. md: md1 stopped. md: bind<sdb2> md: bind<sda2> raid1: raid set md1 active with 2 out of 2 mirrors md: md2 stopped. md: bind<sdb3> md: bind<sda3> raid1: raid set md2 active with 2 out of 2 mirrors
Jiri, any updates on this? Have you tested with latest kernels? Thanks.
*** This bug has been marked as a duplicate of bug 8421 ***
Hello Natalie, unfortunately I didn't test it with the latest kernels. My final step was to use a rc version (I'll post the exact number as soon as I will reach the server). With this kernel this problem was not present any more. But remaines the problem that after the disc is inserted back into the server it gets another /dev/sdX number and thus can't be smoothly added to the raid. Jirka bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=6665 > > > protasnb@gmail.com changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |protasnb@gmail.com > > > > > ------- Comment #1 from protasnb@gmail.com 2007-08-08 13:07 ------- > Jiri, any updates on this? Have you tested with latest kernels? > Thanks. > > >