Latest working kernel version: Not known Earliest failing kernel version: 2.6.23.8 Distribution: Gentoo Hardware Environment: JMicron AHCI controller in PCIE, Nvidia Nforce4, Software Environment: gcc 4.2.2, binutils 2.18, glibc 2.7 Problem Description: I have AHCI in kernel: If I have the drive on during boot, the drive is detected, /dev/sdi is created and works fine. If I turn the drive on after boot, the drive throws errors and no /dev/sdi is created (referred as "errors"). Turning the drive off and on again, doesn't help. I have AHCI built as module: If the drive is on and hasn't seen errors earlier, when I modprobe ahci, the drive is detected fine, /dev/sdi is created and works. If the drive is off and I modprobe ahci, errors. A rmmod and modprobe doesn't help. If I modprobe ahci and then turn on the drive, errors. Steps to reproduce: Turn the device on after the AHCI has loaded and errors. The errors seen are: ###################### drive is turned on and modprobe ahci is done ######################### Feb 7 19:13:03 localhost [ 361.513897] ahci 0000:04:00.0: version 3.0 Feb 7 19:13:03 localhost [ 361.514148] ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17 Feb 7 19:13:03 localhost [ 361.514155] ACPI: PCI Interrupt 0000:04:00.0[A] -> Link [APC2] -> GSI 17 (level, low) -> IRQ 22 Feb 7 19:13:04 localhost [ 362.513197] ahci 0000:04:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode Feb 7 19:13:04 localhost [ 362.513202] ahci 0000:04:00.0: flags: 64bit ncq pm led clo pmp pio slum part Feb 7 19:13:04 localhost [ 362.513206] PCI: Setting latency timer of device 0000:04:00.0 to 64 Feb 7 19:13:04 localhost [ 362.513568] scsi7 : ahci Feb 7 19:13:04 localhost [ 362.514081] scsi8 : ahci Feb 7 19:13:04 localhost [ 362.514106] ata7: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe100 irq 22 Feb 7 19:13:04 localhost [ 362.514110] ata8: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe180 irq 22 Feb 7 19:13:05 localhost [ 362.818688] ata7: SATA link down (SStatus 0 SControl 300) Feb 7 19:13:05 localhost [ 363.427706] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Feb 7 19:13:05 localhost [ 363.428912] ata8.00: ATA-7: ST3750330AS, SD04, max UDMA/133 Feb 7 19:13:05 localhost [ 363.428915] ata8.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 31/32) Feb 7 19:13:05 localhost [ 363.430484] ata8.00: configured for UDMA/133 Feb 7 19:13:05 localhost [ 363.429554] scsi 8:0:0:0: Direct-Access ATA ST3750330AS SD04 PQ: 0 ANSI: 5 Feb 7 19:13:05 localhost [ 363.429694] sd 8:0:0:0: [sdi] 1465149168 512-byte hardware sectors (750156 MB) Feb 7 19:13:05 localhost [ 363.429771] sd 8:0:0:0: [sdi] Write Protect is off Feb 7 19:13:05 localhost [ 363.429774] sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 7 19:13:05 localhost [ 363.429918] sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 7 19:13:05 localhost [ 363.430030] sd 8:0:0:0: [sdi] 1465149168 512-byte hardware sectors (750156 MB) Feb 7 19:13:05 localhost [ 363.430105] sd 8:0:0:0: [sdi] Write Protect is off Feb 7 19:13:05 localhost [ 363.430107] sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 7 19:13:05 localhost [ 363.430222] sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 7 19:13:05 localhost [ 363.430224] sdi: sdi1 sdi2 sdi3 sdi4 < sdi5 > Feb 7 19:13:05 localhost [ 363.470136] sd 8:0:0:0: [sdi] Attached SCSI disk Feb 7 19:13:05 localhost [ 363.470170] sd 8:0:0:0: Attached scsi generic sg8 type 0 ############### drive is turned off ###################### Feb 7 19:13:52 localhost [ 410.461251] ata8: exception Emask 0x10 SAct 0x0 SErr 0x990000 action 0xa frozen Feb 7 19:13:52 localhost [ 410.461256] ata8: irq_stat 0x00400000, PHY RDY changed Feb 7 19:13:52 localhost [ 410.461259] ata8: SError: { PHYRdyChg 10B8B Dispar LinkSeq } Feb 7 19:13:52 localhost [ 410.461267] ata8: hard resetting link Feb 7 19:13:53 localhost [ 411.183002] ata8: SATA link down (SStatus 0 SControl 300) Feb 7 19:13:53 localhost [ 411.183013] ata8: failed to recover some devices, retrying in 5 secs Feb 7 19:13:58 localhost [ 416.175868] ata8: hard resetting link Feb 7 19:13:59 localhost [ 416.480379] ata8: SATA link down (SStatus 0 SControl 300) Feb 7 19:13:59 localhost [ 416.480388] ata8: failed to recover some devices, retrying in 5 secs Feb 7 19:14:04 localhost [ 421.473247] ata8: hard resetting link Feb 7 19:14:04 localhost [ 421.777761] ata8: SATA link down (SStatus 0 SControl 300) Feb 7 19:14:04 localhost [ 421.777773] ata8.00: disabled Feb 7 19:14:04 localhost [ 422.277938] ata8: EH complete Feb 7 19:14:04 localhost [ 422.277949] ata8.00: detaching (SCSI 8:0:0:0) Feb 7 19:14:04 localhost [ 422.278148] sd 8:0:0:0: [sdi] Synchronizing SCSI cache Feb 7 19:14:04 localhost [ 422.278172] sd 8:0:0:0: [sdi] Result: hostbyte=0x04 driverbyte=0x00 Feb 7 19:14:04 localhost [ 422.278192] sd 8:0:0:0: [sdi] Stopping disk Feb 7 19:14:04 localhost [ 422.278197] sd 8:0:0:0: [sdi] START_STOP FAILED Feb 7 19:14:04 localhost [ 422.278198] sd 8:0:0:0: [sdi] Result: hostbyte=0x04 driverbyte=0x00 ############### drive is turned on ###################### Feb 7 19:14:40 localhost [ 458.388903] ata8: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xa frozen Feb 7 19:14:40 localhost [ 458.388908] ata8: irq_stat 0x00000040, connection status changed Feb 7 19:14:40 localhost [ 458.388912] ata8: SError: { PHYRdyChg CommWake DevExch } Feb 7 19:14:40 localhost [ 458.388920] ata8: hard resetting link Feb 7 19:14:43 localhost [ 461.093772] ata8: classification failed Feb 7 19:14:43 localhost [ 461.093777] ata8: reset failed (errno=-22), retrying in 8 secs Feb 7 19:14:50 localhost [ 468.371925] ata8: hard resetting link Feb 7 19:14:53 localhost [ 471.077524] ata8: classification failed Feb 7 19:14:53 localhost [ 471.077529] ata8: reset failed (errno=-22), retrying in 8 secs Feb 7 19:15:00 localhost [ 478.355678] ata8: hard resetting link Feb 7 19:15:03 localhost [ 481.061277] ata8: classification failed Feb 7 19:15:03 localhost [ 481.061282] ata8: reset failed (errno=-22), retrying in 33 secs Feb 7 19:15:35 localhost [ 513.298813] ata8: limiting SATA link speed to 1.5 Gbps Feb 7 19:15:35 localhost [ 513.298817] ata8: hard resetting link Feb 7 19:15:38 localhost [ 516.004410] ata8: classfication failed, assuming ATA Feb 7 19:15:38 localhost [ 516.004421] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 7 19:16:08 localhost [ 545.955668] ata8.00: qc timeout (cmd 0xec) Feb 7 19:16:08 localhost [ 545.955678] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4) Feb 7 19:16:08 localhost [ 545.955681] ata8: failed to recover some devices, retrying in 5 secs ############### rmmod ahci is done ###################### Feb 7 19:16:13 localhost [ 550.948540] ata8: EH complete Feb 7 19:16:13 localhost [ 550.948732] ACPI: PCI interrupt for device 0000:04:00.0 disabled ############### drive is modprobe ahci is done ###################### Feb 7 19:16:40 localhost [ 577.634831] ahci 0000:04:00.0: version 3.0 Feb 7 19:16:40 localhost [ 577.634853] ACPI: PCI Interrupt 0000:04:00.0[A] -> Link [APC2] -> GSI 17 (level, low) -> IRQ 22 Feb 7 19:16:41 localhost [ 578.633470] ahci 0000:04:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode Feb 7 19:16:41 localhost [ 578.633475] ahci 0000:04:00.0: flags: 64bit ncq pm led clo pmp pio slum part Feb 7 19:16:41 localhost [ 578.633482] PCI: Setting latency timer of device 0000:04:00.0 to 64 Feb 7 19:16:41 localhost [ 578.633714] scsi9 : ahci Feb 7 19:16:41 localhost [ 578.633791] scsi10 : ahci Feb 7 19:16:41 localhost [ 578.633816] ata9: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe100 irq 22 Feb 7 19:16:41 localhost [ 578.633819] ata10: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe180 irq 22 Feb 7 19:16:41 localhost [ 578.939004] ata9: SATA link down (SStatus 0 SControl 300) Feb 7 19:16:43 localhost [ 581.227269] ata10: classification failed Feb 7 19:16:43 localhost [ 581.227273] ata10: reset failed (errno=-22), retrying in 8 secs Feb 7 19:16:53 localhost [ 591.211023] ata10: classification failed Feb 7 19:16:53 localhost [ 591.211028] ata10: reset failed (errno=-22), retrying in 8 secs Feb 7 19:17:03 localhost [ 601.194777] ata10: classification failed Feb 7 19:17:03 localhost [ 601.194781] ata10: reset failed (errno=-22), retrying in 33 secs Feb 7 19:17:36 localhost [ 633.849639] ata10: limiting SATA link speed to 1.5 Gbps Feb 7 19:17:38 localhost [ 636.137917] ata10: classfication failed, assuming ATA Feb 7 19:17:38 localhost [ 636.137928] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 7 19:18:08 localhost [ 666.089175] ata10.00: qc timeout (cmd 0xec) Feb 7 19:18:08 localhost [ 666.089186] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4) Feb 7 19:18:08 localhost [ 666.089189] ata10: failed to recover some devices, retrying in 5 secs Feb 7 19:18:16 localhost [ 673.370326] ata10: classification failed Feb 7 19:18:16 localhost [ 673.370332] ata10: reset failed (errno=-22), retrying in 8 secs Feb 7 19:18:26 localhost [ 683.356010] ata10: classification failed Feb 7 19:18:26 localhost [ 683.356015] ata10: reset failed (errno=-22), retrying in 8 secs Feb 7 19:18:36 localhost [ 693.337760] ata10: classification failed Feb 7 19:18:36 localhost [ 693.337765] ata10: reset failed (errno=-22), retrying in 33 secs ############## drive is turned off ##################### Feb 7 19:19:09 localhost [ 726.297135] ata10: SATA link down (SStatus 0 SControl 310) ############### rmmod ahci is done ###################### Feb 7 19:19:20 localhost [ 738.092615] ACPI: PCI interrupt for device 0000:04:00.0 disabled ############### modprobe ahci is done ###################### Feb 7 19:19:29 localhost [ 746.913039] ahci 0000:04:00.0: version 3.0 Feb 7 19:19:29 localhost [ 746.913060] ACPI: PCI Interrupt 0000:04:00.0[A] -> Link [APC2] -> GSI 17 (level, low) -> IRQ 22 Feb 7 19:19:30 localhost [ 747.911889] ahci 0000:04:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode Feb 7 19:19:30 localhost [ 747.911893] ahci 0000:04:00.0: flags: 64bit ncq pm led clo pmp pio slum part Feb 7 19:19:30 localhost [ 747.911899] PCI: Setting latency timer of device 0000:04:00.0 to 64 Feb 7 19:19:30 localhost [ 747.912262] scsi11 : ahci Feb 7 19:19:30 localhost [ 747.912366] scsi12 : ahci Feb 7 19:19:30 localhost [ 747.912391] ata11: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe100 irq 22 Feb 7 19:19:30 localhost [ 747.912394] ata12: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe180 irq 22 Feb 7 19:19:31 localhost [ 748.217467] ata11: SATA link down (SStatus 0 SControl 300) Feb 7 19:19:31 localhost [ 748.521972] ata12: SATA link down (SStatus 0 SControl 300) ############## drive is turned on ##################### Feb 7 19:19:41 localhost [ 758.280168] ata12: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xa frozen Feb 7 19:19:41 localhost [ 758.280173] ata12: irq_stat 0x00000040, connection status changed Feb 7 19:19:41 localhost [ 758.280176] ata12: SError: { PHYRdyChg CommWake DevExch } Feb 7 19:19:41 localhost [ 758.280184] ata12: hard resetting link Feb 7 19:19:43 localhost [ 760.985679] ata12: classification failed Feb 7 19:19:43 localhost [ 760.985684] ata12: reset failed (errno=-22), retrying in 8 secs Feb 7 19:19:51 localhost [ 768.263833] ata12: hard resetting link Feb 7 19:19:53 localhost [ 770.969432] ata12: classification failed Feb 7 19:19:53 localhost [ 770.969438] ata12: reset failed (errno=-22), retrying in 8 secs Feb 7 19:20:01 localhost [ 778.247586] ata12: hard resetting link Feb 7 19:20:03 localhost [ 780.953185] ata12: classification failed Feb 7 19:20:03 localhost [ 780.953190] ata12: reset failed (errno=-22), retrying in 33 secs Feb 7 19:20:36 localhost [ 813.190720] ata12: limiting SATA link speed to 1.5 Gbps Feb 7 19:20:36 localhost [ 813.190725] ata12: hard resetting link Feb 7 19:20:38 localhost [ 815.896318] ata12: classfication failed, assuming ATA Feb 7 19:20:38 localhost [ 815.896329] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Feb 7 19:21:08 localhost [ 845.847580] ata12.00: qc timeout (cmd 0xec) Feb 7 19:21:08 localhost [ 845.847591] ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4) Feb 7 19:21:08 localhost [ 845.847594] ata12: failed to recover some devices, retrying in 5 secs ############### rmmod ahci is done ###################### Feb 7 19:21:13 localhost [ 850.840453] ata12: EH complete Feb 7 19:21:13 localhost [ 850.839736] ACPI: PCI interrupt for device 0000:04:00.0 disabled ############## drive is turned off, on and modprobe ahci is done ##################### Feb 7 19:22:20 localhost [ 917.495408] ahci 0000:04:00.0: version 3.0 Feb 7 19:22:20 localhost [ 917.495430] ACPI: PCI Interrupt 0000:04:00.0[A] -> Link [APC2] -> GSI 17 (level, low) -> IRQ 22 Feb 7 19:22:21 localhost [ 918.494237] ahci 0000:04:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode Feb 7 19:22:21 localhost [ 918.494243] ahci 0000:04:00.0: flags: 64bit ncq pm led clo pmp pio slum part Feb 7 19:22:21 localhost [ 918.494250] PCI: Setting latency timer of device 0000:04:00.0 to 64 Feb 7 19:22:21 localhost [ 918.494484] scsi13 : ahci Feb 7 19:22:21 localhost [ 918.494564] scsi14 : ahci Feb 7 19:22:21 localhost [ 918.494588] ata13: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe100 irq 22 Feb 7 19:22:21 localhost [ 918.494591] ata14: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe180 irq 22 Feb 7 19:22:21 localhost [ 918.799878] ata13: SATA link down (SStatus 0 SControl 300) Feb 7 19:22:22 localhost [ 919.407887] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Feb 7 19:22:22 localhost [ 919.409093] ata14.00: ATA-7: ST3750330AS, SD04, max UDMA/133 Feb 7 19:22:22 localhost [ 919.409096] ata14.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 31/32) Feb 7 19:22:22 localhost [ 919.410647] ata14.00: configured for UDMA/133 Feb 7 19:22:22 localhost [ 919.409520] scsi 14:0:0:0: Direct-Access ATA ST3750330AS SD04 PQ: 0 ANSI: 5 Feb 7 19:22:22 localhost [ 919.409580] sd 14:0:0:0: [sdi] 1465149168 512-byte hardware sectors (750156 MB) Feb 7 19:22:22 localhost [ 919.409589] sd 14:0:0:0: [sdi] Write Protect is off Feb 7 19:22:22 localhost [ 919.409591] sd 14:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 7 19:22:22 localhost [ 919.409602] sd 14:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 7 19:22:22 localhost [ 919.409641] sd 14:0:0:0: [sdi] 1465149168 512-byte hardware sectors (750156 MB) Feb 7 19:22:22 localhost [ 919.409647] sd 14:0:0:0: [sdi] Write Protect is off Feb 7 19:22:22 localhost [ 919.409649] sd 14:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Feb 7 19:22:22 localhost [ 919.409659] sd 14:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 7 19:22:22 localhost [ 919.409661] sdi: sdi1 sdi2 sdi3 sdi4 < sdi5 > Feb 7 19:22:22 localhost [ 919.451333] sd 14:0:0:0: [sdi] Attached SCSI disk Feb 7 19:22:22 localhost [ 919.451370] sd 14:0:0:0: Attached scsi generic sg8 type 0 Feb 7 19:22:50 localhost [ 947.116521] kjournald starting. Commit interval 5 seconds Feb 7 19:22:50 localhost [ 947.117102] EXT3 FS on sdi3, internal journal Feb 7 19:22:50 localhost [ 947.117107] EXT3-fs: mounted filesystem with ordered data mode. I could put as much load as I wanted on the drive after this. I have run seatools short and long tests a couple of times and the drive itself is in great health. The drive in the same enclosure behaves fine as far as hotplug is concerned in my laptop which has a jmicron express card esata controller.
# lspci 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2) 00:07.0 RAID bus controller: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:08.0 RAID bus controller: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600 GT] (rev a2) 04:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03) 04:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03) 05:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
> The drive in the same enclosure behaves fine as far as hotplug is > concerned in my laptop which has a jmicron express card esata controller. Sorry for the wrong info. I repeated the experiments in laptop again. And I was mistaken. The behavior is same in both laptop and the desktop. Laptop uses 2.6.23 kernel and same controller (as reported by lspci) but with express card bus. Please let me know if you want me to do some experiments or want some info.
That's weird. I've never seen any hotplug problem w/ JMBs. Do you have a different hard drive to try? What happens if you do 'echo - - - > /sys/class/scsi_host/hostX/scan' where hostX is the JMB port?
Sorry for the delay. Unfortunately, I don't have a spare drive to test. I will post the info you requested when I go home.
I don't have the data with a different hard drive but I tried esata drive on the sata_nv port that I have and here is the output. So, what I have done is put one of the internal disks on JMicron and put esata on one of the on-board ports. It seems like the drive hotplug is working with sata_nv (on the mobo port) with a little caveat that it keeps on saying "rejecting IOs to dead device every 65 seconds", but the disk is working fine. I can mount the partitions and access data without problems. The only difference I can see wrt to AHCI is that the JMicron is a PCIEx x1 controller card. 1. Is there a way to suppress the "rejecting IOs..." message? 2. Why is it printing it anyway? 3:0:0:0 is not dead anymore. ############ turned off drive and turned it back on ############################### Feb 14 21:03:56 localhost [ 550.215868] ata4: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xa frozen Feb 14 21:03:56 localhost [ 550.215874] ata4: SError: { PHYRdyChg LinkSeq TrStaTrns } Feb 14 21:03:56 localhost [ 550.215880] ata4: hard resetting link Feb 14 21:03:56 localhost [ 550.937395] ata4: SATA link down (SStatus 0 SControl 300) Feb 14 21:03:56 localhost [ 550.937404] ata4: failed to recover some devices, retrying in 5 secs Feb 14 21:04:01 localhost [ 555.930276] ata4: hard resetting link Feb 14 21:04:02 localhost [ 556.234776] ata4: SATA link down (SStatus 0 SControl 300) Feb 14 21:04:02 localhost [ 556.234787] ata4: failed to recover some devices, retrying in 5 secs Feb 14 21:04:07 localhost [ 561.227652] ata4: hard resetting link Feb 14 21:04:07 localhost [ 561.532155] ata4: SATA link down (SStatus 0 SControl 300) Feb 14 21:04:07 localhost [ 561.532168] ata4.00: disabled Feb 14 21:04:08 localhost [ 562.334854] ata4: soft resetting link Feb 14 21:04:08 localhost [ 562.334862] ata4: SATA link down (SStatus 0 SControl 300) Feb 14 21:04:08 localhost [ 562.334872] ata4: EH complete Feb 14 21:04:08 localhost [ 562.334880] ata4.00: detaching (SCSI 3:0:0:0) Feb 14 21:04:08 localhost [ 562.335060] sd 3:0:0:0: [sdd] Synchronizing SCSI cache Feb 14 21:04:08 localhost [ 562.335090] sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 Feb 14 21:04:08 localhost [ 562.335093] sd 3:0:0:0: [sdd] Stopping disk Feb 14 21:04:08 localhost [ 562.335145] sd 3:0:0:0: [sdd] START_STOP FAILED Feb 14 21:04:08 localhost [ 562.335147] sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00 Feb 14 21:04:22 localhost [ 576.285357] ata4: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xa frozen Feb 14 21:04:22 localhost [ 576.285363] ata4: SError: { PHYRdyChg CommWake } Feb 14 21:04:22 localhost [ 576.285370] ata4: hard resetting link Feb 14 21:04:28 localhost [ 582.198527] ata4: port is slow to respond, please be patient (Status 0x80) Feb 14 21:04:32 localhost [ 586.269900] ata4: SRST failed (errno=-16) Feb 14 21:04:32 localhost [ 586.269905] ata4: hard resetting link Feb 14 21:04:33 localhost [ 587.445994] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Feb 14 21:04:33 localhost [ 587.449433] ata4.00: ATA-7: ST3750330AS, SD04, max UDMA/133 Feb 14 21:04:33 localhost [ 587.449436] ata4.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 0/32) Feb 14 21:04:33 localhost [ 587.455423] ata4.00: configured for UDMA/133 Feb 14 21:04:33 localhost [ 587.455432] ata4: EH complete Feb 14 21:04:33 localhost [ 587.455482] scsi 3:0:0:0: Direct-Access ATA ST3750330AS SD04 PQ: 0 ANSI: 5 Feb 14 21:04:33 localhost [ 587.455546] sd 3:0:0:0: [sdj] 1465149168 512-byte hardware sectors (750156 MB) Feb 14 21:04:33 localhost [ 587.455554] sd 3:0:0:0: [sdj] Write Protect is off Feb 14 21:04:33 localhost [ 587.455556] sd 3:0:0:0: [sdj] Mode Sense: 00 3a 00 00 Feb 14 21:04:33 localhost [ 587.455566] sd 3:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 14 21:04:33 localhost [ 587.455603] sd 3:0:0:0: [sdj] 1465149168 512-byte hardware sectors (750156 MB) Feb 14 21:04:33 localhost [ 587.455609] sd 3:0:0:0: [sdj] Write Protect is off Feb 14 21:04:33 localhost [ 587.455611] sd 3:0:0:0: [sdj] Mode Sense: 00 3a 00 00 Feb 14 21:04:33 localhost [ 587.455620] sd 3:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Feb 14 21:04:33 localhost [ 587.455622] sdj: sdj1 sdj2 sdj3 sdj4 < sdj5 sdj6 > Feb 14 21:04:33 localhost [ 587.506074] sd 3:0:0:0: [sdj] Attached SCSI disk Feb 14 21:04:33 localhost [ 587.506108] sd 3:0:0:0: Attached scsi generic sg3 type 0 Feb 14 21:05:10 localhost [ 624.609995] scsi 3:0:0:0: rejecting I/O to dead device Feb 14 21:05:10 localhost [ 624.610002] scsi 3:0:0:0: rejecting I/O to dead device Feb 14 21:06:15 localhost [ 689.709946] scsi 3:0:0:0: rejecting I/O to dead device Feb 14 21:06:15 localhost [ 689.709954] scsi 3:0:0:0: rejecting I/O to dead device Feb 14 21:07:21 localhost [ 754.834827] scsi 3:0:0:0: rejecting I/O to dead device Feb 14 21:07:21 localhost [ 754.834835] scsi 3:0:0:0: rejecting I/O to dead device
One more observation is that I can't really boot from disks on the JMicron ports, which is bizarre.
rejecting I/O to dead device messages are for in-flight commands for the old 3:0:0:0 which got detached. It doesn't continue, right? It's really bizarre, what do you mean by 'can't boot from disks connected to the JMB'? What happens?
(In reply to comment #7) > rejecting I/O to dead device messages are for in-flight commands for the old > 3:0:0:0 which got detached. It doesn't continue, right? Thats where the problem is. Messages "rejecting ..." continue every 65 seconds till I reboot the box. It doesn't matter whether 3:0:0:0 is now a valid device and old 3:0:0:0 didn't have any I/O going to it (I unmounted and sync'ed). > It's really bizarre, > what do you mean by 'can't boot from disks connected to the JMB'? What > happens? Grub doesn't see the disk. Only OS can see it after loading AHCI module.
> Grub doesn't see the disk. Only OS can see it after loading AHCI module. BIOS sees the disks connected to JMB as well and allows me to reorder boot sequence around them but when it reaches grub side of things, the drive disappears. It may just be how grub talks with BIOS.
Earliest failing kernel version: 2.6.24 Distribution: Debian Hardware Environment: JMicron AHCI controller , ICH8 chipset Software Environment: gcc 4.2.2, binutils 2.18, glibc 2.7 Problem Description: Hi, I don't know if this will help but I have a similar problem when I connect a WD 1Tb Mybook to the esata port on my Asus g1s. The esata is connected to the Jmicron chip. Even if I connect the harddrive during boot I have the same problem. lspci: 00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03) 07:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02) dmesg when i connect the Mybook: ata4: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xa frozen ata4: irq_stat 0x00000040, connection status changed ata4: SError: { DevExch } ata4: hard resetting link ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4.00: ATA-6: WD My Book, 01.01B01, max UDMA/133 ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 1) ata4.00: configured for UDMA/133 ata4: EH complete scsi 3:0:0:0: Direct-Access ATA WD My Book 01.0 PQ: 0 ANSI: 5 sd 3:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB) sd 3:0:0:0: [sdb] Write Protect is off sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 3:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB) sd 3:0:0:0: [sdb] Write Protect is off sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb1 sdb2 sd 3:0:0:0: [sdb] Attached SCSI disk sd 3:0:0:0: Attached scsi generic sg1 type 0 ata4.00: exception Emask 0x10 SAct 0x1 SErr 0x780100 action 0x2 ata4.00: irq_stat 0x08000000 ata4: SError: { UnrecovData 10B8B Dispar BadCRC Handshk } ata4.00: cmd 60/b8:00:30:00:00/00:00:00:00:00/40 tag 0 ncq 94208 in res 40/00:04:30:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) ata4.00: status: { DRDY } ata4: hard resetting link ata4: SATA link down (SStatus 0 SControl 300) ata4: failed to recover some devices, retrying in 5 secs ata4: hard resetting link ata4: SATA link down (SStatus 0 SControl 300) ata4: failed to recover some devices, retrying in 5 secs ata4: hard resetting link ata4: SATA link down (SStatus 0 SControl 300) ata4.00: disabled ata4: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4 ata4: irq_stat 0x48000000 sd 3:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK sd 3:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 00 30 sd 3:0:0:0: [sdb] Add. Sense: No additional sense information end_request: I/O error, dev sdb, sector 48 Buffer I/O error on device sdb, logical block 6 Buffer I/O error on device sdb, logical block 7 Buffer I/O error on device sdb, logical block 8 Buffer I/O error on device sdb, logical block 9 Buffer I/O error on device sdb, logical block 10 Buffer I/O error on device sdb, logical block 11 Buffer I/O error on device sdb, logical block 12 Buffer I/O error on device sdb, logical block 13 Buffer I/O error on device sdb, logical block 14 Buffer I/O error on device sdb, logical block 15 ata4: EH complete sd 3:0:0:0: rejecting I/O to offline device sd 3:0:0:0: rejecting I/O to offline device sd 3:0:0:0: rejecting I/O to offline device sd 3:0:0:0: rejecting I/O to offline device sd 3:0:0:0: rejecting I/O to offline device ata4.00: detaching (SCSI 3:0:0:0) sd 3:0:0:0: [sdb] Synchronizing SCSI cache sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK sd 3:0:0:0: [sdb] Stopping disk sd 3:0:0:0: [sdb] START_STOP FAILED sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
I forgot to mention that I installed Vista to test whether there was a problem with the drive but worked just fine.
Created attachment 14944 [details] limit-link-to-1.5Gbps.patch Your problem is probably different. Does this patch help?
I tried another thing when the external disk was on sata_nv port: --------------------------------- # scsiadd -r 3 0 0 0 Feb 21 20:57:57 localhost [15852.982493] sd 3:0:0:0: [sdi] Synchronizing SCSI cache Feb 21 20:57:57 localhost [15852.982904] sd 3:0:0:0: [sdi] Stopping disk Feb 21 20:57:57 localhost [15852.983738] ata4.00: disabled Feb 21 20:58:25 localhost [15880.735688] ata4: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xa frozen Feb 21 20:58:25 localhost [15880.735694] ata4: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns } Feb 21 20:58:25 localhost [15880.735701] ata4: hard resetting link Feb 21 20:58:26 localhost [15881.459164] ata4: SATA link down (SStatus 0 SControl 300) Feb 21 20:58:26 localhost [15881.459178] ata4: EH complete -------------------------- No more "rejecting IOs...". I wonder why it doesn't do the same thing when I just pull the cable? Was that patch supposed to be only for David? Or do you think my troubles might have something to do with link speed as well? My external disk works at 1.5Gbps (jumper is set to 1.5 and the external case only supports 1.5) and the JMB controller is 3Gbps.
Tnx for the patch, I'll test it a soon as I have my eSata cable back. Would this mean that if the patch works it would have the same effect as setting the jumper on the HD to 1.5G? My apologies if I've made this report less clarifying.
I've tested the patch. Everything works just fine now. But in theory the harddrive supports 3Gb/s. Is there a particular reason 3Gb/s doesn't work? Anyhow, many thanks
Davidd, I don't know. It's a hardware issue and many external SATA devices seem to have PHY related issues. It's also weird that the device couldn't recover from such failures. :-( Patch to implement libata.force module parameter is pending, so you'll be able to use "libata.force=4:1.5Gbps" instead of the patch. Devsk, the message repetition is because somebody is holding the SCSI device and issuing commands every 65 seconds. Because it's holding the device node, the zombie device can't die and just keeps rejecting IO requests being issued to it. fuser on the device node should tell you who's deep into voodoo.
(In reply to comment #16) > Devsk, the message repetition is because somebody is holding the SCSI device > and issuing commands every 65 seconds. Because it's holding the device node, > the zombie device can't die and just keeps rejecting IO requests being issued > to it. fuser on the device node should tell you who's deep into voodoo. > So, it is hddtemp answering gkrellm's request for temperature of the disk (every 1 minute) and hddtemp making scsi ioctl calls. The question is why does it go away if I do scsiadd -r? Does this mean that some part of scsi subsystem is not really updated to the fact the disk is gone? Does any of this make sense?
Ah... okay. The device can be either disabled or detached. The device is disabled if the device isn't responding but it can't be positively determined that the device is actually gone (PHY offline). In this case, the device node hangs around in killed state. 'scsiadd -r' tells the kernel that the device should be removed and thus the device node goes away. Back to the original problem, have you tried the force 1.5Gbps patch?
ok, forcing to 1.5Gbps works. But I like the other configuration with the removable disk on sata_nv...;-)
Hmmm... The behavior is a bit disturbing in that libata EH can't recover the device automatically but I don't see anything which the driver can do automatically to work around the problem other than forcing 1.5Gbps. 2.6.25 will contain a better EH logic which behaves a bit more aggressively, so it will have better chance of recovering from such failures. Closing as DOCUMENTED for now. Thanks.
Reopening for a twist. I had this problem (http://bugzilla.kernel.org/show_bug.cgi?id=9659) with kernels later than 2.6.21 that they couldn't resume from suspend-to-ram unless I passed sata_nv.adma=0. I moved to a libata-only configuration and I could resume from suspend-to-ram without sata_nv.adma=0. Now that I have put the external drive on sata_nv, I can't hot plug the drive without sata_nv.adma=0...:-( As soon as I turn the drive on, it just locks up. If pass sata_nv.adma=0, I turn it on, it works fine. Surprisingly, if I boot same kernel but x86_64 on the same hardware, sata_nv.adma=0 is not required for hotplug to work. It just works. But I can't use x86_64 for other reasons.
> it just locks up I meant the whole machine. Only reset can be done. Just like the resume problem.
This bug report sort of drifted from ahci to sata_nv ADMA. Maybe it's best to file a separate bug report for this. Robert, can you please take a look at this? Thanks.
We had another report of hotplug lockups on 32-bit only: http://bugzilla.kernel.org/show_bug.cgi?id=8421 which seems a bit bizarre, I can't see how a 64-bit vs. 32-bit kernel should affect anything. It seems to also be somehow dependent on motherboard revision, according to some reports anyway. Feel free to add comments to that bug report. It's not had much activity recently..
I know Ingo found a bug on x86 recently which was present for a long time in the kernel and had something to do with CONFIG_PCI_BIOS. http://lkml.org/lkml/2008/3/10/171 Not very surprisingly, the only major difference in the .config of the x86_64 and x86 kernels I use are in that area: x86: # grep CONFIG_PCI /usr/src/linux/.config CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set # CONFIG_PCI_GOMMCONFIG is not set # CONFIG_PCI_GODIRECT is not set CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_DOMAINS=y CONFIG_PCIEPORTBUS=y CONFIG_PCIEAER=y CONFIG_PCI_MSI=y CONFIG_PCI_LEGACY=y # CONFIG_PCI_DEBUG is not set x86_64: # grep CONFIG_PCI /mnt/x64-root/usr/src/linux/.config CONFIG_PCI=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_DOMAINS=y CONFIG_PCIEPORTBUS=y CONFIG_PCIEAER=y CONFIG_PCI_MSI=y CONFIG_PCI_LEGACY=y # CONFIG_PCI_DEBUG is not set I have no idea how these are related to each other, and with sata_nv.adma=0. So, facts: 1. x86_64 works, x86 doesn't. 2. x86 works if sata_nv.adma=0 is passed. 3. x86 detects the drive fine if booted with it on, but hard locks if turned on after bootup. 4. CONFIG_PCI options are not same in x86_64 and x86 by default. GOANY is not even there in menuconfig. Anybody, know these config options well enough to comment?
I have a similar problem as davidd in comment 10. I connect a WD MyBook 500GB Studio Edition over eSATA to a JMicron JMB363 controller (before power on). Then in linux right after I access the drive (by running hdparm -tT /dev/sdd; sdd is the MyBook), it locks up (see dmesg). This is a 64 bit kernel from RIPLinux 6.1 (2.6.26). /proc/version: Linux version 2.6.26.64bit (root@darkstar) (gcc version 4.2.3) #4 Mon Jul 14 05:39:37 UTC 2008 uname -a: Linux RIPLinuX 2.6.26.64bit #4 Mon Jul 14 05:39:37 UTC 2008 x86_64 GNU/Linux I attach the larger files.
Created attachment 17190 [details] dmesg, error at end
Created attachment 17191 [details] lspci output
Created attachment 17192 [details] lspci- vvn output
Sorry, here is some more detail: Latest working kernel version: ? (probably none) Earliest failing kernel version: 2.6.26 Distribution: RIPLinux 6.1 Hardware Environment: Asus P5K-E WiFi (P35+ICH9R+JMB363) Software Environment: ? Problem Description: ? see above Also: The BIOS mode setting for JMB363 is "AHCI". The disk works fine in Windows (but has a lot of problems there too; I only managed to get it work properly with the very latest driver from JMicron, v 1.17.39). After the problem happens in linux, the drive does not work in Windows either. Not even after several reboots and power cycles. Last time this happened, it worked again next day, so I have no idea what makes it work ok again...
I've been talking with JMicron and they suggest that the problem is caused by misbehaving SATA bridge chip inside the WD device and they haven't found proper solution yet on the windows either. I'm asking WD for more information but it seems we're out of luck and might have to just note the case in linux-ata wiki and give up on the combination. I'll update when I know more. Thanks.
Only in my case, it was Seagate 750GB drive.
For info, I received this from WD support about my JMB363+WD_MyBook problem: This sounds like an incompatibility perhaps with the ESATA controller. Please see the link below for tested ESATA controllers. http://www.wdc.com/en/products/resources/esataupgrade.asp
Yeah, there are two bugs being tracked here the nv one and the jmb + wdc one. For the jmb + wd one, jmb acks that there's a hardware compatibility with the wd external drive and I'm talking with both jmb and wd to find out what's going on. For the nv one, I have no idea what's going on. :-(
For the record, I see the same behavior as initially reported: works when booted with everything on, errors when trying to hotplug, even when hotplugging (expresscard) the esata card with the esata cable already connected. Hardware: MacBookPro, ST-LAB C-230 (JMB363 based) express card, Seagate FreeAgent Pro. Software: * Gentoo, kernel 2.6.22-gentoo-r9 with mactel patches * Gentoo, kernel 2.6.27-rc6-wl Log: ata5: exception Emask 0x10 SAct 0x0 SErr 0x49d0000 action 0x2 frozen ata5: hard resetting port ata5: port is slow to respond, please be patient (Status 0xff) ata5: COMRESET failed (errno=-16) ata5: hard resetting port ata5: port is slow to respond, please be patient (Status 0xff) ata5: COMRESET failed (errno=-16) ata5: hard resetting port ata5: port is slow to respond, please be patient (Status 0xff) ata5: COMRESET failed (errno=-16) ata5: limiting SATA link speed to 1.5 Gbps ata5: hard resetting port ata5: COMRESET failed (errno=-16) ata5: reset failed, giving up ata5: EH complete
Please disregard my last comment. It turned out the problem was not with JMB363, but with construction flaw of the Seagate FreeAgent Pro, which doesn't allow proper fitting of the eSATA cable. Disassembled FAP with the cable properly fitted works perfectly, including hotplug.
Hi, Tejun. Any news ? Should the WD/JMB combination be added to the wiki ?
Hello, sorry about the long delay. I've tested four WD My Book drives (studio, studio ii, home and dvr expander) with jmb360, ich7 and sata_sil24. There seem to be certain quirkiness to the drives. For example, studio and home ones shutdown their SATA interface after certain transmission errors leaving no way to recover it other than power cycling the drive. But, in general, it seems the problem isn't specific to any controller or drive combination but is rather heavily affected by cabling. With short to medium direct cables, all four drives worked fine with all controllers. With SATA -> eSATA converter in the middle and longer cable, detection and operation afterwards become quite flaky and different controller and drive combinations give different results, which BTW is quite consistent with davidd and devsk's reports that forcing 1.5Gbps resolves the problem. What libata can do here are... 1. Slow down to 1.5Gbps if hotplug events occur repeatedly. libata does slow down after certain number of transmission errors after a device is probed but during probing it doesn't. This prevents successful probing in certain cabling conditions. 2. Force 1.5Gbps for drives which are known to react badly to transmission errors. As the hardware seems to completely shutdown after transmission errors, the only thing the driver can do is to avoid them. Lowering link speed helps quite a bit and there virtually isn't any downside. David, can you please try libata.force=1.5Gbps and shorter cable and see whether that makes any difference? The difference between windows and linux could be that windows defaults to 1.5Gbps thus avoiding such transmission failures from the beginning. Thanks.
Patchset posted. http://thread.gmane.org/gmane.linux.ide/37942
(to not spam the bug I'm sending directly to you) Would it be possible to always start in 1.5Gbps mode and then switch to 3 in certain conditions ? (whitelists, blackslists, crystal ball....) Just an idea... Regards, David > -----Original Message----- > From: bugme-daemon@bugzilla.kernel.org > [mailto:bugme-daemon@bugzilla.kernel.org] > Sent: Thursday, January 29, 2009 12:34 PM > To: David Balazic > Subject: [Bug 9913] AHCI (JMICRON) hotplug doesn't work with > esata drive > > http://bugzilla.kernel.org/show_bug.cgi?id=9913 > > > > > > ------- Comment #39 from tj@kernel.org 2009-01-29 03:34 ------- > Patchset posted. > > http://thread.gmane.org/gmane.linux.ide/37942 > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. >