Bug 9913

Summary: AHCI (JMICRON) hotplug doesn't work with esata drive
Product: IO/Storage Reporter: devsk (funtoos)
Component: Serial ATAAssignee: Tejun Heo (tj)
Status: CLOSED CODE_FIX    
Severity: normal CC: david.balazic, David.Dierickx, hancockrwd, Petr.Nejedly, serge
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24, 2.6.23.8 Subsystem:
Regression: --- Bisected commit-id:
Attachments: limit-link-to-1.5Gbps.patch
dmesg, error at end
lspci output
lspci- vvn output

Description devsk 2008-02-07 20:10:59 UTC
Latest working kernel version: Not known
Earliest failing kernel version: 2.6.23.8
Distribution: Gentoo
Hardware Environment: JMicron AHCI controller in PCIE, Nvidia Nforce4,
Software Environment: gcc 4.2.2, binutils 2.18, glibc 2.7
Problem Description:

I have AHCI in kernel:

If I have the drive on during boot, the drive is detected, /dev/sdi is created and works fine.
If I turn the drive on after boot, the drive throws errors and no /dev/sdi is created (referred as "errors").
Turning the drive off and on again, doesn't help.

I have AHCI built as module:

If the drive is on and hasn't seen errors earlier, when I modprobe ahci, the drive is detected fine, /dev/sdi is created and works.

If the drive is off and I modprobe ahci, errors. A rmmod and modprobe doesn't help.

If I modprobe ahci and then turn on the drive, errors.

Steps to reproduce: Turn the device on after the AHCI has loaded and errors.

The errors seen are:

###################### drive is turned on and modprobe ahci is done #########################
Feb  7 19:13:03 localhost [  361.513897] ahci 0000:04:00.0: version 3.0
Feb  7 19:13:03 localhost [  361.514148] ACPI: PCI Interrupt Link [APC2] enabled at IRQ 17
Feb  7 19:13:03 localhost [  361.514155] ACPI: PCI Interrupt 0000:04:00.0[A] -> Link [APC2] -> GSI 17 (level, low) -> IRQ 22
Feb  7 19:13:04 localhost [  362.513197] ahci 0000:04:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode
Feb  7 19:13:04 localhost [  362.513202] ahci 0000:04:00.0: flags: 64bit ncq pm led clo pmp pio slum part
Feb  7 19:13:04 localhost [  362.513206] PCI: Setting latency timer of device 0000:04:00.0 to 64
Feb  7 19:13:04 localhost [  362.513568] scsi7 : ahci
Feb  7 19:13:04 localhost [  362.514081] scsi8 : ahci
Feb  7 19:13:04 localhost [  362.514106] ata7: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe100 irq 22
Feb  7 19:13:04 localhost [  362.514110] ata8: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe180 irq 22
Feb  7 19:13:05 localhost [  362.818688] ata7: SATA link down (SStatus 0 SControl 300)
Feb  7 19:13:05 localhost [  363.427706] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb  7 19:13:05 localhost [  363.428912] ata8.00: ATA-7: ST3750330AS, SD04, max UDMA/133
Feb  7 19:13:05 localhost [  363.428915] ata8.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Feb  7 19:13:05 localhost [  363.430484] ata8.00: configured for UDMA/133
Feb  7 19:13:05 localhost [  363.429554] scsi 8:0:0:0: Direct-Access     ATA      ST3750330AS      SD04 PQ: 0 ANSI: 5
Feb  7 19:13:05 localhost [  363.429694] sd 8:0:0:0: [sdi] 1465149168 512-byte hardware sectors (750156 MB)
Feb  7 19:13:05 localhost [  363.429771] sd 8:0:0:0: [sdi] Write Protect is off
Feb  7 19:13:05 localhost [  363.429774] sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00
Feb  7 19:13:05 localhost [  363.429918] sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb  7 19:13:05 localhost [  363.430030] sd 8:0:0:0: [sdi] 1465149168 512-byte hardware sectors (750156 MB)
Feb  7 19:13:05 localhost [  363.430105] sd 8:0:0:0: [sdi] Write Protect is off
Feb  7 19:13:05 localhost [  363.430107] sd 8:0:0:0: [sdi] Mode Sense: 00 3a 00 00
Feb  7 19:13:05 localhost [  363.430222] sd 8:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb  7 19:13:05 localhost [  363.430224]  sdi: sdi1 sdi2 sdi3 sdi4 < sdi5 >
Feb  7 19:13:05 localhost [  363.470136] sd 8:0:0:0: [sdi] Attached SCSI disk
Feb  7 19:13:05 localhost [  363.470170] sd 8:0:0:0: Attached scsi generic sg8 type 0

############### drive is turned off ######################
Feb  7 19:13:52 localhost [  410.461251] ata8: exception Emask 0x10 SAct 0x0 SErr 0x990000 action 0xa frozen
Feb  7 19:13:52 localhost [  410.461256] ata8: irq_stat 0x00400000, PHY RDY changed
Feb  7 19:13:52 localhost [  410.461259] ata8: SError: { PHYRdyChg 10B8B Dispar LinkSeq }
Feb  7 19:13:52 localhost [  410.461267] ata8: hard resetting link
Feb  7 19:13:53 localhost [  411.183002] ata8: SATA link down (SStatus 0 SControl 300)
Feb  7 19:13:53 localhost [  411.183013] ata8: failed to recover some devices, retrying in 5 secs
Feb  7 19:13:58 localhost [  416.175868] ata8: hard resetting link
Feb  7 19:13:59 localhost [  416.480379] ata8: SATA link down (SStatus 0 SControl 300)
Feb  7 19:13:59 localhost [  416.480388] ata8: failed to recover some devices, retrying in 5 secs
Feb  7 19:14:04 localhost [  421.473247] ata8: hard resetting link
Feb  7 19:14:04 localhost [  421.777761] ata8: SATA link down (SStatus 0 SControl 300)
Feb  7 19:14:04 localhost [  421.777773] ata8.00: disabled
Feb  7 19:14:04 localhost [  422.277938] ata8: EH complete
Feb  7 19:14:04 localhost [  422.277949] ata8.00: detaching (SCSI 8:0:0:0)
Feb  7 19:14:04 localhost [  422.278148] sd 8:0:0:0: [sdi] Synchronizing SCSI cache
Feb  7 19:14:04 localhost [  422.278172] sd 8:0:0:0: [sdi] Result: hostbyte=0x04 driverbyte=0x00
Feb  7 19:14:04 localhost [  422.278192] sd 8:0:0:0: [sdi] Stopping disk
Feb  7 19:14:04 localhost [  422.278197] sd 8:0:0:0: [sdi] START_STOP FAILED
Feb  7 19:14:04 localhost [  422.278198] sd 8:0:0:0: [sdi] Result: hostbyte=0x04 driverbyte=0x00

############### drive is turned on ######################
Feb  7 19:14:40 localhost [  458.388903] ata8: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xa frozen
Feb  7 19:14:40 localhost [  458.388908] ata8: irq_stat 0x00000040, connection status changed
Feb  7 19:14:40 localhost [  458.388912] ata8: SError: { PHYRdyChg CommWake DevExch }
Feb  7 19:14:40 localhost [  458.388920] ata8: hard resetting link
Feb  7 19:14:43 localhost [  461.093772] ata8: classification failed
Feb  7 19:14:43 localhost [  461.093777] ata8: reset failed (errno=-22), retrying in 8 secs
Feb  7 19:14:50 localhost [  468.371925] ata8: hard resetting link
Feb  7 19:14:53 localhost [  471.077524] ata8: classification failed
Feb  7 19:14:53 localhost [  471.077529] ata8: reset failed (errno=-22), retrying in 8 secs
Feb  7 19:15:00 localhost [  478.355678] ata8: hard resetting link
Feb  7 19:15:03 localhost [  481.061277] ata8: classification failed
Feb  7 19:15:03 localhost [  481.061282] ata8: reset failed (errno=-22), retrying in 33 secs
Feb  7 19:15:35 localhost [  513.298813] ata8: limiting SATA link speed to 1.5 Gbps
Feb  7 19:15:35 localhost [  513.298817] ata8: hard resetting link
Feb  7 19:15:38 localhost [  516.004410] ata8: classfication failed, assuming ATA
Feb  7 19:15:38 localhost [  516.004421] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb  7 19:16:08 localhost [  545.955668] ata8.00: qc timeout (cmd 0xec)
Feb  7 19:16:08 localhost [  545.955678] ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb  7 19:16:08 localhost [  545.955681] ata8: failed to recover some devices, retrying in 5 secs

############### rmmod ahci is done ######################
Feb  7 19:16:13 localhost [  550.948540] ata8: EH complete
Feb  7 19:16:13 localhost [  550.948732] ACPI: PCI interrupt for device 0000:04:00.0 disabled

############### drive is modprobe ahci is done ######################
Feb  7 19:16:40 localhost [  577.634831] ahci 0000:04:00.0: version 3.0
Feb  7 19:16:40 localhost [  577.634853] ACPI: PCI Interrupt 0000:04:00.0[A] -> Link [APC2] -> GSI 17 (level, low) -> IRQ 22
Feb  7 19:16:41 localhost [  578.633470] ahci 0000:04:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode
Feb  7 19:16:41 localhost [  578.633475] ahci 0000:04:00.0: flags: 64bit ncq pm led clo pmp pio slum part
Feb  7 19:16:41 localhost [  578.633482] PCI: Setting latency timer of device 0000:04:00.0 to 64
Feb  7 19:16:41 localhost [  578.633714] scsi9 : ahci
Feb  7 19:16:41 localhost [  578.633791] scsi10 : ahci
Feb  7 19:16:41 localhost [  578.633816] ata9: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe100 irq 22
Feb  7 19:16:41 localhost [  578.633819] ata10: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe180 irq 22
Feb  7 19:16:41 localhost [  578.939004] ata9: SATA link down (SStatus 0 SControl 300)
Feb  7 19:16:43 localhost [  581.227269] ata10: classification failed
Feb  7 19:16:43 localhost [  581.227273] ata10: reset failed (errno=-22), retrying in 8 secs
Feb  7 19:16:53 localhost [  591.211023] ata10: classification failed
Feb  7 19:16:53 localhost [  591.211028] ata10: reset failed (errno=-22), retrying in 8 secs
Feb  7 19:17:03 localhost [  601.194777] ata10: classification failed
Feb  7 19:17:03 localhost [  601.194781] ata10: reset failed (errno=-22), retrying in 33 secs
Feb  7 19:17:36 localhost [  633.849639] ata10: limiting SATA link speed to 1.5 Gbps
Feb  7 19:17:38 localhost [  636.137917] ata10: classfication failed, assuming ATA
Feb  7 19:17:38 localhost [  636.137928] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb  7 19:18:08 localhost [  666.089175] ata10.00: qc timeout (cmd 0xec)
Feb  7 19:18:08 localhost [  666.089186] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb  7 19:18:08 localhost [  666.089189] ata10: failed to recover some devices, retrying in 5 secs
Feb  7 19:18:16 localhost [  673.370326] ata10: classification failed
Feb  7 19:18:16 localhost [  673.370332] ata10: reset failed (errno=-22), retrying in 8 secs
Feb  7 19:18:26 localhost [  683.356010] ata10: classification failed
Feb  7 19:18:26 localhost [  683.356015] ata10: reset failed (errno=-22), retrying in 8 secs
Feb  7 19:18:36 localhost [  693.337760] ata10: classification failed
Feb  7 19:18:36 localhost [  693.337765] ata10: reset failed (errno=-22), retrying in 33 secs

##############  drive is turned off #####################
Feb  7 19:19:09 localhost [  726.297135] ata10: SATA link down (SStatus 0 SControl 310)

############### rmmod ahci is done ######################
Feb  7 19:19:20 localhost [  738.092615] ACPI: PCI interrupt for device 0000:04:00.0 disabled

############### modprobe ahci is done ######################
Feb  7 19:19:29 localhost [  746.913039] ahci 0000:04:00.0: version 3.0
Feb  7 19:19:29 localhost [  746.913060] ACPI: PCI Interrupt 0000:04:00.0[A] -> Link [APC2] -> GSI 17 (level, low) -> IRQ 22
Feb  7 19:19:30 localhost [  747.911889] ahci 0000:04:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode
Feb  7 19:19:30 localhost [  747.911893] ahci 0000:04:00.0: flags: 64bit ncq pm led clo pmp pio slum part
Feb  7 19:19:30 localhost [  747.911899] PCI: Setting latency timer of device 0000:04:00.0 to 64
Feb  7 19:19:30 localhost [  747.912262] scsi11 : ahci
Feb  7 19:19:30 localhost [  747.912366] scsi12 : ahci
Feb  7 19:19:30 localhost [  747.912391] ata11: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe100 irq 22
Feb  7 19:19:30 localhost [  747.912394] ata12: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe180 irq 22
Feb  7 19:19:31 localhost [  748.217467] ata11: SATA link down (SStatus 0 SControl 300)
Feb  7 19:19:31 localhost [  748.521972] ata12: SATA link down (SStatus 0 SControl 300)

##############  drive is turned on #####################
Feb  7 19:19:41 localhost [  758.280168] ata12: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xa frozen
Feb  7 19:19:41 localhost [  758.280173] ata12: irq_stat 0x00000040, connection status changed
Feb  7 19:19:41 localhost [  758.280176] ata12: SError: { PHYRdyChg CommWake DevExch }
Feb  7 19:19:41 localhost [  758.280184] ata12: hard resetting link
Feb  7 19:19:43 localhost [  760.985679] ata12: classification failed
Feb  7 19:19:43 localhost [  760.985684] ata12: reset failed (errno=-22), retrying in 8 secs
Feb  7 19:19:51 localhost [  768.263833] ata12: hard resetting link
Feb  7 19:19:53 localhost [  770.969432] ata12: classification failed
Feb  7 19:19:53 localhost [  770.969438] ata12: reset failed (errno=-22), retrying in 8 secs
Feb  7 19:20:01 localhost [  778.247586] ata12: hard resetting link
Feb  7 19:20:03 localhost [  780.953185] ata12: classification failed
Feb  7 19:20:03 localhost [  780.953190] ata12: reset failed (errno=-22), retrying in 33 secs
Feb  7 19:20:36 localhost [  813.190720] ata12: limiting SATA link speed to 1.5 Gbps
Feb  7 19:20:36 localhost [  813.190725] ata12: hard resetting link
Feb  7 19:20:38 localhost [  815.896318] ata12: classfication failed, assuming ATA
Feb  7 19:20:38 localhost [  815.896329] ata12: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Feb  7 19:21:08 localhost [  845.847580] ata12.00: qc timeout (cmd 0xec)
Feb  7 19:21:08 localhost [  845.847591] ata12.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb  7 19:21:08 localhost [  845.847594] ata12: failed to recover some devices, retrying in 5 secs

############### rmmod ahci is done ######################
Feb  7 19:21:13 localhost [  850.840453] ata12: EH complete
Feb  7 19:21:13 localhost [  850.839736] ACPI: PCI interrupt for device 0000:04:00.0 disabled

##############  drive is turned off, on and modprobe ahci is done #####################
Feb  7 19:22:20 localhost [  917.495408] ahci 0000:04:00.0: version 3.0
Feb  7 19:22:20 localhost [  917.495430] ACPI: PCI Interrupt 0000:04:00.0[A] -> Link [APC2] -> GSI 17 (level, low) -> IRQ 22
Feb  7 19:22:21 localhost [  918.494237] ahci 0000:04:00.0: AHCI 0001.0000 32 slots 2 ports 3 Gbps 0x3 impl SATA mode
Feb  7 19:22:21 localhost [  918.494243] ahci 0000:04:00.0: flags: 64bit ncq pm led clo pmp pio slum part
Feb  7 19:22:21 localhost [  918.494250] PCI: Setting latency timer of device 0000:04:00.0 to 64
Feb  7 19:22:21 localhost [  918.494484] scsi13 : ahci
Feb  7 19:22:21 localhost [  918.494564] scsi14 : ahci
Feb  7 19:22:21 localhost [  918.494588] ata13: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe100 irq 22
Feb  7 19:22:21 localhost [  918.494591] ata14: SATA max UDMA/133 abar m8192@0xfdafe000 port 0xfdafe180 irq 22
Feb  7 19:22:21 localhost [  918.799878] ata13: SATA link down (SStatus 0 SControl 300)
Feb  7 19:22:22 localhost [  919.407887] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb  7 19:22:22 localhost [  919.409093] ata14.00: ATA-7: ST3750330AS, SD04, max UDMA/133
Feb  7 19:22:22 localhost [  919.409096] ata14.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Feb  7 19:22:22 localhost [  919.410647] ata14.00: configured for UDMA/133
Feb  7 19:22:22 localhost [  919.409520] scsi 14:0:0:0: Direct-Access     ATA      ST3750330AS      SD04 PQ: 0 ANSI: 5
Feb  7 19:22:22 localhost [  919.409580] sd 14:0:0:0: [sdi] 1465149168 512-byte hardware sectors (750156 MB)
Feb  7 19:22:22 localhost [  919.409589] sd 14:0:0:0: [sdi] Write Protect is off
Feb  7 19:22:22 localhost [  919.409591] sd 14:0:0:0: [sdi] Mode Sense: 00 3a 00 00
Feb  7 19:22:22 localhost [  919.409602] sd 14:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb  7 19:22:22 localhost [  919.409641] sd 14:0:0:0: [sdi] 1465149168 512-byte hardware sectors (750156 MB)
Feb  7 19:22:22 localhost [  919.409647] sd 14:0:0:0: [sdi] Write Protect is off
Feb  7 19:22:22 localhost [  919.409649] sd 14:0:0:0: [sdi] Mode Sense: 00 3a 00 00
Feb  7 19:22:22 localhost [  919.409659] sd 14:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb  7 19:22:22 localhost [  919.409661]  sdi: sdi1 sdi2 sdi3 sdi4 < sdi5 >
Feb  7 19:22:22 localhost [  919.451333] sd 14:0:0:0: [sdi] Attached SCSI disk
Feb  7 19:22:22 localhost [  919.451370] sd 14:0:0:0: Attached scsi generic sg8 type 0
Feb  7 19:22:50 localhost [  947.116521] kjournald starting.  Commit interval 5 seconds
Feb  7 19:22:50 localhost [  947.117102] EXT3 FS on sdi3, internal journal
Feb  7 19:22:50 localhost [  947.117107] EXT3-fs: mounted filesystem with ordered data mode.

I could put as much load as I wanted on the drive after this. I have run seatools short and long tests a couple of times and the drive itself is in great health.

The drive in the same enclosure behaves fine as far as hotplug is concerned in my laptop which has a jmicron express card esata controller.
Comment 1 devsk 2008-02-07 20:11:59 UTC
# lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2)
00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2)
00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3)
00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2)
00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2)
00:07.0 RAID bus controller: nVidia Corporation CK804 Serial ATA Controller (rev a3)
00:08.0 RAID bus controller: nVidia Corporation CK804 Serial ATA Controller (rev a3)
00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2)
00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
00:0b.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0c.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0d.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600 GT] (rev a2)
04:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03)
04:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03)
05:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
Comment 2 devsk 2008-02-07 21:19:23 UTC
> The drive in the same enclosure behaves fine as far as hotplug is
> concerned in my laptop which has a jmicron express card esata controller.

Sorry for the wrong info. I repeated the experiments in laptop again. And I was mistaken. The behavior is same in both laptop and the desktop. Laptop uses 2.6.23 kernel and same controller (as reported by lspci) but with express card bus.

Please let me know if you want me to do some experiments or want some info.
Comment 3 Tejun Heo 2008-02-11 00:13:05 UTC
That's weird.  I've never seen any hotplug problem w/ JMBs.  Do you have a different hard drive to try?  What happens if you do 'echo - - - > /sys/class/scsi_host/hostX/scan' where hostX is the JMB port?
Comment 4 devsk 2008-02-13 12:30:34 UTC
Sorry for the delay. Unfortunately, I don't have a spare drive to test. I will post the info you requested when I go home.
Comment 5 devsk 2008-02-14 21:25:16 UTC
I don't have the data with a different hard drive but I tried esata drive on the sata_nv port that I have and here is the output. So, what I have done is put one of the internal disks on JMicron and put esata on one of the on-board ports.

It seems like the drive hotplug is working with sata_nv (on the mobo port) with a little caveat that it keeps on saying "rejecting IOs to dead device every 65 seconds", but the disk is working fine. I can mount the partitions and access data without problems. The only difference I can see wrt to AHCI is that the JMicron is a PCIEx x1 controller card.

1. Is there a way to suppress the "rejecting IOs..." message?
2. Why is it printing it anyway? 3:0:0:0 is not dead anymore.

############ turned off drive and turned it back on ###############################
Feb 14 21:03:56 localhost [  550.215868] ata4: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xa frozen
Feb 14 21:03:56 localhost [  550.215874] ata4: SError: { PHYRdyChg LinkSeq TrStaTrns }
Feb 14 21:03:56 localhost [  550.215880] ata4: hard resetting link
Feb 14 21:03:56 localhost [  550.937395] ata4: SATA link down (SStatus 0 SControl 300)
Feb 14 21:03:56 localhost [  550.937404] ata4: failed to recover some devices, retrying in 5 secs
Feb 14 21:04:01 localhost [  555.930276] ata4: hard resetting link
Feb 14 21:04:02 localhost [  556.234776] ata4: SATA link down (SStatus 0 SControl 300)
Feb 14 21:04:02 localhost [  556.234787] ata4: failed to recover some devices, retrying in 5 secs
Feb 14 21:04:07 localhost [  561.227652] ata4: hard resetting link
Feb 14 21:04:07 localhost [  561.532155] ata4: SATA link down (SStatus 0 SControl 300)
Feb 14 21:04:07 localhost [  561.532168] ata4.00: disabled
Feb 14 21:04:08 localhost [  562.334854] ata4: soft resetting link
Feb 14 21:04:08 localhost [  562.334862] ata4: SATA link down (SStatus 0 SControl 300)
Feb 14 21:04:08 localhost [  562.334872] ata4: EH complete
Feb 14 21:04:08 localhost [  562.334880] ata4.00: detaching (SCSI 3:0:0:0)
Feb 14 21:04:08 localhost [  562.335060] sd 3:0:0:0: [sdd] Synchronizing SCSI cache
Feb 14 21:04:08 localhost [  562.335090] sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00
Feb 14 21:04:08 localhost [  562.335093] sd 3:0:0:0: [sdd] Stopping disk
Feb 14 21:04:08 localhost [  562.335145] sd 3:0:0:0: [sdd] START_STOP FAILED
Feb 14 21:04:08 localhost [  562.335147] sd 3:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00
Feb 14 21:04:22 localhost [  576.285357] ata4: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xa frozen
Feb 14 21:04:22 localhost [  576.285363] ata4: SError: { PHYRdyChg CommWake }
Feb 14 21:04:22 localhost [  576.285370] ata4: hard resetting link
Feb 14 21:04:28 localhost [  582.198527] ata4: port is slow to respond, please be patient (Status 0x80)
Feb 14 21:04:32 localhost [  586.269900] ata4: SRST failed (errno=-16)
Feb 14 21:04:32 localhost [  586.269905] ata4: hard resetting link
Feb 14 21:04:33 localhost [  587.445994] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb 14 21:04:33 localhost [  587.449433] ata4.00: ATA-7: ST3750330AS, SD04, max UDMA/133
Feb 14 21:04:33 localhost [  587.449436] ata4.00: 1465149168 sectors, multi 0: LBA48 NCQ (depth 0/32)
Feb 14 21:04:33 localhost [  587.455423] ata4.00: configured for UDMA/133
Feb 14 21:04:33 localhost [  587.455432] ata4: EH complete
Feb 14 21:04:33 localhost [  587.455482] scsi 3:0:0:0: Direct-Access     ATA      ST3750330AS      SD04 PQ: 0 ANSI: 5
Feb 14 21:04:33 localhost [  587.455546] sd 3:0:0:0: [sdj] 1465149168 512-byte hardware sectors (750156 MB)
Feb 14 21:04:33 localhost [  587.455554] sd 3:0:0:0: [sdj] Write Protect is off
Feb 14 21:04:33 localhost [  587.455556] sd 3:0:0:0: [sdj] Mode Sense: 00 3a 00 00
Feb 14 21:04:33 localhost [  587.455566] sd 3:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 14 21:04:33 localhost [  587.455603] sd 3:0:0:0: [sdj] 1465149168 512-byte hardware sectors (750156 MB)
Feb 14 21:04:33 localhost [  587.455609] sd 3:0:0:0: [sdj] Write Protect is off
Feb 14 21:04:33 localhost [  587.455611] sd 3:0:0:0: [sdj] Mode Sense: 00 3a 00 00
Feb 14 21:04:33 localhost [  587.455620] sd 3:0:0:0: [sdj] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 14 21:04:33 localhost [  587.455622]  sdj: sdj1 sdj2 sdj3 sdj4 < sdj5 sdj6 >
Feb 14 21:04:33 localhost [  587.506074] sd 3:0:0:0: [sdj] Attached SCSI disk
Feb 14 21:04:33 localhost [  587.506108] sd 3:0:0:0: Attached scsi generic sg3 type 0
Feb 14 21:05:10 localhost [  624.609995] scsi 3:0:0:0: rejecting I/O to dead device
Feb 14 21:05:10 localhost [  624.610002] scsi 3:0:0:0: rejecting I/O to dead device
Feb 14 21:06:15 localhost [  689.709946] scsi 3:0:0:0: rejecting I/O to dead device
Feb 14 21:06:15 localhost [  689.709954] scsi 3:0:0:0: rejecting I/O to dead device
Feb 14 21:07:21 localhost [  754.834827] scsi 3:0:0:0: rejecting I/O to dead device
Feb 14 21:07:21 localhost [  754.834835] scsi 3:0:0:0: rejecting I/O to dead device
Comment 6 devsk 2008-02-15 08:49:05 UTC
One more observation is that I can't really boot from disks on the JMicron ports, which is bizarre.
Comment 7 Tejun Heo 2008-02-20 18:55:13 UTC
rejecting I/O to dead device messages are for in-flight commands for the old 3:0:0:0 which got detached.  It doesn't continue, right?  It's really bizarre, what do you mean by 'can't boot from disks connected to the JMB'?  What happens?
Comment 8 devsk 2008-02-20 19:23:13 UTC
(In reply to comment #7)
> rejecting I/O to dead device messages are for in-flight commands for the old
> 3:0:0:0 which got detached.  It doesn't continue, right?

Thats where the problem is. Messages "rejecting ..." continue every 65 seconds till I reboot the box. It doesn't matter whether 3:0:0:0 is now a valid device and old 3:0:0:0 didn't have any I/O going to it (I unmounted and sync'ed).

>  It's really bizarre,
> what do you mean by 'can't boot from disks connected to the JMB'?  What
> happens?

Grub doesn't see the disk. Only OS can see it after loading AHCI module.
Comment 9 devsk 2008-02-20 19:54:26 UTC
> Grub doesn't see the disk. Only OS can see it after loading AHCI module.

BIOS sees the disks connected to JMB as well and allows me to reorder boot sequence around them but when it reaches grub side of things, the drive disappears. It may just be how grub talks with BIOS.
Comment 10 davidd 2008-02-21 07:00:33 UTC
Earliest failing kernel version: 2.6.24
Distribution: Debian
Hardware Environment: JMicron AHCI controller , ICH8 chipset
Software Environment: gcc 4.2.2, binutils 2.18, glibc 2.7
Problem Description:

Hi, I don't know if this will help but I have a similar problem when I connect a WD 1Tb Mybook to the esata port on my Asus g1s. The esata is connected to the Jmicron chip. Even if I connect the harddrive during boot I have the same problem.

lspci: 

00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)
07:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 02)

dmesg when i connect the Mybook:

ata4: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xa frozen
ata4: irq_stat 0x00000040, connection status changed
ata4: SError: { DevExch }
ata4: hard resetting link
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: ATA-6: WD My Book, 01.01B01, max UDMA/133
ata4.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 1)
ata4.00: configured for UDMA/133
ata4: EH complete
scsi 3:0:0:0: Direct-Access     ATA      WD My Book       01.0 PQ: 0 ANSI: 5
sd 3:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
sd 3:0:0:0: [sdb] Write Protect is off
sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 3:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
sd 3:0:0:0: [sdb] Write Protect is off
sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1 sdb2
sd 3:0:0:0: [sdb] Attached SCSI disk
sd 3:0:0:0: Attached scsi generic sg1 type 0
ata4.00: exception Emask 0x10 SAct 0x1 SErr 0x780100 action 0x2
ata4.00: irq_stat 0x08000000
ata4: SError: { UnrecovData 10B8B Dispar BadCRC Handshk }
ata4.00: cmd 60/b8:00:30:00:00/00:00:00:00:00/40 tag 0 ncq 94208 in
         res 40/00:04:30:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
ata4.00: status: { DRDY }
ata4: hard resetting link
ata4: SATA link down (SStatus 0 SControl 300)
ata4: failed to recover some devices, retrying in 5 secs
ata4: hard resetting link
ata4: SATA link down (SStatus 0 SControl 300)
ata4: failed to recover some devices, retrying in 5 secs
ata4: hard resetting link
ata4: SATA link down (SStatus 0 SControl 300)
ata4.00: disabled
ata4: exception Emask 0x1 SAct 0x0 SErr 0x0 action 0x0 t4
ata4: irq_stat 0x48000000
sd 3:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 3:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
        00 00 00 30 
sd 3:0:0:0: [sdb] Add. Sense: No additional sense information
end_request: I/O error, dev sdb, sector 48
Buffer I/O error on device sdb, logical block 6
Buffer I/O error on device sdb, logical block 7
Buffer I/O error on device sdb, logical block 8
Buffer I/O error on device sdb, logical block 9
Buffer I/O error on device sdb, logical block 10
Buffer I/O error on device sdb, logical block 11
Buffer I/O error on device sdb, logical block 12
Buffer I/O error on device sdb, logical block 13
Buffer I/O error on device sdb, logical block 14
Buffer I/O error on device sdb, logical block 15
ata4: EH complete
sd 3:0:0:0: rejecting I/O to offline device
sd 3:0:0:0: rejecting I/O to offline device
sd 3:0:0:0: rejecting I/O to offline device
sd 3:0:0:0: rejecting I/O to offline device
sd 3:0:0:0: rejecting I/O to offline device
ata4.00: detaching (SCSI 3:0:0:0)
sd 3:0:0:0: [sdb] Synchronizing SCSI cache
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
sd 3:0:0:0: [sdb] Stopping disk
sd 3:0:0:0: [sdb] START_STOP FAILED
sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Comment 11 davidd 2008-02-21 07:07:03 UTC
I forgot to mention that I installed Vista to test whether there was a problem with the drive but worked just fine.
Comment 12 Tejun Heo 2008-02-21 18:15:27 UTC
Created attachment 14944 [details]
limit-link-to-1.5Gbps.patch

Your problem is probably different.  Does this patch help?
Comment 13 devsk 2008-02-21 21:26:01 UTC
I tried another thing when the external disk was on sata_nv port:

---------------------------------
# scsiadd -r 3 0 0 0

Feb 21 20:57:57 localhost [15852.982493] sd 3:0:0:0: [sdi] Synchronizing SCSI cache
Feb 21 20:57:57 localhost [15852.982904] sd 3:0:0:0: [sdi] Stopping disk
Feb 21 20:57:57 localhost [15852.983738] ata4.00: disabled
Feb 21 20:58:25 localhost [15880.735688] ata4: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xa frozen
Feb 21 20:58:25 localhost [15880.735694] ata4: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
Feb 21 20:58:25 localhost [15880.735701] ata4: hard resetting link
Feb 21 20:58:26 localhost [15881.459164] ata4: SATA link down (SStatus 0 SControl 300)
Feb 21 20:58:26 localhost [15881.459178] ata4: EH complete

--------------------------

No more "rejecting IOs...". I wonder why it doesn't do the same thing when I just pull the cable?

Was that patch supposed to be only for David? Or do you think my troubles might have something to do with link speed as well? My external disk works at 1.5Gbps (jumper is set to 1.5 and the external case only supports 1.5) and the JMB controller is 3Gbps.
Comment 14 davidd 2008-02-22 14:37:25 UTC
Tnx for the patch, I'll test it a soon as I have my eSata cable back. Would this mean that if the patch works it would have the same effect as setting the jumper on the HD to 1.5G? 

My apologies if I've made this report less clarifying. 
Comment 15 davidd 2008-02-26 01:49:28 UTC
I've tested the patch. Everything works just fine now. But in theory the harddrive supports 3Gb/s. Is there a particular reason 3Gb/s doesn't work?

Anyhow, many thanks
Comment 16 Tejun Heo 2008-02-28 01:43:35 UTC
Davidd, I don't know.  It's a hardware issue and many external SATA devices seem to have PHY related issues.  It's also weird that the device couldn't recover from such failures.  :-(  Patch to implement libata.force module parameter is pending, so you'll be able to use "libata.force=4:1.5Gbps" instead of the patch.

Devsk, the message repetition is because somebody is holding the SCSI device and issuing commands every 65 seconds.  Because it's holding the device node, the zombie device can't die and just keeps rejecting IO requests being issued to it.  fuser on the device node should tell you who's deep into voodoo.
Comment 17 devsk 2008-02-28 20:22:05 UTC
(In reply to comment #16)
> Devsk, the message repetition is because somebody is holding the SCSI device
> and issuing commands every 65 seconds.  Because it's holding the device node,
> the zombie device can't die and just keeps rejecting IO requests being issued
> to it.  fuser on the device node should tell you who's deep into voodoo.
> 

So, it is hddtemp answering gkrellm's request for temperature of the disk (every 1 minute) and hddtemp making scsi ioctl calls. The question is why does it go away if I do scsiadd -r? Does this mean that some part of scsi subsystem is not really updated to the fact the disk is gone? Does any of this make sense?
Comment 18 Tejun Heo 2008-02-29 04:20:48 UTC
Ah... okay.  The device can be either disabled or detached.  The device is disabled if the device isn't responding but it can't be positively determined that the device is actually gone (PHY offline).  In this case, the device node hangs around in killed state.  'scsiadd -r' tells the kernel that the device should be removed and thus the device node goes away.

Back to the original problem, have you tried the force 1.5Gbps patch?
Comment 19 devsk 2008-03-05 20:18:04 UTC
ok, forcing to 1.5Gbps works. But I like the other configuration with the removable disk on sata_nv...;-)
Comment 20 Tejun Heo 2008-03-05 21:32:44 UTC
Hmmm... The behavior is a bit disturbing in that libata EH can't recover the device automatically but I don't see anything which the driver can do automatically to work around the problem other than forcing 1.5Gbps.

2.6.25 will contain a better EH logic which behaves a bit more aggressively, so it will have better chance of recovering from such failures.  Closing as DOCUMENTED for now.  Thanks.
Comment 21 devsk 2008-03-12 22:01:06 UTC
Reopening for a twist. I had this problem (http://bugzilla.kernel.org/show_bug.cgi?id=9659) with kernels later than 2.6.21 that they couldn't resume from suspend-to-ram unless I passed sata_nv.adma=0. I moved to a libata-only configuration and I could resume from suspend-to-ram without sata_nv.adma=0. Now that I have put the external drive on sata_nv, I can't hot plug the drive without sata_nv.adma=0...:-(

As soon as I turn the drive on, it just locks up. If pass sata_nv.adma=0, I turn it on, it works fine.

Surprisingly, if I boot same kernel but x86_64 on the same hardware, sata_nv.adma=0 is not required for hotplug to work. It just works. But I can't use x86_64 for other reasons.
Comment 22 devsk 2008-03-12 22:01:59 UTC
> it just locks up

I meant the whole machine. Only reset can be done. Just like the resume problem.
Comment 23 Tejun Heo 2008-03-12 22:07:24 UTC
This bug report sort of drifted from ahci to sata_nv ADMA.  Maybe it's best to file a separate bug report for this.  Robert, can you please take a look at this?

Thanks.
Comment 24 Robert Hancock 2008-03-12 22:13:57 UTC
We had another report of hotplug lockups on 32-bit only:

http://bugzilla.kernel.org/show_bug.cgi?id=8421

which seems a bit bizarre, I can't see how a 64-bit vs. 32-bit kernel should affect anything. It seems to also be somehow dependent on motherboard revision, according to some reports anyway. Feel free to add comments to that bug report. It's not had much activity recently..
Comment 25 devsk 2008-03-12 23:41:26 UTC
I know Ingo found a bug on x86 recently which was present for a long time in the kernel and had something to do with CONFIG_PCI_BIOS.

http://lkml.org/lkml/2008/3/10/171

Not very surprisingly, the only major difference in the .config of the x86_64 and x86 kernels I use are in that area:

x86:

# grep CONFIG_PCI /usr/src/linux/.config
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set

x86_64:

# grep CONFIG_PCI /mnt/x64-root/usr/src/linux/.config
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set

I have no idea how these are related to each other, and with sata_nv.adma=0.

So, facts:

1. x86_64 works, x86 doesn't.
2. x86 works if sata_nv.adma=0 is passed.
3. x86 detects the drive fine if booted with it on, but hard locks if turned on after bootup.
4. CONFIG_PCI options are not same in x86_64 and x86 by default. GOANY is not even there in menuconfig.

Anybody, know these config options well enough to comment?
Comment 26 David Balažic 2008-08-12 13:10:01 UTC
I have a similar problem as davidd in comment 10.

I connect a WD MyBook 500GB Studio Edition over eSATA to a JMicron JMB363 controller (before power on).
Then in linux right after I access the drive (by running hdparm -tT /dev/sdd; sdd is the MyBook), it locks up (see dmesg).

This is a 64 bit kernel from RIPLinux 6.1 (2.6.26).

/proc/version:
Linux version 2.6.26.64bit (root@darkstar) (gcc version 4.2.3) #4 Mon Jul 14 05:39:37 UTC 2008

uname -a:
Linux RIPLinuX 2.6.26.64bit #4 Mon Jul 14 05:39:37 UTC 2008 x86_64 GNU/Linux

I attach the larger files.
Comment 27 David Balažic 2008-08-12 13:11:06 UTC
Created attachment 17190 [details]
dmesg, error at end
Comment 28 David Balažic 2008-08-12 13:11:36 UTC
Created attachment 17191 [details]
lspci output
Comment 29 David Balažic 2008-08-12 13:11:59 UTC
Created attachment 17192 [details]
lspci- vvn output
Comment 30 David Balažic 2008-08-12 13:17:52 UTC
Sorry, here is some more detail:

Latest working kernel version: ? (probably none)
Earliest failing kernel version: 2.6.26
Distribution: RIPLinux 6.1
Hardware Environment: Asus P5K-E WiFi (P35+ICH9R+JMB363)
Software Environment: ?
Problem Description: ? see above

Also:
The BIOS mode setting for JMB363 is "AHCI".

The disk works fine in Windows (but has a lot of problems there too; I only managed to get it work properly with the very latest driver from JMicron, v 1.17.39).

After the problem happens in linux, the drive does not work in Windows either.
Not even after several reboots and power cycles.
Last time this happened, it worked again next day, so I have no idea what makes it work ok again...
Comment 31 Tejun Heo 2008-08-25 01:48:34 UTC
I've been talking with JMicron and they suggest that the problem is caused by misbehaving SATA bridge chip inside the WD device and they haven't found proper solution yet on the windows either.  I'm asking WD for more information but it seems we're out of luck and might have to just note the case in linux-ata wiki and give up on the combination.  I'll update when I know more.  Thanks.
Comment 32 devsk 2008-08-25 08:51:53 UTC
Only in my case, it was Seagate 750GB drive.
Comment 33 David Balažic 2008-08-26 01:04:38 UTC
For info, I received this from WD support about my JMB363+WD_MyBook problem:

This sounds like an incompatibility perhaps with the ESATA controller. Please see the link below for tested ESATA controllers.

http://www.wdc.com/en/products/resources/esataupgrade.asp
Comment 34 Tejun Heo 2008-08-29 03:40:57 UTC
Yeah, there are two bugs being tracked here the nv one and the jmb + wdc one.  For the jmb + wd one, jmb acks that there's a hardware compatibility with the wd external drive and I'm talking with both jmb and wd to find out what's going on.  For the nv one, I have no idea what's going on.  :-(
Comment 35 Petr Nejedly 2008-09-26 10:51:35 UTC
For the record, I see the same behavior as initially reported: works when booted with everything on, errors when trying to hotplug, even when hotplugging (expresscard) the esata card with the esata cable already connected.
Hardware: MacBookPro, ST-LAB C-230 (JMB363 based) express card, Seagate FreeAgent Pro.

Software:
* Gentoo, kernel 2.6.22-gentoo-r9 with mactel patches
* Gentoo, kernel 2.6.27-rc6-wl

Log:
ata5: exception Emask 0x10 SAct 0x0 SErr 0x49d0000 action 0x2 frozen
ata5: hard resetting port
ata5: port is slow to respond, please be patient (Status 0xff)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting port
ata5: port is slow to respond, please be patient (Status 0xff)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting port
ata5: port is slow to respond, please be patient (Status 0xff)
ata5: COMRESET failed (errno=-16)
ata5: limiting SATA link speed to 1.5 Gbps
ata5: hard resetting port
ata5: COMRESET failed (errno=-16)
ata5: reset failed, giving up
ata5: EH complete
Comment 36 Petr Nejedly 2008-09-26 15:31:49 UTC
Please disregard my last comment. It turned out the problem was not with JMB363, but with construction flaw of the Seagate FreeAgent Pro, which doesn't allow proper fitting of the eSATA cable. Disassembled FAP with the cable properly fitted
works perfectly, including hotplug.
Comment 37 David Balažic 2008-12-08 00:58:02 UTC
Hi, Tejun.
Any news ? Should the WD/JMB combination be added to the wiki ?
Comment 38 Tejun Heo 2009-01-28 18:28:33 UTC
Hello, sorry about the long delay.

I've tested four WD My Book drives (studio, studio ii, home and dvr expander) with jmb360, ich7 and sata_sil24.  There seem to be certain quirkiness to the drives.  For example, studio and home ones shutdown their SATA interface after certain transmission errors leaving no way to recover it other than power cycling the drive.  But, in general, it seems the problem isn't specific to any controller or drive combination but is rather heavily affected by cabling.  With short to medium direct cables, all four drives worked fine with all controllers.  With SATA -> eSATA converter in the middle and longer cable, detection and operation afterwards become quite flaky and different controller and drive combinations give different results, which BTW is quite consistent with davidd and devsk's reports that forcing 1.5Gbps resolves the problem.

What libata can do here are...

1. Slow down to 1.5Gbps if hotplug events occur repeatedly.  libata does slow down after certain number of transmission errors after a device is probed but during probing it doesn't.  This prevents successful probing in certain cabling conditions.

2. Force 1.5Gbps for drives which are known to react badly to transmission errors.  As the hardware seems to completely shutdown after transmission errors, the only thing the driver can do is to avoid them.  Lowering link speed helps quite a bit and there virtually isn't any downside.

David, can you please try libata.force=1.5Gbps and shorter cable and see whether that makes any difference?  The difference between windows and linux could be that windows defaults to 1.5Gbps thus avoiding such transmission failures from the beginning.

Thanks.
Comment 39 Tejun Heo 2009-01-29 03:34:09 UTC
Patchset posted.

http://thread.gmane.org/gmane.linux.ide/37942
Comment 40 David Balažic 2009-01-29 04:16:41 UTC
(to not spam the bug I'm sending directly to you)

Would it be possible to always start in 1.5Gbps mode
and then switch to 3 in certain conditions ?
(whitelists, blackslists, crystal ball....)

Just an idea...

Regards,
David
 

> -----Original Message-----
> From: bugme-daemon@bugzilla.kernel.org 
> [mailto:bugme-daemon@bugzilla.kernel.org] 
> Sent: Thursday, January 29, 2009 12:34 PM
> To: David Balazic
> Subject: [Bug 9913] AHCI (JMICRON) hotplug doesn't work with 
> esata drive
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=9913
> 
> 
> 
> 
> 
> ------- Comment #39 from tj@kernel.org  2009-01-29 03:34 -------
> Patchset posted.
> 
> http://thread.gmane.org/gmane.linux.ide/37942
> 
> 
> -- 
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>