I have two WD WD20EARS Green Adv. Fmt. drives Each of them has it's own NexStar CX enclosure (USB/eSATA). Each drive & enclosure works ok on it own. They can be plugged, are recognized, are mountable, are u-mountable, no problems of any kind. Problem: I cannot hot-swap any of the disks with the other one. If i power off an enclosure and replace it with another one, i get: May 8 17:12:42 ulquiorra kernel: [ 9913.167921] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen May 8 17:12:42 ulquiorra kernel: [ 9913.167932] ata5: irq_stat 0x00000040, connection status changed May 8 17:12:42 ulquiorra kernel: [ 9913.167942] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } May 8 17:12:42 ulquiorra kernel: [ 9913.167958] ata5: limiting SATA link speed to 1.5 Gbps May 8 17:12:42 ulquiorra kernel: [ 9913.167966] ata5: hard resetting link May 8 17:12:48 ulquiorra kernel: [ 9918.940037] ata5: link is slow to respond, please be patient (ready=0) May 8 17:12:52 ulquiorra kernel: [ 9923.173258] ata5: COMRESET failed (errno=-16) May 8 17:12:52 ulquiorra kernel: [ 9923.173271] ata5: hard resetting link May 8 17:12:58 ulquiorra kernel: [ 9928.945419] ata5: link is slow to respond, please be patient (ready=0) May 8 17:13:02 ulquiorra kernel: [ 9933.183068] ata5: COMRESET failed (errno=-16) May 8 17:13:02 ulquiorra kernel: [ 9933.183081] ata5: hard resetting link May 8 17:13:08 ulquiorra kernel: [ 9938.909277] ata5: link is slow to respond, please be patient (ready=0) May 8 17:13:37 ulquiorra kernel: [ 9968.201074] ata5: COMRESET failed (errno=-16) May 8 17:13:37 ulquiorra kernel: [ 9968.201089] ata5: hard resetting link May 8 17:13:38 ulquiorra kernel: [ 9968.924046] ata5: SATA link down (SStatus 0 SControl 310) May 8 17:13:38 ulquiorra kernel: [ 9968.924069] ata5: EH complete Workaround: * power on, mount, use, umount the first enclosure. * power off the first enclosure (for example port ATA5) * issue '# echo "- - - " > /sys/class/scsi_host/host4/scan' * power on the second enclosure (also port ATA5), it will be detected correclty. NOTE: The first time the port allways detects any disk correctly. If the port is not rescanned manually without drives attached, it is unusable. ---- Sata Controller: * nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2) OS: Fedora 15 beta kernel: 2.6.38.5-24.fc15.x86_64 HD1: WDC WD20EARS-00MVWB0, 50.0AB50 HD2: WDC WD20EARS-00MVWB0, 51.0AB51
Created attachment 56982 [details] ata-reset-fix.patch Does this patch make any difference? Also, can you please post full kernel log including boot and detection failure messages preferably with printk timestamp turned on? Thanks.
I experienced the issue in * F15 (2.6.38.x) * F13 (2.6.34.x) * Slackware64 13.1 (different messages but same results in slack). So i will first update Slack64 to 2.6.38.5 vanilla,then after reproducing the problem i will try to patch it and see. If that fails i will try with fedora.
Probably different distros wouldn't affect the problem too much. I would suggest just sticking with the same distro but rolling your own vanilla kernel for testing. Thanks.
For me, compiling in Slackware is more easy and faster (and maybe fun) Well, i tried this: * downloaded kernel 2.6.38.5 from kernel.org. And setup two different directories for each kernel (just in case). */usr/src/2.6.38.5-stock (using 2.6.38.5-stock.conf) */usr/src/2.6.38.5-patch (using 2.6.38.5-patch.conf) I will upload these after posting this. These are self-compiled kernels. I was able to reproduce the problem with the 'stock' kernel, so it is not fedora specific, so i will try the patch. Sadly the proposed patch in comment#1 did not change things. --- OS: Slackware64 13.1 Kernel: 2.6.38.5stock-0 (no patch applied) ------------------------------------------------------- Power-ON ENC#2: May 15 17:09:56 shadow kernel: ata5: hard resetting link May 15 17:09:57 shadow kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 15 17:09:57 shadow kernel: ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 51.0AB51, max UDMA/133 May 15 17:09:57 shadow kernel: ata5.00: 3907029168 sectors, multi 0: LBA48 May 15 17:09:57 shadow kernel: ata5.00: configured for UDMA/133 May 15 17:09:57 shadow kernel: ata5: EH complete May 15 17:09:57 shadow kernel: scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-00M 51.0 PQ: 0 ANSI: 5 May 15 17:09:57 shadow kernel: sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) May 15 17:09:57 shadow kernel: sd 4:0:0:0: Attached scsi generic sg3 type 0 May 15 17:09:57 shadow kernel: sd 4:0:0:0: [sdc] Write Protect is off May 15 17:09:57 shadow kernel: sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 15 17:09:57 shadow kernel: sdc: sdc1 sdc2 sdc3 sdc4 sdc5 sdc6 sdc7 May 15 17:09:57 shadow kernel: sd 4:0:0:0: [sdc] Attached SCSI disk Power-OFF ENC#2: May 15 17:10:38 shadow kernel: ata5: hard resetting link May 15 17:10:39 shadow kernel: ata5: SATA link down (SStatus 0 SControl 300) May 15 17:10:44 shadow kernel: ata5: hard resetting link May 15 17:10:44 shadow kernel: ata5: SATA link down (SStatus 0 SControl 300) May 15 17:10:49 shadow kernel: ata5: hard resetting link May 15 17:10:49 shadow kernel: ata5: SATA link down (SStatus 0 SControl 310) May 15 17:10:49 shadow kernel: ata5: EH complete May 15 17:10:49 shadow kernel: ata5.00: detaching (SCSI 4:0:0:0) May 15 17:10:49 shadow kernel: sd 4:0:0:0: [sdc] Synchronizing SCSI cache May 15 17:10:49 shadow kernel: sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 May 15 17:10:49 shadow kernel: sd 4:0:0:0: [sdc] Stopping disk May 15 17:10:49 shadow kernel: sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 Relpace ENC#2 with ENC#1: Power-ON ENC#1: May 15 17:12:24 shadow kernel: ata5: hard resetting link May 15 17:12:44 shadow last message repeated 2 times May 15 17:13:19 shadow kernel: ata5: hard resetting link May 15 17:13:24 shadow kernel: ata5: EH complete Power-OFF ENC#1: (After led flashing for more than a minute): [no messages] REBOOT. OS: Slackware64 13.1 Kernel: 2.6.38.5patch-esata-1 (patch applied) ------------------------------------------------------- Power-ON ENC#2: May 15 17:21:44 shadow kernel: ata5: hard resetting link May 15 17:21:45 shadow kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 15 17:21:45 shadow kernel: ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 51.0AB51, max UDMA/133 May 15 17:21:45 shadow kernel: ata5.00: 3907029168 sectors, multi 0: LBA48 May 15 17:21:45 shadow kernel: ata5.00: configured for UDMA/133 May 15 17:21:45 shadow kernel: ata5: EH complete May 15 17:21:45 shadow kernel: scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-00M 51.0 PQ: 0 ANSI: 5 May 15 17:21:45 shadow kernel: sd 4:0:0:0: Attached scsi generic sg3 type 0 May 15 17:21:45 shadow kernel: sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) May 15 17:21:45 shadow kernel: sd 4:0:0:0: [sdc] Write Protect is off May 15 17:21:45 shadow kernel: sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 15 17:21:45 shadow kernel: sdc: sdc1 sdc2 sdc3 sdc4 sdc5 sdc6 sdc7 May 15 17:21:45 shadow kernel: sd 4:0:0:0: [sdc] Attached SCSI disk Power-OFF ENC#2: May 15 17:22:23 shadow kernel: ata5: hard resetting link May 15 17:22:24 shadow kernel: ata5: SATA link down (SStatus 0 SControl 300) May 15 17:22:29 shadow kernel: ata5: hard resetting link May 15 17:22:29 shadow kernel: ata5: SATA link down (SStatus 0 SControl 300) May 15 17:22:34 shadow kernel: ata5: hard resetting link May 15 17:22:34 shadow kernel: ata5: SATA link down (SStatus 0 SControl 310) May 15 17:22:34 shadow kernel: ata5: EH complete May 15 17:22:34 shadow kernel: ata5.00: detaching (SCSI 4:0:0:0) May 15 17:22:34 shadow kernel: sd 4:0:0:0: [sdc] Synchronizing SCSI cache May 15 17:22:34 shadow kernel: sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 May 15 17:22:34 shadow kernel: sd 4:0:0:0: [sdc] Stopping disk May 15 17:22:34 shadow kernel: sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 Relpace ENC#2 with ENC#1: Power-ON ENC#1: May 15 17:23:50 shadow kernel: ata5: hard resetting link May 15 17:24:10 shadow last message repeated 2 times May 15 17:24:45 shadow kernel: ata5: hard resetting link May 15 17:24:50 shadow kernel: ata5: EH complete Power-OFF ENC#1: (After led flashing for more than a minute): [no messages]
Created attachment 57932 [details] stock kernel, unpatched & self-compiled
Created attachment 57942 [details] patched kernel, self-compiled
Sorry about the delay. Can you please attach the output of "dmesg" after triggering the error? It seems all KERN_DEBUG messages are missing. Thanks.
Created attachment 58642 [details] sata test (stock) I repeated the same previous test, for stock, then for patch: dmesg_pre (after boot) dmesg_enc2 (power-on enclosure #2) dmesg_enc2_off (power-off enclosure #2) dmesg_enc1_off (power- on enclosure #1, wait and power-off enclosure #1) also included messages. (there are 8 files per archive)
Created attachment 58652 [details] sata test (patch) These are the patched dmesg / messages. Both attachments are *.tar.gz with 8 logfiles each.
So, the problem basically comes down to the following reset failure. ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen ata5: irq_stat 0x00000040, connection status changed ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } ata5: hard resetting link ata5: link is slow to respond, please be patient (ready=0) ata5: COMRESET failed (errno=-16) ata5: hard resetting link ata5: link is slow to respond, please be patient (ready=0) ata5: COMRESET failed (errno=-16) ata5: hard resetting link ata5: link is slow to respond, please be patient (ready=0) ata5: COMRESET failed (errno=-16) ata5: limiting SATA link speed to 1.5 Gbps ata5: hard resetting link ata5: COMRESET failed (errno=-16) ata5: reset failed, giving up ata5: EH complete After which, libata turns off PHY event detection on the port until manual rescan is explicitly issued (we might want to change that behavior but that's a different issue). Can you please clarify the followings? 1. Right after EH complete is printed, if you issue manual rescan while leaving the device attached, does it get detected properly? 2. What happens if you power off the enclosure, wait for it to be detached and then power it up again? Does it get detected properly? ie. Is exchanging to the other enclosure necessary to trigger the reset failure? Thanks.
> 2. What happens if you power off the enclosure, wait for it to be detached > and > then power it up again? Does it get detected properly? That was a surprise, it does NOT get detected properly. The oldest enclosure is used on a wdtvl, and in the past certainly was able to do that. (I did assume that powering on an enclosure, then off then on again worked but it seems to not to work any more). > ie. Is exchanging to the other enclosure necessary to trigger the reset > failure? That is what i was thinking until today. In the past it was necessary, now it seems not. I will try with older kernels. > 1. Right after EH complete is printed, if you issue manual rescan while > leaving > the device attached, does it get detected properly? no, it does not. i already tested it but i repeated the test with the patched kernel, just in case. the rescan must be executed without drives.
Created attachment 58802 [details] new test with parameter libata.force=5:dump_id Tried with slackwares's 2.6.33.4 and Fedora 13's 2.6.34.9 and: * encA -> on > off > on >>>>>>>>>> fails. * encA -> on > off > encB > on >>> fails. * manual port rescan is always needed when changing drives or power cycle the current one. Repeated tests with patched stock 2.6.38.5 and the following kernel parameter: libata.force=5:dump_id This parameter is not documented or absent in 2.6.33/34 and present in 2.6.37/38, it prints the 'identify data'. Output may be useful. attached logfiles, order: dmesg-t8.pre.out dmesg-t8-enc2_on.out dmesg-t8-enc2_off.out dmesg-t8-enc2_on_off.out dmesg-t8-enc2_rescan_empty.out dmesg-t8-enc2_on_after_rescan.out dmesg-t8-enc1_on_after_rescan.out dmesg-t8-enc1_on_after_rescan_II.out dmesg-t8-enc1_off_rescan.o
Created attachment 59082 [details] thaw-reset-failed-ports.patch Can you please try this patch? This might not remove the problem itself but you should be able to trigger retry by simply disconnecting and replugging the enclosure. Please let me know how it works. Also, can you please enable CONFIG_PRINTK_TIME? Thanks.
Created attachment 59172 [details] 2.6.38.5 with thaw-reset-failed-ports.patch > This might not remove the problem itself but > you should be able to trigger retry by simply disconnecting and replugging > the > enclosure. It seems you are right, with the last patch, now i get: ON > OFF > ON (FAIL) > OFF > ON (GOOD) Previously, if you power of the enclosure in failed state (on [fail] state) there were no messages, now i get: [ 381.605365] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen [ 381.605367] ata5: irq_stat 0x00400000, PHY RDY changed [ 381.605370] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns } [ 381.605375] ata5: hard resetting link [ 382.328032] ata5: SATA link down (SStatus 10 SControl 300) [ 382.328041] ata5: EH complete The first, third, seventh... time it will work. The second, fourth, sixth... time it wont. --- * As for the .config i enabled printk timing and increased dmesg log size a bit, just in case. * in the attached messages file there is a syslog restart, because i changed it to put debug messages in the messages file. ---- do i try the latest 2.6.39 ?
I don't think 2.6.39 would change anything and with timestamp I think it should be enough to attach single dmesg output after everything took place. It's quite interesting and I'm wondering whether this is caused by the controller or enclosure. Can you please try to take the hard drive out of the enclosure and connect directly to the SATA port and try hotplugging? Plugging and unplugging both power and SATA signal connectors while the system is hot should be safe - the internal ones are designed to be hotplug safe too. Can you still see the interleaved success/fail pattern that way? Thanks.
> It's quite interesting and I'm wondering whether this is caused by the > controller or enclosure. I already had some problems with it in the past: https://bugzilla.redhat.com --> Bug 611350 - ATA frozen , random pauses, every disk (workaround: pcie_aspm=off) > Can you please try to take the hard drive out of the > enclosure and connect directly to the SATA port and try hotplugging? I will try it, but it could take a couple of days since i have to open the case... ---- I did a quick test with one enclosure. I replaced one WD20EARS with a WD5000AAKS and later with a WD1200JS. Only the WD20EARS 'triggers' the issue. And the 'issue' seems to be related to the WD20EARS being powered down, whatever that comes next to this event, hits the invisible wall. * WD1200JS (It is a SATA 1.5Gbps NON-NCQ 120GB drive). * WD5000AAKS (It is a SATA 3Gbps NCQ 500gb drive). input: CHESEN USB Keyboard as /devices/pci0000:00/0000:00:04.0/usb4/4-3/4-3:1.1/input/input3 generic-usb 0003:0A81:0101.0002: input,hidraw1: USB HID v1.10 Device [CHESEN USB Keyboard] on usb-0000:00:04.0-3/input1 input: Genius Optical Mouse as /devices/pci0000:00/0000:00:02.0/usb3/3-3/3-3:1.0/input/input4 generic-usb 0003:0458:003A.0003: input,hidraw2: USB HID v1.10 Mouse [Genius Optical Mouse] on usb-0000:00:02.0-3/input0 usbcore: registered new interface driver usbhid usbhid: USB HID core driver forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64. ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22 forcedeth 0000:00:0a.0: PCI INT A -> Link[LMAC] -> GSI 22 (level, low) -> IRQ 22 forcedeth 0000:00:0a.0: setting latency timer to 64 wmi: Mapper loaded forcedeth 0000:00:0a.0: ifname eth1, PHY OUI 0x732 @ 3, addr 00:24:8c:38:2b:0c forcedeth 0000:00:0a.0: highdma csum pwrctl mgmt gbit lnktim msi desc-v3 udev: renamed network interface eth1 to eth2 ACPI: PCI Interrupt Link [LN0B] enabled at IRQ 19 HDA Intel 0000:02:00.1: PCI INT B -> Link[LN0B] -> GSI 19 (level, low) -> IRQ 19 HDA Intel 0000:02:00.1: irq 41 for MSI/MSI-X HDA Intel 0000:02:00.1: setting latency timer to 64 ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 18 CA0106 0000:01:09.0: PCI INT A -> Link[LNKB] -> GSI 18 (level, low) -> IRQ 18 snd-ca0106: Model 1006 Rev 00000000 Serial 10061102 fuse init (API version 7.16) ppdev: user-space parallel port driver lp: driver loaded but no devices found Linux agpgart interface v0.103 NET: Registered protocol family 10 ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen ata5: irq_stat 0x00000040, connection status changed ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } ata5: hard resetting link ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata5.00: ATA-8: WDC WD5000AAKS-00A7B2, 01.03B01, max UDMA/133 ata5.00: 976773168 sectors, multi 0: LBA48 ata5.00: configured for UDMA/133 ata5: EH complete scsi 4:0:0:0: Direct-Access ATA WDC WD5000AAKS-0 01.0 PQ: 0 ANSI: 5 sd 4:0:0:0: Attached scsi generic sg3 type 0 sd 4:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/465 GiB) sd 4:0:0:0: [sdc] Write Protect is off sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 > sd 4:0:0:0: [sdc] Attached SCSI disk ata5: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen ata5: irq_stat 0x00400000, PHY RDY changed ata5: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns } ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 300) ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 300) ata5: limiting SATA link speed to 1.5 Gbps ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 310) ata5.00: disabled ata5: limiting SATA link speed to 1.5 Gbps ata5: EH complete ata5.00: detaching (SCSI 4:0:0:0) sd 4:0:0:0: [sdc] Synchronizing SCSI cache sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 sd 4:0:0:0: [sdc] Stopping disk sd 4:0:0:0: [sdc] START_STOP FAILED sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen ata5: irq_stat 0x00000040, connection status changed ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } ata5: limiting SATA link speed to 1.5 Gbps ata5: hard resetting link ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata5.00: ATA-8: WDC WD5000AAKS-00A7B2, 01.03B01, max UDMA/133 ata5.00: 976773168 sectors, multi 0: LBA48 ata5.00: configured for UDMA/133 ata5: EH complete scsi 4:0:0:0: Direct-Access ATA WDC WD5000AAKS-0 01.0 PQ: 0 ANSI: 5 sd 4:0:0:0: Attached scsi generic sg3 type 0 sd 4:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/465 GiB) sd 4:0:0:0: [sdc] Write Protect is off sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 > sd 4:0:0:0: [sdc] Attached SCSI disk ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen ata5: irq_stat 0x00400000, PHY RDY changed ata5: SError: { PHYRdyChg LinkSeq TrStaTrns } ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 310) ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 310) ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 310) ata5.00: disabled ata5: limiting SATA link speed to 1.5 Gbps ata5: EH complete ata5.00: detaching (SCSI 4:0:0:0) sd 4:0:0:0: [sdc] Synchronizing SCSI cache sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 sd 4:0:0:0: [sdc] Stopping disk sd 4:0:0:0: [sdc] START_STOP FAILED sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen ata5: irq_stat 0x00000040, connection status changed ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } ata5: hard resetting link ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133 ata5.00: 234441648 sectors, multi 0: LBA48 ata5.00: configured for UDMA/133 ata5: EH complete scsi 4:0:0:0: Direct-Access ATA WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5 sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB) sd 4:0:0:0: [sdc] Write Protect is off sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 4:0:0:0: Attached scsi generic sg3 type 0 sdc: sdc1 sd 4:0:0:0: [sdc] Attached SCSI disk ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen ata5: irq_stat 0x00400000, PHY RDY changed ata5: SError: { PHYRdyChg LinkSeq TrStaTrns } ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 300) ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 300) ata5: limiting SATA link speed to 1.5 Gbps ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 310) ata5.00: disabled ata5: limiting SATA link speed to 1.5 Gbps ata5: EH complete ata5.00: detaching (SCSI 4:0:0:0) sd 4:0:0:0: [sdc] Synchronizing SCSI cache sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 sd 4:0:0:0: [sdc] Stopping disk sd 4:0:0:0: [sdc] START_STOP FAILED sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen ata5: irq_stat 0x00000040, connection status changed ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } ata5: limiting SATA link speed to 1.5 Gbps ata5: hard resetting link ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133 ata5.00: 234441648 sectors, multi 0: LBA48 ata5.00: configured for UDMA/133 ata5: EH complete scsi 4:0:0:0: Direct-Access ATA WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5 sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB) sd 4:0:0:0: Attached scsi generic sg3 type 0 sd 4:0:0:0: [sdc] Write Protect is off sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdc: sdc1 sd 4:0:0:0: [sdc] Attached SCSI disk ata5: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen ata5: irq_stat 0x00400000, PHY RDY changed ata5: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns } ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 310) ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 310) ata5: hard resetting link ata5: SATA link down (SStatus 0 SControl 310) ata5.00: disabled ata5: limiting SATA link speed to 1.5 Gbps ata5: EH complete ata5.00: detaching (SCSI 4:0:0:0) sd 4:0:0:0: [sdc] Synchronizing SCSI cache sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 sd 4:0:0:0: [sdc] Stopping disk sd 4:0:0:0: [sdc] START_STOP FAILED sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen ata5: irq_stat 0x00000040, connection status changed ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } ata5: limiting SATA link speed to 1.5 Gbps ata5: hard resetting link ata5: link is slow to respond, please be patient (ready=0) ata5: COMRESET failed (errno=-16) ata5: hard resetting link ata5: link is slow to respond, please be patient (ready=0) ata5: COMRESET failed (errno=-16) ata5: hard resetting link ata5: link is slow to respond, please be patient (ready=0) ata5: COMRESET failed (errno=-16) ata5: hard resetting link ata5: COMRESET failed (errno=-16) ata5: reset failed, giving up ata5: EH complete This is the stock kernel, and what was done is: WD5000AAKS on, off, on, off WD1200JS on, off, on, off WD20EARS on (fail), off, on (fail) [no messages were shown] WD1200JS on (fail), off [no messages were shown] Since it is the same kernel from the other time, printk timing is disabled i wil compile it later for testing.
> Only the WD20EARS 'triggers' the issue. And the 'issue' seems to be related > to > the WD20EARS being powered down, whatever that comes next to this event, hits > the invisible wall. Please ignore that paragraph.
I recompiled stock with printk timing enabled. In the example below, i powered on the WD1200JS, then on again. I noticed that the WD1200JS comes in 1.5Gbps ... instead of 3Gbps... I jumpered the other enclosure's WD20EARS to 1.5Gbps and when i power it on it come up with that link speed... May it be that: * The first time anything is plugged in in a port, the controller config it ok. * the second time it is unable to pass the "3Gbps stage" (i am not a programer) probing and get's stuck. Maybe the WD20EARS cannot default to 1.5Gbps while the others can --- Test's dmesg (stock): [ 5.458271] ACPI: Power Button [PWRF] [ 5.465011] usb 3-3: new low speed USB device using ohci_hcd and address 2 [ 5.470183] ACPI: acpi_idle registered with cpuidle [ 5.495846] forcedeth 0000:00:0a.0: ifname eth0, PHY OUI 0x732 @ 3, addr 00:24:8c:38:2b:0c [ 5.495912] forcedeth 0000:00:0a.0: highdma csum pwrctl mgmt gbit lnktim msi desc-v3 [ 5.496023] i2c i2c-0: nForce2 SMBus adapter at 0x600 [ 5.496070] ACPI: resource nForce2_smbus [io 0x0700-0x073f] conflicts with ACPI region SM00 [io 0x700-0x73f] [ 5.496133] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 5.660013] usb 3-3: New USB device found, idVendor=0458, idProduct=003a [ 5.660060] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 5.660104] usb 3-3: Product: Optical Mouse [ 5.660146] usb 3-3: Manufacturer: Genius [ 5.665728] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 19 [ 5.665775] CA0106 0000:01:09.0: PCI INT A -> Link[LNKB] -> GSI 19 (level, low) -> IRQ 19 [ 5.665860] snd-ca0106: Model 1006 Rev 00000000 Serial 10061102 [ 5.673158] input: Genius Optical Mouse as /devices/pci0000:00/0000:00:02.0/usb3/3-3/3-3:1.0/input/input4 [ 5.673304] generic-usb 0003:0458:003A.0003: input,hidraw2: USB HID v1.10 Mouse [Genius Optical Mouse] on usb-0000:00:02.0-3/input0 [ 5.687225] sundance.c:v1.2 11-Sep-2006 Written by Donald Becker [ 5.687460] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 18 [ 5.687506] sundance 0000:01:08.0: PCI INT A -> Link[LNKA] -> GSI 18 (level, low) -> IRQ 18 [ 5.687950] eth1: IC Plus Corporation IP100A FAST Ethernet Adapter at 000000000001dc00, 00:06:4f:5a:dd:a8, IRQ 18. [ 5.688475] eth1: MII PHY found at address 0, status 0x786d advertising 01e1. [ 6.264107] udev: renamed network interface eth0 to eth1 [ 6.269061] udev: renamed network interface eth1_rename to eth0 [ 6.456110] fuse init (API version 7.16) [ 7.179301] ppdev: user-space parallel port driver [ 7.188039] lp: driver loaded but no devices found [ 7.255209] Linux agpgart interface v0.103 [ 13.803413] forcedeth 0000:00:0a.0: irq 42 for MSI/MSI-X [ 13.803605] forcedeth 0000:00:0a.0: eth1: no link during initialization [ 32.650457] NET: Registered protocol family 10 [ 32.650786] ADDRCONF(NETDEV_UP): eth1: link is not ready [ 102.784010] eth0: Link up [ 102.784010] eth0: Link changed: 100Mbps, full duplex [ 113.378028] eth0: no IPv6 routers present [ 249.212071] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen [ 249.212074] ata5: irq_stat 0x00000040, connection status changed [ 249.212076] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } [ 249.212082] ata5: hard resetting link [ 249.935026] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 249.935728] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133 [ 249.935731] ata5.00: 234441648 sectors, multi 0: LBA48 [ 249.936483] ata5.00: configured for UDMA/133 [ 249.936487] ata5: EH complete [ 249.936560] scsi 4:0:0:0: Direct-Access ATA WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5 [ 249.936739] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 249.936806] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB) [ 249.936856] sd 4:0:0:0: [sdc] Write Protect is off [ 249.936858] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 249.936872] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 249.942388] sdc: sdc1 [ 249.942539] sd 4:0:0:0: [sdc] Attached SCSI disk [ 257.908844] md127: unknown partition table [ 1800.643750] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen [ 1800.643753] ata5: irq_stat 0x00400000, PHY RDY changed [ 1800.643756] ata5: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns } [ 1800.643760] ata5: hard resetting link [ 1801.366025] ata5: SATA link down (SStatus 0 SControl 300) [ 1806.366068] ata5: hard resetting link [ 1806.671023] ata5: SATA link down (SStatus 0 SControl 300) [ 1806.671027] ata5: limiting SATA link speed to 1.5 Gbps [ 1811.671041] ata5: hard resetting link [ 1811.976031] ata5: SATA link down (SStatus 0 SControl 310) [ 1811.976037] ata5.00: disabled [ 1811.976046] ata5: EH complete [ 1811.976054] ata5.00: detaching (SCSI 4:0:0:0) [ 1811.979104] sd 4:0:0:0: [sdc] Synchronizing SCSI cache [ 1811.979131] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 1811.979134] sd 4:0:0:0: [sdc] Stopping disk [ 1811.979140] sd 4:0:0:0: [sdc] START_STOP FAILED [ 1811.979141] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 1837.863267] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen [ 1837.863270] ata5: irq_stat 0x00000040, connection status changed [ 1837.863273] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } [ 1837.863278] ata5: limiting SATA link speed to 1.5 Gbps [ 1837.863280] ata5: hard resetting link [ 1838.586032] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 1838.586760] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133 [ 1838.586762] ata5.00: 234441648 sectors, multi 0: LBA48 [ 1838.587523] ata5.00: configured for UDMA/133 [ 1838.587530] ata5: EH complete [ 1838.587620] scsi 4:0:0:0: Direct-Access ATA WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5 [ 1838.587782] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB) [ 1838.587814] sd 4:0:0:0: [sdc] Write Protect is off [ 1838.587817] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 1838.587830] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 1838.587848] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 1838.594879] sdc: sdc1 [ 1838.595048] sd 4:0:0:0: [sdc] Attached SCSI disk [ 1869.230327] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen [ 1869.230330] ata5: irq_stat 0x00400000, PHY RDY changed [ 1869.230333] ata5: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns } [ 1869.230337] ata5: hard resetting link [ 1869.953031] ata5: SATA link down (SStatus 0 SControl 310) [ 1874.953038] ata5: hard resetting link [ 1875.258032] ata5: SATA link down (SStatus 0 SControl 310) [ 1880.258169] ata5: hard resetting link [ 1880.563033] ata5: SATA link down (SStatus 0 SControl 310) [ 1880.563039] ata5.00: disabled [ 1880.563043] ata5: limiting SATA link speed to 1.5 Gbps [ 1880.563051] ata5: EH complete [ 1880.563060] ata5.00: detaching (SCSI 4:0:0:0) [ 1880.566103] sd 4:0:0:0: [sdc] Synchronizing SCSI cache [ 1880.566132] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 1880.566135] sd 4:0:0:0: [sdc] Stopping disk [ 1880.566140] sd 4:0:0:0: [sdc] START_STOP FAILED [ 1880.566141] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 1894.012105] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen [ 1894.012107] ata5: irq_stat 0x00000040, connection status changed [ 1894.012110] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } [ 1894.012115] ata5: limiting SATA link speed to 1.5 Gbps [ 1894.012118] ata5: hard resetting link [ 1894.735030] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 1894.735754] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133 [ 1894.735757] ata5.00: 234441648 sectors, multi 0: LBA48 [ 1894.736511] ata5.00: configured for UDMA/133 [ 1894.736517] ata5: EH complete [ 1894.736595] scsi 4:0:0:0: Direct-Access ATA WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5 [ 1894.736743] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB) [ 1894.736775] sd 4:0:0:0: [sdc] Write Protect is off [ 1894.736777] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 1894.736791] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 1894.736818] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 1894.743510] sdc: sdc1 [ 1894.743664] sd 4:0:0:0: [sdc] Attached SCSI disk [ 2143.322966] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen [ 2143.322968] ata5: irq_stat 0x00400000, PHY RDY changed [ 2143.322971] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns } [ 2143.322975] ata5: hard resetting link [ 2144.045033] ata5: SATA link down (SStatus 0 SControl 310) [ 2149.045040] ata5: hard resetting link [ 2149.350032] ata5: SATA link down (SStatus 0 SControl 310) [ 2154.350027] ata5: hard resetting link [ 2154.655024] ata5: SATA link down (SStatus 0 SControl 310) [ 2154.655030] ata5.00: disabled [ 2154.655041] ata5: EH complete [ 2154.655049] ata5.00: detaching (SCSI 4:0:0:0) [ 2154.664101] sd 4:0:0:0: [sdc] Synchronizing SCSI cache [ 2154.664353] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 2154.664356] sd 4:0:0:0: [sdc] Stopping disk [ 2154.664361] sd 4:0:0:0: [sdc] START_STOP FAILED [ 2154.664362] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 2482.520422] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5940000 action 0xe frozen [ 2482.520425] ata5: irq_stat 0x00000040, connection status changed [ 2482.520428] ata5: SError: { CommWake Dispar LinkSeq TrStaTrns DevExch } [ 2482.520433] ata5: hard resetting link [ 2483.243032] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 2483.261208] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133 [ 2483.261212] ata5.00: 3907029168 sectors, multi 0: LBA48 [ 2483.264916] ata5.00: configured for UDMA/133 [ 2483.264923] ata5: EH complete [ 2483.265035] scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5 [ 2483.265212] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 2483.265231] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) [ 2483.265285] sd 4:0:0:0: [sdc] Write Protect is off [ 2483.265287] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 2483.265310] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 2483.278936] sdc: sdc1 [ 2483.279541] sd 4:0:0:0: [sdc] Attached SCSI disk [ 2498.115437] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen [ 2498.115440] ata5: irq_stat 0x00400000, PHY RDY changed [ 2498.115442] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns } [ 2498.115446] ata5: hard resetting link [ 2498.838031] ata5: SATA link down (SStatus 0 SControl 300) [ 2503.838046] ata5: hard resetting link [ 2504.143032] ata5: SATA link down (SStatus 0 SControl 300) [ 2504.143038] ata5: limiting SATA link speed to 1.5 Gbps [ 2509.143048] ata5: hard resetting link [ 2509.448031] ata5: SATA link down (SStatus 0 SControl 310) [ 2509.448037] ata5.00: disabled [ 2509.448042] ata5: limiting SATA link speed to 1.5 Gbps [ 2509.448050] ata5: EH complete [ 2509.448058] ata5.00: detaching (SCSI 4:0:0:0) [ 2509.451100] sd 4:0:0:0: [sdc] Synchronizing SCSI cache [ 2509.451130] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 2509.451133] sd 4:0:0:0: [sdc] Stopping disk [ 2509.451139] sd 4:0:0:0: [sdc] START_STOP FAILED [ 2509.451140] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 2528.851233] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen [ 2528.851236] ata5: irq_stat 0x00000040, connection status changed [ 2528.851239] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } [ 2528.851244] ata5: limiting SATA link speed to 1.5 Gbps [ 2528.851247] ata5: hard resetting link [ 2529.574031] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 2529.587720] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133 [ 2529.587723] ata5.00: 3907029168 sectors, multi 0: LBA48 [ 2529.591414] ata5.00: configured for UDMA/133 [ 2529.591420] ata5: EH complete [ 2529.591515] scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5 [ 2529.591684] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 2529.591750] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) [ 2529.591782] sd 4:0:0:0: [sdc] Write Protect is off [ 2529.591784] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 2529.591798] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 2529.605472] sdc: sdc1 [ 2529.605676] sd 4:0:0:0: [sdc] Attached SCSI disk [ 2565.891518] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen [ 2565.891521] ata5: irq_stat 0x00400000, PHY RDY changed [ 2565.891523] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns } [ 2565.891527] ata5: hard resetting link [ 2566.614034] ata5: SATA link down (SStatus 0 SControl 310) [ 2571.614070] ata5: hard resetting link [ 2571.919032] ata5: SATA link down (SStatus 0 SControl 310) [ 2576.919039] ata5: hard resetting link [ 2577.224030] ata5: SATA link down (SStatus 0 SControl 310) [ 2577.224037] ata5.00: disabled [ 2577.224041] ata5: limiting SATA link speed to 1.5 Gbps [ 2577.224049] ata5: EH complete [ 2577.224057] ata5.00: detaching (SCSI 4:0:0:0) [ 2577.228101] sd 4:0:0:0: [sdc] Synchronizing SCSI cache [ 2577.228133] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 2577.228136] sd 4:0:0:0: [sdc] Stopping disk [ 2577.228141] sd 4:0:0:0: [sdc] START_STOP FAILED [ 2577.228142] sd 4:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 [ 2598.347557] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5940000 action 0xe frozen [ 2598.347560] ata5: irq_stat 0x00000040, connection status changed [ 2598.347563] ata5: SError: { CommWake Dispar LinkSeq TrStaTrns DevExch } [ 2598.347567] ata5: limiting SATA link speed to 1.5 Gbps [ 2598.347570] ata5: hard resetting link [ 2599.070033] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 2599.088528] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133 [ 2599.088532] ata5.00: 3907029168 sectors, multi 0: LBA48 [ 2599.092230] ata5.00: configured for UDMA/133 [ 2599.092237] ata5: EH complete [ 2599.092324] scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5 [ 2599.092499] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 2599.092651] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) [ 2599.092685] sd 4:0:0:0: [sdc] Write Protect is off [ 2599.092687] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 2599.092701] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 2599.106280] sdc: sdc1 [ 2599.106474] sd 4:0:0:0: [sdc] Attached SCSI disk
Correction: * The WD1200JS is actually a SATA 3Gbps drive. (Non-NCQ).
Issue: SATA Hot-Swap either Fails or Partially Succeeds (non-optimal), based on hardware combination. * Results in non-optimal drive setup for INTEL for all tested disks * Results in non-functional port after hot-swap for NVIDIA for certain disks AND non-optimal disk setup for the rest of the working tested disks. WORKAROUND: * Manually rescan the port, but no disks must be powered on that port (EMPTY PORT). # echo "- - - " > /sys/class/scsi_host/host4/scan Affected Kernels: * F13 / F14 / F15 (2.6.33*, 2.6.34*) * Sysrescuecd (sysresccd 2.6.35-std211-i586) * Slackware64 13.1 (2.6.33*) * Stock 2.6.38.5 downloaded & self-compiled from kernel.org (with userland Slackware64 13.1) Affected SATA Controllers: * RAID bus controller [0104]: Intel Corporation Mobile 82801 SATA RAID Controller [8086:282a] (rev 05) * SATA controller [0106]: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2) Hardware: * For NVIDIA: Asus M4N72-E * For INTEL: Dell Lattitude 4310 Explanation: [1] The first time an enclosure is powered on, it is recognized correctly and configuted correctly. Drives that support 3Gbps speeds are configured correctly. [1] INTEL (enclosure1 power-on): May 15 13:39:14 sysresccd kernel: [ 1258.830169] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen May 15 13:39:14 sysresccd kernel: [ 1258.830173] ata5: irq_stat 0x00400040, connection status changed May 15 13:39:14 sysresccd kernel: [ 1258.830178] ata5: SError: { PHYRdyChg CommWake DevExch } May 15 13:39:14 sysresccd kernel: [ 1258.830190] ata5: hard resetting link May 15 13:39:14 sysresccd kernel: [ 1259.552436] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 15 13:39:14 sysresccd kernel: [ 1259.565598] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133 May 15 13:39:14 sysresccd kernel: [ 1259.565603] ata5.00: 3907029168 sectors, multi 0: LBA48 May 15 13:39:14 sysresccd kernel: [ 1259.569585] ata5.00: configured for UDMA/133 May 15 13:39:14 sysresccd kernel: [ 1259.569594] ata5: EH complete May 15 13:39:14 sysresccd kernel: [ 1259.569781] scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5 May 15 13:39:14 sysresccd kernel: [ 1259.570053] sd 4:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) May 15 13:39:14 sysresccd kernel: [ 1259.570065] sd 4:0:0:0: Attached scsi generic sg2 type 0 May 15 13:39:14 sysresccd kernel: [ 1259.570126] sd 4:0:0:0: [sdb] Write Protect is off May 15 13:39:14 sysresccd kernel: [ 1259.570129] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 May 15 13:39:14 sysresccd kernel: [ 1259.570158] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 15 13:39:14 sysresccd kernel: [ 1259.570313] sdb: sdb1 May 15 13:39:14 sysresccd kernel: [ 1259.583671] sd 4:0:0:0: [sdb] Attached SCSI disk [1] NVIDIA (enclosure1 power-on): May 24 21:56:26 sysresccd kernel: [ 193.399937] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen May 24 21:56:26 sysresccd kernel: [ 193.399940] ata5: irq_stat 0x00000040, connection status changed May 24 21:56:26 sysresccd kernel: [ 193.399943] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } May 24 21:56:26 sysresccd kernel: [ 193.399949] ata5: hard resetting link May 24 21:56:26 sysresccd kernel: [ 194.122024] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) May 24 21:56:26 sysresccd kernel: [ 194.122725] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133 May 24 21:56:26 sysresccd kernel: [ 194.122728] ata5.00: 234441648 sectors, multi 0: LBA48 May 24 21:56:26 sysresccd kernel: [ 194.123505] ata5.00: configured for UDMA/133 May 24 21:56:26 sysresccd kernel: [ 194.123512] ata5: EH complete May 24 21:56:26 sysresccd kernel: [ 194.123607] scsi 4:0:0:0: Direct-Access ATA WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5 May 24 21:56:26 sysresccd kernel: [ 194.123746] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB) May 24 21:56:26 sysresccd kernel: [ 194.123751] sd 4:0:0:0: Attached scsi generic sg3 type 0 May 24 21:56:26 sysresccd kernel: [ 194.123778] sd 4:0:0:0: [sdc] Write Protect is off May 24 21:56:26 sysresccd kernel: [ 194.123781] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 May 24 21:56:26 sysresccd kernel: [ 194.123795] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 24 21:56:26 sysresccd kernel: [ 194.123941] sdc: sdc1 May 24 21:56:26 sysresccd kernel: [ 194.128271] sd 4:0:0:0: [sdc] Attached SCSI disk [2] That enclosure (now named enclosure1) is powered-off: [2] INTEL (enclosure1 powered-off): May 15 13:40:04 sysresccd kernel: [ 1309.637238] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen May 15 13:40:04 sysresccd kernel: [ 1309.637242] ata5: irq_stat 0x00400040, connection status changed May 15 13:40:04 sysresccd kernel: [ 1309.637248] ata5: SError: { PHYRdyChg 10B8B DevExch } May 15 13:40:04 sysresccd kernel: [ 1309.637257] ata5: hard resetting link May 15 13:40:05 sysresccd kernel: [ 1310.360284] ata5: SATA link down (SStatus 0 SControl 300) May 15 13:40:10 sysresccd kernel: [ 1315.360305] ata5: hard resetting link May 15 13:40:10 sysresccd kernel: [ 1315.665470] ata5: SATA link down (SStatus 0 SControl 300) May 15 13:40:10 sysresccd kernel: [ 1315.665483] ata5: limiting SATA link speed to 1.5 Gbps May 15 13:40:15 sysresccd kernel: [ 1320.665357] ata5: hard resetting link May 15 13:40:16 sysresccd kernel: [ 1320.970264] ata5: SATA link down (SStatus 0 SControl 310) May 15 13:40:16 sysresccd kernel: [ 1320.970275] ata5.00: disabled May 15 13:40:16 sysresccd kernel: [ 1320.970290] ata5: EH complete May 15 13:40:16 sysresccd kernel: [ 1320.970302] ata5.00: detaching (SCSI 4:0:0:0) May 15 13:40:16 sysresccd kernel: [ 1320.974419] sd 4:0:0:0: [sdb] Synchronizing SCSI cache May 15 13:40:16 sysresccd kernel: [ 1320.974489] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK May 15 13:40:16 sysresccd kernel: [ 1320.974496] sd 4:0:0:0: [sdb] Stopping disk May 15 13:40:16 sysresccd kernel: [ 1320.974515] sd 4:0:0:0: [sdb] START_STOP FAILED May 15 13:40:16 sysresccd kernel: [ 1320.974519] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [2] NVIDIA (enclosure1 powered-off): May 24 21:56:50 sysresccd kernel: [ 217.598434] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen May 24 21:56:50 sysresccd kernel: [ 217.598437] ata5: irq_stat 0x00400000, PHY RDY changed May 24 21:56:50 sysresccd kernel: [ 217.598439] ata5: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns } May 24 21:56:50 sysresccd kernel: [ 217.598444] ata5: hard resetting link May 24 21:56:51 sysresccd kernel: [ 218.321024] ata5: SATA link down (SStatus 0 SControl 300) May 24 21:56:56 sysresccd kernel: [ 223.321017] ata5: hard resetting link May 24 21:56:56 sysresccd kernel: [ 223.626024] ata5: SATA link down (SStatus 0 SControl 300) May 24 21:56:56 sysresccd kernel: [ 223.626029] ata5: limiting SATA link speed to 1.5 Gbps May 24 21:57:01 sysresccd kernel: [ 228.626027] ata5: hard resetting link May 24 21:57:01 sysresccd kernel: [ 228.931024] ata5: SATA link down (SStatus 0 SControl 310) May 24 21:57:01 sysresccd kernel: [ 228.931028] ata5.00: disabled May 24 21:57:01 sysresccd kernel: [ 228.931043] ata5: EH complete May 24 21:57:01 sysresccd kernel: [ 228.931054] ata5.00: detaching (SCSI 4:0:0:0) May 24 21:57:01 sysresccd kernel: [ 228.935104] sd 4:0:0:0: [sdc] Synchronizing SCSI cache May 24 21:57:01 sysresccd kernel: [ 228.935154] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK May 24 21:57:01 sysresccd kernel: [ 228.935158] sd 4:0:0:0: [sdc] Stopping disk May 24 21:57:01 sysresccd kernel: [ 228.935165] sd 4:0:0:0: [sdc] START_STOP FAILED May 24 21:57:01 sysresccd kernel: [ 228.935174] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [3] Enlcosure2 is powered-on in the same port as enclosure1 (now removed). Please notice that both drives came up with lower link speed. Please also note that WD20EARS in intel worked at 1.5Gbps WITHOUT a jumper and failed in nvidia UNLESS JUMPERED. (See previous posts) [3] INTEL (Power-on enclosure2): May 15 13:41:38 sysresccd kernel: [ 1403.306041] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen May 15 13:41:38 sysresccd kernel: [ 1403.306045] ata5: irq_stat 0x00400040, connection status changed May 15 13:41:38 sysresccd kernel: [ 1403.306050] ata5: SError: { RecovComm PHYRdyChg CommWake DevExch } May 15 13:41:38 sysresccd kernel: [ 1403.306061] ata5: hard resetting link May 15 13:41:39 sysresccd kernel: [ 1404.028172] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) May 15 13:41:39 sysresccd kernel: [ 1404.058974] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 51.0AB51, max UDMA/133 May 15 13:41:39 sysresccd kernel: [ 1404.058979] ata5.00: 3907029168 sectors, multi 0: LBA48 May 15 13:41:39 sysresccd kernel: [ 1404.063161] ata5.00: configured for UDMA/133 May 15 13:41:39 sysresccd kernel: [ 1404.063170] ata5: EH complete May 15 13:41:39 sysresccd kernel: [ 1404.063311] scsi 4:0:0:0: Direct-Access ATA WDC WD20EARS-00M 51.0 PQ: 0 ANSI: 5 May 15 13:41:39 sysresccd kernel: [ 1404.063513] sd 4:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) May 15 13:41:39 sysresccd kernel: [ 1404.063531] sd 4:0:0:0: Attached scsi generic sg2 type 0 May 15 13:41:39 sysresccd kernel: [ 1404.063557] sd 4:0:0:0: [sdb] Write Protect is off May 15 13:41:39 sysresccd kernel: [ 1404.063559] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00 May 15 13:41:39 sysresccd kernel: [ 1404.063578] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 15 13:41:39 sysresccd kernel: [ 1404.063741] sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7 May 15 13:41:39 sysresccd kernel: [ 1404.113474] sd 4:0:0:0: [sdb] Attached SCSI disk [3] NVIDIA (Power-on enclosure2): May 24 21:57:20 sysresccd kernel: [ 247.520154] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen May 24 21:57:20 sysresccd kernel: [ 247.520156] ata5: irq_stat 0x00000040, connection status changed May 24 21:57:20 sysresccd kernel: [ 247.520159] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch } May 24 21:57:20 sysresccd kernel: [ 247.520165] ata5: hard resetting link May 24 21:57:21 sysresccd kernel: [ 248.243024] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300) May 24 21:57:21 sysresccd kernel: [ 248.243729] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133 May 24 21:57:21 sysresccd kernel: [ 248.243732] ata5.00: 234441648 sectors, multi 0: LBA48 May 24 21:57:21 sysresccd kernel: [ 248.244530] ata5.00: configured for UDMA/133 May 24 21:57:21 sysresccd kernel: [ 248.244536] ata5: EH complete May 24 21:57:21 sysresccd kernel: [ 248.244624] scsi 4:0:0:0: Direct-Access ATA WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5 May 24 21:57:21 sysresccd kernel: [ 248.244748] sd 4:0:0:0: Attached scsi generic sg3 type 0 May 24 21:57:21 sysresccd kernel: [ 248.244771] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB) May 24 21:57:21 sysresccd kernel: [ 248.244805] sd 4:0:0:0: [sdc] Write Protect is off May 24 21:57:21 sysresccd kernel: [ 248.244808] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00 May 24 21:57:21 sysresccd kernel: [ 248.244822] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 24 21:57:21 sysresccd kernel: [ 248.245445] sdc: sdc1 May 24 21:57:21 sysresccd kernel: [ 248.248935] sd 4:0:0:0: [sdc] Attached SCSI disk Enclosure: 2 x NexStar CX (USB+eSATA) Tested with these disks: 1 x WD20EARS-00MVWB0 (50.0AB50) 2.0TB 3Gbps NCQ 1 x WD20EARS-00MVWB0 (51.0AB51) 2.0TB 3Gbps NCQ 1 x WD1200JS-00MHB0 (02.01C03) 120GB 3Gbps N/A 1 x WD5000AAKS-00A7B2 (01.03B01) 500GB 3Gbps NCQ
Looks more and more like the enclosure has some oddity in link negotiation. Please take the enclosure out of the equation and see whether the problem persists. Thanks.
Created attachment 59412 [details] manual hot-swap without any enclosure I performed a disk hot-swap by hand. I put the disk onto O'Reilly "BIND & DNS" and put some DVD boxes and did the following: For the WD20EARS: * I booted with only that drive attached. > The drive booted at 3Gbps. * Disconnected the data cable. * Reconnected the data cable. > The WD20EARS was configured at 1.5Gbps (the enclosure failed to do that with the nvidia controller but succeeded with the intel). * Disconnected the power cable. * Reconnected the power cable. > The WD20EARS was able to get configured at 3Gbps with some errors. (!) * Disconnected the power cable and the data cable. Now with the WD5000AAKS drive: * Connected data and power cable. > The WD5000AAKS gets configured at 1.5Gbps. * Disconnected the power cable. * Performed a manual port rescan * Reconnected the power cable. > The WD5000AAKS gets configured at 3Gbps. --- I will try the patch.
So, it's working as expected. If the power is maintained to the drive, the drive seems to remember the last negotiated speed, nothing too strange there and the patch wouldn't make any difference. So, the enclosure, whatever chipset it may be using, is the cause of the problem. It is incomplete but with the patch just replugging the drive makes it work, so that might be the work around we'll have to accept for now. Thanks.
The correct workaround seems to jumper any drive with any enclosure to 1.5Gbps for the broader compatibility, just in case i plug the drive in any random computer. > So, the enclosure, whatever chipset it may be using, is the cause of the > problem. As for the enclosure chipset, i could not found which one it uses since it is not listed at the vantec site. (Model is NST-300SU-BK). As for the behaviour of the nvidia controller, it is slighty better but not ok at all. >> For the WD20EARS: >> * I booted with only that drive attached. >> > The drive booted at 3Gbps. >> * Disconnected the data cable. >> * Reconnected the data cable. >> > The WD20EARS was configured at 1.5Gbps (the enclosure failed to do that >> with the nvidia controller but succeeded with the intel). Granted, as i said, the enclosure adds noise and worsens the problem. But the drive should have used 3Gbps optimally and there is no enclosure here. >> * Disconnected the power cable. >> * Reconnected the power cable. >> > The WD20EARS was able to get configured at 3Gbps with some errors. (!) >> * Disconnected the power cable and the data cable. I only tried 2 times, i may need to try more to see how really lucky i was. >> Now with the WD5000AAKS drive: >> * Connected data and power cable. >> > The WD5000AAKS gets configured at 1.5Gbps. Again, this drive must have used 3Gbps instead of 1.5Gbps. >> * Disconnected the power cable. >> * Performed a manual port rescan >> * Reconnected the power cable. >> > The WD5000AAKS gets configured at 3Gbps. The manual port rescans fixes the problem, since no enclosure was used here it must be the controller (who knows, it may be m4n72-e specific or just bad luck picking this particular motherboard). I will perform more specific test in the weekend. I will boot the sysrescue on this mobo and try all ports and all drives, now including the blacks main disks. Also i will try with an older computer which uses sata_nv. And with and without the enclosure. As for the intel controller, the behaviour is not ok also but since i can only test stuff with the enclosure, a third(sata_nv) or fourth(have to find one) controller test is needed. Rigth now i am compiling a bugreport list from major distros for analysis, it will take some time to me to read them all. I will also test to wich extent does this particular enclosure handles the other dma/udma modes, and i expect it to not support PIO but i will test it because these are part of the SATA EH routines, if i am not wrong.
A patch referencing this bug report has been merged in Linux v3.1-rc1: commit 8ea7645c5a949f9d0ea86edc0778713b5e63ab74 Author: Tejun Heo <tj@kernel.org> Date: Wed May 25 12:23:56 2011 +0200 libata: leave port thawed after reset failure