Bug 34712

Summary: sata issue with nvidia / intel ahci (workaround = manually rescan port) hot swap does not work properly
Product: IO/Storage Reporter: Reartes Guillermo (rtguille)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: CLOSED CODE_FIX    
Severity: low CC: alan, florian, tj
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: ata-reset-fix.patch
stock kernel, unpatched & self-compiled
patched kernel, self-compiled
sata test (stock)
sata test (patch)
new test with parameter libata.force=5:dump_id
thaw-reset-failed-ports.patch
2.6.38.5 with thaw-reset-failed-ports.patch
manual hot-swap without any enclosure

Description Reartes Guillermo 2011-05-08 20:48:17 UTC
I have two  WD WD20EARS Green Adv. Fmt. drives
Each of them has it's own NexStar CX enclosure (USB/eSATA).
Each drive & enclosure works ok on it own. They can be plugged, are recognized, are mountable, are u-mountable, no problems of any kind.

Problem:
I cannot hot-swap any of the disks with the other one.
If i power off an enclosure and replace it with another one, i get:

May  8 17:12:42 ulquiorra kernel: [ 9913.167921] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
May  8 17:12:42 ulquiorra kernel: [ 9913.167932] ata5: irq_stat 0x00000040, connection status changed
May  8 17:12:42 ulquiorra kernel: [ 9913.167942] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
May  8 17:12:42 ulquiorra kernel: [ 9913.167958] ata5: limiting SATA link speed to 1.5 Gbps
May  8 17:12:42 ulquiorra kernel: [ 9913.167966] ata5: hard resetting link
May  8 17:12:48 ulquiorra kernel: [ 9918.940037] ata5: link is slow to respond, please be patient (ready=0)
May  8 17:12:52 ulquiorra kernel: [ 9923.173258] ata5: COMRESET failed (errno=-16)
May  8 17:12:52 ulquiorra kernel: [ 9923.173271] ata5: hard resetting link
May  8 17:12:58 ulquiorra kernel: [ 9928.945419] ata5: link is slow to respond, please be patient (ready=0)
May  8 17:13:02 ulquiorra kernel: [ 9933.183068] ata5: COMRESET failed (errno=-16)
May  8 17:13:02 ulquiorra kernel: [ 9933.183081] ata5: hard resetting link
May  8 17:13:08 ulquiorra kernel: [ 9938.909277] ata5: link is slow to respond, please be patient (ready=0)
May  8 17:13:37 ulquiorra kernel: [ 9968.201074] ata5: COMRESET failed (errno=-16)
May  8 17:13:37 ulquiorra kernel: [ 9968.201089] ata5: hard resetting link
May  8 17:13:38 ulquiorra kernel: [ 9968.924046] ata5: SATA link down (SStatus 0 SControl 310)
May  8 17:13:38 ulquiorra kernel: [ 9968.924069] ata5: EH complete

Workaround:

* power on, mount, use, umount the first enclosure.
* power off the first enclosure (for example port ATA5) 

* issue '# echo "- - - " > /sys/class/scsi_host/host4/scan'

* power on the second enclosure (also port ATA5), it will be detected correclty.

NOTE:
The first time the port allways detects any disk correctly.
If the port is not rescanned manually without drives attached, it is unusable.

----

Sata Controller:
* nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2)

OS:      Fedora 15 beta

kernel:  2.6.38.5-24.fc15.x86_64

HD1: WDC WD20EARS-00MVWB0, 50.0AB50
HD2: WDC WD20EARS-00MVWB0, 51.0AB51
Comment 1 Tejun Heo 2011-05-09 08:46:04 UTC
Created attachment 56982 [details]
ata-reset-fix.patch

Does this patch make any difference?  Also, can you please post full kernel log including boot and detection failure messages preferably with printk timestamp turned on?

Thanks.
Comment 2 Reartes Guillermo 2011-05-12 02:19:14 UTC
I experienced the issue in
* F15 (2.6.38.x)
* F13 (2.6.34.x) 
* Slackware64 13.1 (different messages but same results in slack).

So i will first update Slack64 to 2.6.38.5 vanilla,then after reproducing the problem i will try to patch it and see. If that fails i will try with fedora.
Comment 3 Tejun Heo 2011-05-12 08:24:18 UTC
Probably different distros wouldn't affect the problem too much.  I would suggest just sticking with the same distro but rolling your own vanilla kernel for testing.  Thanks.
Comment 4 Reartes Guillermo 2011-05-15 20:56:47 UTC
For me, compiling in Slackware is more easy and faster (and maybe fun)

Well, i tried this:

* downloaded kernel 2.6.38.5 from kernel.org. 

And setup two different directories for each kernel (just in case).
 */usr/src/2.6.38.5-stock (using 2.6.38.5-stock.conf)
 */usr/src/2.6.38.5-patch (using 2.6.38.5-patch.conf)

I will upload these after posting this.
These are self-compiled kernels.

I was able to reproduce the problem with the 'stock' kernel, so it is not fedora specific, so i will try the patch.

Sadly the proposed patch in comment#1 did not change things.
---

OS: Slackware64 13.1
Kernel: 2.6.38.5stock-0 (no patch applied)
-------------------------------------------------------

Power-ON ENC#2:

May 15 17:09:56 shadow kernel: ata5: hard resetting link
May 15 17:09:57 shadow kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 15 17:09:57 shadow kernel: ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 51.0AB51, max UDMA/133
May 15 17:09:57 shadow kernel: ata5.00: 3907029168 sectors, multi 0: LBA48 
May 15 17:09:57 shadow kernel: ata5.00: configured for UDMA/133
May 15 17:09:57 shadow kernel: ata5: EH complete
May 15 17:09:57 shadow kernel: scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 51.0 PQ: 0 ANSI: 5
May 15 17:09:57 shadow kernel: sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
May 15 17:09:57 shadow kernel: sd 4:0:0:0: Attached scsi generic sg3 type 0
May 15 17:09:57 shadow kernel: sd 4:0:0:0: [sdc] Write Protect is off
May 15 17:09:57 shadow kernel: sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 15 17:09:57 shadow kernel:  sdc: sdc1 sdc2 sdc3 sdc4 sdc5 sdc6 sdc7
May 15 17:09:57 shadow kernel: sd 4:0:0:0: [sdc] Attached SCSI disk

Power-OFF ENC#2:

May 15 17:10:38 shadow kernel: ata5: hard resetting link
May 15 17:10:39 shadow kernel: ata5: SATA link down (SStatus 0 SControl 300)
May 15 17:10:44 shadow kernel: ata5: hard resetting link
May 15 17:10:44 shadow kernel: ata5: SATA link down (SStatus 0 SControl 300)
May 15 17:10:49 shadow kernel: ata5: hard resetting link
May 15 17:10:49 shadow kernel: ata5: SATA link down (SStatus 0 SControl 310)
May 15 17:10:49 shadow kernel: ata5: EH complete
May 15 17:10:49 shadow kernel: ata5.00: detaching (SCSI 4:0:0:0)
May 15 17:10:49 shadow kernel: sd 4:0:0:0: [sdc] Synchronizing SCSI cache
May 15 17:10:49 shadow kernel: sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
May 15 17:10:49 shadow kernel: sd 4:0:0:0: [sdc] Stopping disk                                                                      
May 15 17:10:49 shadow kernel: sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00

Relpace ENC#2 with ENC#1:

Power-ON ENC#1:

May 15 17:12:24 shadow kernel: ata5: hard resetting link                                                                            
May 15 17:12:44 shadow last message repeated 2 times                                                                                
May 15 17:13:19 shadow kernel: ata5: hard resetting link                                                                            
May 15 17:13:24 shadow kernel: ata5: EH complete

Power-OFF ENC#1: (After led flashing for more than a minute):

[no messages]

REBOOT.

OS: Slackware64 13.1
Kernel: 2.6.38.5patch-esata-1 (patch applied)
-------------------------------------------------------

Power-ON ENC#2:

May 15 17:21:44 shadow kernel: ata5: hard resetting link
May 15 17:21:45 shadow kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 15 17:21:45 shadow kernel: ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 51.0AB51, max UDMA/133
May 15 17:21:45 shadow kernel: ata5.00: 3907029168 sectors, multi 0: LBA48 
May 15 17:21:45 shadow kernel: ata5.00: configured for UDMA/133
May 15 17:21:45 shadow kernel: ata5: EH complete
May 15 17:21:45 shadow kernel: scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 51.0 PQ: 0 ANSI: 5
May 15 17:21:45 shadow kernel: sd 4:0:0:0: Attached scsi generic sg3 type 0
May 15 17:21:45 shadow kernel: sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
May 15 17:21:45 shadow kernel: sd 4:0:0:0: [sdc] Write Protect is off
May 15 17:21:45 shadow kernel: sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 15 17:21:45 shadow kernel:  sdc: sdc1 sdc2 sdc3 sdc4 sdc5 sdc6 sdc7
May 15 17:21:45 shadow kernel: sd 4:0:0:0: [sdc] Attached SCSI disk

Power-OFF ENC#2:

May 15 17:22:23 shadow kernel: ata5: hard resetting link
May 15 17:22:24 shadow kernel: ata5: SATA link down (SStatus 0 SControl 300)                                                        
May 15 17:22:29 shadow kernel: ata5: hard resetting link                                                                            
May 15 17:22:29 shadow kernel: ata5: SATA link down (SStatus 0 SControl 300)                                                        
May 15 17:22:34 shadow kernel: ata5: hard resetting link                                                                            
May 15 17:22:34 shadow kernel: ata5: SATA link down (SStatus 0 SControl 310)                                                        
May 15 17:22:34 shadow kernel: ata5: EH complete                                                                                    
May 15 17:22:34 shadow kernel: ata5.00: detaching (SCSI 4:0:0:0)                                                                    
May 15 17:22:34 shadow kernel: sd 4:0:0:0: [sdc] Synchronizing SCSI cache                                                           
May 15 17:22:34 shadow kernel: sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
May 15 17:22:34 shadow kernel: sd 4:0:0:0: [sdc] Stopping disk
May 15 17:22:34 shadow kernel: sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00

Relpace ENC#2 with ENC#1:

Power-ON ENC#1:

May 15 17:23:50 shadow kernel: ata5: hard resetting link
May 15 17:24:10 shadow last message repeated 2 times
May 15 17:24:45 shadow kernel: ata5: hard resetting link
May 15 17:24:50 shadow kernel: ata5: EH complete


Power-OFF ENC#1: (After led flashing for more than a minute):

[no messages]
Comment 5 Reartes Guillermo 2011-05-15 20:57:45 UTC
Created attachment 57932 [details]
stock kernel, unpatched & self-compiled
Comment 6 Reartes Guillermo 2011-05-15 20:58:44 UTC
Created attachment 57942 [details]
patched kernel, self-compiled
Comment 7 Tejun Heo 2011-05-19 13:46:42 UTC
Sorry about the delay.  Can you please attach the output of "dmesg" after triggering the error?  It seems all KERN_DEBUG messages are missing.

Thanks.
Comment 8 Reartes Guillermo 2011-05-19 20:44:39 UTC
Created attachment 58642 [details]
sata test (stock)

I repeated the same previous test, for stock, then for patch:

dmesg_pre (after boot)
dmesg_enc2 (power-on enclosure #2)
dmesg_enc2_off (power-off enclosure #2)
dmesg_enc1_off (power- on enclosure #1, wait and power-off enclosure #1)

also included messages. (there are 8 files per archive)
Comment 9 Reartes Guillermo 2011-05-19 20:46:15 UTC
Created attachment 58652 [details]
sata test (patch)

These are the patched dmesg / messages.

Both attachments are *.tar.gz with 8 logfiles each.
Comment 10 Tejun Heo 2011-05-20 10:04:38 UTC
So, the problem basically comes down to the following reset failure.

 ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
 ata5: irq_stat 0x00000040, connection status changed
 ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
 ata5: hard resetting link
 ata5: link is slow to respond, please be patient (ready=0)
 ata5: COMRESET failed (errno=-16)
 ata5: hard resetting link
 ata5: link is slow to respond, please be patient (ready=0)
 ata5: COMRESET failed (errno=-16)
 ata5: hard resetting link
 ata5: link is slow to respond, please be patient (ready=0)
 ata5: COMRESET failed (errno=-16)
 ata5: limiting SATA link speed to 1.5 Gbps
 ata5: hard resetting link
 ata5: COMRESET failed (errno=-16)
 ata5: reset failed, giving up
 ata5: EH complete

After which, libata turns off PHY event detection on the port until manual rescan is explicitly issued (we might want to change that behavior but that's a different issue).  Can you please clarify the followings?

1. Right after EH complete is printed, if you issue manual rescan while leaving the device attached, does it get detected properly?

2. What happens if you power off the enclosure, wait for it to be detached and then power it up again?  Does it get detected properly?  ie. Is exchanging to the other enclosure necessary to trigger the reset failure?

Thanks.
Comment 11 Reartes Guillermo 2011-05-20 14:10:22 UTC
> 2. What happens if you power off the enclosure, wait for it to be detached
> and
> then power it up again?  Does it get detected properly? 

That was a surprise, it does NOT get detected properly. The oldest enclosure is used on a wdtvl, and in the past certainly was able to do that. (I did assume that powering on an enclosure, then off then on again worked but it seems to not to work any more).

> ie. Is exchanging to the other enclosure necessary to trigger the reset
> failure?

That is what i was thinking until today. In the past it was necessary, now it seems not. I will try with older kernels.

> 1. Right after EH complete is printed, if you issue manual rescan while
> leaving
> the device attached, does it get detected properly?

no, it does not. i already tested it but i repeated the test with the patched kernel, just in case. the rescan must be executed without drives.
Comment 12 Reartes Guillermo 2011-05-20 18:14:30 UTC
Created attachment 58802 [details]
new test with parameter libata.force=5:dump_id

Tried with slackwares's 2.6.33.4 and Fedora 13's 2.6.34.9 and:
 * encA -> on > off > on >>>>>>>>>> fails.
 * encA -> on > off > encB > on >>> fails.
 * manual port rescan is always needed when changing drives or power cycle the current one.

Repeated tests with patched stock 2.6.38.5 and the following kernel parameter:
 libata.force=5:dump_id

This parameter is not documented or absent in 2.6.33/34 and present in 2.6.37/38, it prints the 'identify data'. Output may be useful.

attached logfiles, order:

dmesg-t8.pre.out
dmesg-t8-enc2_on.out
dmesg-t8-enc2_off.out
dmesg-t8-enc2_on_off.out
dmesg-t8-enc2_rescan_empty.out
dmesg-t8-enc2_on_after_rescan.out
dmesg-t8-enc1_on_after_rescan.out
dmesg-t8-enc1_on_after_rescan_II.out
dmesg-t8-enc1_off_rescan.o
Comment 13 Tejun Heo 2011-05-23 12:07:34 UTC
Created attachment 59082 [details]
thaw-reset-failed-ports.patch

Can you please try this patch?  This might not remove the problem itself but you should be able to trigger retry by simply disconnecting and replugging the enclosure.  Please let me know how it works.  Also, can you please enable CONFIG_PRINTK_TIME?

Thanks.
Comment 14 Reartes Guillermo 2011-05-24 15:01:45 UTC
Created attachment 59172 [details]
2.6.38.5 with thaw-reset-failed-ports.patch

> This might not remove the problem itself but
> you should be able to trigger retry by simply disconnecting and replugging
> the
> enclosure. 

It seems you are right, with the last patch, now i get:

ON > OFF > ON (FAIL) > OFF > ON (GOOD)

Previously, if you power of the enclosure in failed state (on [fail] state) there were no messages, now i get:

[  381.605365] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
[  381.605367] ata5: irq_stat 0x00400000, PHY RDY changed
[  381.605370] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns }
[  381.605375] ata5: hard resetting link
[  382.328032] ata5: SATA link down (SStatus 10 SControl 300)
[  382.328041] ata5: EH complete  

The first, third, seventh... time it will work.
The second, fourth, sixth... time it wont.

---

* As for the .config i enabled printk timing and increased dmesg log size a bit, just in case.

* in the attached messages file there is a syslog restart, because i changed it to put debug messages in the messages file.

----

do i try the latest 2.6.39 ?
Comment 15 Tejun Heo 2011-05-24 15:14:33 UTC
I don't think 2.6.39 would change anything and with timestamp I think it should be enough to attach single dmesg output after everything took place.

It's quite interesting and I'm wondering whether this is caused by the controller or enclosure.  Can you please try to take the hard drive out of the enclosure and connect directly to the SATA port and try hotplugging?  Plugging and unplugging both power and SATA signal connectors while the system is hot should be safe - the internal ones are designed to be hotplug safe too.  Can you still see the interleaved success/fail pattern that way?

Thanks.
Comment 16 Reartes Guillermo 2011-05-24 17:01:19 UTC
> It's quite interesting and I'm wondering whether this is caused by the
> controller or enclosure.

I already had some problems with it in the past: 
https://bugzilla.redhat.com --> Bug 611350  - ATA frozen , random pauses, every disk (workaround: pcie_aspm=off)

> Can you please try to take the hard drive out of the
> enclosure and connect directly to the SATA port and try hotplugging?

I will try it, but it could take a couple of days since i have to open the case...

----

I did a quick test with one enclosure.

I replaced one WD20EARS with a WD5000AAKS and later with a WD1200JS.

Only the WD20EARS 'triggers' the issue. And the 'issue' seems to be related to the WD20EARS being powered down, whatever that comes next to this event, hits the invisible wall.

* WD1200JS (It is a SATA 1.5Gbps NON-NCQ 120GB drive).
* WD5000AAKS (It is a SATA 3Gbps NCQ 500gb drive).

input: CHESEN USB Keyboard as /devices/pci0000:00/0000:00:04.0/usb4/4-3/4-3:1.1/input/input3
generic-usb 0003:0A81:0101.0002: input,hidraw1: USB HID v1.10 Device [CHESEN USB Keyboard] on usb-0000:00:04.0-3/input1
input: Genius Optical Mouse as /devices/pci0000:00/0000:00:02.0/usb3/3-3/3-3:1.0/input/input4
generic-usb 0003:0458:003A.0003: input,hidraw2: USB HID v1.10 Mouse [Genius Optical Mouse] on usb-0000:00:02.0-3/input0
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
forcedeth: Reverse Engineered nForce ethernet driver. Version 0.64.
ACPI: PCI Interrupt Link [LMAC] enabled at IRQ 22
forcedeth 0000:00:0a.0: PCI INT A -> Link[LMAC] -> GSI 22 (level, low) -> IRQ 22
forcedeth 0000:00:0a.0: setting latency timer to 64
wmi: Mapper loaded
forcedeth 0000:00:0a.0: ifname eth1, PHY OUI 0x732 @ 3, addr 00:24:8c:38:2b:0c
forcedeth 0000:00:0a.0: highdma csum pwrctl mgmt gbit lnktim msi desc-v3
udev: renamed network interface eth1 to eth2
ACPI: PCI Interrupt Link [LN0B] enabled at IRQ 19
HDA Intel 0000:02:00.1: PCI INT B -> Link[LN0B] -> GSI 19 (level, low) -> IRQ 19
HDA Intel 0000:02:00.1: irq 41 for MSI/MSI-X
HDA Intel 0000:02:00.1: setting latency timer to 64
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 18
CA0106 0000:01:09.0: PCI INT A -> Link[LNKB] -> GSI 18 (level, low) -> IRQ 18
snd-ca0106: Model 1006 Rev 00000000 Serial 10061102
fuse init (API version 7.16)
ppdev: user-space parallel port driver
lp: driver loaded but no devices found
Linux agpgart interface v0.103
NET: Registered protocol family 10
ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
ata5: irq_stat 0x00000040, connection status changed
ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
ata5: hard resetting link
ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata5.00: ATA-8: WDC WD5000AAKS-00A7B2, 01.03B01, max UDMA/133
ata5.00: 976773168 sectors, multi 0: LBA48 
ata5.00: configured for UDMA/133
ata5: EH complete
scsi 4:0:0:0: Direct-Access     ATA      WDC WD5000AAKS-0 01.0 PQ: 0 ANSI: 5
sd 4:0:0:0: Attached scsi generic sg3 type 0
sd 4:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/465 GiB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 >
sd 4:0:0:0: [sdc] Attached SCSI disk
ata5: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
ata5: irq_stat 0x00400000, PHY RDY changed
ata5: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 300)
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 300)
ata5: limiting SATA link speed to 1.5 Gbps
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 310)
ata5.00: disabled
ata5: limiting SATA link speed to 1.5 Gbps
ata5: EH complete
ata5.00: detaching (SCSI 4:0:0:0)
sd 4:0:0:0: [sdc] Synchronizing SCSI cache
sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
sd 4:0:0:0: [sdc] Stopping disk
sd 4:0:0:0: [sdc] START_STOP FAILED
sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
ata5: irq_stat 0x00000040, connection status changed
ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
ata5: limiting SATA link speed to 1.5 Gbps
ata5: hard resetting link
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5.00: ATA-8: WDC WD5000AAKS-00A7B2, 01.03B01, max UDMA/133
ata5.00: 976773168 sectors, multi 0: LBA48 
ata5.00: configured for UDMA/133
ata5: EH complete
scsi 4:0:0:0: Direct-Access     ATA      WDC WD5000AAKS-0 01.0 PQ: 0 ANSI: 5
sd 4:0:0:0: Attached scsi generic sg3 type 0
sd 4:0:0:0: [sdc] 976773168 512-byte logical blocks: (500 GB/465 GiB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 >
sd 4:0:0:0: [sdc] Attached SCSI disk
ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
ata5: irq_stat 0x00400000, PHY RDY changed
ata5: SError: { PHYRdyChg LinkSeq TrStaTrns }
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 310)
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 310)
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 310)
ata5.00: disabled
ata5: limiting SATA link speed to 1.5 Gbps
ata5: EH complete
ata5.00: detaching (SCSI 4:0:0:0)
sd 4:0:0:0: [sdc] Synchronizing SCSI cache
sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
sd 4:0:0:0: [sdc] Stopping disk
sd 4:0:0:0: [sdc] START_STOP FAILED
sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
ata5: irq_stat 0x00000040, connection status changed
ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
ata5: hard resetting link
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133
ata5.00: 234441648 sectors, multi 0: LBA48 
ata5.00: configured for UDMA/133
ata5: EH complete
scsi 4:0:0:0: Direct-Access     ATA      WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5
sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB)
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 4:0:0:0: Attached scsi generic sg3 type 0
 sdc: sdc1
sd 4:0:0:0: [sdc] Attached SCSI disk
ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
ata5: irq_stat 0x00400000, PHY RDY changed
ata5: SError: { PHYRdyChg LinkSeq TrStaTrns }
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 300)
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 300)
ata5: limiting SATA link speed to 1.5 Gbps
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 310)
ata5.00: disabled
ata5: limiting SATA link speed to 1.5 Gbps
ata5: EH complete
ata5.00: detaching (SCSI 4:0:0:0)
sd 4:0:0:0: [sdc] Synchronizing SCSI cache
sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
sd 4:0:0:0: [sdc] Stopping disk
sd 4:0:0:0: [sdc] START_STOP FAILED
sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
ata5: irq_stat 0x00000040, connection status changed
ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
ata5: limiting SATA link speed to 1.5 Gbps
ata5: hard resetting link
ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133
ata5.00: 234441648 sectors, multi 0: LBA48 
ata5.00: configured for UDMA/133
ata5: EH complete
scsi 4:0:0:0: Direct-Access     ATA      WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5
sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB)
sd 4:0:0:0: Attached scsi generic sg3 type 0
sd 4:0:0:0: [sdc] Write Protect is off
sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdc: sdc1
sd 4:0:0:0: [sdc] Attached SCSI disk
ata5: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen
ata5: irq_stat 0x00400000, PHY RDY changed
ata5: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns }
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 310)
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 310)
ata5: hard resetting link
ata5: SATA link down (SStatus 0 SControl 310)
ata5.00: disabled
ata5: limiting SATA link speed to 1.5 Gbps
ata5: EH complete
ata5.00: detaching (SCSI 4:0:0:0)
sd 4:0:0:0: [sdc] Synchronizing SCSI cache
sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
sd 4:0:0:0: [sdc] Stopping disk
sd 4:0:0:0: [sdc] START_STOP FAILED
sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
ata5: irq_stat 0x00000040, connection status changed
ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
ata5: limiting SATA link speed to 1.5 Gbps
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting link
ata5: COMRESET failed (errno=-16)
ata5: reset failed, giving up
ata5: EH complete


This is the stock kernel, and what was done is:

WD5000AAKS on, off, on, off 
WD1200JS on, off, on, off
WD20EARS on (fail), off, on (fail) [no messages were shown] 
WD1200JS on (fail), off  [no messages were shown]

Since it is the same kernel from the other time, printk timing is disabled i wil compile it later for testing.
Comment 17 Reartes Guillermo 2011-05-24 17:02:58 UTC
> Only the WD20EARS 'triggers' the issue. And the 'issue' seems to be related
> to
> the WD20EARS being powered down, whatever that comes next to this event, hits
> the invisible wall.

Please ignore that paragraph.
Comment 18 Reartes Guillermo 2011-05-24 18:42:02 UTC
I recompiled stock with printk timing enabled.

In the example below, i powered on the WD1200JS, then on again.
I noticed that the WD1200JS comes in 1.5Gbps ... instead of 3Gbps...
I jumpered the other enclosure's WD20EARS to 1.5Gbps and when i power it
on it come up with that link speed...

May it be that:
* The first time anything is plugged in in a port, the controller config it ok.
* the second time it is unable to pass the "3Gbps stage" (i am not a programer) probing and get's stuck. Maybe the WD20EARS cannot default to 1.5Gbps while the others can

---
Test's dmesg (stock):

[    5.458271] ACPI: Power Button [PWRF]
[    5.465011] usb 3-3: new low speed USB device using ohci_hcd and address 2
[    5.470183] ACPI: acpi_idle registered with cpuidle
[    5.495846] forcedeth 0000:00:0a.0: ifname eth0, PHY OUI 0x732 @ 3, addr 00:24:8c:38:2b:0c
[    5.495912] forcedeth 0000:00:0a.0: highdma csum pwrctl mgmt gbit lnktim msi desc-v3
[    5.496023] i2c i2c-0: nForce2 SMBus adapter at 0x600
[    5.496070] ACPI: resource nForce2_smbus [io  0x0700-0x073f] conflicts with ACPI region SM00 [io 0x700-0x73f]
[    5.496133] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    5.660013] usb 3-3: New USB device found, idVendor=0458, idProduct=003a
[    5.660060] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    5.660104] usb 3-3: Product: Optical Mouse
[    5.660146] usb 3-3: Manufacturer: Genius
[    5.665728] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 19
[    5.665775] CA0106 0000:01:09.0: PCI INT A -> Link[LNKB] -> GSI 19 (level, low) -> IRQ 19
[    5.665860] snd-ca0106: Model 1006 Rev 00000000 Serial 10061102
[    5.673158] input: Genius Optical Mouse as /devices/pci0000:00/0000:00:02.0/usb3/3-3/3-3:1.0/input/input4
[    5.673304] generic-usb 0003:0458:003A.0003: input,hidraw2: USB HID v1.10 Mouse [Genius Optical Mouse] on usb-0000:00:02.0-3/input0
[    5.687225] sundance.c:v1.2 11-Sep-2006 Written by Donald Becker
[    5.687460] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 18
[    5.687506] sundance 0000:01:08.0: PCI INT A -> Link[LNKA] -> GSI 18 (level, low) -> IRQ 18
[    5.687950] eth1: IC Plus Corporation IP100A FAST Ethernet Adapter at 000000000001dc00, 00:06:4f:5a:dd:a8, IRQ 18.
[    5.688475] eth1: MII PHY found at address 0, status 0x786d advertising 01e1.
[    6.264107] udev: renamed network interface eth0 to eth1
[    6.269061] udev: renamed network interface eth1_rename to eth0
[    6.456110] fuse init (API version 7.16)
[    7.179301] ppdev: user-space parallel port driver
[    7.188039] lp: driver loaded but no devices found
[    7.255209] Linux agpgart interface v0.103
[   13.803413] forcedeth 0000:00:0a.0: irq 42 for MSI/MSI-X
[   13.803605] forcedeth 0000:00:0a.0: eth1: no link during initialization
[   32.650457] NET: Registered protocol family 10
[   32.650786] ADDRCONF(NETDEV_UP): eth1: link is not ready
[  102.784010] eth0: Link up
[  102.784010] eth0: Link changed: 100Mbps, full duplex
[  113.378028] eth0: no IPv6 routers present
[  249.212071] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
[  249.212074] ata5: irq_stat 0x00000040, connection status changed
[  249.212076] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
[  249.212082] ata5: hard resetting link
[  249.935026] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[  249.935728] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133
[  249.935731] ata5.00: 234441648 sectors, multi 0: LBA48 
[  249.936483] ata5.00: configured for UDMA/133
[  249.936487] ata5: EH complete
[  249.936560] scsi 4:0:0:0: Direct-Access     ATA      WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5
[  249.936739] sd 4:0:0:0: Attached scsi generic sg3 type 0
[  249.936806] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB)
[  249.936856] sd 4:0:0:0: [sdc] Write Protect is off
[  249.936858] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[  249.936872] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  249.942388]  sdc: sdc1
[  249.942539] sd 4:0:0:0: [sdc] Attached SCSI disk
[  257.908844]  md127: unknown partition table
[ 1800.643750] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
[ 1800.643753] ata5: irq_stat 0x00400000, PHY RDY changed
[ 1800.643756] ata5: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
[ 1800.643760] ata5: hard resetting link
[ 1801.366025] ata5: SATA link down (SStatus 0 SControl 300)
[ 1806.366068] ata5: hard resetting link
[ 1806.671023] ata5: SATA link down (SStatus 0 SControl 300)
[ 1806.671027] ata5: limiting SATA link speed to 1.5 Gbps
[ 1811.671041] ata5: hard resetting link
[ 1811.976031] ata5: SATA link down (SStatus 0 SControl 310)
[ 1811.976037] ata5.00: disabled
[ 1811.976046] ata5: EH complete
[ 1811.976054] ata5.00: detaching (SCSI 4:0:0:0)
[ 1811.979104] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
[ 1811.979131] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 1811.979134] sd 4:0:0:0: [sdc] Stopping disk
[ 1811.979140] sd 4:0:0:0: [sdc] START_STOP FAILED
[ 1811.979141] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 1837.863267] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
[ 1837.863270] ata5: irq_stat 0x00000040, connection status changed
[ 1837.863273] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
[ 1837.863278] ata5: limiting SATA link speed to 1.5 Gbps
[ 1837.863280] ata5: hard resetting link
[ 1838.586032] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 1838.586760] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133
[ 1838.586762] ata5.00: 234441648 sectors, multi 0: LBA48 
[ 1838.587523] ata5.00: configured for UDMA/133
[ 1838.587530] ata5: EH complete
[ 1838.587620] scsi 4:0:0:0: Direct-Access     ATA      WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5
[ 1838.587782] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB)
[ 1838.587814] sd 4:0:0:0: [sdc] Write Protect is off
[ 1838.587817] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 1838.587830] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1838.587848] sd 4:0:0:0: Attached scsi generic sg3 type 0
[ 1838.594879]  sdc: sdc1
[ 1838.595048] sd 4:0:0:0: [sdc] Attached SCSI disk
[ 1869.230327] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1910000 action 0xe frozen
[ 1869.230330] ata5: irq_stat 0x00400000, PHY RDY changed
[ 1869.230333] ata5: SError: { PHYRdyChg Dispar LinkSeq TrStaTrns }
[ 1869.230337] ata5: hard resetting link
[ 1869.953031] ata5: SATA link down (SStatus 0 SControl 310)
[ 1874.953038] ata5: hard resetting link
[ 1875.258032] ata5: SATA link down (SStatus 0 SControl 310)
[ 1880.258169] ata5: hard resetting link
[ 1880.563033] ata5: SATA link down (SStatus 0 SControl 310)
[ 1880.563039] ata5.00: disabled
[ 1880.563043] ata5: limiting SATA link speed to 1.5 Gbps
[ 1880.563051] ata5: EH complete
[ 1880.563060] ata5.00: detaching (SCSI 4:0:0:0)
[ 1880.566103] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
[ 1880.566132] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 1880.566135] sd 4:0:0:0: [sdc] Stopping disk
[ 1880.566140] sd 4:0:0:0: [sdc] START_STOP FAILED
[ 1880.566141] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 1894.012105] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
[ 1894.012107] ata5: irq_stat 0x00000040, connection status changed
[ 1894.012110] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
[ 1894.012115] ata5: limiting SATA link speed to 1.5 Gbps
[ 1894.012118] ata5: hard resetting link
[ 1894.735030] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 1894.735754] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133
[ 1894.735757] ata5.00: 234441648 sectors, multi 0: LBA48 
[ 1894.736511] ata5.00: configured for UDMA/133
[ 1894.736517] ata5: EH complete
[ 1894.736595] scsi 4:0:0:0: Direct-Access     ATA      WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5
[ 1894.736743] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB)
[ 1894.736775] sd 4:0:0:0: [sdc] Write Protect is off
[ 1894.736777] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 1894.736791] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 1894.736818] sd 4:0:0:0: Attached scsi generic sg3 type 0
[ 1894.743510]  sdc: sdc1
[ 1894.743664] sd 4:0:0:0: [sdc] Attached SCSI disk
[ 2143.322966] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
[ 2143.322968] ata5: irq_stat 0x00400000, PHY RDY changed
[ 2143.322971] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns }
[ 2143.322975] ata5: hard resetting link
[ 2144.045033] ata5: SATA link down (SStatus 0 SControl 310)
[ 2149.045040] ata5: hard resetting link
[ 2149.350032] ata5: SATA link down (SStatus 0 SControl 310)
[ 2154.350027] ata5: hard resetting link
[ 2154.655024] ata5: SATA link down (SStatus 0 SControl 310)
[ 2154.655030] ata5.00: disabled
[ 2154.655041] ata5: EH complete
[ 2154.655049] ata5.00: detaching (SCSI 4:0:0:0)
[ 2154.664101] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
[ 2154.664353] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 2154.664356] sd 4:0:0:0: [sdc] Stopping disk
[ 2154.664361] sd 4:0:0:0: [sdc] START_STOP FAILED
[ 2154.664362] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 2482.520422] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5940000 action 0xe frozen
[ 2482.520425] ata5: irq_stat 0x00000040, connection status changed
[ 2482.520428] ata5: SError: { CommWake Dispar LinkSeq TrStaTrns DevExch }
[ 2482.520433] ata5: hard resetting link
[ 2483.243032] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2483.261208] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133
[ 2483.261212] ata5.00: 3907029168 sectors, multi 0: LBA48 
[ 2483.264916] ata5.00: configured for UDMA/133
[ 2483.264923] ata5: EH complete
[ 2483.265035] scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5
[ 2483.265212] sd 4:0:0:0: Attached scsi generic sg3 type 0
[ 2483.265231] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 2483.265285] sd 4:0:0:0: [sdc] Write Protect is off
[ 2483.265287] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 2483.265310] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2483.278936]  sdc: sdc1
[ 2483.279541] sd 4:0:0:0: [sdc] Attached SCSI disk
[ 2498.115437] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
[ 2498.115440] ata5: irq_stat 0x00400000, PHY RDY changed
[ 2498.115442] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns }
[ 2498.115446] ata5: hard resetting link
[ 2498.838031] ata5: SATA link down (SStatus 0 SControl 300)
[ 2503.838046] ata5: hard resetting link
[ 2504.143032] ata5: SATA link down (SStatus 0 SControl 300)
[ 2504.143038] ata5: limiting SATA link speed to 1.5 Gbps
[ 2509.143048] ata5: hard resetting link
[ 2509.448031] ata5: SATA link down (SStatus 0 SControl 310)
[ 2509.448037] ata5.00: disabled
[ 2509.448042] ata5: limiting SATA link speed to 1.5 Gbps
[ 2509.448050] ata5: EH complete
[ 2509.448058] ata5.00: detaching (SCSI 4:0:0:0)
[ 2509.451100] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
[ 2509.451130] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 2509.451133] sd 4:0:0:0: [sdc] Stopping disk
[ 2509.451139] sd 4:0:0:0: [sdc] START_STOP FAILED
[ 2509.451140] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 2528.851233] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
[ 2528.851236] ata5: irq_stat 0x00000040, connection status changed
[ 2528.851239] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
[ 2528.851244] ata5: limiting SATA link speed to 1.5 Gbps
[ 2528.851247] ata5: hard resetting link
[ 2529.574031] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2529.587720] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133
[ 2529.587723] ata5.00: 3907029168 sectors, multi 0: LBA48 
[ 2529.591414] ata5.00: configured for UDMA/133
[ 2529.591420] ata5: EH complete
[ 2529.591515] scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5
[ 2529.591684] sd 4:0:0:0: Attached scsi generic sg3 type 0
[ 2529.591750] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 2529.591782] sd 4:0:0:0: [sdc] Write Protect is off
[ 2529.591784] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 2529.591798] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2529.605472]  sdc: sdc1
[ 2529.605676] sd 4:0:0:0: [sdc] Attached SCSI disk
[ 2565.891518] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen
[ 2565.891521] ata5: irq_stat 0x00400000, PHY RDY changed
[ 2565.891523] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns }
[ 2565.891527] ata5: hard resetting link
[ 2566.614034] ata5: SATA link down (SStatus 0 SControl 310)
[ 2571.614070] ata5: hard resetting link
[ 2571.919032] ata5: SATA link down (SStatus 0 SControl 310)
[ 2576.919039] ata5: hard resetting link
[ 2577.224030] ata5: SATA link down (SStatus 0 SControl 310)
[ 2577.224037] ata5.00: disabled
[ 2577.224041] ata5: limiting SATA link speed to 1.5 Gbps
[ 2577.224049] ata5: EH complete
[ 2577.224057] ata5.00: detaching (SCSI 4:0:0:0)
[ 2577.228101] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
[ 2577.228133] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 2577.228136] sd 4:0:0:0: [sdc] Stopping disk
[ 2577.228141] sd 4:0:0:0: [sdc] START_STOP FAILED
[ 2577.228142] sd 4:0:0:0: [sdc]  Result: hostbyte=0x04 driverbyte=0x00
[ 2598.347557] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5940000 action 0xe frozen
[ 2598.347560] ata5: irq_stat 0x00000040, connection status changed
[ 2598.347563] ata5: SError: { CommWake Dispar LinkSeq TrStaTrns DevExch }
[ 2598.347567] ata5: limiting SATA link speed to 1.5 Gbps
[ 2598.347570] ata5: hard resetting link
[ 2599.070033] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2599.088528] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133
[ 2599.088532] ata5.00: 3907029168 sectors, multi 0: LBA48 
[ 2599.092230] ata5.00: configured for UDMA/133
[ 2599.092237] ata5: EH complete
[ 2599.092324] scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5
[ 2599.092499] sd 4:0:0:0: Attached scsi generic sg3 type 0
[ 2599.092651] sd 4:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[ 2599.092685] sd 4:0:0:0: [sdc] Write Protect is off
[ 2599.092687] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 2599.092701] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2599.106280]  sdc: sdc1
[ 2599.106474] sd 4:0:0:0: [sdc] Attached SCSI disk
Comment 19 Reartes Guillermo 2011-05-24 19:16:35 UTC
Correction: 
 * The WD1200JS is actually a SATA 3Gbps drive. (Non-NCQ).
Comment 20 Reartes Guillermo 2011-05-25 02:20:04 UTC
Issue:

SATA Hot-Swap either Fails or Partially Succeeds (non-optimal), based on
hardware combination.

* Results in non-optimal drive setup for INTEL for all tested disks
* Results in non-functional port after hot-swap for NVIDIA for certain
  disks AND non-optimal disk setup for the rest of the working tested disks.

WORKAROUND:

* Manually rescan the port, but no disks must be powered on that port 
  (EMPTY PORT).

# echo "- - - " > /sys/class/scsi_host/host4/scan


Affected Kernels: 

* F13 / F14 / F15  (2.6.33*, 2.6.34*)
* Sysrescuecd (sysresccd 2.6.35-std211-i586)
* Slackware64 13.1 (2.6.33*)

* Stock 2.6.38.5 downloaded & self-compiled from kernel.org 
  (with userland Slackware64 13.1)

Affected SATA Controllers:

* RAID bus controller [0104]: Intel Corporation Mobile 82801 SATA RAID Controller [8086:282a] (rev 05)
* SATA controller [0106]: nVidia Corporation MCP78S [GeForce 8200] AHCI Controller [10de:0ad4] (rev a2)

Hardware:

* For NVIDIA: Asus M4N72-E
* For INTEL: Dell Lattitude 4310

Explanation:

[1] The first time an enclosure is powered on, it is recognized correctly
    and configuted correctly.
    Drives that support 3Gbps speeds are configured correctly.

[1] INTEL (enclosure1 power-on):

May 15 13:39:14 sysresccd kernel: [ 1258.830169] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen              
May 15 13:39:14 sysresccd kernel: [ 1258.830173] ata5: irq_stat 0x00400040, connection status changed                              
May 15 13:39:14 sysresccd kernel: [ 1258.830178] ata5: SError: { PHYRdyChg CommWake DevExch }                                      
May 15 13:39:14 sysresccd kernel: [ 1258.830190] ata5: hard resetting link                                                         
May 15 13:39:14 sysresccd kernel: [ 1259.552436] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)                            
May 15 13:39:14 sysresccd kernel: [ 1259.565598] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 50.0AB50, max UDMA/133                      
May 15 13:39:14 sysresccd kernel: [ 1259.565603] ata5.00: 3907029168 sectors, multi 0: LBA48                                       
May 15 13:39:14 sysresccd kernel: [ 1259.569585] ata5.00: configured for UDMA/133                                                  
May 15 13:39:14 sysresccd kernel: [ 1259.569594] ata5: EH complete                                                                 
May 15 13:39:14 sysresccd kernel: [ 1259.569781] scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 50.0 PQ: 0 ANSI: 5      
May 15 13:39:14 sysresccd kernel: [ 1259.570053] sd 4:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)          
May 15 13:39:14 sysresccd kernel: [ 1259.570065] sd 4:0:0:0: Attached scsi generic sg2 type 0                                      
May 15 13:39:14 sysresccd kernel: [ 1259.570126] sd 4:0:0:0: [sdb] Write Protect is off                                            
May 15 13:39:14 sysresccd kernel: [ 1259.570129] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00                                         
May 15 13:39:14 sysresccd kernel: [ 1259.570158] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA                                                                                                                              
May 15 13:39:14 sysresccd kernel: [ 1259.570313]  sdb: sdb1                                                                        
May 15 13:39:14 sysresccd kernel: [ 1259.583671] sd 4:0:0:0: [sdb] Attached SCSI disk

[1] NVIDIA (enclosure1 power-on): 

May 24 21:56:26 sysresccd kernel: [  193.399937] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
May 24 21:56:26 sysresccd kernel: [  193.399940] ata5: irq_stat 0x00000040, connection status changed
May 24 21:56:26 sysresccd kernel: [  193.399943] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
May 24 21:56:26 sysresccd kernel: [  193.399949] ata5: hard resetting link
May 24 21:56:26 sysresccd kernel: [  194.122024] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
May 24 21:56:26 sysresccd kernel: [  194.122725] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133
May 24 21:56:26 sysresccd kernel: [  194.122728] ata5.00: 234441648 sectors, multi 0: LBA48 
May 24 21:56:26 sysresccd kernel: [  194.123505] ata5.00: configured for UDMA/133
May 24 21:56:26 sysresccd kernel: [  194.123512] ata5: EH complete
May 24 21:56:26 sysresccd kernel: [  194.123607] scsi 4:0:0:0: Direct-Access     ATA      WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5
May 24 21:56:26 sysresccd kernel: [  194.123746] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB)
May 24 21:56:26 sysresccd kernel: [  194.123751] sd 4:0:0:0: Attached scsi generic sg3 type 0
May 24 21:56:26 sysresccd kernel: [  194.123778] sd 4:0:0:0: [sdc] Write Protect is off
May 24 21:56:26 sysresccd kernel: [  194.123781] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
May 24 21:56:26 sysresccd kernel: [  194.123795] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 24 21:56:26 sysresccd kernel: [  194.123941]  sdc: sdc1
May 24 21:56:26 sysresccd kernel: [  194.128271] sd 4:0:0:0: [sdc] Attached SCSI disk


[2] That enclosure (now named enclosure1) is powered-off:

[2] INTEL (enclosure1 powered-off):

May 15 13:40:04 sysresccd kernel: [ 1309.637238] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen              
May 15 13:40:04 sysresccd kernel: [ 1309.637242] ata5: irq_stat 0x00400040, connection status changed                              
May 15 13:40:04 sysresccd kernel: [ 1309.637248] ata5: SError: { PHYRdyChg 10B8B DevExch }
May 15 13:40:04 sysresccd kernel: [ 1309.637257] ata5: hard resetting link
May 15 13:40:05 sysresccd kernel: [ 1310.360284] ata5: SATA link down (SStatus 0 SControl 300)
May 15 13:40:10 sysresccd kernel: [ 1315.360305] ata5: hard resetting link
May 15 13:40:10 sysresccd kernel: [ 1315.665470] ata5: SATA link down (SStatus 0 SControl 300)
May 15 13:40:10 sysresccd kernel: [ 1315.665483] ata5: limiting SATA link speed to 1.5 Gbps
May 15 13:40:15 sysresccd kernel: [ 1320.665357] ata5: hard resetting link
May 15 13:40:16 sysresccd kernel: [ 1320.970264] ata5: SATA link down (SStatus 0 SControl 310)
May 15 13:40:16 sysresccd kernel: [ 1320.970275] ata5.00: disabled
May 15 13:40:16 sysresccd kernel: [ 1320.970290] ata5: EH complete
May 15 13:40:16 sysresccd kernel: [ 1320.970302] ata5.00: detaching (SCSI 4:0:0:0)
May 15 13:40:16 sysresccd kernel: [ 1320.974419] sd 4:0:0:0: [sdb] Synchronizing SCSI cache
May 15 13:40:16 sysresccd kernel: [ 1320.974489] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 15 13:40:16 sysresccd kernel: [ 1320.974496] sd 4:0:0:0: [sdb] Stopping disk
May 15 13:40:16 sysresccd kernel: [ 1320.974515] sd 4:0:0:0: [sdb] START_STOP FAILED
May 15 13:40:16 sysresccd kernel: [ 1320.974519] sd 4:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK 

[2] NVIDIA (enclosure1 powered-off):

May 24 21:56:50 sysresccd kernel: [  217.598434] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1990000 action 0xe frozen
May 24 21:56:50 sysresccd kernel: [  217.598437] ata5: irq_stat 0x00400000, PHY RDY changed
May 24 21:56:50 sysresccd kernel: [  217.598439] ata5: SError: { PHYRdyChg 10B8B Dispar LinkSeq TrStaTrns }
May 24 21:56:50 sysresccd kernel: [  217.598444] ata5: hard resetting link
May 24 21:56:51 sysresccd kernel: [  218.321024] ata5: SATA link down (SStatus 0 SControl 300)
May 24 21:56:56 sysresccd kernel: [  223.321017] ata5: hard resetting link
May 24 21:56:56 sysresccd kernel: [  223.626024] ata5: SATA link down (SStatus 0 SControl 300)
May 24 21:56:56 sysresccd kernel: [  223.626029] ata5: limiting SATA link speed to 1.5 Gbps
May 24 21:57:01 sysresccd kernel: [  228.626027] ata5: hard resetting link
May 24 21:57:01 sysresccd kernel: [  228.931024] ata5: SATA link down (SStatus 0 SControl 310)
May 24 21:57:01 sysresccd kernel: [  228.931028] ata5.00: disabled
May 24 21:57:01 sysresccd kernel: [  228.931043] ata5: EH complete
May 24 21:57:01 sysresccd kernel: [  228.931054] ata5.00: detaching (SCSI 4:0:0:0)
May 24 21:57:01 sysresccd kernel: [  228.935104] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
May 24 21:57:01 sysresccd kernel: [  228.935154] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
May 24 21:57:01 sysresccd kernel: [  228.935158] sd 4:0:0:0: [sdc] Stopping disk
May 24 21:57:01 sysresccd kernel: [  228.935165] sd 4:0:0:0: [sdc] START_STOP FAILED
May 24 21:57:01 sysresccd kernel: [  228.935174] sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

[3] Enlcosure2 is powered-on in the same port as enclosure1 (now removed).
    Please notice that both drives came up with lower link speed.
    Please also note that WD20EARS in intel worked at 1.5Gbps WITHOUT a 
    jumper and failed in nvidia UNLESS JUMPERED. (See previous posts)

[3] INTEL (Power-on enclosure2):

May 15 13:41:38 sysresccd kernel: [ 1403.306041] ata5: exception Emask 0x10 SAct 0x0 SErr 0x4050002 action 0xe frozen
May 15 13:41:38 sysresccd kernel: [ 1403.306045] ata5: irq_stat 0x00400040, connection status changed
May 15 13:41:38 sysresccd kernel: [ 1403.306050] ata5: SError: { RecovComm PHYRdyChg CommWake DevExch }
May 15 13:41:38 sysresccd kernel: [ 1403.306061] ata5: hard resetting link
May 15 13:41:39 sysresccd kernel: [ 1404.028172] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
May 15 13:41:39 sysresccd kernel: [ 1404.058974] ata5.00: ATA-8: WDC WD20EARS-00MVWB0, 51.0AB51, max UDMA/133
May 15 13:41:39 sysresccd kernel: [ 1404.058979] ata5.00: 3907029168 sectors, multi 0: LBA48 
May 15 13:41:39 sysresccd kernel: [ 1404.063161] ata5.00: configured for UDMA/133
May 15 13:41:39 sysresccd kernel: [ 1404.063170] ata5: EH complete
May 15 13:41:39 sysresccd kernel: [ 1404.063311] scsi 4:0:0:0: Direct-Access     ATA      WDC WD20EARS-00M 51.0 PQ: 0 ANSI: 5
May 15 13:41:39 sysresccd kernel: [ 1404.063513] sd 4:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
May 15 13:41:39 sysresccd kernel: [ 1404.063531] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 15 13:41:39 sysresccd kernel: [ 1404.063557] sd 4:0:0:0: [sdb] Write Protect is off
May 15 13:41:39 sysresccd kernel: [ 1404.063559] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
May 15 13:41:39 sysresccd kernel: [ 1404.063578] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 15 13:41:39 sysresccd kernel: [ 1404.063741]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7
May 15 13:41:39 sysresccd kernel: [ 1404.113474] sd 4:0:0:0: [sdb] Attached SCSI disk


[3] NVIDIA (Power-on enclosure2):

May 24 21:57:20 sysresccd kernel: [  247.520154] ata5: exception Emask 0x10 SAct 0x0 SErr 0x5840000 action 0xe frozen
May 24 21:57:20 sysresccd kernel: [  247.520156] ata5: irq_stat 0x00000040, connection status changed
May 24 21:57:20 sysresccd kernel: [  247.520159] ata5: SError: { CommWake LinkSeq TrStaTrns DevExch }
May 24 21:57:20 sysresccd kernel: [  247.520165] ata5: hard resetting link
May 24 21:57:21 sysresccd kernel: [  248.243024] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
May 24 21:57:21 sysresccd kernel: [  248.243729] ata5.00: ATA-7: WDC WD1200JS-00MHB0, 02.01C03, max UDMA/133
May 24 21:57:21 sysresccd kernel: [  248.243732] ata5.00: 234441648 sectors, multi 0: LBA48 
May 24 21:57:21 sysresccd kernel: [  248.244530] ata5.00: configured for UDMA/133
May 24 21:57:21 sysresccd kernel: [  248.244536] ata5: EH complete
May 24 21:57:21 sysresccd kernel: [  248.244624] scsi 4:0:0:0: Direct-Access     ATA      WDC WD1200JS-00M 02.0 PQ: 0 ANSI: 5
May 24 21:57:21 sysresccd kernel: [  248.244748] sd 4:0:0:0: Attached scsi generic sg3 type 0
May 24 21:57:21 sysresccd kernel: [  248.244771] sd 4:0:0:0: [sdc] 234441648 512-byte logical blocks: (120 GB/111 GiB)
May 24 21:57:21 sysresccd kernel: [  248.244805] sd 4:0:0:0: [sdc] Write Protect is off
May 24 21:57:21 sysresccd kernel: [  248.244808] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
May 24 21:57:21 sysresccd kernel: [  248.244822] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 24 21:57:21 sysresccd kernel: [  248.245445]  sdc: sdc1
May 24 21:57:21 sysresccd kernel: [  248.248935] sd 4:0:0:0: [sdc] Attached SCSI disk


Enclosure:

2 x NexStar CX (USB+eSATA)

Tested with these disks:

1 x WD20EARS-00MVWB0  (50.0AB50) 2.0TB 3Gbps NCQ
1 x WD20EARS-00MVWB0  (51.0AB51) 2.0TB 3Gbps NCQ
1 x WD1200JS-00MHB0   (02.01C03) 120GB 3Gbps N/A
1 x WD5000AAKS-00A7B2 (01.03B01) 500GB 3Gbps NCQ
Comment 21 Tejun Heo 2011-05-25 09:27:38 UTC
Looks more and more like the enclosure has some oddity in link negotiation.  Please take the enclosure out of the equation and see whether the problem persists.

Thanks.
Comment 22 Reartes Guillermo 2011-05-25 15:41:53 UTC
Created attachment 59412 [details]
manual hot-swap without any enclosure

I performed a disk hot-swap by hand. I put the disk onto  O'Reilly "BIND & DNS"
and put some DVD boxes and did the following:

For the WD20EARS:
 * I booted with only that drive attached.
   > The drive booted at 3Gbps.
 * Disconnected the data cable.
 * Reconnected the data cable.
   > The WD20EARS was configured at 1.5Gbps (the enclosure failed to do that
     with the nvidia controller but succeeded with the intel).
 * Disconnected the power cable.
 * Reconnected the power cable.
   > The WD20EARS was able to get configured at 3Gbps with some errors. (!)
 * Disconnected the power cable and the data cable.

Now with the WD5000AAKS drive:
 * Connected data and power cable.
   > The WD5000AAKS gets configured at 1.5Gbps.
 * Disconnected the power cable.
 * Performed a manual port rescan 
 * Reconnected the power cable.
   > The WD5000AAKS gets configured at 3Gbps.

---

I will try the patch.
Comment 23 Tejun Heo 2011-05-26 13:28:51 UTC
So, it's working as expected.  If the power is maintained to the drive, the drive seems to remember the last negotiated speed, nothing too strange there and the patch wouldn't make any difference.  So, the enclosure, whatever chipset it may be using, is the cause of the problem.

It is incomplete but with the patch just replugging the drive makes it work, so that might be the work around we'll have to accept for now.

Thanks.
Comment 24 Reartes Guillermo 2011-05-27 12:13:32 UTC
The correct workaround seems to jumper any drive with any enclosure to 1.5Gbps for the broader compatibility, just in case i plug the drive in any random computer.

> So, the enclosure, whatever chipset it may be using, is the cause of the 
> problem.

As for the enclosure chipset, i could not found which one it uses since it is not listed at the vantec site. (Model is NST-300SU-BK).

As for the behaviour of the nvidia controller, it is slighty better but not ok at all.

>> For the WD20EARS:
>> * I booted with only that drive attached.
>>   > The drive booted at 3Gbps.
>> * Disconnected the data cable.
>> * Reconnected the data cable.
>>   > The WD20EARS was configured at 1.5Gbps (the enclosure failed to do that
>>     with the nvidia controller but succeeded with the intel).
Granted, as i said, the enclosure adds noise and worsens the problem.
But the drive should have used 3Gbps optimally and there is no enclosure here.

>> * Disconnected the power cable.
>> * Reconnected the power cable.
>>   > The WD20EARS was able to get configured at 3Gbps with some errors. (!)
>> * Disconnected the power cable and the data cable.
I only tried 2 times, i may need to try more to see how really lucky i was.

>> Now with the WD5000AAKS drive:
>> * Connected data and power cable.
>>   > The WD5000AAKS gets configured at 1.5Gbps.
Again, this drive must have used 3Gbps instead of 1.5Gbps.

>> * Disconnected the power cable.
>> * Performed a manual port rescan 
>> * Reconnected the power cable.
>>   > The WD5000AAKS gets configured at 3Gbps.
The manual port rescans fixes the problem, since no enclosure was used here it must be the controller (who knows, it may be m4n72-e specific or just bad luck picking this particular motherboard).

I will perform more specific test in the weekend. I will boot the sysrescue on this mobo and try all ports and all drives, now including the blacks main disks.
Also i will try with an older computer which uses sata_nv. And with and without the enclosure.

As for the intel controller, the behaviour is not ok also but since i can only test stuff with the enclosure, a third(sata_nv) or fourth(have to find one) controller test is needed.

Rigth now i am compiling a bugreport list from major distros for analysis, it will take some time to me to read them all.

I will also test to wich extent does this particular enclosure handles the other dma/udma modes, and i expect it to not support PIO but i will test it because these are part of the SATA EH routines, if i am not wrong.
Comment 25 Florian Mickler 2011-08-08 08:25:30 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc1:

commit 8ea7645c5a949f9d0ea86edc0778713b5e63ab74
Author: Tejun Heo <tj@kernel.org>
Date:   Wed May 25 12:23:56 2011 +0200

    libata: leave port thawed after reset failure