Bug 9309

Summary: Drive seagate ST380011AS needs to be blacklisted
Product: IO/Storage Reporter: Hans-Joachim Baader (Hans-Joachim.Baader)
Component: Serial ATAAssignee: Tejun Heo (htejun)
Status: REJECTED UNREPRODUCIBLE    
Severity: normal CC: akpm, protasnb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.22.11 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Output of lcpci -v
dmesg before patch
dmesg after patch

Description Hans-Joachim Baader 2007-11-05 10:31:19 UTC
Problem Description:
Kernel 2.6.22.x leaves occasional errors in the log:

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen
ata1.00: cmd 61/02:00:69:f6:36/00:00:00:00:00/40 tag 0 cdb 0x0 data 1024 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1: soft resetting port
ata1: softreset failed (port busy but CLO unavailable)
ata1: reset failed (errno=-95), retrying in 10 secs
ata1: hard resetting port
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: COMRESET failed (errno=-16)
ata1: hard resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
ata1: EH complete
sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

hdparm -i /dev/sda:

 Model=ST380011AS                              , FwRev=3.00    ,
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156301488
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:

This is similar to other reports, so I suggest adding it to the blacklist:

        { "ST380011AS",         "3.00",         ATA_HORKAGE_NONCQ, },

(haven't checked yet if it works)

Regards,
hjb
Comment 1 Anonymous Emailer 2007-11-05 13:27:41 UTC
Reply-To: akpm@linux-foundation.org

On Mon,  5 Nov 2007 10:31:20 -0800 (PST)
bugme-daemon@bugzilla.kernel.org wrote:

> (haven't checked yet if it works)

When you have done so, please send a tested patch to

Jeff Garzik <jeff@garzik.org>
Andrew Morton <akpm@linux-foundation.org>
linux-ide@vger.kernel.org

thanks.
Comment 2 Jeff Garzik 2007-11-08 11:20:48 UTC
A timeout tells us nothing, sorta like this bug report :)  There is definitely -zero- information implying that this should be added to an NCQ blacklist.

Furthermore, on hardware like Intel ICH, a timeout is the only way we have to know that a DMA error occurred (it only sends an interrupt on success; assumes OS will notice failure via timeout).

It always recovers from the timeout, so I don't see any problem here.

Of course, a full dmesg and lspci might shed additional light.
Comment 3 Tejun Heo 2007-11-08 17:40:57 UTC
The driver is complaining about missing CLO support, so the controller seems to be a non-intel variant of ahci, or if an intel one, a pretty early one.

Anyways, mostly likely cause is hardware issues.

* Does 'smartctl -a /dev/sdX' indicate any problem?

* Perform common hardware debugging - swap / reseat cables, connect harddrive to separate power supply, etc.
Comment 4 Hans-Joachim Baader 2007-11-09 10:31:54 UTC
We have several of these machines, only the two with new ATA drivers make problems. The others run a 2.6.13.5 kernel.

smartctl doesn't indicate problems.

Suggested patch:

--- drivers/ata/libata-core.c.orig      2007-10-22 11:34:23.000000000 +0200
+++ drivers/ata/libata-core.c   2007-11-09 19:05:31.000000000 +0100
@@ -3789,6 +3789,7 @@
        /* NCQ is broken */
        { "Maxtor 6L250S0",     "BANC1G10",     ATA_HORKAGE_NONCQ },
        { "Maxtor 6B200M0",     "BANC1B10",     ATA_HORKAGE_NONCQ },
+       { "ST380011AS",         "3.00",         ATA_HORKAGE_NONCQ },
        /* NCQ hard hangs device under heavier load, needs hard power cycle */
        { "Maxtor 6B250S0",     "BANC1B70",     ATA_HORKAGE_NONCQ },
        /* Blacklist entries taken from Silicon Image 3124/3132

I attach lspci, dmesg output (before and after patch)
Comment 5 Hans-Joachim Baader 2007-11-09 10:34:13 UTC
Created attachment 13482 [details]
Output of lcpci -v
Comment 6 Hans-Joachim Baader 2007-11-09 10:35:23 UTC
Created attachment 13483 [details]
dmesg before patch
Comment 7 Hans-Joachim Baader 2007-11-09 10:36:12 UTC
Created attachment 13484 [details]
dmesg after patch
Comment 8 Tejun Heo 2007-11-12 06:20:31 UTC
I see, it's a ICH6.  It could be that the cause is the ICH6 not the drive.  Can you please connect another NCQ capable drive to the controller and see if the same problem occurs?  Or even better, connect ST380011AS to another NCQ capable controller and see whether it works.

If ICH6 AHCI turns out to be the culprit, we'll need to turn off NCQ support for the controller not the drive.

Thanks.
Comment 9 Natalie Protasevich 2008-02-08 02:27:42 UTC
Hans,
Any update on this, Have you been able to try recommended in #8?
Comment 10 Hans-Joachim Baader 2008-02-08 02:59:14 UTC
I'm sorry, I didn't have the right hardware combination for further tests. I'll try to find some next week.
Comment 11 Alan 2008-09-22 10:53:22 UTC
No activity for months, closing. Please re-open if you get time