Problem Description: Kernel 2.6.22.x leaves occasional errors in the log: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x2 frozen ata1.00: cmd 61/02:00:69:f6:36/00:00:00:00:00/40 tag 0 cdb 0x0 data 1024 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: soft resetting port ata1: softreset failed (port busy but CLO unavailable) ata1: reset failed (errno=-95), retrying in 10 secs ata1: hard resetting port ata1: port is slow to respond, please be patient (Status 0x80) ata1: COMRESET failed (errno=-16) ata1: hard resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA hdparm -i /dev/sda: Model=ST380011AS , FwRev=3.00 , Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=?16? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156301488 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 AdvancedPM=no WriteCache=enabled Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2: This is similar to other reports, so I suggest adding it to the blacklist: { "ST380011AS", "3.00", ATA_HORKAGE_NONCQ, }, (haven't checked yet if it works) Regards, hjb
Reply-To: akpm@linux-foundation.org On Mon, 5 Nov 2007 10:31:20 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > (haven't checked yet if it works) When you have done so, please send a tested patch to Jeff Garzik <jeff@garzik.org> Andrew Morton <akpm@linux-foundation.org> linux-ide@vger.kernel.org thanks.
A timeout tells us nothing, sorta like this bug report :) There is definitely -zero- information implying that this should be added to an NCQ blacklist. Furthermore, on hardware like Intel ICH, a timeout is the only way we have to know that a DMA error occurred (it only sends an interrupt on success; assumes OS will notice failure via timeout). It always recovers from the timeout, so I don't see any problem here. Of course, a full dmesg and lspci might shed additional light.
The driver is complaining about missing CLO support, so the controller seems to be a non-intel variant of ahci, or if an intel one, a pretty early one. Anyways, mostly likely cause is hardware issues. * Does 'smartctl -a /dev/sdX' indicate any problem? * Perform common hardware debugging - swap / reseat cables, connect harddrive to separate power supply, etc.
We have several of these machines, only the two with new ATA drivers make problems. The others run a 2.6.13.5 kernel. smartctl doesn't indicate problems. Suggested patch: --- drivers/ata/libata-core.c.orig 2007-10-22 11:34:23.000000000 +0200 +++ drivers/ata/libata-core.c 2007-11-09 19:05:31.000000000 +0100 @@ -3789,6 +3789,7 @@ /* NCQ is broken */ { "Maxtor 6L250S0", "BANC1G10", ATA_HORKAGE_NONCQ }, { "Maxtor 6B200M0", "BANC1B10", ATA_HORKAGE_NONCQ }, + { "ST380011AS", "3.00", ATA_HORKAGE_NONCQ }, /* NCQ hard hangs device under heavier load, needs hard power cycle */ { "Maxtor 6B250S0", "BANC1B70", ATA_HORKAGE_NONCQ }, /* Blacklist entries taken from Silicon Image 3124/3132 I attach lspci, dmesg output (before and after patch)
Created attachment 13482 [details] Output of lcpci -v
Created attachment 13483 [details] dmesg before patch
Created attachment 13484 [details] dmesg after patch
I see, it's a ICH6. It could be that the cause is the ICH6 not the drive. Can you please connect another NCQ capable drive to the controller and see if the same problem occurs? Or even better, connect ST380011AS to another NCQ capable controller and see whether it works. If ICH6 AHCI turns out to be the culprit, we'll need to turn off NCQ support for the controller not the drive. Thanks.
Hans, Any update on this, Have you been able to try recommended in #8?
I'm sorry, I didn't have the right hardware combination for further tests. I'll try to find some next week.
No activity for months, closing. Please re-open if you get time