Most recent kernel where this bug did not occur: none Distribution: Debian unstable Hardware Environment: Lenovo 3000 V200 Software Environment: Problem Description:Getting HSM violation messages, and then the driver resets. Ugly. Jul 13 04:57:51 v200 kernel: ata1.00: exception Emask 0x2 SAct 0x300 SErr 0x0 action 0x2 frozen Jul 13 04:57:51 v200 kernel: ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x300 FIS=004040a1:00000080) Jul 13 04:57:51 v200 kernel: ata1.00: cmd 61/28:40:c0:b5:84/00:00:08:00:00/40 tag 8 cdb 0x0 data 20480 out Jul 13 04:57:51 v200 kernel: res 40/00:48:f8:b5:84/00:00:08:00:00/40 Emask 0x2 (HSM violation) Jul 13 04:57:51 v200 kernel: ata1.00: cmd 61/30:48:f8:b5:84/00:00:08:00:00/40 tag 9 cdb 0x0 data 24576 out Jul 13 04:57:51 v200 kernel: res 40/00:48:f8:b5:84/00:00:08:00:00/40 Emask 0x2 (HSM violation) Jul 13 04:57:51 v200 kernel: ata1: soft resetting port Jul 13 04:57:51 v200 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Jul 13 04:57:51 v200 kernel: ata1.00: configured for UDMA/100 Jul 13 04:57:51 v200 kernel: ata1: EH complete Jul 13 04:57:51 v200 kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Jul 13 04:57:51 v200 kernel: sd 0:0:0:0: [sda] Write Protect is off Jul 13 04:57:51 v200 kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Jul 13 04:57:51 v200 kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 13 07:56:31 v200 kernel: ata1.00: exception Emask 0x2 SAct 0x300 SErr 0x0 action 0x2 frozen Jul 13 07:56:31 v200 kernel: ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x300 FIS=004040a1:00000080) Jul 13 07:56:31 v200 kernel: ata1.00: cmd 61/10:40:d8:b7:84/00:00:08:00:00/40 tag 8 cdb 0x0 data 8192 out Jul 13 07:56:31 v200 kernel: res 40/00:68:b0:9d:83/00:00:08:00:00/40 Emask 0x2 (HSM violation) Jul 13 07:56:31 v200 kernel: ata1.00: cmd 61/20:48:18:b8:84/00:00:08:00:00/40 tag 9 cdb 0x0 data 16384 out Jul 13 07:56:31 v200 kernel: res 40/00:68:b0:9d:83/00:00:08:00:00/40 Emask 0x2 (HSM violation) Jul 13 07:56:31 v200 kernel: ata1: soft resetting port Jul 13 07:56:31 v200 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Jul 13 07:56:31 v200 kernel: ata1.00: configured for UDMA/100 Jul 13 07:56:31 v200 kernel: ata1: EH complete Jul 13 07:56:31 v200 kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Jul 13 07:56:31 v200 kernel: sd 0:0:0:0: [sda] Write Protect is off Jul 13 07:56:31 v200 kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Jul 13 07:56:31 v200 kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Jul 13 09:50:15 v200 kernel: ata1.00: exception Emask 0x2 SAct 0x300 SErr 0x0 action 0x2 frozen Jul 13 09:50:15 v200 kernel: ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x300 FIS=004040a1:00000080) Jul 13 09:50:15 v200 kernel: ata1.00: cmd 61/18:40:50:e9:84/00:00:08:00:00/40 tag 8 cdb 0x0 data 12288 out Jul 13 09:50:15 v200 kernel: res 40/00:48:d8:ea:84/00:00:08:00:00/40 Emask 0x2 (HSM violation) Jul 13 09:50:15 v200 kernel: ata1.00: cmd 61/30:48:d8:ea:84/00:00:08:00:00/40 tag 9 cdb 0x0 data 24576 out Jul 13 09:50:15 v200 kernel: res 40/00:48:d8:ea:84/00:00:08:00:00/40 Emask 0x2 (HSM violation) Jul 13 09:50:15 v200 kernel: ata1: soft resetting port Jul 13 09:50:16 v200 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Jul 13 09:50:16 v200 kernel: ata1.00: configured for UDMA/100 Jul 13 09:50:16 v200 kernel: ata1: EH complete Jul 13 09:50:16 v200 kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Jul 13 09:50:16 v200 kernel: sd 0:0:0:0: [sda] Write Protect is off Jul 13 09:50:16 v200 kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Jul 13 09:50:16 v200 kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA # hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: ST9160821AS Serial Number: 5MA2N6RV Firmware Revision: 3.CLF Standards: Supported: 7 6 5 4 Likely used: 7 So, do we add { "ST9160821AS", NULL, ATA_HORKAGE_NONCQ, } to libata-core.c ? But someone who has the same disk on an HP laptop sez that he has no probem, so is this my chipset (Santa Rosa?) Help! Steps to reproduce: Wait long enough (several times a day)
It might have something to do with firmware revision. Can you get hdparm -I result of the other guy with the laptop?
I asked him to join us here. Meanwhile, look at http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e8914f11039c64d1/40b6026882549b3e It seems relevant. Sam.
Bill Stearns here (wstearns@pobox.com). I have 2 of those drives, set up with mirroring on an HP DV9225 laptop. For more details on the laptop, see http://www.stearns.org/doc/hp-dv9225us-fedora-6.html . One possibly relevant piece is that I have to use "noapic irqfixup irqpoll" to avoid a lockup problem; see "ACPI and lockups" in that web page. ACPI appears now to be unrelated. I've used most of the stock fedora kernels from fedora 6 and 7 and am currently running 2.6.21-1.3228.fc7. I'd be happy to provide any other details or run any tests that might be useful. /dev/sda: ATA device, with non-removable media Model Number: ST9160821AS Serial Number: 5MA0LZ8A Firmware Revision: 3.ALC Standards: Supported: 7 6 5 4 Likely used: 7 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 312581808 device size with M = 1024*1024: 152627 MBytes device size with M = 1000*1000: 160041 MBytes (160 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Advanced power management level: unknown setting (0x8080) Recommended acoustic management value: 254, current value: 0 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=240ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * DOWNLOAD_MICROCODE * Advanced Power Management feature set SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * IDLE_IMMEDIATE with UNLOAD * SATA-I signaling speed (1.5Gb/s) * Native Command Queueing (NCQ) * Phy event counters Device-initiated interface power management * Software settings preservation * SMART Command Transport (SCT) feature set Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count not supported: enhanced erase Checksum: correct
Bill Stearns, so you don't see those spurious interrupts message, right?
Syslog only goes back 4 weeks, but I don't have the word "spurious" in messages* or secure*, so no. Is there any chance that one of "noapic irqfixup irqpoll" is masking the problem on my system?
Bill, methinks he was asking if you saw any "HSM violations" in the logs.
I haven't seen any 'hsm viol' in the logs either.
Alright, then, I'll submit a patch to blacklist firmware 3.CLF. Thanks.
Patch submitted. Closing.
i'm having the same issue with 2.6.22.9: /dev/sda: ATA device, with non-removable media Model Number: ST9160821AS Serial Number: 5MA37BSD Firmware Revision: 3.CDD Standards: Supported: 7 6 5 4 Likely used: 7 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 312581808 device size with M = 1024*1024: 152627 MBytes device size with M = 1000*1000: 160041 MBytes (160 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 8 Advanced power management level: unknown setting (0x8080) Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=240ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * DOWNLOAD_MICROCODE * Advanced Power Management feature set SET_MAX security extension * Automatic Acoustic Management feature set * 48-bit Address feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * IDLE_IMMEDIATE with UNLOAD * SATA-I signaling speed (1.5Gb/s) * Native Command Queueing (NCQ) * Phy event counters Device-initiated interface power management * Software settings preservation * SMART Command Transport (SCT) feature set Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count not supported: enhanced erase Checksum: correct
Frank, please post kernel boot log and dmesg including error messages.
this is what i consider the relevant part of the boot process (let me know if you need the whole boot process log): [...] Oct 1 15:42:30 mescalito ahci 0000:00:1f.2: version 2.2 Oct 1 15:42:30 mescalito ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 17 Oct 1 15:42:30 mescalito ahci 0000:00:1f.2: nr_ports (3) and implemented port map (0x5) don't ma tch Oct 1 15:42:30 mescalito ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 3 ports 3 Gbps 0x5 impl SATA mode Oct 1 15:42:30 mescalito ahci 0000:00:1f.2: flags: 64bit ncq pm led clo pio slum part Oct 1 15:42:30 mescalito PCI: Setting latency timer of device 0000:00:1f.2 to 64 Oct 1 15:42:30 mescalito scsi0 : ahci Oct 1 15:42:30 mescalito scsi1 : ahci Oct 1 15:42:30 mescalito scsi2 : ahci Oct 1 15:42:30 mescalito ata1: SATA max UDMA/133 cmd 0xffffc20000026900 ctl 0x0000000000000000 b mdma 0x0000000000000000 irq 315 Oct 1 15:42:30 mescalito ata2: DUMMY Oct 1 15:42:30 mescalito ata3: SATA max UDMA/133 cmd 0xffffc20000026a00 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 315 Oct 1 15:42:30 mescalito ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 1 15:42:30 mescalito ata1.00: ATA-7: ST9160821AS, 3.CDD, max UDMA/133 Oct 1 15:42:30 mescalito ata1.00: 312581808 sectors, multi 8: LBA48 NCQ (depth 31/32) Oct 1 15:42:30 mescalito ata1.00: configured for UDMA/133 Oct 1 15:42:30 mescalito ata3: SATA link down (SStatus 0 SControl 300) Oct 1 15:42:30 mescalito scsi 0:0:0:0: Direct-Access ATA ST9160821AS 3.CD PQ: 0 ANSI: 5 Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] Write Protect is off Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] Write Protect is off Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't su pport DPO or FUA Oct 1 15:42:30 mescalito sda: sda1 sda2 sda3 sda4 < sda5 sda6 > Oct 1 15:42:30 mescalito sd 0:0:0:0: [sda] Attached SCSI disk Oct 1 15:42:30 mescalito ACPI: PCI Interrupt 0000:00:1a.7[C] -> GSI 22 (level, low) -> IRQ 22 Oct 1 15:42:30 mescalito PCI: Setting latency timer of device 0000:00:1a.7 to 64 [...] and the error messages: [...] Oct 1 15:57:29 mescalito ata1.00: exception Emask 0x2 SAct 0x180 SErr 0x0 action 0x2 frozen Oct 1 15:57:29 mescalito ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x180 FIS=0040 40a1:00000040) Oct 1 15:57:29 mescalito ata1.00: cmd 61/08:38:bf:79:fb/00:00:09:00:00/40 tag 7 cdb 0x0 data 409 6 out Oct 1 15:57:29 mescalito res 40/00:50:27:bb:c3/00:00:03:00:00/40 Emask 0x2 (HSM violation) Oct 1 15:57:29 mescalito ata1.00: cmd 61/50:40:e7:79:fb/00:00:09:00:00/40 tag 8 cdb 0x0 data 409 60 out Oct 1 15:57:29 mescalito res 40/00:50:27:bb:c3/00:00:03:00:00/40 Emask 0x2 (HSM violation) Oct 1 15:57:29 mescalito ata1: soft resetting port Oct 1 15:57:29 mescalito ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Oct 1 15:57:29 mescalito ata1.00: configured for UDMA/133 Oct 1 15:57:29 mescalito ata1: EH complete Oct 1 15:57:29 mescalito sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Oct 1 15:57:29 mescalito sd 0:0:0:0: [sda] Write Protect is off Oct 1 15:57:29 mescalito sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Oct 1 15:57:29 mescalito sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't su pport DPO or FUA [...]
it is fixed in 2.3.23-rcN . We will have an official 2.6.23 shortly, Linus willing.
i don't think so, from 2.6.23 final: [...] libata version 2.21 loaded. [...] ahci 0000:00:1f.2: version 2.3 ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 17 (level, low) -> IRQ 17 ahci 0000:00:1f.2: nr_ports (3) and implemented port map (0x5) don't match ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 3 ports 3 Gbps 0x5 impl SATA mode ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part PCI: Setting latency timer of device 0000:00:1f.2 to 64 scsi0 : ahci scsi1 : ahci scsi2 : ahci ata1: SATA max UDMA/133 cmd 0xffffc20000026900 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 315 ata2: DUMMY ata3: SATA max UDMA/133 cmd 0xffffc20000026a00 ctl 0x0000000000000000 bmdma 0x0000000000000000 irq 315 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7: ST9160821AS, 3.CDD, max UDMA/133 ata1.00: 312581808 sectors, multi 8: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata3: SATA link down (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA ST9160821AS 3.CD PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sda4 < sda5 sda6 > sd 0:0:0:0: [sda] Attached SCSI disk sd 0:0:0:0: Attached scsi generic sg0 type 0 ata_piix 0000:00:1f.1: version 2.12 ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 16 (level, low) -> IRQ 16 PCI: Setting latency timer of device 0000:00:1f.1 to 64 scsi3 : ata_piix scsi4 : ata_piix ata4: PATA max UDMA/100 cmd 0x00000000000101f0 ctl 0x00000000000103f6 bmdma 0x0000000000016fa0 irq 14 ata5: PATA max UDMA/100 cmd 0x0000000000010170 ctl 0x0000000000010376 bmdma 0x0000000000016fa8 irq 15 ata4.00: ATAPI: HL-DT-ST DVD+/-RW GSA-T11N, A103, max UDMA/33 ata4.00: configured for UDMA/33 ata5: port disabled. ignoring. scsi 3:0:0:0: CD-ROM HL-DT-ST DVD+-RW GSA-T11N A103 PQ: 0 ANSI: 5 sr0: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.20 sr 3:0:0:0: Attached scsi CD-ROM sr0 sr 3:0:0:0: Attached scsi generic sg1 type 5 [...] after running hdparm -t /dev/sda: [...] ata1.00: exception Emask 0x2 SAct 0x8 SErr 0x0 action 0x2 frozen ata1.00: spurious completions during NCQ issue=0x0 SAct=0x8 FIS=004040a1:00000004 ata1.00: cmd 60/f8:18:f8:9c:00/00:00:00:00:00/40 tag 3 cdb 0x0 data 126976 in res 40/00:18:f8:9c:00/00:00:00:00:00/40 Emask 0x2 (HSM violation) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA so maybe the patch was lost somewhere... what's the fix you are talking about? just a blacklist entry for drive/firmware or any modification in the NCQ code? where can i find the patch? thank you.
It's a simple patch to add blacklist entry. The patch was added to devel branch and will be merged into 2.6.24-rc1. I forgot to synchronize 2.6.23 branch. Will submit a patch. Thanks.
The original, Model=ST9160821AS, FwRev=3.CLF is indeed fixed, see lines 3784-5 in libata-core.c: { "ST9160821AS", "3.CLF", ATA_HORKAGE_NONCQ, }, { "ST3160812AS", "3.AD", ATA_HORKAGE_NONCQ, }, Apparently you need ANOTHER line, for 3.CD, or suchlike. Maybe all of these disks are bad, and we should go back to my original proposition of { "ST9160821AS", NULL, ATA_HORKAGE_NONCQ, } (Assuming NULL == '*'.) In any case, my HW is 3.CLF, and I am happy with 2.6.23.
I'm not sure whether the problem is with the firmware version or the drive model. I'll ask around. For the time being, patches to include 3.CCD and synchronize blacklist with upstream have been submitted. Thanks.