Bug 9819 - sata_promise: frequent HSM violations
Summary: sata_promise: frequent HSM violations
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jeff Garzik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-25 18:07 UTC by Tejun Heo
Modified: 2012-05-17 15:30 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.24-rc8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci output (18.96 KB, text/plain)
2008-01-26 06:11 UTC, Christian Kuehn
Details
dmesg startet at boot-time (23.10 KB, application/octet-stream)
2008-01-26 06:12 UTC, Christian Kuehn
Details
Boot-log (28.86 KB, application/octet-stream)
2008-01-26 06:13 UTC, Christian Kuehn
Details

Description Tejun Heo 2008-01-25 18:07:33 UTC
Latest working kernel version: probably none
Earliest failing kernel version: dunno
Distribution: openSUSE 10.3
Hardware Environment: PDC40775 (SATA 300 Tx2Plus) + 2 * SAMSUNG HD403J
Problem Description:

Under 2.6.24-rc8, when the drives are loaded with IO, frequent HSM violations occur.

Steps to reproduce:

Boot and fire up some IO.
Comment 1 Tejun Heo 2008-01-25 18:09:57 UTC
This problem is originally reported in Novell bugzilla against SL103 (2.6.22 + 23-rcX libata).  I asked the reporter to test ASIC PRD fix but it didn't make any difference.  The reporter rolled 2.6.24-rc8 and it shows the same problem.  I'm attaching the partial log.  Will ask the original reporter to attach lspci and full boot log.  Thanks.
Comment 2 Tejun Heo 2008-01-25 18:10:17 UTC
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata4.00: port_status 0x20080000
ata4.00: cmd 25/00:b0:df:9e:b8/00:00:12:00:00/e0 tag 0 dma 90112 in
         res 50/00:00:8e:9f:b8/00:00:12:00:00/e0 Emask 0x2 (HSM violation)
ata4.00: status: { DRDY }
ata4: soft resetting link
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: configured for UDMA/133
ata4: EH complete
sd 3:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata4.00: port_status 0x20080000
ata4.00: cmd 25/00:80:47:ba:b8/00:00:12:00:00/e0 tag 0 dma 65536 in
         res 50/00:00:c6:ba:b8/00:00:12:00:00/e0 Emask 0x2 (HSM violation)
ata4.00: status: { DRDY }
ata4: soft resetting link
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: configured for UDMA/133
ata4: EH complete
sd 3:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2
ata4.00: port_status 0x20080000
ata4.00: cmd 25/00:88:df:97:e4/00:01:2a:00:00/e0 tag 0 dma 200704 in
         res 50/00:00:66:99:e4/00:00:2a:00:00/e0 Emask 0x2 (HSM violation)
ata4.00: status: { DRDY }
ata4: soft resetting link
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.00: configured for UDMA/133
ata4: EH complete
sd 3:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB)
sd 3:0:0:0: [sdd] Write Protect is off
sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support
DPO or FUA
Comment 3 Christian Kuehn 2008-01-26 06:11:18 UTC
Created attachment 14586 [details]
lspci output
Comment 4 Christian Kuehn 2008-01-26 06:12:28 UTC
Created attachment 14587 [details]
dmesg startet at boot-time
Comment 5 Christian Kuehn 2008-01-26 06:13:31 UTC
Created attachment 14588 [details]
Boot-log
Comment 6 Christian Kuehn 2008-01-26 06:15:10 UTC
Please find the requested details as uploaded files

* output of lspci
* dmesg 
* /var/log/boot.msg
Comment 7 Mikael Pettersson 2008-01-26 14:28:44 UTC
(In reply to comment #1)
> This problem is originally reported in Novell bugzilla against SL103 (2.6.22
> +
> 23-rcX libata).  I asked the reporter to test ASIC PRD fix but it didn't make
> any difference.  The reporter rolled 2.6.24-rc8 and it shows the same
> problem. 
> I'm attaching the partial log.  Will ask the original reporter to attach
> lspci
> and full boot log.  Thanks.

In this case I think the problem is one of PSU overloading or a poor chipset. Look at the system components:
- a very old VIA chipset; nowadays the're Ok but way back they weren't
- apparently neither a local APIC nor an I/O APIC; having those would for      instance reduce interrupt handling overheads
- VIA pata (the mainboard) controlling a DVD drive and two Maxtor disks
- Promise SATA (addon) controlling two Samsung sata disks
- two SCSI controllers, or one masquerading as two, with two scsi tape drives
- a presumably addon r8169 gigaether controller
If the PSU is as old as the mainboard, I'd definitely be concerned about the system's power budget. And if that's not the case, I'd still be worried about running a raid on the PCI bus of such an old VIA chipset.

The ata exceptions on sata_promise only occur on the second Samsung disk, and they list a DRDY status which I don't remember being in the "ASIC bug" logs.
The error data (port_status 0x20080000) says "overrun error in packet cycle", which means that the disk asserted INTRQ before the entire SG data had been transmitted. The disk got unhappy, but we don't know why.

The only thing I can suggest right now is to try to change the system in various ways to see if anything makes a difference. Like:
- try another more modern mainboard if at all possible
- try a newer and more powerful PSU
- try different SATA cables
- try another disk than the one on ata4 that threw the errors
- try removing the SCSI and DVD stuff (to reduce power consumption)

(Minor nit: PDC40775 is the SATA300 TX4, the 300 TX2 plus is the PDC20775. Looks like lspci has a bug.)
Comment 8 Christian Kuehn 2008-01-27 03:54:10 UTC
We change the PSU one week ago to a new one, 460W - with the same result.

Changing the mainboard... could be a problem

We will change the SATA cables as the next step

The one SCSI-controller (contains two lines) is connected to 1 external tapedrive and 1 internal tapedrive.

But the PSU should not be the problem with 460W... for
* mainbord (incl. network+graphic-card)
* 2x PATA-disks
* 2x SATA-disks
* 1x SCSI-controller
* 1x internal tape
Comment 9 Natalie Protasevich 2008-06-02 15:24:07 UTC
Any updates on this problem? Christian, how is it working after hw updates?
Comment 10 Daniel Fuhrmann 2008-08-18 14:50:15 UTC
Same problem.

Recogniced against opensuse 10.3 and 11.1alpha

Aug 18 20:44:48 linux kernel: ata1.00: exception Emask 0x10 SAct 0x2 SErr 0x380000 action 0x6 frozen
Aug 18 20:44:48 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 18 20:44:48 linux kernel: ata1: SError: { 10B8B Dispar BadCRC }
Aug 18 20:44:48 linux kernel: ata1.00: cmd 60/58:08:e7:00:00/00:00:00:00:00/40 tag 1 ncq 45056 in
Aug 18 20:44:48 linux kernel:          res 40/00:08:e7:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Aug 18 20:44:48 linux kernel: ata1.00: status: { DRDY }
Aug 18 20:44:48 linux kernel: ata1: hard resetting link
Aug 18 20:44:48 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 18 20:44:48 linux kernel: ata1.00: configured for UDMA/133
Aug 18 20:44:48 linux kernel: ata1: EH complete
Aug 18 20:44:48 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Aug 18 20:44:48 linux kernel: sd 0:0:0:0: [sda] Write Protect is off
Aug 18 20:44:48 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Aug 18 20:44:48 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 18 20:44:48 linux kernel: ata1.00: exception Emask 0x10 SAct 0x6 SErr 0x180000 action 0x6 frozen
Aug 18 20:44:48 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 18 20:44:48 linux kernel: ata1: SError: { 10B8B Dispar }
Aug 18 20:44:48 linux kernel: ata1.00: cmd 60/a8:08:fa:43:c0/00:00:02:00:00/40 tag 1 ncq 86016 in
Aug 18 20:44:48 linux kernel:          res 40/00:10:a2:44:c0/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Aug 18 20:44:48 linux kernel: ata1.00: status: { DRDY }
Aug 18 20:44:48 linux kernel: ata1.00: cmd 60/57:10:a2:44:c0/00:00:02:00:00/40 tag 2 ncq 44544 in
Aug 18 20:44:48 linux kernel:          res 40/00:10:a2:44:c0/00:00:02:00:00/40 Emask 0x10 (ATA bus error)
Aug 18 20:44:48 linux kernel: ata1.00: status: { DRDY }
Aug 18 20:44:48 linux kernel: ata1: hard resetting link
Aug 18 20:44:49 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 18 20:44:49 linux kernel: ata1.00: configured for UDMA/133
Aug 18 20:44:49 linux kernel: ata1: EH complete
Aug 18 20:44:49 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Aug 18 20:44:49 linux kernel: sd 0:0:0:0: [sda] Write Protect is off
Aug 18 20:44:49 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Aug 18 20:44:49 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 18 20:44:49 linux kernel: ata1.00: exception Emask 0x10 SAct 0x2 SErr 0x380000 action 0x6 frozen
Aug 18 20:44:49 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 18 20:44:49 linux kernel: ata1: SError: { 10B8B Dispar BadCRC }
Aug 18 20:44:49 linux kernel: ata1.00: cmd 60/fc:08:c9:ac:c0/00:00:0c:00:00/40 tag 1 ncq 129024 in
Aug 18 20:44:49 linux kernel:          res 40/00:08:c9:ac:c0/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Aug 18 20:44:49 linux kernel: ata1.00: status: { DRDY }
Aug 18 20:44:49 linux kernel: ata1: hard resetting link
Aug 18 20:44:50 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 18 20:44:50 linux kernel: ata1.00: configured for UDMA/133
Aug 18 20:44:50 linux kernel: ata1: EH complete
Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Write Protect is off
Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 18 20:44:50 linux kernel: ata1.00: limiting speed to UDMA/100:PIO4
Aug 18 20:44:50 linux kernel: ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x380000 action 0x6 frozen
Aug 18 20:44:50 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 18 20:44:50 linux kernel: ata1: SError: { 10B8B Dispar BadCRC }
Aug 18 20:44:50 linux kernel: ata1.00: cmd 60/fc:00:c9:ac:c0/00:00:0c:00:00/40 tag 0 ncq 129024 in
Aug 18 20:44:50 linux kernel:          res 40/00:00:c9:ac:c0/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Aug 18 20:44:50 linux kernel: ata1.00: status: { DRDY }
Aug 18 20:44:50 linux kernel: ata1: hard resetting link
Aug 18 20:44:50 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 18 20:44:50 linux kernel: ata1.00: configured for UDMA/100
Aug 18 20:44:50 linux kernel: ata1: EH complete
Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Write Protect is off
Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 18 20:44:50 linux kernel: ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x380000 action 0x6 frozen
Aug 18 20:44:50 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Aug 18 20:44:50 linux kernel: ata1: SError: { 10B8B Dispar BadCRC }
Aug 18 20:44:50 linux kernel: ata1.00: cmd 60/fc:00:c9:ac:c0/00:00:0c:00:00/40 tag 0 ncq 129024 in
Aug 18 20:44:50 linux kernel:          res 40/00:00:c9:ac:c0/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Aug 18 20:44:50 linux kernel: ata1.00: status: { DRDY }
Aug 18 20:44:50 linux kernel: ata1: hard resetting link
Aug 18 20:44:51 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 18 20:44:51 linux kernel: ata1.00: configured for UDMA/100
Aug 18 20:44:51 linux kernel: ata1: EH complete
Aug 18 20:44:51 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB)
Aug 18 20:44:51 linux kernel: sd 0:0:0:0: [sda] Write Protect is off
Aug 18 20:44:51 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Aug 18 20:44:51 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Comment 11 Daniel Fuhrmann 2008-08-19 02:15:03 UTC
> In this case I think the problem is one of PSU overloading or a poor chipset.
> Look at the system components:
> - a very old VIA chipset; nowadays the're Ok but way back they weren't
> - apparently neither a local APIC nor an I/O APIC; having those would for     
> instance reduce interrupt handling overheads
> - VIA pata (the mainboard) controlling a DVD drive and two Maxtor disks

Whats the matter with two Maxtor drives? I had two.
But it worked with a change from Sata1 to Sata4. But then I removed 2nd Maxtor drive. Is the Sata1 now broken or what causes this malfunction.

My chipset is a MSI k9n neo v3
Comment 12 Tejun Heo 2008-08-19 21:35:59 UTC
Daniel, the controller is reporting transmission errors via SError on your machine.  That's really a hardware problem.  As for the original HSM violation, I don't have much idea left.  :-(

Note You need to log in before you can comment on or make changes to this bug.