Latest working kernel version: probably none Earliest failing kernel version: dunno Distribution: openSUSE 10.3 Hardware Environment: PDC40775 (SATA 300 Tx2Plus) + 2 * SAMSUNG HD403J Problem Description: Under 2.6.24-rc8, when the drives are loaded with IO, frequent HSM violations occur. Steps to reproduce: Boot and fire up some IO.
This problem is originally reported in Novell bugzilla against SL103 (2.6.22 + 23-rcX libata). I asked the reporter to test ASIC PRD fix but it didn't make any difference. The reporter rolled 2.6.24-rc8 and it shows the same problem. I'm attaching the partial log. Will ask the original reporter to attach lspci and full boot log. Thanks.
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 ata4.00: port_status 0x20080000 ata4.00: cmd 25/00:b0:df:9e:b8/00:00:12:00:00/e0 tag 0 dma 90112 in res 50/00:00:8e:9f:b8/00:00:12:00:00/e0 Emask 0x2 (HSM violation) ata4.00: status: { DRDY } ata4: soft resetting link ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4.00: configured for UDMA/133 ata4: EH complete sd 3:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB) sd 3:0:0:0: [sdd] Write Protect is off sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00 sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 ata4.00: port_status 0x20080000 ata4.00: cmd 25/00:80:47:ba:b8/00:00:12:00:00/e0 tag 0 dma 65536 in res 50/00:00:c6:ba:b8/00:00:12:00:00/e0 Emask 0x2 (HSM violation) ata4.00: status: { DRDY } ata4: soft resetting link ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4.00: configured for UDMA/133 ata4: EH complete sd 3:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB) sd 3:0:0:0: [sdd] Write Protect is off sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00 sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 ata4.00: port_status 0x20080000 ata4.00: cmd 25/00:88:df:97:e4/00:01:2a:00:00/e0 tag 0 dma 200704 in res 50/00:00:66:99:e4/00:00:2a:00:00/e0 Emask 0x2 (HSM violation) ata4.00: status: { DRDY } ata4: soft resetting link ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata4.00: configured for UDMA/133 ata4: EH complete sd 3:0:0:0: [sdd] 781422768 512-byte hardware sectors (400088 MB) sd 3:0:0:0: [sdd] Write Protect is off sd 3:0:0:0: [sdd] Mode Sense: 00 3a 00 00 sd 3:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Created attachment 14586 [details] lspci output
Created attachment 14587 [details] dmesg startet at boot-time
Created attachment 14588 [details] Boot-log
Please find the requested details as uploaded files * output of lspci * dmesg * /var/log/boot.msg
(In reply to comment #1) > This problem is originally reported in Novell bugzilla against SL103 (2.6.22 > + > 23-rcX libata). I asked the reporter to test ASIC PRD fix but it didn't make > any difference. The reporter rolled 2.6.24-rc8 and it shows the same > problem. > I'm attaching the partial log. Will ask the original reporter to attach > lspci > and full boot log. Thanks. In this case I think the problem is one of PSU overloading or a poor chipset. Look at the system components: - a very old VIA chipset; nowadays the're Ok but way back they weren't - apparently neither a local APIC nor an I/O APIC; having those would for instance reduce interrupt handling overheads - VIA pata (the mainboard) controlling a DVD drive and two Maxtor disks - Promise SATA (addon) controlling two Samsung sata disks - two SCSI controllers, or one masquerading as two, with two scsi tape drives - a presumably addon r8169 gigaether controller If the PSU is as old as the mainboard, I'd definitely be concerned about the system's power budget. And if that's not the case, I'd still be worried about running a raid on the PCI bus of such an old VIA chipset. The ata exceptions on sata_promise only occur on the second Samsung disk, and they list a DRDY status which I don't remember being in the "ASIC bug" logs. The error data (port_status 0x20080000) says "overrun error in packet cycle", which means that the disk asserted INTRQ before the entire SG data had been transmitted. The disk got unhappy, but we don't know why. The only thing I can suggest right now is to try to change the system in various ways to see if anything makes a difference. Like: - try another more modern mainboard if at all possible - try a newer and more powerful PSU - try different SATA cables - try another disk than the one on ata4 that threw the errors - try removing the SCSI and DVD stuff (to reduce power consumption) (Minor nit: PDC40775 is the SATA300 TX4, the 300 TX2 plus is the PDC20775. Looks like lspci has a bug.)
We change the PSU one week ago to a new one, 460W - with the same result. Changing the mainboard... could be a problem We will change the SATA cables as the next step The one SCSI-controller (contains two lines) is connected to 1 external tapedrive and 1 internal tapedrive. But the PSU should not be the problem with 460W... for * mainbord (incl. network+graphic-card) * 2x PATA-disks * 2x SATA-disks * 1x SCSI-controller * 1x internal tape
Any updates on this problem? Christian, how is it working after hw updates?
Same problem. Recogniced against opensuse 10.3 and 11.1alpha Aug 18 20:44:48 linux kernel: ata1.00: exception Emask 0x10 SAct 0x2 SErr 0x380000 action 0x6 frozen Aug 18 20:44:48 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error Aug 18 20:44:48 linux kernel: ata1: SError: { 10B8B Dispar BadCRC } Aug 18 20:44:48 linux kernel: ata1.00: cmd 60/58:08:e7:00:00/00:00:00:00:00/40 tag 1 ncq 45056 in Aug 18 20:44:48 linux kernel: res 40/00:08:e7:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) Aug 18 20:44:48 linux kernel: ata1.00: status: { DRDY } Aug 18 20:44:48 linux kernel: ata1: hard resetting link Aug 18 20:44:48 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Aug 18 20:44:48 linux kernel: ata1.00: configured for UDMA/133 Aug 18 20:44:48 linux kernel: ata1: EH complete Aug 18 20:44:48 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Aug 18 20:44:48 linux kernel: sd 0:0:0:0: [sda] Write Protect is off Aug 18 20:44:48 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 18 20:44:48 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 18 20:44:48 linux kernel: ata1.00: exception Emask 0x10 SAct 0x6 SErr 0x180000 action 0x6 frozen Aug 18 20:44:48 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error Aug 18 20:44:48 linux kernel: ata1: SError: { 10B8B Dispar } Aug 18 20:44:48 linux kernel: ata1.00: cmd 60/a8:08:fa:43:c0/00:00:02:00:00/40 tag 1 ncq 86016 in Aug 18 20:44:48 linux kernel: res 40/00:10:a2:44:c0/00:00:02:00:00/40 Emask 0x10 (ATA bus error) Aug 18 20:44:48 linux kernel: ata1.00: status: { DRDY } Aug 18 20:44:48 linux kernel: ata1.00: cmd 60/57:10:a2:44:c0/00:00:02:00:00/40 tag 2 ncq 44544 in Aug 18 20:44:48 linux kernel: res 40/00:10:a2:44:c0/00:00:02:00:00/40 Emask 0x10 (ATA bus error) Aug 18 20:44:48 linux kernel: ata1.00: status: { DRDY } Aug 18 20:44:48 linux kernel: ata1: hard resetting link Aug 18 20:44:49 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Aug 18 20:44:49 linux kernel: ata1.00: configured for UDMA/133 Aug 18 20:44:49 linux kernel: ata1: EH complete Aug 18 20:44:49 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Aug 18 20:44:49 linux kernel: sd 0:0:0:0: [sda] Write Protect is off Aug 18 20:44:49 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 18 20:44:49 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 18 20:44:49 linux kernel: ata1.00: exception Emask 0x10 SAct 0x2 SErr 0x380000 action 0x6 frozen Aug 18 20:44:49 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error Aug 18 20:44:49 linux kernel: ata1: SError: { 10B8B Dispar BadCRC } Aug 18 20:44:49 linux kernel: ata1.00: cmd 60/fc:08:c9:ac:c0/00:00:0c:00:00/40 tag 1 ncq 129024 in Aug 18 20:44:49 linux kernel: res 40/00:08:c9:ac:c0/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Aug 18 20:44:49 linux kernel: ata1.00: status: { DRDY } Aug 18 20:44:49 linux kernel: ata1: hard resetting link Aug 18 20:44:50 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Aug 18 20:44:50 linux kernel: ata1.00: configured for UDMA/133 Aug 18 20:44:50 linux kernel: ata1: EH complete Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Write Protect is off Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 18 20:44:50 linux kernel: ata1.00: limiting speed to UDMA/100:PIO4 Aug 18 20:44:50 linux kernel: ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x380000 action 0x6 frozen Aug 18 20:44:50 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error Aug 18 20:44:50 linux kernel: ata1: SError: { 10B8B Dispar BadCRC } Aug 18 20:44:50 linux kernel: ata1.00: cmd 60/fc:00:c9:ac:c0/00:00:0c:00:00/40 tag 0 ncq 129024 in Aug 18 20:44:50 linux kernel: res 40/00:00:c9:ac:c0/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Aug 18 20:44:50 linux kernel: ata1.00: status: { DRDY } Aug 18 20:44:50 linux kernel: ata1: hard resetting link Aug 18 20:44:50 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Aug 18 20:44:50 linux kernel: ata1.00: configured for UDMA/100 Aug 18 20:44:50 linux kernel: ata1: EH complete Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Write Protect is off Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 18 20:44:50 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 18 20:44:50 linux kernel: ata1.00: exception Emask 0x10 SAct 0x1 SErr 0x380000 action 0x6 frozen Aug 18 20:44:50 linux kernel: ata1.00: irq_stat 0x08000000, interface fatal error Aug 18 20:44:50 linux kernel: ata1: SError: { 10B8B Dispar BadCRC } Aug 18 20:44:50 linux kernel: ata1.00: cmd 60/fc:00:c9:ac:c0/00:00:0c:00:00/40 tag 0 ncq 129024 in Aug 18 20:44:50 linux kernel: res 40/00:00:c9:ac:c0/00:00:0c:00:00/40 Emask 0x10 (ATA bus error) Aug 18 20:44:50 linux kernel: ata1.00: status: { DRDY } Aug 18 20:44:50 linux kernel: ata1: hard resetting link Aug 18 20:44:51 linux kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Aug 18 20:44:51 linux kernel: ata1.00: configured for UDMA/100 Aug 18 20:44:51 linux kernel: ata1: EH complete Aug 18 20:44:51 linux kernel: sd 0:0:0:0: [sda] 312581808 512-byte hardware sectors (160042 MB) Aug 18 20:44:51 linux kernel: sd 0:0:0:0: [sda] Write Protect is off Aug 18 20:44:51 linux kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 Aug 18 20:44:51 linux kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> In this case I think the problem is one of PSU overloading or a poor chipset. > Look at the system components: > - a very old VIA chipset; nowadays the're Ok but way back they weren't > - apparently neither a local APIC nor an I/O APIC; having those would for > instance reduce interrupt handling overheads > - VIA pata (the mainboard) controlling a DVD drive and two Maxtor disks Whats the matter with two Maxtor drives? I had two. But it worked with a change from Sata1 to Sata4. But then I removed 2nd Maxtor drive. Is the Sata1 now broken or what causes this malfunction. My chipset is a MSI k9n neo v3
Daniel, the controller is reporting transmission errors via SError on your machine. That's really a hardware problem. As for the original HSM violation, I don't have much idea left. :-(