Bug 212695
Summary: | ASMedia ASM1062 needs MPS = 256 quirk | ||
---|---|---|---|
Product: | Drivers | Reporter: | Marek Behún (kabel) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | NEW --- | ||
Severity: | normal | CC: | clement, kabel, laurent, pali |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://lore.kernel.org/linux-pci/20210317115924.31885-1-kabel@kernel.org/ | ||
Kernel Version: | all | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
lspci-without-quirk.txt
lspci-with-quirk.txt config.txt lspci-nnvv ASRockRack TRX40D8-2N2T lspci-nnvv ASRockRack TRX40D8-2N2T with 5.4.157 lspci post crash |
Description
Marek Behún
2021-04-16 13:51:45 UTC
Created attachment 296407 [details]
lspci-with-quirk.txt
Created attachment 296409 [details]
config.txt
AsrockRack TRX40D8-2N2T and X470D4U have an asmedia ASM1062 SATA controller that control two ports. On these motherboard we loose after a while SATA SSD plugged in those ports. Same SSD disks when plugged in other ports (non ASM1062) have no issue. Swapping cables does not resolve the issue on the ASM1062. 25:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02) On both motherboards the PCI bridge has DevCap: MaxPayload 512 bytes so it looks like it may be the same issue. Log below of SATA SSD being removed. [125768.573175] ata1.00: exception Emask 0x0 SAct 0x400040 SErr 0x0 action 0x6 frozen [125768.573204] ata1.00: failed command: WRITE FPDMA QUEUED [125768.573219] ata1.00: cmd 61/00:30:88:31:3e/01:00:0d:00:00/40 tag 6 ncq dma 131072 out res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [125768.573246] ata1.00: status: { DRDY } [125768.573256] ata1.00: failed command: WRITE FPDMA QUEUED [125768.573270] ata1.00: cmd 61/10:b0:88:32:3e/00:00:0d:00:00/40 tag 22 ncq dma 8192 out res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [125768.573303] ata1.00: status: { DRDY } [125768.573313] ata1: hard resetting link [125768.573340] ata2.00: exception Emask 0x0 SAct 0x8010000 SErr 0x0 action 0x6 frozen [125768.573368] ata2.00: failed command: WRITE FPDMA QUEUED [125768.573384] ata2.00: cmd 61/00:80:28:31:3e/01:00:0d:00:00/40 tag 16 ncq dma 131072 out res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [125768.573418] ata2.00: status: { DRDY } [125768.573428] ata2.00: failed command: WRITE FPDMA QUEUED [125768.573443] ata2.00: cmd 61/00:d8:28:32:3e/01:00:0d:00:00/40 tag 27 ncq dma 131072 out res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [125768.573998] ata2.00: status: { DRDY } [125768.574470] ata2: hard resetting link [125778.573335] ata2: softreset failed (1st FIS failed) [125778.573806] ata2: hard resetting link [125778.574223] ata1: softreset failed (1st FIS failed) [125778.574802] ata1: hard resetting link [125788.573341] ata2: softreset failed (1st FIS failed) [125788.573812] ata2: hard resetting link [125788.574225] ata1: softreset failed (1st FIS failed) [125788.574826] ata1: hard resetting link [125823.573875] ata2: softreset failed (1st FIS failed) [125823.574266] ata2: limiting SATA link speed to 3.0 Gbps [125823.574572] ata2: hard resetting link [125823.574896] ata1: softreset failed (1st FIS failed) [125823.576037] ata1: limiting SATA link speed to 3.0 Gbps [125823.576479] ata1: hard resetting link [125828.573692] ata2: softreset failed (1st FIS failed) [125828.574091] ata2: reset failed, giving up [125828.574407] ata2.00: disabled [125828.574733] sd 1:0:0:0: [sdb] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [125828.574740] ata1: softreset failed (1st FIS failed) [125828.575059] sd 1:0:0:0: [sdb] tag#16 Sense Key : Not Ready [current] [125828.575598] ata1: reset failed, giving up [125828.576081] sd 1:0:0:0: [sdb] tag#16 Add. Sense: Logical unit not ready, hard reset required [125828.576588] ata1.00: disabled This issue was fixed/workarounded in following commit (in 5.15) which forces MPS to 256 bytes: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b12d93e9958e028856cbcb061b6e64728ca07755 And also it was backported to stable kernel versions: 5.14.6, 5.13.19, 5.10.67, 5.4.148, 4.19.207, 4.14.247, 4.9.283 and 4.4.284 Laurent, please provide "lspci -nn -vv" log output, update kernel to some patched version and also check if issue is still there. Ideally check also if some PCIe AER message was logged before/after failure. Pali, thanks for all the info. I've pinged my kernel provider to update to 5.4.148 or later, I'll report here if it fixes my issue: it happens after a few hours of heavy I/O so I'll be able to tell after a day of so if it's fixed. lspci-nnvv.txt problematic asmedia is 49:00.0 Created attachment 299191 [details]
lspci-nnvv ASRockRack TRX40D8-2N2T
grep -i aer /var/log/kern.log returns only init info. Only errors are ATA at the time of the failure Oct 9 02:59:33 p kernel: [125768.573175] ata1.00: exception Emask 0x0 SAct 0x400040 SErr 0x0 action 0x6 frozen Oct 9 02:59:33 p kernel: [125768.573204] ata1.00: failed command: WRITE FPDMA QUEUED Oct 9 02:59:33 p kernel: [125768.573219] ata1.00: cmd 61/00:30:88:31:3e/01:00:0d:00:00/40 tag 6 ncq dma 131072 out Oct 9 02:59:33 p kernel: [125768.573219] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 9 02:59:33 p kernel: [125768.573246] ata1.00: status: { DRDY } Oct 9 02:59:33 p kernel: [125768.573256] ata1.00: failed command: WRITE FPDMA QUEUED Oct 9 02:59:33 p kernel: [125768.573270] ata1.00: cmd 61/10:b0:88:32:3e/00:00:0d:00:00/40 tag 22 ncq dma 8192 out Oct 9 02:59:33 p kernel: [125768.573270] res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 9 02:59:33 p kernel: [125768.573303] ata1.00: status: { DRDY } Oct 9 02:59:33 p kernel: [125768.573313] ata1: hard resetting link Ok! If crash happens again and you are able, try to provide new lspci output again. Comparing PCIe registers from outputs before crash (which you have already posted) and after crash could bring some new information... Created attachment 300021 [details]
lspci-nnvv ASRockRack TRX40D8-2N2T with 5.4.157
After 7 days of running 5.4.157 no issue so far, new lspci attached but looks identical to the older one
Unfortunately it didn't last and even with 5.15.64-1-pve I have the failure. I purchased a PCIe storage card: https://www.startech.com/fr-fr/cartes-additionelles-et-peripheriques/8p6g-pcie-sata-card Unluckily for me it also has ASM1062 on it: 51:00.0 0106: 1b21:0612 (rev 02) 52:00.0 0106: 1b21:0612 (rev 02) 53:00.0 0106: 1b21:0612 (rev 02) 54:00.0 0106: 1b21:0612 (rev 02) And disks plugged into it disappear with the same error after a few hours. Created attachment 303083 [details]
lspci post crash
Hi, Seems that this problem also affects ASM1061, and I'm guessing this patch only applies to ASM1062 ? Even worst in my instance the affected machine completly freezes. Is there any other way to have this patch applied other than re-compiling a custon kernel ? Mar 27 21:51:40 pve2 kernel: [ 1349.324899] ata7.00: exception Emask 0x73 SAct 0x1c000 SErr 0xffffffff action 0xe frozen Mar 27 21:51:40 pve2 kernel: [ 1349.324928] ata7.00: irq_stat 0xffffffff, unknown FIS 00000000 00000000 00000000 00000000, host bus Mar 27 21:51:40 pve2 kernel: [ 1349.324946] ata7: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch } Mar 27 21:51:40 pve2 kernel: [ 1349.324974] ata7.00: failed command: WRITE FPDMA QUEUED Mar 27 21:51:40 pve2 kernel: [ 1349.324987] ata7.00: cmd 61/58:70:80:72:e7/00:00:8f:00:00/40 tag 14 ncq dma 45056 out Mar 27 21:51:40 pve2 kernel: [ 1349.324987] res 40/00:80:30:73:e7/00:00:8f:00:00/40 Emask 0x72 (host bus error) Mar 27 21:51:40 pve2 kernel: [ 1349.325019] ata7.00: status: { DRDY } Mar 27 21:51:40 pve2 kernel: [ 1349.325032] ata7.00: failed command: WRITE FPDMA QUEUED Mar 27 21:51:40 pve2 kernel: [ 1349.325047] ata7.00: cmd 61/58:78:d8:72:e7/00:00:8f:00:00/40 tag 15 ncq dma 45056 out Mar 27 21:51:40 pve2 kernel: [ 1349.325047] res 40/00:80:30:73:e7/00:00:8f:00:00/40 Emask 0x72 (host bus error) Mar 27 21:51:40 pve2 kernel: [ 1349.325079] ata7.00: status: { DRDY } Mar 27 21:51:40 pve2 kernel: [ 1349.325094] ata7.00: failed command: WRITE FPDMA QUEUED Mar 27 21:51:40 pve2 kernel: [ 1349.325109] ata7.00: cmd 61/d0:80:30:73:e7/04:00:8f:00:00/40 tag 16 ncq dma 630784 out Mar 27 21:51:40 pve2 kernel: [ 1349.325109] res 40/00:80:30:73:e7/00:00:8f:00:00/40 Emask 0x72 (host bus error) Mar 27 21:51:40 pve2 kernel: [ 1349.325140] ata7.00: status: { DRDY } Mar 27 21:51:40 pve2 kernel: [ 1349.325157] ata7: hard resetting link Mar 27 21:51:40 pve2 kernel: [ 1349.375033] ahci 0000:05:00.0: AHCI controller unavailable! Mar 27 21:51:41 pve2 kernel: [ 1349.675685] ata8.00: exception Emask 0x73 SAct 0x800000 SErr 0xffffffff action 0xe frozen Mar 27 21:51:41 pve2 kernel: [ 1349.675709] ata8.00: irq_stat 0xffffffff, unknown FIS 00000000 00000000 00000000 00000000, host bus Mar 27 21:51:41 pve2 kernel: [ 1349.675726] ata8: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch } Mar 27 21:51:41 pve2 kernel: [ 1349.675753] ata8.00: failed command: WRITE FPDMA QUEUED Mar 27 21:51:41 pve2 kernel: [ 1349.675766] ata8.00: cmd 61/b8:b8:f0:70:e7/01:00:8f:00:00/40 tag 23 ncq dma 225280 out Mar 27 21:51:41 pve2 kernel: [ 1349.675766] res 40/00:bc:f0:70:e7/00:00:8f:00:00/40 Emask 0x72 (host bus error) Mar 27 21:51:41 pve2 kernel: [ 1349.675793] ata8.00: status: { DRDY } Mar 27 21:51:41 pve2 kernel: [ 1349.675806] ata8: hard resetting link Mar 27 21:51:41 pve2 kernel: [ 1349.725797] ahci 0000:05:00.0: AHCI controller unavailable! Mar 27 21:51:43 pve2 kernel: [ 1351.199208] ata7: failed to resume link (SControl FFFFFFFF) Mar 27 21:51:43 pve2 kernel: [ 1351.750470] ata7: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Mar 27 21:51:48 pve2 kernel: [ 1356.911883] ata7: hard resetting link Mar 27 21:51:48 pve2 kernel: [ 1356.952522] ahci 0000:05:00.0: AHCI controller unavailable! Mar 27 21:51:48 pve2 kernel: [ 1357.253183] ata8: failed to resume link (SControl FFFFFFFF) Mar 27 21:51:49 pve2 kernel: [ 1357.804463] ata8: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Mar 27 21:51:54 pve2 kernel: [ 1363.056031] ata8: hard resetting link Mar 27 21:51:54 pve2 kernel: [ 1363.096719] ahci 0000:05:00.0: AHCI controller unavailable! Mar 27 21:51:56 pve2 kernel: [ 1364.610195] ata7: failed to resume link (SControl FFFFFFFF) Mar 27 21:51:56 pve2 kernel: [ 1365.161462] ata7: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) [MACHINE IS FROZEN] |