Bug 209149 - "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space not accessible after S3
Summary: "iommu/vt-d: Enable PCI ACS for platform opt in hint" makes NVMe config space...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-04 14:31 UTC by Kai-Heng Feng
Modified: 2024-07-01 14:50 UTC (History)
3 users (show)

See Also:
Kernel Version: mainline
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with dynamic debug enabled (123.13 KB, text/plain)
2020-09-04 14:32 UTC, Kai-Heng Feng
Details
lspci -vvnn (28.02 KB, text/plain)
2020-09-04 14:32 UTC, Kai-Heng Feng
Details
lspci -t (960 bytes, text/plain)
2020-09-04 14:32 UTC, Kai-Heng Feng
Details
workaround patch (490 bytes, patch)
2020-09-04 14:33 UTC, Kai-Heng Feng
Details | Diff
dmesg with the quirk patch applied (115.61 KB, text/plain)
2020-09-04 14:34 UTC, Kai-Heng Feng
Details
dmesg (583.69 KB, text/plain)
2020-09-23 05:28 UTC, Kai-Heng Feng
Details
lspci -tv, Intel NVMe (1.08 KB, text/plain)
2020-09-23 05:28 UTC, Kai-Heng Feng
Details
Proposed patch (1.62 KB, patch)
2020-09-23 05:30 UTC, Kai-Heng Feng
Details | Diff
Print AER status (1.35 KB, patch)
2021-02-05 15:05 UTC, Kai-Heng Feng
Details | Diff
dmesg with AER status printed (69.54 KB, text/plain)
2021-02-05 15:05 UTC, Kai-Heng Feng
Details
proposed patch (5.93 KB, patch)
2024-06-18 21:32 UTC, Bjorn Helgaas
Details | Diff

Description Kai-Heng Feng 2020-09-04 14:31:20 UTC
Here's the error:
[   50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
[   50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
[   50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[   50.947830] pcieport 0000:00:1b.0:   device [8086:06ac] error status/mask=00200000/00010000
[   50.947831] pcieport 0000:00:1b.0:    [21] ACSViol                (First)
[   50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
[   50.947843] nvme nvme0: frozen state error detected, reset controller
Comment 1 Kai-Heng Feng 2020-09-04 14:32:13 UTC
Created attachment 292327 [details]
dmesg with dynamic debug enabled
Comment 2 Kai-Heng Feng 2020-09-04 14:32:33 UTC
Created attachment 292329 [details]
lspci -vvnn
Comment 3 Kai-Heng Feng 2020-09-04 14:32:51 UTC
Created attachment 292331 [details]
lspci -t
Comment 4 Kai-Heng Feng 2020-09-04 14:33:50 UTC
Created attachment 292333 [details]
workaround patch

Once using a quirk for the root port, the issue is gone.
Comment 5 Kai-Heng Feng 2020-09-04 14:34:17 UTC
Created attachment 292335 [details]
dmesg with the quirk patch applied
Comment 6 Kai-Heng Feng 2020-09-04 14:34:44 UTC
So I wonder if ACS quirk is also required for Comet Lake?
Comment 7 Kai-Heng Feng 2020-09-23 05:28:29 UTC
Created attachment 292565 [details]
dmesg

Same issue on Intel NVMe, after ACS quirk applied.
Comment 8 Kai-Heng Feng 2020-09-23 05:28:52 UTC
Created attachment 292567 [details]
lspci -tv, Intel NVMe
Comment 9 Kai-Heng Feng 2020-09-23 05:30:26 UTC
Created attachment 292569 [details]
Proposed patch

Unconditionally disable ACS redir for Intel bridges can workaround the issue.
Comment 10 Kai-Heng Feng 2020-10-15 15:33:00 UTC
#9 is just a placebo. The issue is still reproducible with ACS redir forcibly disabled.
Comment 11 Kai-Heng Feng 2021-02-05 15:05:05 UTC
Created attachment 295077 [details]
Print AER status
Comment 12 Kai-Heng Feng 2021-02-05 15:05:37 UTC
Created attachment 295079 [details]
dmesg with AER status printed
Comment 13 Bjorn Helgaas 2024-06-18 21:32:51 UTC
Created attachment 306473 [details]
proposed patch

I'm not completely clear on the mechanism here, but this is a possible fix for this issue (at least, this bug is mentioned in the commit log).
Comment 14 Kai-Heng Feng 2024-06-19 06:06:29 UTC
Confirmed the patch solves the issue.
Comment 15 Werner Sembach [TUXEDO] 2024-07-01 14:50:36 UTC
(In reply to Bjorn Helgaas from comment #13)
> Created attachment 306473 [details]
> proposed patch
> 
> I'm not completely clear on the mechanism here, but this is a possible fix
> for this issue (at least, this bug is mentioned in the commit log).

also works for me (applied wihtout conflict against 6.8.12, couldn't use 6.10-rc5, because the nvidia driver does not yet support that kernel)

Note You need to log in before you can comment on or make changes to this bug.