Here's the error: [ 50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000 [ 50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected [ 50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) [ 50.947830] pcieport 0000:00:1b.0: device [8086:06ac] error status/mask=00200000/00010000 [ 50.947831] pcieport 0000:00:1b.0: [21] ACSViol (First) [ 50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message [ 50.947843] nvme nvme0: frozen state error detected, reset controller
Created attachment 292327 [details] dmesg with dynamic debug enabled
Created attachment 292329 [details] lspci -vvnn
Created attachment 292331 [details] lspci -t
Created attachment 292333 [details] workaround patch Once using a quirk for the root port, the issue is gone.
Created attachment 292335 [details] dmesg with the quirk patch applied
So I wonder if ACS quirk is also required for Comet Lake?
Created attachment 292565 [details] dmesg Same issue on Intel NVMe, after ACS quirk applied.
Created attachment 292567 [details] lspci -tv, Intel NVMe
Created attachment 292569 [details] Proposed patch Unconditionally disable ACS redir for Intel bridges can workaround the issue.
#9 is just a placebo. The issue is still reproducible with ACS redir forcibly disabled.
Created attachment 295077 [details] Print AER status
Created attachment 295079 [details] dmesg with AER status printed
Created attachment 306473 [details] proposed patch I'm not completely clear on the mechanism here, but this is a possible fix for this issue (at least, this bug is mentioned in the commit log).
Confirmed the patch solves the issue.
(In reply to Bjorn Helgaas from comment #13) > Created attachment 306473 [details] > proposed patch > > I'm not completely clear on the mechanism here, but this is a possible fix > for this issue (at least, this bug is mentioned in the commit log). also works for me (applied wihtout conflict against 6.8.12, couldn't use 6.10-rc5, because the nvidia driver does not yet support that kernel)