Here's the error:
[ 50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
[ 50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
[ 50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[ 50.947830] pcieport 0000:00:1b.0: device [8086:06ac] error status/mask=00200000/00010000
[ 50.947831] pcieport 0000:00:1b.0:  ACSViol (First)
[ 50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
[ 50.947843] nvme nvme0: frozen state error detected, reset controller
Created attachment 292327 [details]
dmesg with dynamic debug enabled
Created attachment 292329 [details]
Created attachment 292331 [details]
Created attachment 292333 [details]
Once using a quirk for the root port, the issue is gone.
Created attachment 292335 [details]
dmesg with the quirk patch applied
So I wonder if ACS quirk is also required for Comet Lake?
Created attachment 292565 [details]
Same issue on Intel NVMe, after ACS quirk applied.
Created attachment 292567 [details]
lspci -tv, Intel NVMe
Created attachment 292569 [details]
Unconditionally disable ACS redir for Intel bridges can workaround the issue.
#9 is just a placebo. The issue is still reproducible with ACS redir forcibly disabled.
Created attachment 295077 [details]
Print AER status
Created attachment 295079 [details]
dmesg with AER status printed