Bug 215265

Summary: thunderbolt device got DMAR error and can't access pci config
Product: Platform Specific/Hardware Reporter: KaiChuan-Hsieh (kaichuan.hsieh)
Component: IA-64Assignee: platform_ia-64
Status: NEW ---    
Severity: blocking CC: bavay, chris.chiu
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 5.16-rc3 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel log
Kernel Msg with MapleRidge adapter
Kernel Msg with MapleRidge adapter
lspci_nnv output
lspci_vt output

Description KaiChuan-Hsieh 2021-12-08 10:13:44 UTC
Created attachment 299929 [details]
kernel log

Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [05:00.0] fault addr 0x5d5b6000 [fault reason 0x05] PTE Write access is not set
Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [05:00.0] fault addr 0x5d5b6000 [fault reason 0x05] PTE Write access is not set
Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [05:00.0] fault addr 0x5d5b6000 [fault reason 0x05] PTE Write access is not set
Dec 08 00:21:39 C6 kernel: thunderbolt 0000:05:00.0: failed to send driver ready to ICM
Dec 08 00:21:39 C6 kernel: thunderbolt: probe of 0000:05:00.0 failed with error -110
Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [6f:00.0] fault addr 0x5d133000 [fault reason 0x05] PTE Write access is not set
Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [6f:00.0] fault addr 0x5d133000 [fault reason 0x05] PTE Write access is not set
Comment 1 Chris Chiu 2021-12-09 08:51:55 UTC
Created attachment 299959 [details]
Kernel Msg with MapleRidge adapter
Comment 2 Chris Chiu 2021-12-09 09:09:23 UTC
Created attachment 299961 [details]
Kernel Msg with MapleRidge adapter

I also have more machines which seem to have pretty similar problem. And I think it's also pretty similar to https://bugzilla.kernel.org/show_bug.cgi?id=214259. However, the patch https://lkml.org/lkml/2020/6/17/751 does not help. The kernel parameter `pci=nocrs` doesn't help either. I'm not exactly sure they share the same root cause.

Here's the kernel log message which shows DMAR fault and ICM ready failure. Attached the kernel message and `lspci` output for reference.
```
[   27.684113] RTX3070 kernel: DMAR: DRHD: handling fault status reg 2
[   27.684131] RTX3070 kernel: DMAR: [DMA Write NO_PASID] Request device [0x03:0x00.0] fault addr 0x3f9b4000 [fault reason 0x05] PTE Write access is not set
[   48.164130] RTX3070 kernel: DMAR: DRHD: handling fault status reg 2
[   48.164148] RTX3070 kernel: DMAR: [DMA Write NO_PASID] Request device [0x03:0x00.0] fault addr 0x3f9b4000 [fault reason 0x05] PTE Write access is not set
[   68.644108] RTX3070 kernel: DMAR: DRHD: handling fault status reg 2
[   68.644125] RTX3070 kernel: DMAR: [DMA Write NO_PASID] Request device [0x03:0x00.0] fault addr 0x3f9b4000 [fault reason 0x05] PTE Write access is not set
[   89.120424] RTX3070 kernel: thunderbolt 0000:03:00.0: failed to send driver ready to ICM
```

Please help to suggest how to identify the cause of the problem. Thanks
Comment 3 Chris Chiu 2021-12-09 09:10:13 UTC
Created attachment 299963 [details]
lspci_nnv output
Comment 4 Chris Chiu 2021-12-09 09:10:35 UTC
Created attachment 299965 [details]
lspci_vt output
Comment 5 Mathias Bavay 2022-01-14 13:39:06 UTC
I'm not sure if this is the exact same problem or just a related one, but here is what I've got on a Dell XPS 7590 running Ubuntu 20.04, kernel 5.11.0-46-generic:

[    0.074884] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.074884] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.075109] DMAR: DRHD: handling fault status reg 2
[    0.075113] DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr 78890000 [fault reason 05] PTE Write access is not set
[    0.077976] DMAR-IR: Enabled IRQ remapping in x2apic mode