Bug 215265 - thunderbolt device got DMAR error and can't access pci config
Summary: thunderbolt device got DMAR error and can't access pci config
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: IA-64 (show other bugs)
Hardware: Intel Linux
: P1 blocking
Assignee: platform_ia-64
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-08 10:13 UTC by KaiChuan-Hsieh
Modified: 2022-01-14 13:39 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.16-rc3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel log (511.18 KB, text/plain)
2021-12-08 10:13 UTC, KaiChuan-Hsieh
Details
Kernel Msg with MapleRidge adapter (852.36 KB, text/plain)
2021-12-09 08:51 UTC, Chris Chiu
Details
Kernel Msg with MapleRidge adapter (799.13 KB, text/plain)
2021-12-09 09:09 UTC, Chris Chiu
Details
lspci_nnv output (17.50 KB, text/plain)
2021-12-09 09:10 UTC, Chris Chiu
Details
lspci_vt output (1.36 KB, text/plain)
2021-12-09 09:10 UTC, Chris Chiu
Details

Description KaiChuan-Hsieh 2021-12-08 10:13:44 UTC
Created attachment 299929 [details]
kernel log

Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [05:00.0] fault addr 0x5d5b6000 [fault reason 0x05] PTE Write access is not set
Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [05:00.0] fault addr 0x5d5b6000 [fault reason 0x05] PTE Write access is not set
Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [05:00.0] fault addr 0x5d5b6000 [fault reason 0x05] PTE Write access is not set
Dec 08 00:21:39 C6 kernel: thunderbolt 0000:05:00.0: failed to send driver ready to ICM
Dec 08 00:21:39 C6 kernel: thunderbolt: probe of 0000:05:00.0 failed with error -110
Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [6f:00.0] fault addr 0x5d133000 [fault reason 0x05] PTE Write access is not set
Dec 08 00:21:39 C6 kernel: DMAR: DRHD: handling fault status reg 2
Dec 08 00:21:39 C6 kernel: DMAR: [DMA Write NO_PASID] Request device [6f:00.0] fault addr 0x5d133000 [fault reason 0x05] PTE Write access is not set
Comment 1 Chris Chiu 2021-12-09 08:51:55 UTC
Created attachment 299959 [details]
Kernel Msg with MapleRidge adapter
Comment 2 Chris Chiu 2021-12-09 09:09:23 UTC
Created attachment 299961 [details]
Kernel Msg with MapleRidge adapter

I also have more machines which seem to have pretty similar problem. And I think it's also pretty similar to https://bugzilla.kernel.org/show_bug.cgi?id=214259. However, the patch https://lkml.org/lkml/2020/6/17/751 does not help. The kernel parameter `pci=nocrs` doesn't help either. I'm not exactly sure they share the same root cause.

Here's the kernel log message which shows DMAR fault and ICM ready failure. Attached the kernel message and `lspci` output for reference.
```
[   27.684113] RTX3070 kernel: DMAR: DRHD: handling fault status reg 2
[   27.684131] RTX3070 kernel: DMAR: [DMA Write NO_PASID] Request device [0x03:0x00.0] fault addr 0x3f9b4000 [fault reason 0x05] PTE Write access is not set
[   48.164130] RTX3070 kernel: DMAR: DRHD: handling fault status reg 2
[   48.164148] RTX3070 kernel: DMAR: [DMA Write NO_PASID] Request device [0x03:0x00.0] fault addr 0x3f9b4000 [fault reason 0x05] PTE Write access is not set
[   68.644108] RTX3070 kernel: DMAR: DRHD: handling fault status reg 2
[   68.644125] RTX3070 kernel: DMAR: [DMA Write NO_PASID] Request device [0x03:0x00.0] fault addr 0x3f9b4000 [fault reason 0x05] PTE Write access is not set
[   89.120424] RTX3070 kernel: thunderbolt 0000:03:00.0: failed to send driver ready to ICM
```

Please help to suggest how to identify the cause of the problem. Thanks
Comment 3 Chris Chiu 2021-12-09 09:10:13 UTC
Created attachment 299963 [details]
lspci_nnv output
Comment 4 Chris Chiu 2021-12-09 09:10:35 UTC
Created attachment 299965 [details]
lspci_vt output
Comment 5 Mathias Bavay 2022-01-14 13:39:06 UTC
I'm not sure if this is the exact same problem or just a related one, but here is what I've got on a Dell XPS 7590 running Ubuntu 20.04, kernel 5.11.0-46-generic:

[    0.074884] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.074884] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.075109] DMAR: DRHD: handling fault status reg 2
[    0.075113] DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr 78890000 [fault reason 05] PTE Write access is not set
[    0.077976] DMAR-IR: Enabled IRQ remapping in x2apic mode

Note You need to log in before you can comment on or make changes to this bug.