Bug 108251
Summary: | tpm_crb / DMAR: DRHD: handling fault status reg 3 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Pierre Chifflier (chifflier) |
Component: | Other | Assignee: | drivers_other |
Status: | NEW --- | ||
Severity: | normal | CC: | bigon, bordjukov, dion, dwmw2, jarkko.sakkinen, klondike+kernel, matthias.nagel |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.2.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
DMAR table DMAR.dsl Patch to alias function 0 to function 7 on the HECI |
Description
Pierre Chifflier
2015-11-21 16:32:43 UTC
Quick update, bug is still present in 4.3.5 Also, adding TPM 2 driver author to CC. Thanks for reporting this. We have some luck since I happen to have the very same laptop. Please could I see the full dmesg? I'd like to see what the ACPI tables tell the IOMMU code about the mapping of ACPI devices to PCI dev/fn. And also I'd like to see what is at address 0xccdff000 — is that reserved memory? Created attachment 203281 [details]
dmesg
Dmesg, kernel 4.3.5
Boot is in UEFI mode only (CSM disabled)
Thanks. So... [ 0.000000] BIOS-e820: [mem 0x00000000ccdfe000-0x00000000ccdfefff] usable [ 0.000000] BIOS-e820: [mem 0x00000000f80f8000-0x00000000f80f8fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved A chunk of usable memory ends at 0xccdfefff, and our faulting address 0xccdff000 is right after that. I think this is the TPM 'control area'. (Btw, isn't that a BIOS bug, and this memory should be explicitly marked as reserved, rather than just left out of the E820 map? Otherwise, we might attempt to map PCI BARs over it? But I suspect that's harmless in this case.) It looks like the DMA from the TPM appears as if it comes from (non-existent) PCI device 00:16.7. That's normal — the firmware is supposed to provide that information in ANDD records in the DMAR table, with a mapping for each DMA-capable ACPI device to the PCI bus/dev/fn that it appears as. Although your BIOS doesn't seem to be providing that. I also suspect that the BIOS is supposed to ask the OS to set up a 1:1 mapping for the control area with an RMRR record in the DMAR table. It doesn't look like it's doing so. Can you attach the contents of /sys/firmware/acpi/tables/DMAR please? It's possible that the BIOS *is* doing the right thing, but that we don't obey because there isn't actually a PCI device at 00:16.7. I'd just like to check. Created attachment 203291 [details]
DMAR table
/sys/firmware/acpi/tables/DMAR
Device 16 is the MEI, which provides the vTPM
I don't know how to parse the DMAR table, so I provided the binary.
Communication with PTT is through ACPI function that transfers control to SMM and communicates with ME. From /sys/firmware/acpi/tables/TPM2 it is easy to check the address of the control area. On my laptop (x250), if I set intel_iommu=on, the driver times out when it first tries to communicate with the TPM and waits for the response. I do not see the above erros. BTW, have you ever updated your BIOS for that laptop? > Created attachment 203291 [details]
> DMAR table
>
> /sys/firmware/acpi/tables/DMAR
>
> Device 16 is the MEI, which provides the vTPM
>
> I don't know how to parse the DMAR table, so I provided the binary.
There is a tool called iasl that you can use to disassemble ACPI tables.
I don't find entry for that device from DMAR.
Created attachment 203301 [details]
DMAR.dsl
So the BIOS isn't *asking* us to permit DMA from 00:16.7 to 0xccdff000. And thus we don't configure the IOMMU to permit it, ando stuff thus doesn't work Having diagnosed a BIOS bug, I probably now have about three years until it gets fixed (and then only if you buy new hardware). During which time I can fix the Linux bug that if the BIOS *had* asked for such a mapping, we wouldn't have honoured it anyway because there is no *actual* PCI device at 00:16.7 :) Jarkko, no idea why you don't see the same faults; perhaps some deep chipset/firmware magic hides them from you. But the symptoms for the TPM look the same; that DMA to the control area is blocked. (In reply to jarkko.sakkinen from comment #8) > On my laptop (x250), if I set intel_iommu=on, the driver times out when it > first tries to communicate with the TPM and waits for the response. I do not > see the above erros. > > BTW, have you ever updated your BIOS for that laptop? Yes, current version is 1.17 (In reply to David Woodhouse from comment #11) > So the BIOS isn't *asking* us to permit DMA from 00:16.7 to 0xccdff000. And > thus we don't configure the IOMMU to permit it, ando stuff thus doesn't work > > Having diagnosed a BIOS bug, I probably now have about three years until it > gets fixed (and then only if you buy new hardware). During which time I can > fix the Linux bug that if the BIOS *had* asked for such a mapping, we > wouldn't have honoured it anyway because there is no *actual* PCI device at > 00:16.7 :) > > Jarkko, no idea why you don't see the same faults; perhaps some deep > chipset/firmware magic hides them from you. But the symptoms for the TPM > look the same; that DMA to the control area is blocked. I know it now. Starting from Skylake DMA is not used. In Skylake the communication is MMIO based. That's why I didn't have such a problem. Should this be worked around somehow? Just thinking what I should do with this bug. Hello, I'm still experiencing a similar issue with my lenovo t550 even with the latest firmware. Can something be done in the kernel for this? Could that be reported to lenovo somehow? I'm sure that will cause troubles to people who will try to dualboot with windows 11 (as it requires TPM 2.0) (In reply to David Woodhouse from comment #11) > Having diagnosed a BIOS bug, I probably now have about three years until it > gets fixed (and then only if you buy new hardware). During which time I can > fix the Linux bug that if the BIOS *had* asked for such a mapping, we > wouldn't have honoured it anyway because there is no *actual* PCI device at > 00:16.7 :) The firmware bug can be addressed by overriding the DMAR ACPI table using an initrd (as I have already done). This still does not address the issue that Linux is not honouring the BIOS request, so... How can this second issue be solved? I patched the tables using a ACPI device definition and using a patch that added support for MTRRs using them. Baolu instead suggested using a PCI quirk to allow the MTRR to apply to the (hidden) function used bu the HECI. The resulting patch is significantly smaller both on the kernel and ACPI tables. If anybody here wants to test the patch I can provide them with the patched DMAR table for a X240 thinkpad. But it might be better if you patch the table yourself (I'll try to write down a post on how to do so). Created attachment 303249 [details]
Patch to alias function 0 to function 7 on the HECI
> I patched the tables using a ACPI device definition and using a patch that > added support for MTRRs using them. > > [...] > If anybody here wants to test the patch I can provide them with the patched DMAR table for a X240 thinkpad. But it might be better if you patch the table yourself (I'll try to write down a post on how to do so). > Could you provide me with the patch or help me what to do? I believe I have the same error on a Lenovo X1 Carbon 3rd Generation from 2015. During boot I get > DMAR: DRHD: handling fault status reg 3 > DMAR: [DMA Read NO_PASID] Request device [00:16.7] fault addr 0xacdff000 > [fault reason 0x02] Present bit in context entry is clear > DMAR: DRHD: handling fault status reg 2 > DMAR: [DMA Write NO_PASID] Request device [00:16.7] fault addr 0xacdff000 > [fault reason 0x02] Present bit in context entry is clear > DMAR: DRHD: handling fault status reg 2 > DMAR: [DMA Write NO_PASID] Request device [00:16.7] fault addr 0xacdff000 > [fault reason 0x02] Present bit in context entry is clear I have enabled TPM2 in the UEFI for secure dual boot with Windows and I also use signed Linux kernels. I didn't care that I have not been able to access the TPM2 from within Linux. However, since the last systemd update I got a boot delay of 90s, because systemd tries to access /dev/tpmrm0 and times out after 90s. Hi, the blog post can be found at https://klondike.es/klog/2022/11/21/patching-the-acpi-dmar-table-to-allow-tpm2-0/ |