Bug 217631 - Kernel 6.4 failing to access TPM on Framework Laptop 12th gen
Summary: Kernel 6.4 failing to access TPM on Framework Laptop 12th gen
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: drivers_other
URL: https://community.frame.work/t/boot-a...
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-03 19:38 UTC by roubro1991
Modified: 2023-08-10 12:18 UTC (History)
12 users (show)

See Also:
Kernel Version: 6.4.1
Subsystem:
Regression: Yes
Bisected commit-id: e644b2f498d297a928efcb7ff6f900c27f8b788e


Attachments
Archive with two screenshots and kernelconfig (4.62 MB, application/octet-stream)
2023-07-03 19:38 UTC, roubro1991
Details
full dmesg with 6.4.1 kernel (81.57 KB, text/plain)
2023-07-06 07:36 UTC, Herbie Hopkins
Details

Description roubro1991 2023-07-03 19:38:49 UTC
Created attachment 304537 [details]
Archive with two screenshots and kernelconfig

After updating to linux-6.4.1 several users of Framework Laptop running Intel 12th gen CPUs reported that their machine fails to boot (in my case i5-1240P). It seems it is working fine for the model using 11th gen CPUs. See https://community.frame.work/t/boot-and-shutdown-hangs-with-arch-linux-kernel-6-4-1-mainline-and-arch/33118


* When using TPM to unlock a LUKS volume it will wait infinitely to start systemd-cryptsetup. When forcing a reboot, it will still wait for this service to stop and print the following message after waiting for some time (see attached cryptsetup_stack.jpg)
* When disabling TPM unlock, systemd-cryptsetup will start successfully and allow to unlock the volume using the passphrase. However, after that the boot will also wait indefinitely for systemd-pcrphase to start. When force rebooting it will still wait for this service to stop (see pcrphase_stack.jpg).

I was able to boot successfully by either reverting e644b2f498d297a928efcb7ff6f900c27f8b788e or disabling interrupts for my model like it has been done here: https://lore.kernel.org/linux-integrity/20230620-flo-lenovo-l590-tpm-fix-v1-1-16032a8b5a1d%40bezdeka.de/


DMI_MATCH(DMI_SYS_VENDOR, "Framework"),
DMI_MATCH(DMI_PRODUCT_VERSION, "A4"),
Comment 1 Bagas Sanjaya 2023-07-04 02:10:25 UTC
Can you attach dmesg instead?
Comment 2 roubro1991 2023-07-04 04:50:26 UTC
Unfortunately I do not know how to obtain dmesg.

* Since it is already stuck in the initrd no logs are stored anywhere afaik
* I could not find anything about those failed attempts in journalctl
* Since it is stuck before any console is available I cannot issue any commands
* Pressing Ctrl+Alt+Entf 7 times within 2 seconds tries to force reboot, however this does not even work, because it is waiting for the TPM tasks. The screenshots show the messages after shutdown was pending for >120s
* The machine has to be hard reset by pressing the power button for several seconds

If there is some guide anywhere how to obtain dmesg in such situations, I would appreciate if you could point me to this.
Comment 3 roubro1991 2023-07-04 05:18:08 UTC
A user in the framework community confirmed that it is also happening on 13th gen CPU:
DMI_MATCH(DMI_SYS_VENDOR, "Framework"),
DMI_MATCH(DMI_PRODUCT_VERSION, "A6"),

He pointed out that dmesg always outputs this line:
> tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
Comment 4 ggrundik 2023-07-04 10:50:09 UTC
Same problem occurs on MSI E13FlipEvo A12MT Laptop (gen12 i5-1240P CPU).

Its tricky to obtain dmesg, since freeze occurs too early in boot process.

Dmidecode:
        Manufacturer: Micro-Star International Co., Ltd.
        Product Name: Summit E13FlipEvo A12MT
        Version: REV:1.0
        Wake-up Type: Power Switch
        SKU Number: 13P3.1
        Family: Summit
Comment 5 ggrundik 2023-07-04 10:56:11 UTC
`dmesg|grep -i tpm` from 6.3.7 kernel, which works fine:
[    0.000000] efi: ACPI=0x58ad5000 ACPI 2.0=0x58ad5014 TPMFinalLog=0x58b28000 SMBIOS=0x5bc1d000 SMBIOS 3.0=0x5bc1c000 MEMATTR=0x4c682498 ESRT=0x51254898 MOKvar=0x5bbe4000 RNG=0x58a1b018 TPMEventLog=0x4bbcd018 
[    0.013128] ACPI: TPM2 0x0000000058A1E000 00004C (v04 MSI_NB MEGABOOK 00000001 AMI  00000000)
[    0.013152] ACPI: Reserving TPM2 table memory at [mem 0x58a1e000-0x58a1e04b]
[    1.615095] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22)
[    1.636071] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
Comment 6 Herbie Hopkins 2023-07-06 07:35:24 UTC
Also on a 12th gen Framework and am experiencing this after upgrading to 6.4.1. Not using TPM to unlock my LUKS volume here - just entering the full passphrase. For me the boot process hangs at

   Starting TPM2 PCR Machine ID Measurement

for 2 minutes and then hangs again at

   Starting TPM2 PCR Barrier

for an additional 5-6 minutes. I do eventually get into the system if I wait long enough though so can attach dmesg.

Prior to 6.4.1 kernel I get "tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead" but otherwise the boot process is unhindered.
Comment 7 Herbie Hopkins 2023-07-06 07:36:33 UTC
Created attachment 304556 [details]
full dmesg with 6.4.1 kernel
Comment 8 roubro1991 2023-07-06 12:33:56 UTC
I am also able now to boot an affected kernel. I disabled TPM unlock in the initramfs and masked systemd-pcrphase & systemd-pcrphase-sysinit & systemd-pcrphase-initrd & systemd-pcrmachine. Notably, this kernel does never turn off the machine on shutdown.

However, my dmesg only contains one line about TPM:
> tpm_tis NTC0702:00: 2.0 TPM (device-id 0xFC, rev-id 1)

I am available to test any changes or add some more logging statements for troubleshooting wherever you want me to, in case it helps.

tpm2_get_tpm_pt() returns 0, so it seems the interrupt code is basically working, but somehow runs into a (dead)lock later...
Comment 9 Andrea Cervesato 2023-07-09 12:18:54 UTC
I confirm this bug on my laptop as well FrameWork 13th gen Intel.

There is a post on the framework community forum that give more info. https://community.frame.work/t/boot-and-shutdown-hangs-with-arch-linux-kernel-6-4-1-mainline-and-arch/33118

Please don't hesitate to ask any further detail.
Comment 10 Andrea Cervesato 2023-07-09 12:24:54 UTC
Seems that the problem is related to this commit: 
e644b2f498d297a928efcb7ff6f900c27f8b788e
https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux/+/e644b2f498d297a928efcb7ff6f900c27f8b788e%5E%21/#F0

By Lino Sanfilippo <l.sanfilippo@kunbus.com>
Comment 11 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-07-10 07:44:20 UTC
I forwarded the bug report to the developers[1]:
https://lore.kernel.org/all/c0ee4b7c-9d63-0bb3-c677-2be045deda43@leemhuis.info/

It depends a bit on the developers where the discussion to fix this will happen; maybe in replies to above mail, maybe here, maybe a bit of both.

I sadly for legal reason could not CC people there that are CCed here. :-/ 

[1] reminder, bugzilla.kernel.org is often a bad place to report bug; see https://docs.kernel.org/admin-guide/reporting-issues.html and https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/ (yes, I wish things where different, but that's how it is for now)
Comment 12 mapleleaf 2023-07-10 16:37:03 UTC
I have the same problem and I own a Framework 12th-gen, but for whatever reason my DMI_PRODUCT_VERSION is A8 instead of A6...

$ sudo dmidecode -s baseboard-version
A8
Comment 13 mapleleaf 2023-07-10 16:41:29 UTC
And also:

$ sudo dmidecode -s system-version
A8
Comment 14 Christian Hesse 2023-07-10 21:26:39 UTC
Looks like we have to match on product name, not product version. Anybody can test and verify that these work?

https://lore.kernel.org/all/20230710211635.4735-1-mail@eworm.de/
https://lore.kernel.org/all/20230710211635.4735-2-mail@eworm.de/
Comment 15 ggrundik 2023-07-10 21:33:11 UTC
Just a side note: as I stated before: this bug is NOT limited to Framework laptops, it has wider impact.
Comment 16 roubro1991 2023-07-11 08:48:25 UTC
Confirmed for 12th gen that matching on product name works. Thank you!
Comment 17 roubro1991 2023-07-24 19:27:50 UTC
Thank you for the help with this issue. Confirmed it is resolved with >6.4.5.
* Fix for Framework 13th gen: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=aa6e6c72cc9a9deaebc0ad370d0b4484b2ec14bb
* Fix  for Framework 12th gen: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=60057602899c442d3d3f08caacdf231b8db5e975
* Fix for any other device which was not explicitly blacklisted: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7f13d7f68763362a6c11f97428b66de52456228f
Comment 18 aaravchen 2023-08-07 04:41:00 UTC
The MSI E13FlipEvo A12MT device (the same as mentioned in comment #4) is not fixed by >6.4.5.  

I can confirm 6.4.7 still exhibits the same symptoms, and is "fixed" only by hiding the TPM in the BIOS.  

The specific device identifiers don't include any MSI devices in the blacklist, so maybe the automated detection didn't actually make it in yet?
Comment 19 ggrundik 2023-08-07 08:37:38 UTC
(In reply to aaravchen from comment #18)
> The specific device identifiers don't include any MSI devices in the
> blacklist, so maybe the automated detection didn't actually make it in yet?

Automatic detection should be done by this commit:
https://lore.kernel.org/linux-integrity/CTYXI8TL7C36.2SCWH82FAZWBO@suppilovahvero/T/#me895f1920ca6983f791b58a6fa0c157161a33849

And it is included in the 6.4.5 version of the kernel. So, unfortunately, its not enough.
Comment 20 André Barata 2023-08-10 12:18:09 UTC
Same issue on my laptop MSI Summit E16 Flip (A12UCT model). Can only boot up with the TPM disabled, args=“tpm_tis.interrupts=0 doesn't work.

Using kernel version 6.4.8
Fedora 38


tpm_tis MSFT0101:00: 2.0 TPM (device-id 0x1B, rev-id 22)
tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead

Note You need to log in before you can comment on or make changes to this bug.