Bug 217890 - [BISECTED] Since kernel 6.1 intermittent suspend (S3) freeze with some Intel fTPMs as RNG source
Summary: [BISECTED] Since kernel 6.1 intermittent suspend (S3) freeze with some Intel ...
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL: https://bbs.archlinux.org/viewtopic.p...
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-08 15:17 UTC by chriscjsus
Modified: 2024-03-17 14:57 UTC (History)
9 users (show)

See Also:
Kernel Version: 6.1
Subsystem:
Regression: Yes
Bisected commit-id: b006c439d58db625318bf2207feabf847510a8a6


Attachments
Bisect log for NUC7i3BNB (2.92 KB, text/plain)
2023-10-21 09:18 UTC, mahasler
Details

Description chriscjsus 2023-09-08 15:17:23 UTC
This started with kernel 6.1 where system freezes intermittently when trying to sleep.  Power light and fans remain on.  Power button is required to do a hard reset.  On reboot there is nothing in the journal that would indicate a cause.

- first bad commit: [b006c439d58db625318bf2207feabf847510a8a6] hwrng: core - start hwrng kthread also for untrusted sources

Here is the full bisect log:

git bisect start
- status: waiting for both good and bad commits
- good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0
git bisect good 4fe89d07dcc2804c8b562f6c7896a45643d34b2f
- bad: [830b3c68c1fb1e9176028d02ef86f3cf76aa2476] Linux 6.1
git bisect bad 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
- good: [33e591dee915832c618cf68bb1058c8e7d296128] Merge tag 'phy-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy
git bisect good 33e591dee915832c618cf68bb1058c8e7d296128
- bad: [de492c83cae0af72de370b9404aacda93dafcad5] prandom: remove unused functions
git bisect bad de492c83cae0af72de370b9404aacda93dafcad5
- good: [30c999937f69abf935b0228b8411713737377d9e] Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 30c999937f69abf935b0228b8411713737377d9e
- bad: [70442fc54e6889a2a77f0e9554e8188a1557f00e] Merge tag 'x86_mm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 70442fc54e6889a2a77f0e9554e8188a1557f00e
- good: [d4013bc4d49f6da8178a340348369bb9920225c9] Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux
git bisect good d4013bc4d49f6da8178a340348369bb9920225c9
- bad: [706eacadd5c5cc13510ba69eea2917c2ce5ffa99] Merge tag 'devicetree-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
git bisect bad 706eacadd5c5cc13510ba69eea2917c2ce5ffa99
- good: [d310dc2554a5296a338f974d2b4e4f9af2687558] crypto: hisilicon - support get algs by the capability register
git bisect good d310dc2554a5296a338f974d2b4e4f9af2687558
- good: [803184f1ef815b39ec266ff25a0e7f00760e2e69] dt-bindings: virtio: Convert virtio,pci-iommu to DT schema
git bisect good 803184f1ef815b39ec266ff25a0e7f00760e2e69
- bad: [b006c439d58db625318bf2207feabf847510a8a6] hwrng: core - start hwrng kthread also for untrusted sources
git bisect bad b006c439d58db625318bf2207feabf847510a8a6
- good: [c4b1ce72b5c9f7d5772b2f2d4efa25ef0e6fb576] crypto: tcrypt - add async speed test for aria cipher
git bisect good c4b1ce72b5c9f7d5772b2f2d4efa25ef0e6fb576
- good: [4a209078656c3ada49c81d69c4b556be2dda1310] crypto: virtio - fix memory-leak
git bisect good 4a209078656c3ada49c81d69c4b556be2dda1310
- good: [0cb3c9cdf7fcc2ef75a6008223d2e3ee58ea00e1] crypto: octeontx2 - Remove the unneeded result variable
git bisect good 0cb3c9cdf7fcc2ef75a6008223d2e3ee58ea00e1
- good: [4edff849f7a0abca962374512907b3e2151091f4] crypto: zip - remove the unneeded result variable
git bisect good 4edff849f7a0abca962374512907b3e2151091f4
- first bad commit: [b006c439d58db625318bf2207feabf847510a8a6] hwrng: core - start hwrng kthread also for untrusted sources


Reverting this last commit did stop the suspend issue. But to also find the RNG source that was causing this, I disabled the intel fTPM, as thats the only hwrng source besides rdrand in my system.  The intel fTPM was the cause after all.


MSI GS40 6QE
CPU:  i7-6700HQ
GPU:  GeForce GTX 970M
Comment 1 Artem S. Tashkinov 2023-09-08 21:11:55 UTC
This could have been fixed in 6.5.2, please give it a try.
Comment 2 chriscjsus 2023-09-11 23:33:32 UTC
The test failed on 6.5.2. Same symptoms.
Comment 3 Artem S. Tashkinov 2023-09-13 12:59:40 UTC
Dominic, please take a look, it's your commit.
Comment 4 Dominik Brodowski 2023-09-16 08:38:36 UTC
Jarkko, do you have any idea why an Intel fTPM would cause such a suspend issue? Otherwise, we'd need to extend the check in drivers/char/tpm/tpm_crb.c:crb_check_flags() to set TPM_CHIP_FLAG_HWRNG_DISABLED on affected systems.
Comment 5 zebul666 2023-09-23 12:52:55 UTC
Here on Dell Optiplex 7060, i5-8500, UHD 630, 6.5.2 (linux-zen package on archlinux) was good for me, and suspend to sleep worked again as expected.

But this is failing now with linux-zen 6.5.4. Moreover, this does not seem to be random but happening every time, now. The sleep is triggered by gnome automatically.
Comment 6 chriscjsus 2023-09-24 09:30:09 UTC
(In reply to zebul666 from comment #5)
> Here on Dell Optiplex 7060, i5-8500, UHD 630, 6.5.2 (linux-zen package on
> archlinux) was good for me, and suspend to sleep worked again as expected.
> 
> But this is failing now with linux-zen 6.5.4. Moreover, this does not seem
> to be random but happening every time, now. The sleep is triggered by gnome
> automatically.

According to

https://www.dell.com/support/kbdoc/en-uk/000103639/how-to-troubleshoot-and-resolve-common-issues-with-tpm-and-bitlocker#TPM_models

your computer does not have an intel fTPM, so its a different issue than this bug. It has a Nuvoton 750 instead.
Comment 7 chriscjsus 2023-09-24 09:41:44 UTC
@zebul666
You could still try disabling the discrete TPM to see if that helps.
Comment 8 jarkko 2023-09-25 15:42:02 UTC
AMD RNG fixes took a few rounds to get right and that did cause issues with Intel fTPM.

The final nail so far was 8f7f35e5aa6f2182eabcfa3abef4d898a48e9aa8.

So if by any means possible can you test mainline tree at that commit ID and 9c377852ddfdc557b1370f196b0cfdf28d233460.
Comment 9 jarkko 2023-09-25 15:42:31 UTC
Sorry for latency, I was last week on holiday.
Comment 10 chriscjsus 2023-09-26 06:31:06 UTC
The latest stable kernels in 6.1 and 6.5 already have those two commits.  Is it OK to test with those?
Comment 11 jarkko 2023-09-26 13:17:42 UTC
(In reply to chriscjsus from comment #10)
> The latest stable kernels in 6.1 and 6.5 already have those two commits.  Is
> it OK to test with those?

thanks for pointing this out. i'd say yes.
Comment 12 chriscjsus 2023-09-28 17:07:47 UTC
This test failed unfortunately.
Comment 13 chriscjsus 2023-10-02 00:03:05 UTC
(In reply to jarkko from comment #11)
> (In reply to chriscjsus from comment #10)
> > The latest stable kernels in 6.1 and 6.5 already have those two commits.  Is
> > it OK to test with those?
> 
> thanks for pointing this out. i'd say yes.

I tried 6.1.55 and 6.5.5 and suspend failed on both.
Comment 14 jarkko 2023-10-02 23:23:35 UTC
I see if I can reproduce earliest on Monday (due lack of access to approriate hardware up until that). But I put this to my TODO list...
Comment 15 chriscjsus 2023-10-03 05:11:31 UTC
I have updated the bug details with a link to the forum thread where I got some help with this, and others having the same problem.


https://bbs.archlinux.org/viewtopic.php?id=282837
Comment 16 jarkko 2023-10-10 12:22:07 UTC
OK, I'll test this with NUC7 within this week and see if I can reproduce this, once I have a time slot.

I'll also run the test QEMU tpm_crb:

https://github.com/jarkkojs/buildroot-tpmdd/tree/linux-6.5.y

The qemu_x86_64_defconfig target builds a full image and I've added '--tpm-crb' to enable emulated tpm_crb (requires swtpm):

https://github.com/jarkkojs/buildroot-tpmdd/blob/linux-6.5.y/board/qemu/start-qemu.sh.in

Kernel version can be patched in https://github.com/jarkkojs/buildroot-tpmdd/blob/linux-6.5.y/configs/qemu_x86_64_defconfig. See BR2_LINUX_KERNEL_CUSTOM_VERSION_VALUE.

I'm just writing backlog of reasonable steps (not a request to test anything).
Comment 17 jarkko 2023-10-10 12:23:48 UTC
[I'm later on also expanding this repository to have a build target for SD card image bootable on x86 hardware for lean testing, probably once I fork a linux-6.6.y branch]
Comment 18 chriscjsus 2023-10-11 11:46:48 UTC
As "intermittent" in the bug title suggests, as well as the posts from the thread I linked, it can take several days of suspend/resume cycles before the problem manifests.  And that without shutting down or rebooting the computer during that time, or your just starting over.

How does using qemu help with this? I've been using the emulated TPM device for a long time in VMs without any problem.

I've read that Intel PTT is part of the Intel Management Engine, so I updated the ME firmware, but that did not help.

Anyway thanks for your hard work here. Even if you can't reproduce this, I still have the workaround of disabling the fTPM or blacklisting the driver.
Comment 19 jarkko 2023-10-11 12:42:12 UTC
Please read again:

"I'll test this with NUC7 within this week and see if I can reproduce this, once I have a time slot."

I believe what you said but it is still good to do comparative testing, i.e. two test subjects is better than one, right?

I might also extend my buildroot tree to have sd card or USB stick build because I don't have any drive in the NUC ATM :-) That is the main reason why I said "within this week".

I can share the prebuilt image after I have one done my testing.
Comment 20 jarkko 2023-10-11 12:45:18 UTC
I.e. I have this https://www.intel.com/content/www/us/en/products/sku/214619/intel-nuc-kit-nuc7pjyhn/specifications.html. I haven't updated BIOS for it for some time, so I'll probably do testing both with the existing BIOS version and the latest.

That sort of gives potentially two versions of Management Engine (where fTPM runs), which gives more data.
Comment 21 Boris Carvajal 2023-10-11 23:46:01 UTC
I have the same problem, so I can be another test subject.

On my system, in addition to S3, the freeze also happens on shutdown, reboot and suspend to disk (here I can resume normally after a forced shutdown, it seems disks are fully powered off before the freeze).
I could say the issue has like a 20% chance of showing up.

I'm currently running with TPM disabled in BIOS for a few weeks without any issue.

MB: Z370 TOMAHAWK (MS-7B47)
CPU: i5-8400
Comment 22 jarkko 2023-10-16 18:54:57 UTC
Hi, had a bit of flu latter part of week so could not proceed. Looking into.
Comment 23 jarkko 2023-10-20 17:54:52 UTC
I've tried my NUC7 and also my i9-13900k desktop with various v6.5 versions which should be fixed. I could not preproduce it. A dmesg log would help.
Comment 24 mahasler 2023-10-21 09:16:38 UTC
My own bisect is complete but I ended up with a different result. What I can say for sure is that commit 706eacadd5c5 ("Merge tag 'devicetree-for-6.1' ...") is broken and freezes within a day or two. Both parents, ada3bfb6492a ("Merge tag 'tpmdd-next-v6.1-rc1' ...") and 7a7f58575483 ("of: base: Shift refcount decrement ..."), seem to work fine here, though. I've been switching back and forth between the two, running both for more than a week each time, without any issues. I'll attach my own bisect log.

I'm currently testing b006c439d58d ("hwrng: core - start hwrng ...") once again, even though I had already run it for more than a week before declaring it good. Since this issue only occurs sporadically it is difficult to be sure if a commit is really good, even after a week without issues.

My system is a NUC7i3BNB. Actually these freezes happened a lot more often at first, when I still was running the pre-installed Firmware. After upgrading to the latest version available from Intel they become much more sporadic, which didn't exactly help with bisecting.
Comment 25 mahasler 2023-10-21 09:18:05 UTC
Created attachment 305270 [details]
Bisect log for NUC7i3BNB
Comment 26 jarkko 2023-10-23 15:07:53 UTC
(In reply to mahasler from comment #24)
> My own bisect is complete but I ended up with a different result. What I can
> say for sure is that commit 706eacadd5c5 ("Merge tag 'devicetree-for-6.1'
> ...") is broken and freezes within a day or two.  parents, ada3bfb6492a
> ("Merge tag 'tpmdd-next-v6.1-rc1' ...") and 7a7f58Both575483 ("of: base:
> Shift
> refcount decrement ..."), seem to work fine here, though. I've been
> switching back and forth between the two, running both for more than a week
> each time, without any issues. I'll attach my own bisect log.
> 
> I'm currently testing b006c439d58d ("hwrng: core - start hwrng ...") once
> again, even though I had already run it for more than a week before
> declaring it good. Since this issue only occurs sporadically it is difficult
> to be sure if a commit is really good, even after a week without issues.
> 
> My system is a NUC7i3BNB. Actually these freezes happened a lot more often
> at first, when I still was running the pre-installed Firmware. After
> upgrading to the latest version available from Intel they become much more
> sporadic, which didn't exactly help with bisecting.

Thanks a lot for the bisect log!

I intend to redo testing with time today with:

- NUC7PJYHN
- i9-13900k

I have more time this week. Last week I did testing in rush so might have be careless. I'll try 7a7f58Both575483 first and see what sort of results I get with it.

It would not be surprise if there was some firmware bugs. We've been circulating them in the past. This whole mess started with AMD RNG bug though and fixes for that caused issues with Intel fTPM. I thought we got it fixed on that side but yeah apparently no.
Comment 27 jarkko 2023-10-24 01:00:50 UTC
(In reply to mahasler from comment #24)
> My own bisect is complete but I ended up with a different result. What I can
> say for sure is that commit 706eacadd5c5 ("Merge tag 'devicetree-for-6.1'
> ...") is broken and freezes within a day or two. Both parents, ada3bfb6492a
> ("Merge tag 'tpmdd-next-v6.1-rc1' ...") and 7a7f58575483 ("of: base: Shift
> refcount decrement ..."), seem to work fine here, though. I've been
> switching back and forth between the two, running both for more than a week
> each time, without any issues. I'll attach my own bisect log.
> 
> I'm currently testing b006c439d58d ("hwrng: core - start hwrng ...") once
> again, even though I had already run it for more than a week before
> declaring it good. Since this issue only occurs sporadically it is difficult
> to be sure if a commit is really good, even after a week without issues.
> 
> My system is a NUC7i3BNB. Actually these freezes happened a lot more often
> at first, when I still was running the pre-installed Firmware. After
> upgrading to the latest version available from Intel they become much more
> sporadic, which didn't exactly help with bisecting.

So... Commit is 706eacadd5c5 is from v6.1 cycle, as is b006c439d58d. Does this reproduce with either with v6.1.59 or v6.5.8?

BR, Jarkko
Comment 28 Serhii Tsynailo 2023-11-01 11:20:23 UTC
(In reply to jarkko from comment #27)
> (In reply to mahasler from comment #24)
> > My own bisect is complete but I ended up with a different result. What I
> can
> > say for sure is that commit 706eacadd5c5 ("Merge tag 'devicetree-for-6.1'
> > ...") is broken and freezes within a day or two. Both parents, ada3bfb6492a
> > ("Merge tag 'tpmdd-next-v6.1-rc1' ...") and 7a7f58575483 ("of: base: Shift
> > refcount decrement ..."), seem to work fine here, though. I've been
> > switching back and forth between the two, running both for more than a week
> > each time, without any issues. I'll attach my own bisect log.
> > 
> > I'm currently testing b006c439d58d ("hwrng: core - start hwrng ...") once
> > again, even though I had already run it for more than a week before
> > declaring it good. Since this issue only occurs sporadically it is
> difficult
> > to be sure if a commit is really good, even after a week without issues.
> > 
> > My system is a NUC7i3BNB. Actually these freezes happened a lot more often
> > at first, when I still was running the pre-installed Firmware. After
> > upgrading to the latest version available from Intel they become much more
> > sporadic, which didn't exactly help with bisecting.
> 
> So... Commit is 706eacadd5c5 is from v6.1 cycle, as is b006c439d58d. Does
> this reproduce with either with v6.1.59 or v6.5.8?
> 
> BR, Jarkko

My intel 8250u laptop freezes on suspend on latest 6.5.9.arch2-1 kernel
Comment 29 Serhii Tsynailo 2023-11-01 11:23:49 UTC
This issue started from 6.1 branch including latest 6.5.9
Comment 30 mahasler 2023-11-01 11:35:19 UTC
I've been running 706eacadd5c5 for a week with Intel Platform Trust Technology disabled in UEFI and then once again ada3bfb6492a with PTT enabled, in both cases without a freeze.

I'm now also testing 6.5.9.arch2-1 with PTT enabled.
Comment 31 mahasler 2023-11-05 23:06:53 UTC
As expected, 6.5.9.arch2-1 freezes for me, too. I'll try with PTT disabled next.
Comment 32 Serhii Tsynailo 2023-11-06 09:41:06 UTC
(In reply to mahasler from comment #31)
> As expected, 6.5.9.arch2-1 freezes for me, too. I'll try with PTT disabled
> next.

So far after 5 days with PTT disabled in BIOS on 6.5.9 archlinux kernel working with no problems.
Comment 33 Sergio 2024-01-13 21:08:14 UTC
Hi guys, my Asus UX310UQK laptop (Intel i7-7500U) is also affected by this bug. Unfortunately the option to disable PTT isn't present on its UEFI (last available version from 2020) so I'm stuck on kernel 5.15.

I've tried to blacklist all TPM modules but I'm still getting the intermittent freezes when the laptop suspends on kernel 6.1 and newer. I'm running Manjaro.
Comment 34 Sergio 2024-01-17 10:52:23 UTC
Just to add, I've tried kernel 6.7.0, same result, also with all TPM modules blacklisted and "bootctl status" reporting "TPM2 Support: firmware only, driver unavailable".

I was also thinking that soon there will be a lot more people affected by this bug, as Ubuntu 24.04 LTS comes out and switches people from kernel 5.15 (22.04 LTS) to a more recent kernel, which will be affected.
Comment 35 jarkko 2024-01-19 21:59:14 UTC
(In reply to jarkko from comment #23)
> I've tried my NUC7 and also my i9-13900k desktop with various v6.5 versions
> which should be fixed. I could not preproduce it. A dmesg log would help.

I'm sorry if I've missed this at some point but I'm not seeing a kernel log that would demonstrate the issue. Since it does not reproduce for any of the machines I have at hand with bisect information there is not much I can do unfortunately.

So way or another there should be klog output that shows what is going on when things fail.

This is the primary reason for sluggish progress...
Comment 36 jarkko 2024-01-19 22:00:31 UTC
Like only probably unfeasible alternative would be a hardware donation :-) Not expecting this from anyone, just pointing out that klog is now pretty essential.
Comment 37 jarkko 2024-01-19 22:03:07 UTC
Also preferably with unpatched mainline kernel. I really cannot say much of downstream kernels. I guess Arch kernel does not have any patches on top so I guess that is very useful but Ubuntu LTS has probably tons of patches on top of the original version so on those machines it would really require booting with the mainline.
Comment 38 jarkko 2024-01-19 22:11:23 UTC
If you have a bug with e.g. Ubuntu kernel, then a better route to get it fixed goes as follows:

1. Report to Ubuntu bug database.
2. Ubuntu's kernel team takes whatever actions it fits. They could e.g. report the issue in LKML.

I don't really care if there is a bug in a new Ubuntu release because it is not in the scope of this project but I do really care if there is an identifiable mainline bug. They know their downstream kernel best and could possibly produce the information that makes sense for me.

I don't have all the possible hardware existing nor do I have all the possible distributions installed, and I neither do have any customers. If you use Ubuntu, then you are in some ways customers of Canonical. The only responsibility I have is to deal with identified *mainline* bugs.

I've tested this with hardware I have and with the mainline kernel, and I do not get anything.
Comment 39 Sergio 2024-01-19 23:30:00 UTC
Hi Jarkko, thanks for the replies.

Unfortunately there seem to be no logs of this happening at all, as far as the logs are concerned the machine goes to sleep successfully. Maybe, as it was mentioned, the disk is already powered off when the machine freezes.

I'm not really sure how to help solving this as I'm relatively new to Linux, as you could probably tell by my Ubuntu comment :-) I don't use it but it came to my mind that it moving on from 5.15 would have caused lots of people to be affected, without realizing that it would be out of scope here. Sorry about that.

I hope that the more experienced people on this thread can help you track down this bug.
Comment 40 jarkko 2024-01-22 19:57:39 UTC
Yeah, I'd also suggest submitting bugs to specific distribution b(In reply to Sergio from comment #39)
> Hi Jarkko, thanks for the replies.
> 
> Unfortunately there seem to be no logs of this happening at all, as far as
> the logs are concerned the machine goes to sleep successfully. Maybe, as it
> was mentioned, the disk is already powered off when the machine freezes.
> 
> I'm not really sure how to help solving this as I'm relatively new to Linux,
> as you could probably tell by my Ubuntu comment :-) I don't use it but it
> came to my mind that it moving on from 5.15 would have caused lots of people
> to be affected, without realizing that it would be out of scope here. Sorry
> about that.
> 
> I hope that the more experienced people on this thread can help you track
> down this bug.

OK, I do get that. I'd suggest to put bug reports to specific distributions and link this bug over there. That could help on getting some relevant data, which might help to move this forward.

I.e. point is not to deny that there would not be a bug. The point is that bigger muscles than just me is needed bring up the root cause for the bug.

AFAIK, for Ubuntu the bug database exists in https://launchpad.net/
Comment 41 Sergio 2024-01-23 15:20:15 UTC
I've opened a support thread on the Manjaro forum, which is the distribution I use:

https://forum.manjaro.org/t/intermittent-freezes-at-suspend-with-kernel-6-1-and-later/155631
Comment 42 jarkko 2024-02-01 23:46:54 UTC
Despite the original report saying that journal did not show up anything useful, it would still make sense to any commenter to check if there is something interesting in it by issuing

journalctl -b -1 -k
Comment 43 Sergio 2024-02-02 15:31:21 UTC
Adding logs from the last time I ran kernel 6.6.10. After the last freeze below I booted back kernel 5.15.146 to avoid the freezes.

No point posting the whole logs as they're huge and a mix of the 2 different kernels (6.6 and 5.15) as I switched back and forth between them.


Logs with the laptop freezing on suspend (kernel 6.6.10):

Jan 29 09:41:50 zb kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15-x86_64 root=UUID=ff3e61d9-5eaf-4c18-a7a9-493fcf90e01c rw quiet
Jan 29 09:41:50 zb kernel: Linux version 5.15.146-1-MANJARO (builduser@fv-az1497-577) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU Binutils) 2.41.0) #1 SMP PREEMPT Fri Jan 5 16:20:43 UTC 2024
Jan 29 09:41:50 zb kernel: microcode: microcode updated early to revision 0xf4, date = 2023-02-22
-- Boot c16d91f46bdc4922828db058ab74a02e --
Jan 29 09:40:47 zb kernel: PM: suspend entry (deep)
Jan 29 09:40:47 zb systemd-sleep[46373]: Performing sleep operation 'suspend'...
Jan 29 09:40:47 zb systemd-sleep[46373]: Failed to lock home directories: Unknown object '/org/freedesktop/home1'.
Jan 29 09:40:47 zb kernel: ata1: SATA link down (SStatus 4 SControl 300)
Jan 29 09:40:46 zb wpa_supplicant[637]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Jan 29 09:40:46 zb wpa_supplicant[637]: wlp2s0: CTRL-EVENT-DSCP-POLICY clear_all
Jan 29 09:40:46 zb wpa_supplicant[637]: wlp2s0: CTRL-EVENT-DSCP-POLICY clear_all
Jan 29 09:40:46 zb systemd[1]: Starting System Suspend...
Jan 29 09:40:46 zb wpa_supplicant[637]: nl80211: deinit ifname=p2p-dev-wlp2s0 disabled_11b_rates=0
Jan 29 09:40:46 zb wpa_supplicant[637]: p2p-dev-wlp2s0: CTRL-EVENT-DSCP-POLICY clear_all
Jan 29 09:40:46 zb wpa_supplicant[637]: p2p-dev-wlp2s0: CTRL-EVENT-DSCP-POLICY clear_all
Jan 29 09:40:46 zb systemd[1]: Reached target Sleep.


Logs with the laptop NOT freezing on suspend (also kernel 6.6.10):

Jan 29 08:46:58 zb kernel: CPU1 is up
Jan 29 08:46:58 zb kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2
Jan 29 08:46:58 zb kernel: Enabling non-boot CPUs ...
Jan 29 08:46:58 zb kernel: ACPI: PM: Restoring platform NVS memory
Jan 29 08:46:58 zb kernel: ACPI: EC: EC started
Jan 29 08:46:58 zb kernel: ACPI: PM: Low-level resume complete
Jan 29 08:46:58 zb kernel: [Firmware Bug]: TSC ADJUST differs: CPU0 0 --> -319045650. Restoring
Jan 29 08:46:58 zb kernel: smpboot: CPU 3 is now offline
Jan 29 08:46:58 zb kernel: smpboot: CPU 2 is now offline
Jan 29 08:46:58 zb kernel: smpboot: CPU 1 is now offline
Jan 29 08:46:58 zb kernel: Disabling non-boot CPUs ...
Jan 29 08:46:58 zb kernel: ACPI: PM: Saving platform NVS memory
Jan 29 08:46:58 zb kernel: ACPI: EC: EC stopped
Jan 29 08:46:58 zb kernel: ACPI: EC: event blocked
Jan 29 08:46:58 zb kernel: ACPI: PM: Preparing to enter system sleep state S3
Jan 29 08:46:58 zb kernel: ACPI: EC: interrupt blocked
Jan 29 08:46:58 zb kernel: ata3.00: Entering standby power mode
Jan 29 08:46:58 zb kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Jan 29 08:46:58 zb kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Jan 29 08:46:58 zb kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
Jan 29 08:46:58 zb kernel: Freezing remaining freezable tasks
Jan 29 08:46:58 zb kernel: OOM killer disabled.
Jan 29 08:46:58 zb kernel: Freezing user space processes completed (elapsed 0.005 seconds)
Jan 29 08:46:58 zb kernel: Freezing user space processes
Jan 29 08:24:18 zb bluetoothd[572]: Failed to remove UUID: Failed (0x03)
Jan 29 08:24:18 zb bluetoothd[572]: Failed to remove UUID: Failed (0x03)
Jan 29 08:24:18 zb bluetoothd[572]: Failed to remove UUID: Failed (0x03)
Jan 29 08:24:18 zb bluetoothd[572]: Failed to remove UUID: Failed (0x03)
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/opus_05_duplex
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSink/opus_05_duplex
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/opus_05
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSink/opus_05
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/faststream_duplex
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/faststream
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/aptx_ll_duplex_0
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/aptx_ll_duplex_1
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/aptx_ll_0
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/aptx_ll_1
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/sbc_xq
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSink/sbc_xq
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/sbc
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSink/sbc
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/aac
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSink/aac
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/aptx
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSink/aptx
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/aptx_hd
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSink/aptx_hd
Jan 29 08:24:18 zb bluetoothd[572]: Endpoint unregistered: sender=:1.251 path=/MediaEndpoint/A2DPSource/ldac
Jan 29 08:24:18 zb kernel: Filesystems sync: 0.013 seconds
Jan 29 08:24:18 zb kernel: PM: suspend entry (deep)
Jan 29 08:24:18 zb systemd-sleep[43876]: Performing sleep operation 'suspend'...
Jan 29 08:24:18 zb systemd-sleep[43876]: Failed to lock home directories: Unknown object '/org/freedesktop/home1'.
Jan 29 08:24:18 zb kernel: ata1: SATA link down (SStatus 4 SControl 300)
Jan 29 08:24:17 zb wpa_supplicant[637]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Jan 29 08:24:17 zb wpa_supplicant[637]: wlp2s0: CTRL-EVENT-DSCP-POLICY clear_all
Jan 29 08:24:17 zb wpa_supplicant[637]: wlp2s0: CTRL-EVENT-DSCP-POLICY clear_all
Jan 29 08:24:17 zb systemd[1]: Starting System Suspend...
Jan 29 08:24:17 zb wpa_supplicant[637]: nl80211: deinit ifname=p2p-dev-wlp2s0 disabled_11b_rates=0
Jan 29 08:24:17 zb wpa_supplicant[637]: p2p-dev-wlp2s0: CTRL-EVENT-DSCP-POLICY clear_all
Jan 29 08:24:17 zb wpa_supplicant[637]: p2p-dev-wlp2s0: CTRL-EVENT-DSCP-POLICY clear_all
Jan 29 08:24:17 zb systemd[1]: Reached target Sleep.
Comment 44 Adam Alves 2024-03-06 22:12:46 UTC
Hi, I used to have the same issue until I decided to dig deep to find the root cause. After testing some changes in TPM shutdown code, I finally found the reason and already patched my local version with no hangs before suspend and shutdown anymore.

Some buggy firmwares might require the TPM device to be in default locality (Locality 0) before suspend or shutdown. Failing to do so would leave the system in a hanged state before sleep or power off (after “reboot: power down.” message). Such is the case for the ASUSTeK COMPUTER INC. TUF GAMING B460M-PLUS board, I believe this might be the case for several other boards based on the bugs I have found on the internet while trying to find out how to fix my specific issue. Most forums suggest the user to disable the TPM device on firmware BIOS in order to work around this specific issue, which disables several nice security features provided by TPM, such as secure boot attestation, automatic decryption and hardware random generator.

The patch enables a user to configure the kernel through “tpm.locality_on_suspend=1” boot parameter so that the locality is set before suspend/shutdown in order to diagnose whether or not the board is one of the buggy ones that require this workaround. Since this bug is related to the board/platform instead of the specific TPM chip, it also includes a call to dmi_check_system on the tpm_init function so that this setting is automatically enabled for boards specified in code (I already included my specific board on this patch) – automatic configuration would only work in case CONFIG_DMI is set though, since dmi_check_system is a non-op when CONFIG_DMI is not set.

In case “tpm.locality_on_suspend=0” (the default) this patch maintains the integrity of all current tpm-related functionality thus not changing behavior of any other board except ASUSTeK COMPUTER INC. TUF GAMING B460M-PLUS and possibly future boards as we successfully diagnose other boards with the same issue fixed by using “tpm.locality_on_suspend=1”.

My PC would hang on almost every shutdown/suspend until I started testing this patch and so far in the past week I haven’t experienced any problem anymore.

I suspect that the root cause on my specific board is that after the ACPI command to put the device to S3 or S5, some firmware application/driver will try to use the TPM chip expecting it to be in Locality 0 as expected by https://trustedcomputinggroup.org/wp-content/uploads/TCG-PC-Client-Platform-Firmware-Profile-Version-1.06-Revision-52_pub-1.pdf (3.1.1 – Pre-OS Environment) and then when it fails to do so it simply halts my whole system.

I am finding a way to submit the patch so that you might try this change on your versions to see if it fixes.
Comment 45 Adam Alves 2024-03-07 11:24:43 UTC
#44)
> I am finding a way to submit the patch so that you might try this change on
> your versions to see if it fixes.

Hi, I submitted the patch below for those who wish to test whether or not this is the case for your board.

https://patchwork.kernel.org/project/linux-integrity/patch/20240307000331.14848-2-adamoa@gmail.com/

Note You need to log in before you can comment on or make changes to this bug.