I'm working for LG laptops, and I have run serveral LG PC with ubuntu OS. You may know, most LG laptops has intel soc. I found out a critical issue, system reboot on S3 sleep/wake up. Enviornments: - PC BIOS : Phoenix Technologies - Intel Jasperlake or Intel Lunarlake - OS Ubuntu 22.04(Jasperlake), 24.04.1(Lunarlake) - linux kernel version 6.x.0(Jasperlake) or up-to-date 6.11(Lunarlake) Symptom: Running the aging scripts like below, system reboots. ------------------------- #!/bin/bash <snip> for (( i=1; i<=10000 ; i++ )) sudo rtcwake -m mem -s 10 >> ${LOG} 2>&1 <snip> ------------------------- The scripts works like below, 1. waits 10 secs 2. echo mem > /sys/power/state 3. waits 10 secs again and wake up system like press power button. My analysis: I had reproduced several times to find that BIOS side triggered the system reboots. | pm_suspend() | syscore_suspend() | acpi_suspend_enter() | ... | < BIOS > | ...| acpi_suspend_enter() | syscore_resume() | ...| Debugging on BIOS, TPM2 can generate cold reset when it detects something wrong after TPM resuming. In the BIOS code, if there are active PCR banks that are not supported by the Platform mask, it supposes to be update the TPM allocations and reboot the machine. It means that something in linux kernel side can effect operations of tpm when going to sleep. So, I have debuggered and traced the functions related to tpm, such as tpm_chip_start whenever the symptoms represented. In normal case, tpm_chip_start() called once like below, tpm_pm_suspend()-> tpm_chip_start(). but issued case, additionally called below hwrng_fillfn -> rng_get_data -> tpm_hwrng_read -> tpm_get_random -> tpm_find_get_ops -> tpm_try_get_ops -> tpm_chip_start -> I found out that when running hwrng_fillfn(), related to Hardware random number generator, called during system_sleep, it can cause system reboots. To Verify it, I have tested with custom kernel which includes below patch. ----------------------- From 373e92bb6d471c5fb42bacb97a4caf5375df5522 Mon Sep 17 00:00:00 2001 From: mike Seo <mikeseohyungjin@gmail.com> Date: Thu, 10 Oct 2024 14:04:57 +0900 Subject: [PATCH] test_patch test_patch for reboot while sleep/wakeup Signed-off-by: mike Seo <mikeseohyungjin@gmail.com> --- drivers/char/hw_random/core.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c index 57c51efa5..d3f0059a4 100644 --- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -25,6 +25,7 @@ #include <linux/slab.h> #include <linux/string.h> #include <linux/uaccess.h> +#include <linux/suspend.h> #define RNG_MODULE_NAME "hw_random" @@ -469,6 +470,22 @@ static struct attribute *rng_dev_attrs[] = { ATTRIBUTE_GROUPS(rng_dev); + +static int hwrng_pm_notification(struct notifier_block *nb, unsigned long action, void *data) +{ + + switch (action) { + case PM_SUSPEND_PREPARE: + is_suspend_prepare = 1; + break; + default: + is_suspend_prepare = 0; + break; + } + return 0; +} + +static struct notifier_block pm_notifier = { .notifier_call = hwrng_pm_notification }; static int hwrng_fillfn(void *unused) { size_t entropy, entropy_credit = 0; /* in 1/1024 of a bit */ @@ -478,6 +495,9 @@ static int hwrng_fillfn(void *unused) unsigned short quality; struct hwrng *rng; + while (is_suspend_prepare) + msleep(500); + rng = get_current_rng(); if (IS_ERR(rng) || !rng) break; @@ -549,6 +569,7 @@ int hwrng_register(struct hwrng *rng) goto out_unlock; } mutex_unlock(&rng_mutex); + WARN_ON(register_pm_notifier(&pm_notifier)); return 0; out_unlock: mutex_unlock(&rng_mutex); -- 2.43.0 ------------------------ And I had passed over 10000 times of s3 wake/sleep aging test. Can you make some patches for this issue and merges? Thank you, Mike
*** Bug 219368 has been marked as a duplicate of this bug. ***
Dear All, It could be a big help if anyone responds....
Looks like TPM. CCing the proper people. On Mon, Oct 14, 2024 at 12:46:26AM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=219383 > > Bug ID: 219383 > Summary: System reboot on S3 sleep/wakeup test > Product: Platform Specific/Hardware > Version: 2.5 > Hardware: All > OS: Linux > Status: NEW > Severity: normal > Priority: P3 > Component: x86-64 > Assignee: platform_x86_64@kernel-bugs.osdl.org > Reporter: mikeseohyungjin@gmail.com > Regression: No > > I'm working for LG laptops, and I have run serveral LG PC with ubuntu OS. You > may know, most LG laptops has intel soc. > I found out a critical issue, system reboot on S3 sleep/wake up. > > Enviornments: > - PC BIOS : Phoenix Technologies > - Intel Jasperlake or Intel Lunarlake > - OS Ubuntu 22.04(Jasperlake), 24.04.1(Lunarlake) > - linux kernel version 6.x.0(Jasperlake) or up-to-date 6.11(Lunarlake) > > Symptom: > > Running the aging scripts like below, system reboots. > ------------------------- > #!/bin/bash > <snip> > for (( i=1; i<=10000 ; i++ )) > sudo rtcwake -m mem -s 10 >> ${LOG} 2>&1 > <snip> > ------------------------- > The scripts works like below, > 1. waits 10 secs > 2. echo mem > /sys/power/state > 3. waits 10 secs again and wake up system like press power button. > > > My analysis: > > I had reproduced several times to find that BIOS side triggered the system > reboots. > | pm_suspend() | syscore_suspend() | acpi_suspend_enter() | ... | < BIOS > | > ...| acpi_suspend_enter() | syscore_resume() | ...| > > Debugging on BIOS, TPM2 can generate cold reset when it detects something > wrong > after TPM resuming. > In the BIOS code, if there are active PCR banks that are not supported by the > Platform mask, it supposes to be update the TPM allocations and reboot the > machine. > > It means that something in linux kernel side can effect operations of tpm > when > going to sleep. > So, I have debuggered and traced the functions related to tpm, such as > tpm_chip_start whenever the symptoms represented. > > In normal case, tpm_chip_start() called once like below, > tpm_pm_suspend()-> tpm_chip_start(). > but issued case, additionally called below > hwrng_fillfn -> > rng_get_data -> > tpm_hwrng_read -> > tpm_get_random -> > tpm_find_get_ops -> > tpm_try_get_ops -> > tpm_chip_start -> > > I found out that when running hwrng_fillfn(), related to Hardware random > number > generator, called during system_sleep, it can cause system reboots. > To Verify it, I have tested with custom kernel which includes below patch. > > ----------------------- > From 373e92bb6d471c5fb42bacb97a4caf5375df5522 Mon Sep 17 00:00:00 2001 > From: mike Seo <mikeseohyungjin@gmail.com> > Date: Thu, 10 Oct 2024 14:04:57 +0900 > Subject: [PATCH] test_patch > > test_patch for reboot while sleep/wakeup > > Signed-off-by: mike Seo <mikeseohyungjin@gmail.com> > --- > drivers/char/hw_random/core.c | 21 +++++++++++++++++++++ > 1 file changed, 21 insertions(+) > > diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c > index 57c51efa5..d3f0059a4 100644 > --- a/drivers/char/hw_random/core.c > +++ b/drivers/char/hw_random/core.c > @@ -25,6 +25,7 @@ > #include <linux/slab.h> > #include <linux/string.h> > #include <linux/uaccess.h> > +#include <linux/suspend.h> > > #define RNG_MODULE_NAME "hw_random" > > @@ -469,6 +470,22 @@ static struct attribute *rng_dev_attrs[] = { > > ATTRIBUTE_GROUPS(rng_dev); > > + > +static int hwrng_pm_notification(struct notifier_block *nb, unsigned long > action, void *data) > +{ > + > + switch (action) { > + case PM_SUSPEND_PREPARE: > + is_suspend_prepare = 1; > + break; > + default: > + is_suspend_prepare = 0; > + break; > + } > + return 0; > +} > + > +static struct notifier_block pm_notifier = { .notifier_call = > hwrng_pm_notification }; > static int hwrng_fillfn(void *unused) > { > size_t entropy, entropy_credit = 0; /* in 1/1024 of a bit */ > @@ -478,6 +495,9 @@ static int hwrng_fillfn(void *unused) > unsigned short quality; > struct hwrng *rng; > > + while (is_suspend_prepare) > + msleep(500); > + > rng = get_current_rng(); > if (IS_ERR(rng) || !rng) > break; > @@ -549,6 +569,7 @@ int hwrng_register(struct hwrng *rng) > goto out_unlock; > } > mutex_unlock(&rng_mutex); > + WARN_ON(register_pm_notifier(&pm_notifier)); > return 0; > out_unlock: > mutex_unlock(&rng_mutex); > -- > 2.43.0 > ------------------------ > > And I had passed over 10000 times of s3 wake/sleep aging test. > > Can you make some patches for this issue and merges? > > Thank you, > Mike > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are watching the assignee of the bug.
On Mon Oct 21, 2024 at 7:02 PM EEST, Borislav Petkov wrote: > Looks like TPM. CCing the proper people. Thanks for forwarding this to me. My fixes for an issue with TPM bus encryption and AMD boot time have been stuck for a while (over a month) [1]. They've also tested by the reporter [2] but seems to be hard to get any reviewed-by's for this, which is weird given how dead obvious the changes are. There's already a decent plan for overhaul of that functionality [3] so looking into to the new issue you forwarded would make a lot of sense to me while at it. BR, Jarkko [1] https://lore.kernel.org/linux-integrity/20241021053921.33274-1-jarkko@kernel.org/T/#t [2] https://lore.kernel.org/linux-integrity/CALSz7m3SXE3v-yB=_E3Xf5zCDv6bAYhjb+KHrnZ6J14ay2q9sw@mail.gmail.com/ [3] https://chaos.social/@gromit/113345582873908273
I think next week I can send the PR for AMD boot-time fixes (thanks to Stefan Berger's help with reviewing them. As soon as I have bandwidth that week, I will look more into this.
Thank you. Mr. jarkko I have tested if your patch is affectable for 5 days, over 100 hours, made a based on linux-next, v6.12.0-rc4. And my issue does not reproduced. diff is like this, which is a little bit different for your [1] patch. diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 854546000c92..f223f497587d 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -678,8 +678,10 @@ void tpm_chip_unregister(struct tpm_chip *chip) if (tpm_is_hwrng_enabled(chip)) hwrng_unregister(&chip->hwrng); tpm_bios_log_teardown(chip); - if (chip->flags & TPM_CHIP_FLAG_TPM2 && !tpm_is_firmware_upgrade(chip)) + if (chip->flags & TPM_CHIP_FLAG_TPM2 && !tpm_is_firmware_upgrade(chip)){ + tpm2_end_auth_session(chip); tpm_devs_remove(chip); + } tpm_del_char_device(chip); } EXPORT_SYMBOL_GPL(tpm_chip_unregister); diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-common.c index c3fbbf4d3db7..7be817504a53 100644 --- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -47,8 +47,10 @@ static ssize_t tpm_dev_transmit(struct tpm_chip *chip, struct tpm_space *space, if (!ret) ret = tpm2_commit_space(chip, space, buf, &len); - else + else{ + tpm2_end_auth_session(chip); tpm2_flush_space(chip); + } out_rc: return ret ? ret : len; diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index 5da134f12c9a..8b2f5f4c682a 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -379,9 +379,10 @@ int tpm_pm_suspend(struct device *dev) rc = tpm_try_get_ops(chip); if (!rc) { - if (chip->flags & TPM_CHIP_FLAG_TPM2) + if (chip->flags & TPM_CHIP_FLAG_TPM2){ + tpm2_end_auth_session(chip); tpm2_shutdown(chip, TPM2_SU_STATE); - else + }else rc = tpm1_pm_suspend(chip, tpm_suspend_pcr); tpm_put_ops(chip); diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c index 511c67061728..e7f887efce24 100644 --- a/drivers/char/tpm/tpm2-sessions.c +++ b/drivers/char/tpm/tpm2-sessions.c @@ -333,6 +333,8 @@ void tpm_buf_append_hmac_session(struct tpm_chip *chip, struct tpm_buf *buf, } #ifdef CONFIG_TCG_TPM2_HMAC + /* The first write to /dev/tpm{rm0} will flush the session. */ + attributes |= TPM2_SA_CONTINUE_SESSION; /* * The Architecture Guide requires us to strip trailing zeros * before computing the HMAC I hope my test was helpful for your works. Thank you.
Created attachment 307087 [details] A fix candidate
(In reply to Mike Seo from comment #6) > Thank you. Mr. jarkko > > I have tested if your patch is affectable for 5 days, over 100 hours, made > a based on linux-next, v6.12.0-rc4. > And my issue does not reproduced. > diff is like this, which is a little bit different for your [1] patch. > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c > index 854546000c92..f223f497587d 100644 > --- a/drivers/char/tpm/tpm-chip.c > +++ b/drivers/char/tpm/tpm-chip.c > @@ -678,8 +678,10 @@ void tpm_chip_unregister(struct tpm_chip *chip) > if (tpm_is_hwrng_enabled(chip)) > hwrng_unregister(&chip->hwrng); > tpm_bios_log_teardown(chip); > - if (chip->flags & TPM_CHIP_FLAG_TPM2 && !tpm_is_firmware_upgrade(chip)) > + if (chip->flags & TPM_CHIP_FLAG_TPM2 && > !tpm_is_firmware_upgrade(chip)){ > + tpm2_end_auth_session(chip); > tpm_devs_remove(chip); > + } > tpm_del_char_device(chip); > } > EXPORT_SYMBOL_GPL(tpm_chip_unregister); > diff --git a/drivers/char/tpm/tpm-dev-common.c > b/drivers/char/tpm/tpm-dev-common.c > index c3fbbf4d3db7..7be817504a53 100644 > --- a/drivers/char/tpm/tpm-dev-common.c > +++ b/drivers/char/tpm/tpm-dev-common.c > @@ -47,8 +47,10 @@ static ssize_t tpm_dev_transmit(struct tpm_chip *chip, > struct tpm_space *space, > > if (!ret) > ret = tpm2_commit_space(chip, space, buf, &len); > - else > + else{ > + tpm2_end_auth_session(chip); > tpm2_flush_space(chip); > + } > > out_rc: > return ret ? ret : len; > diff --git a/drivers/char/tpm/tpm-interface.c > b/drivers/char/tpm/tpm-interface.c > index 5da134f12c9a..8b2f5f4c682a 100644 > --- a/drivers/char/tpm/tpm-interface.c > +++ b/drivers/char/tpm/tpm-interface.c > @@ -379,9 +379,10 @@ int tpm_pm_suspend(struct device *dev) > > rc = tpm_try_get_ops(chip); > if (!rc) { > - if (chip->flags & TPM_CHIP_FLAG_TPM2) > + if (chip->flags & TPM_CHIP_FLAG_TPM2){ > + tpm2_end_auth_session(chip); > tpm2_shutdown(chip, TPM2_SU_STATE); > - else > + }else > rc = tpm1_pm_suspend(chip, tpm_suspend_pcr); > > tpm_put_ops(chip); > diff --git a/drivers/char/tpm/tpm2-sessions.c > b/drivers/char/tpm/tpm2-sessions.c > index 511c67061728..e7f887efce24 100644 > --- a/drivers/char/tpm/tpm2-sessions.c > +++ b/drivers/char/tpm/tpm2-sessions.c > @@ -333,6 +333,8 @@ void tpm_buf_append_hmac_session(struct tpm_chip *chip, > struct tpm_buf *buf, > } > > #ifdef CONFIG_TCG_TPM2_HMAC > + /* The first write to /dev/tpm{rm0} will flush the session. */ > + attributes |= TPM2_SA_CONTINUE_SESSION; > /* > * The Architecture Guide requires us to strip trailing zeros > * before computing the HMAC > > > I hope my test was helpful for your works. > Thank you. I did not expect my fixes to fix this. It was fix for https://bugzilla.kernel.org/show_bug.cgi?id=219229. I sent pull request for it so should merged soon to mainline: https://lore.kernel.org/linux-integrity/D57UNU1ZUXJS.E2RDQDB8XFKI@kernel.org/T/#u I attached a fix candidate to the bug.
Dear Mr.jarkko, I have experienced this bug reported by me, Bug 219383 System reboot on S3 sleep/wakeup test, on various kernel versions among 6.1 to 6.12. On comment #6, actually I had tested in lg laptop based on lunarlake, uptodate platform of intel. But unstable kernel such as 6.11 and 6.12 can operate normally in the device. Anyway some devices without your boot time patch also didn't represent this symptom. So my comment #6 may be wrong. To correct this, I had japerlake devices which have had this bug. Let me test your fix candidate and let you know. Thank you for your supports.
(In reply to Mike Seo from comment #9) > Dear Mr.jarkko, > > I have experienced this bug reported by me, Bug 219383 System reboot on S3 > sleep/wakeup test, on various kernel versions among 6.1 to 6.12. > > On comment #6, actually I had tested in lg laptop based on lunarlake, > uptodate platform of intel. > But unstable kernel such as 6.11 and 6.12 can operate normally in the > device. > Anyway some devices without your boot time patch also didn't represent this > symptom. > So my comment #6 may be wrong. > > To correct this, I had japerlake devices which have had this bug. > Let me test your fix candidate and let you know. > Thank you for your supports. I refined my fix to be (fully I believe) race-free: https://lore.kernel.org/linux-integrity/20241031004501.165027-1-jarkko@kernel.org/T/#u
Created attachment 307096 [details] Fix candidate v2
So: I believe there was an actual race condition in tpm_pm_suspend(). If the 2nd candidate works for you I'll add "Tested-by: Mike Seo <mikeseohyungjin@gmail.com>" to the bug and it should be done then, and can still make into 6.12.
Created attachment 307111 [details] Fix candidate v3
https://lore.kernel.org/linux-integrity/D5AEYFC1VUYN.24WN7GVHN1MDU@kernel.org/T/#t
Created attachment 307118 [details] Final Fix
(In reply to Mike Seo from comment #9) > Dear Mr.jarkko, > > I have experienced this bug reported by me, Bug 219383 System reboot on S3 > sleep/wakeup test, on various kernel versions among 6.1 to 6.12. > > On comment #6, actually I had tested in lg laptop based on lunarlake, > uptodate platform of intel. > But unstable kernel such as 6.11 and 6.12 can operate normally in the > device. > Anyway some devices without your boot time patch also didn't represent this > symptom. > So my comment #6 may be wrong. > > To correct this, I had japerlake devices which have had this bug. > Let me test your fix candidate and let you know. > Thank you for your supports. See the attachment ("Final Fix"). It should fully address the issue so if possible check if it works for you, thanks.
Dear Mr. Jarkko, I tested your Fix candidate v3 with 10487 sleep/wake tests, almost 84 hours, and found out no problem. Without your patch, the symptom reproduced under 1000 tests in the device, which has kernel v6.8 and ubuntu 22.04. Let me test your Final Fix as well.
Dear Mr. Jarkko, I think your Final Fix and Fix candidate v3 are same. I am sure that your The Final Fix fixes this issue as well. Thank you for your efforts.
(In reply to Mike Seo from comment #18) > Dear Mr. Jarkko, > > I think your Final Fix and Fix candidate v3 are same. > I am sure that your The Final Fix fixes this issue as well. > Thank you for your efforts. Yeah for most part yes. OK I'll add your tested-by before sending the patch to Linus. Thank you for reporting this and help with testing!
Pull request sent: https://lore.kernel.org/linux-integrity/D5CYH0IJXX72.35A25M1YICZRP@kernel.org/
The fix landed. Thanks for the report!