Bug 219368 - System reboot on S3 sleep/wakeup test
Summary: System reboot on S3 sleep/wakeup test
Status: RESOLVED DUPLICATE of bug 219383
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P3 high
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-10-10 06:22 UTC by Mike Seo
Modified: 2024-10-14 00:47 UTC (History)
1 user (show)

See Also:
Kernel Version: 6.11
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Mike Seo 2024-10-10 06:22:25 UTC
Dear tpm, and hw_random developers 

I'm working for LG laptops, and I have run serveral LG PC with ubuntu OS. You may know, most LG laptops has intel soc.
I found out a critical issue, system reboot on S3 sleep/wake up.

Enviornments:
- PC BIOS : Phoenix Technologies
- Intel Jasperlake or Intel Lunarlake 
- OS Ubuntu 22.04(Jasperlake), 24.04.1(Lunarlake)
- linux kernel version 6.x.0(Jasperlake) or up-to-date 6.11(Lunarlake)

Symptom:

Running the aging scripts like below, system reboots.
-------------------------
#!/bin/bash
<snip>
for (( i=1; i<=10000 ; i++ ))
sudo rtcwake -m mem -s 10 >> ${LOG} 2>&1
<snip>
-------------------------
The scripts works like below,
1. waits 10 secs
2. echo mem > /sys/power/state
3. waits 10 secs again and wake up system like press power button.


My analysis:

I had reproduced several times to find that BIOS side triggered the system reboots.
| pm_suspend() | syscore_suspend() | acpi_suspend_enter() | ... |  < BIOS > |  ...| acpi_suspend_enter() |  syscore_resume() | ...|

Debugging on BIOS, TPM2 can generate cold reset when it detects something wrong after TPM resuming.
In the BIOS code, if there are active PCR banks that are not supported by the Platform mask, it supposes to be update the TPM allocations and reboot the machine.

It means that something in linux kernel side can effect operations of  tpm when going to sleep.
So, I have debuggered and traced the functions related to tpm, such as tpm_chip_start whenever the symptoms represented.

In normal case, tpm_chip_start() called once like below,
 tpm_pm_suspend()-> tpm_chip_start().
but issued case, additionally called below
 hwrng_fillfn ->
  rng_get_data ->
    tpm_hwrng_read ->
      tpm_get_random ->
        tpm_find_get_ops ->
           tpm_try_get_ops ->
             tpm_chip_start ->

I found out that when running hwrng_fillfn(), related to Hardware random number generator,  called during system_sleep, it can cause system reboots.
To Verify it, I have tested with custom kernel which includes below patch.

-----------------------
From 373e92bb6d471c5fb42bacb97a4caf5375df5522 Mon Sep 17 00:00:00 2001
From: mike Seo <mikeseohyungjin@gmail.com>
Date: Thu, 10 Oct 2024 14:04:57 +0900
Subject: [PATCH] test_patch

test_patch for reboot while sleep/wakeup

Signed-off-by: mike Seo <mikeseohyungjin@gmail.com>
---
 drivers/char/hw_random/core.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 57c51efa5..d3f0059a4 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -25,6 +25,7 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/uaccess.h>
+#include <linux/suspend.h>
 
 #define RNG_MODULE_NAME		"hw_random"
 
@@ -469,6 +470,22 @@ static struct attribute *rng_dev_attrs[] = {
 
 ATTRIBUTE_GROUPS(rng_dev);
 
+
+static int hwrng_pm_notification(struct notifier_block *nb, unsigned long action, void *data)
+{
+
+	switch (action) {
+	case PM_SUSPEND_PREPARE:
+		is_suspend_prepare = 1;
+		break;
+	default:
+		is_suspend_prepare = 0;
+		break;
+	}
+	return 0;
+}
+
+static struct notifier_block pm_notifier = { .notifier_call = hwrng_pm_notification };
 static int hwrng_fillfn(void *unused)
 {
 	size_t entropy, entropy_credit = 0; /* in 1/1024 of a bit */
@@ -478,6 +495,9 @@ static int hwrng_fillfn(void *unused)
 		unsigned short quality;
 		struct hwrng *rng;
 
+		while (is_suspend_prepare)
+			msleep(500);
+
 		rng = get_current_rng();
 		if (IS_ERR(rng) || !rng)
 			break;
@@ -549,6 +569,7 @@ int hwrng_register(struct hwrng *rng)
 			goto out_unlock;
 	}
 	mutex_unlock(&rng_mutex);
+	WARN_ON(register_pm_notifier(&pm_notifier));
 	return 0;
 out_unlock:
 	mutex_unlock(&rng_mutex);
-- 
2.43.0
------------------------

And I had passed over 10000 times of s3 wake/sleep aging test.

Can you make some patches for this issue and merges?

Thank you,
Mike
Comment 1 Mike Seo 2024-10-14 00:47:26 UTC

*** This bug has been marked as a duplicate of bug 219383 ***

Note You need to log in before you can comment on or make changes to this bug.