Bug 219368

Summary: System reboot on S3 sleep/wakeup test
Product: Platform Specific/Hardware Reporter: Mike Seo (mikeseohyungjin)
Component: x86-64Assignee: drivers_other
Status: RESOLVED DUPLICATE    
Severity: high CC: mikeseohyungjin
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 6.11 Subsystem:
Regression: No Bisected commit-id:

Description Mike Seo 2024-10-10 06:22:25 UTC
Dear tpm, and hw_random developers 

I'm working for LG laptops, and I have run serveral LG PC with ubuntu OS. You may know, most LG laptops has intel soc.
I found out a critical issue, system reboot on S3 sleep/wake up.

Enviornments:
- PC BIOS : Phoenix Technologies
- Intel Jasperlake or Intel Lunarlake 
- OS Ubuntu 22.04(Jasperlake), 24.04.1(Lunarlake)
- linux kernel version 6.x.0(Jasperlake) or up-to-date 6.11(Lunarlake)

Symptom:

Running the aging scripts like below, system reboots.
-------------------------
#!/bin/bash
<snip>
for (( i=1; i<=10000 ; i++ ))
sudo rtcwake -m mem -s 10 >> ${LOG} 2>&1
<snip>
-------------------------
The scripts works like below,
1. waits 10 secs
2. echo mem > /sys/power/state
3. waits 10 secs again and wake up system like press power button.


My analysis:

I had reproduced several times to find that BIOS side triggered the system reboots.
| pm_suspend() | syscore_suspend() | acpi_suspend_enter() | ... |  < BIOS > |  ...| acpi_suspend_enter() |  syscore_resume() | ...|

Debugging on BIOS, TPM2 can generate cold reset when it detects something wrong after TPM resuming.
In the BIOS code, if there are active PCR banks that are not supported by the Platform mask, it supposes to be update the TPM allocations and reboot the machine.

It means that something in linux kernel side can effect operations of  tpm when going to sleep.
So, I have debuggered and traced the functions related to tpm, such as tpm_chip_start whenever the symptoms represented.

In normal case, tpm_chip_start() called once like below,
 tpm_pm_suspend()-> tpm_chip_start().
but issued case, additionally called below
 hwrng_fillfn ->
  rng_get_data ->
    tpm_hwrng_read ->
      tpm_get_random ->
        tpm_find_get_ops ->
           tpm_try_get_ops ->
             tpm_chip_start ->

I found out that when running hwrng_fillfn(), related to Hardware random number generator,  called during system_sleep, it can cause system reboots.
To Verify it, I have tested with custom kernel which includes below patch.

-----------------------
From 373e92bb6d471c5fb42bacb97a4caf5375df5522 Mon Sep 17 00:00:00 2001
From: mike Seo <mikeseohyungjin@gmail.com>
Date: Thu, 10 Oct 2024 14:04:57 +0900
Subject: [PATCH] test_patch

test_patch for reboot while sleep/wakeup

Signed-off-by: mike Seo <mikeseohyungjin@gmail.com>
---
 drivers/char/hw_random/core.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 57c51efa5..d3f0059a4 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -25,6 +25,7 @@
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/uaccess.h>
+#include <linux/suspend.h>
 
 #define RNG_MODULE_NAME		"hw_random"
 
@@ -469,6 +470,22 @@ static struct attribute *rng_dev_attrs[] = {
 
 ATTRIBUTE_GROUPS(rng_dev);
 
+
+static int hwrng_pm_notification(struct notifier_block *nb, unsigned long action, void *data)
+{
+
+	switch (action) {
+	case PM_SUSPEND_PREPARE:
+		is_suspend_prepare = 1;
+		break;
+	default:
+		is_suspend_prepare = 0;
+		break;
+	}
+	return 0;
+}
+
+static struct notifier_block pm_notifier = { .notifier_call = hwrng_pm_notification };
 static int hwrng_fillfn(void *unused)
 {
 	size_t entropy, entropy_credit = 0; /* in 1/1024 of a bit */
@@ -478,6 +495,9 @@ static int hwrng_fillfn(void *unused)
 		unsigned short quality;
 		struct hwrng *rng;
 
+		while (is_suspend_prepare)
+			msleep(500);
+
 		rng = get_current_rng();
 		if (IS_ERR(rng) || !rng)
 			break;
@@ -549,6 +569,7 @@ int hwrng_register(struct hwrng *rng)
 			goto out_unlock;
 	}
 	mutex_unlock(&rng_mutex);
+	WARN_ON(register_pm_notifier(&pm_notifier));
 	return 0;
 out_unlock:
 	mutex_unlock(&rng_mutex);
-- 
2.43.0
------------------------

And I had passed over 10000 times of s3 wake/sleep aging test.

Can you make some patches for this issue and merges?

Thank you,
Mike
Comment 1 Mike Seo 2024-10-14 00:47:26 UTC

*** This bug has been marked as a duplicate of bug 219383 ***