Bug 210457

Summary: Fan sporadically maxed on wake-up due to unavailable sensor temperature
Product: Drivers Reporter: Daniel T. (pterion)
Component: Platform_x86Assignee: drivers_platform_x86 (drivers_platform_x86)
Status: NEW ---    
Severity: normal CC: enetor, jwrdegoede, kernel.org, ntran005, pterion, reescf, rui.zhang, s-cvhajmmblfsofmpsh
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4.0-53-generic Subsystem:
Regression: No Bisected commit-id:

Description Daniel T. 2020-12-02 15:37:48 UTC
I'm running Kubuntu on my Lenovo ThinkPad X1 Carbon (4th generation). Since a recent upgrade to 20.04 LTS, I have the following problem.

Upon wake-up from standby the laptop's fan is sporadically set on max speed (around 7000 RPM) regardless of the actual CPU load. It helps to put the laptop to sleep and wake it up again. Then there is a chance the fan will react normally to CPU load and temperature.

Running sensors when the fan is maxed, I get the following:

$ sensors
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +35.0°C  

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +42.5°C  

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.97 V  

thinkpad-isa-0000
Adapter: ISA adapter
fan1:        6932 RPM
temp1:            N/A  
temp2:            N/A  
temp3:         +0.0°C  
temp4:         +0.0°C  
temp5:         +0.0°C  
temp6:         +0.0°C  
temp7:         +0.0°C  
temp8:         +0.0°C  
temp9:         +0.0°C  
temp10:        +1.0°C  
temp11:        +0.0°C  
temp12:        +0.0°C  
temp13:        +0.0°C  
temp14:        +0.0°C  
temp15:        +0.0°C  
temp16:        +0.0°C  

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +43.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +42.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +43.0°C  (high = +100.0°C, crit = +100.0°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +48.0°C  (crit = +128.0°C)


And running it when the fan behaves normally, i.e. in accordance with the CPU load, I notice that temp1 is no longer N/A but has a value.

...
thinkpad-isa-0000
Adapter: ISA adapter
fan1:        3240 RPM
temp1:        +46.0°C  
temp2:            N/A  
...
Comment 1 cfr 2021-03-10 19:24:24 UTC
Lenovo ThinkPad X720 is also affected by this, but it is not sporadic, as far as I can tell. Not once since installing my current kernel have I seen normal behaviour following sleep.

The problem persists if the machine is rebooted. Only powering off and restarting restores the expected behaviour.

The bug seems to be a regression. See https://bugzilla.kernel.org/show_bug.cgi?id=196129. My machine was not affected by the original bug, but the problem looks to be the same.

Following sleep:

cat /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon7/temp*
0
1000
0
0
0
0
0
cat: /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon7/temp1_input: No such device or address
cat: /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon7/temp2_input: No such device or address
0
0
0
0
0
0
0

and

cat /proc/acpi/ibm/thermal 
temperatures:   -128 -128 0 0 0 0 0 0 0 0 1 0 0 0 0 0

Prior to sleep, only the second sensor is missing and acpitz-acpi-0 gives a sane reading. Following, the first goes AWOL and acpitz-acpi-0 is stuck at 48.

Removing and reloading thinkpad_acpi makes no difference and, for me, re-sleeping and rewaking makes no difference either.
Comment 2 cfr 2021-03-10 19:31:03 UTC
I should have looked at the date of this report. The problem I'm seeing is new. I installed a new kernel this week and the problem started then.

ArchLinux kernel package version is 5.11.4-arch1-1. I did NOT see this bug with 5.10.16.arch1-1.
Comment 3 cfr 2021-03-10 19:48:23 UTC
I tried adding 

acpi.ec_freeze_events=Y acpi.ec_suspend_yield=Y

to my kernel command line and rebooting, but I never got past the screen displaying the command. The machine froze and I had to poweroff. 

Is there a newer invocation I could try here?
Comment 4 cfr 2021-03-10 19:54:20 UTC
The bug does NOT manifest if I boot linux-lts 5.10.21-1. Following sleep:

cat /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon6/temp*
0
1000
0
0
0
0
0
38000
cat: /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon6/temp2_input: No such device or address
0
0
0
0
0
0
0

and the fan is at 0RPM, while acpitz-acpi-0 and thinkpad-isa-0000 temp1 are both +38.0°C.
Comment 5 Thorsten 2021-03-21 15:41:33 UTC
I have the same issue on X1 Carbon 5th generation (i.e. lacking of temp1 makes the fan run in max speed) and the bug still exists in 5.11.7.arch1-1. Going back to the lts-kernel (5.10.24-1) makes temp1 appear.
Comment 6 Zhang Rui 2021-03-22 12:51:52 UTC
this seems like a duplicate of #211313, right?
Comment 7 Thorsten 2021-03-22 13:23:04 UTC
Probably it's the same issue. However, here we know that the main issue is that the first entry of /proc/acpi/ibm/thermal is erronously -128 some (or most) of the time (so the fan is just a symptom, not the issue).

temperatures:   -128 -128 0 0 0 0 0 0 0 0 1 0 0 0 0 0

Iirc, the first entry is /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon*/temp1_input

(By the way, the second entry being -128 is fine)
Comment 8 cfr 2021-03-29 02:57:17 UTC
On my machine, the issue seems to have been caused by a firmware bug, which only manifested in symptoms with the kernel update. That is, the firmware was the same for 3+ years with no issue, the new kernel triggered the bug, but a firmware update seems to have resolved it.
Comment 9 Hans de Goede 2021-04-07 10:36:17 UTC
(In reply to cfr from comment #8)
> On my machine, the issue seems to have been caused by a firmware bug, which
> only manifested in symptoms with the kernel update. That is, the firmware
> was the same for 3+ years with no issue, the new kernel triggered the bug,
> but a firmware update seems to have resolved it.

Thank you.

Can the other reporters of this bug please also see if the latest BIOS resolves this? ThinkPad BIOS updates are available on lvfs, so they can be done under Linux through fwupdmgr.
Comment 10 Daniel T. 2021-04-09 17:01:56 UTC
(In reply to Hans de Goede from comment #9)
> (In reply to cfr from comment #8)
> > On my machine, the issue seems to have been caused by a firmware bug, which
> > only manifested in symptoms with the kernel update. That is, the firmware
> > was the same for 3+ years with no issue, the new kernel triggered the bug,
> > but a firmware update seems to have resolved it.
> 
> Thank you.
> 
> Can the other reporters of this bug please also see if the latest BIOS
> resolves this? ThinkPad BIOS updates are available on lvfs, so they can be
> done under Linux through fwupdmgr.

The situation is NOT resolved / the same for me after the recent firmware / BIOS update.
Comment 11 Nghia T 2021-05-05 09:04:59 UTC
I'm running X1 Carbon BIOS 1.48 (latest) and experienced this issue on kernel 5.4.0-58-generic. Also it was intermittent and not happened every resume from lid close/suspend.
Comment 12 Thorsten 2021-06-24 12:53:26 UTC
After updating to kernel 5.12.12(-arch1-1), the issue still existed, but only after multiple suspend/resume-cycles (I couldn't find a deterministic way to reproduce it). After a BIOS update I wasn't able to reproduce it anymore. The BIOS now reports: UEFI BIOS Version: N1MET65W (1.50), Embedded Controller Version: N1MHT31W (1.20), Machine Type Model: 20HQS3KG00. However, I do not own/hold the laptop anymore, so I can't tell whether the issue returns after running the laptop for a longer time (days, or so).