Bug 203191 - The fan speed reports to 65535, despite the fan is stopped
Summary: The fan speed reports to 65535, despite the fan is stopped
Status: ASSIGNED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Platform_x86 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_platform_x86@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-09 07:59 UTC by Ilya
Modified: 2024-01-04 13:47 UTC (History)
13 users (show)

See Also:
Kernel Version: 4.15.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
system journal (261.52 KB, application/x-xz)
2019-04-09 15:46 UTC, Ilya
Details

Description Ilya 2019-04-09 07:59:33 UTC
sometimes after resume from suspend the fan speed reported as 65535. Suspend/resume helps to a problem.

What the actual problem is, that if the hardware starts to overheat, the system dont start the fan at all, still reporting the speed at 65535.

Have a Thinkpad T470s

dikiy@rosh:~$ sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +40.0°C  

thinkpad-isa-0000
Adapter: ISA adapter
fan1:        65535 RPM

acpitz-virtual-0
Adapter: Virtual device
temp1:        +38.0°C  (crit = +128.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +39.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +36.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +36.0°C  (high = +100.0°C, crit = +100.0°C)

pch_skylake-virtual-0
Adapter: Virtual device
temp1:        +35.5°C  

dikiy@rosh:~$ cat /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon2/device/hwmon/hwmon2/fan1_input 
65535
dikiy@rosh:~$
Comment 1 Ilya 2019-04-09 11:46:37 UTC
>Suspend/resume helps to a problem.

I mean hibernate (suspend to disk) helps. It seems, there is needed to reset the hardware an some sense.
Comment 2 Jean Delvare 2019-04-09 14:54:19 UTC
Ilya, is it a new problem, or did this machine always behave like that?

You will have to provide more information, starting with dmesg of regular boot and new lines in dmesg after suspend/resume.

I also invite you to look for a firmware update. Suspend/resume issues are often caused by BIOS bugs.
Comment 3 Ilya 2019-04-09 15:36:20 UTC
It is not y very new problem. I've noticed it one time, half year ago, o like this. When I tried to set the kernel against "Core Melting bugs" of Intel. You know what I mean.

And after apply new microcode I've noticed this bug. I undo the microcode loading and the bug disappeared. So I thought that I was related.

But now, seeing sometimes sensors output I notice _sometimes_ the same behavior. So I decided to report this as a bug.
Comment 4 Ilya 2019-04-09 15:36:51 UTC
>I also invite you to look for a firmware update

firmware update of what? Of a BIOS?
Comment 5 Ilya 2019-04-09 15:46:21 UTC
Created attachment 282195 [details]
system journal

output of journalctl -b

the bug seems to had occured between 08. april and 09.april
Comment 6 Jean Delvare 2019-04-09 19:47:24 UTC
(In reply to Ilya from comment #4)
> firmware update of what? Of a BIOS?

Yes. Lenovo releases BIOS updates for all their enterprise laptops on a regular basis, so I would start there.
Comment 7 Matthias Schiffer 2021-03-18 16:39:46 UTC
The T470 (without the 's' suffix) is affected as well. Still reproducible with the latest BIOS 1.65 and Linux 5.11.6.

I believe the issue is not directly related to suspend/resume, as I don't use suspend. Instead, the fan control can break any time the fan is supposed to spin up, which may be when waking up from suspend, or simply because of increased CPU workload.

Whenever the systems gets into the broken state:

- the fan is not spinning at all
- hwmon and /proc/acpi/ibm/fan report 65535 RPM
- manual control via /proc/acpi/ibm/fan is ignored (Note: I normally don't use this interface, but rely on the builtin fan control)
- only a full poweroff will fix the issue. When attempting to reboot without a poweroff, the BIOS will refuse to boot with a "Fan error" message.

As far as I remember, the issue originally started when Lenovo fixed a BIOS issue that caused the fan to hang at 100% after suspend occasionally a few years ago.

I'd be happy to provide any information that may help fix this issue, as I'm seeing it pretty much daily.
Comment 8 Hans de Goede 2021-04-07 10:33:58 UTC
This sounds like a firmware issue to me. I've asked Lenovo if they can take a look, no promises this will result in anything though.
Comment 9 Manuel Lopez Antequera 2021-08-19 08:08:01 UTC
I believe I'm also affected (on Fedora 33, 5.13.10). Can I help in any way to debug this?
Comment 10 mlk.nrw.de 2022-02-08 09:51:48 UTC
It seems that t420 is also affected

**BIOS Information**

- Version: 83ET82WW (1.52 )
- Release Date: 06/04/2018
- Firmware Revision: 1.20

**Base Board Information**

- Product Name: 4236PAG


Linux 4.13.0-32-generic #35-Ubuntu SMP Thu Jan 25 09:13:46 UTC 2018 x86_64
Comment 11 Stefano Fabri 2022-09-26 11:42:51 UTC
I can still confirm this bus with Linux 5.19.6
Comment 12 Stefano Fabri 2022-10-24 14:49:33 UTC
Kernel 6.0.3

- cat /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon7/fan2_input is always fixed at: 65535
Comment 13 Hans de Goede 2022-10-24 15:16:24 UTC
(In reply to Stefano Fabri from comment #12)
> Kernel 6.0.3
> 
> - cat /sys/devices/platform/thinkpad_hwmon/hwmon/hwmon7/fan2_input is always
> fixed at: 65535

There now sometimes being a fan2_input stuck at 65535 is a different issue which actually is a kernel bug. This is fixed by this patch, which I will send to Linux real soon:

https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/commit/?h=fixes&id=a10d50983f7befe85acf95ea7dbf6ba9187c2d70
Comment 14 Walter 2023-03-29 22:19:01 UTC
Just wanted to know it this will come or not,
my ThinkPad X201, running on 5.19.0-35-generic
still produces the bug, after sleep only heat,
fan1 shows 65535rmp but isn't moving.

only resolution: shutdown, remove battery and
power cord, wait, boot.


dualboot windows shows no problem with the fan and sleep mode.
Comment 15 Charalampos Mitrodimas 2023-04-17 20:41:34 UTC
Confirming on TP-T470s, kernel version 5.15.0-69-generic.
Comment 16 antony.gelberg 2023-11-22 12:02:54 UTC
Did anyone get anywhere with this? Suffering as well, P14s gen 2, Ubuntu 22.04 LTS, kernel 5.14.0-1048-oem, latest BIOS N34ET57W (1.57).

Not using thinkfan, laptop has always been pretty reliable for the last couple of years; this is a recent-ish problem for me, having to reboot every few hours is painful.

Am considering a warranty repair but will be inconvenient to be without laptop and unsure if it is a hardware problem.

Has anyone noticed anything in the system logs at all? I added a fan speed indicator in my XFCE panel to check (usually when I'm like hmm, the laptop is running slowly then I look at the fan speed and it's 65535).

Also, often, I'll hear the fan struggling to start running like brr brr, then sometimes it runs normally (2000-4000 RPM on average) but most times it's 65535 and a heating CPU.

Also, when I reboot, I get the BIOS Fan Error screen and the laptop powers off. Then I power it back on and things run normally for x hours.

The bug is in NEEDINFO but what info is needed?
Comment 17 antony.gelberg 2023-12-04 00:41:59 UTC
Circling back with an update.

More recently, I suddenly got 10 days without the issue occurring. I assumed that it was better, but today it's back again.

I've upgraded to 6.2.0-37-generic, but the issue still occurs (only two hours of uptime before rebooting).

So many people commented on this bug a while ago, is there still no insight into whether this is a potential hardware issue? I'd love to hear what worked (or not) for you because I can't believe everyone who commented has continued to suffer the issue for 2-4 years; it makes it "impossible" to work effectively.

I've opened a support ticket with Lenovo for a warranty repair but I am nervous they will say there's nothing wrong as the fan does pass the stress test in the BIOS / diagnostics (although I only ran it once).
Comment 18 antony.gelberg 2023-12-04 00:52:01 UTC
Apologies for the "spam". I noticed I have a `sensors` output before and after the issue, and there are some other differences.

1. GPU went from reporting a temperature to N/A.
2. ucsi_source_psy_USBC000:002-isa-0000 went from reporting a voltage to zero.

Could this be related? Could it indicate a more widespread failure e.g. motherboard fault than "just" the fan? Where would be a good place to get further community insight before shipping it off for repair and waiting for the possible "there is no hardware fault"?


Before:

$ sensors                               [156/156]
thinkpad-isa-0000                                                                     
Adapter: ISA adapter                                                                  
fan1:        4815 RPM                                                                 
CPU:          +51.0°C                                                                 
GPU:          +50.0°C                                                                 
temp3:        +33.0°C                                                                 
temp4:         +0.0°C                                                                 
temp5:        +42.0°C                                                                 
temp6:        +51.0°C                                                                 
temp7:        +51.0°C                                                                 
temp8:            N/A                                                                 
                                                                                      
coretemp-isa-0000                                                                     
Adapter: ISA adapter                                                                  
Package id 0:  +54.0°C  (high = +100.0°C, crit = +100.0°C)                            
Core 0:        +54.0°C  (high = +100.0°C, crit = +100.0°C)                            
Core 1:        +52.0°C  (high = +100.0°C, crit = +100.0°C)                            
Core 2:        +51.0°C  (high = +100.0°C, crit = +100.0°C)                            
Core 3:        +48.0°C  (high = +100.0°C, crit = +100.0°C)                            
                                                                                      
ucsi_source_psy_USBC000:001-isa-0000                                                  
Adapter: ISA adapter                                                                  
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)                               
curr1:         0.00 A  (max =  +0.00 A)                                               
                                                                                      
BAT0-acpi-0                                                                           
Adapter: ACPI interface                                                               
in0:          12.91 V                                                                 
                                                                                      
iwlwifi_1-virtual-0                                                                   
Adapter: Virtual device                                                               
temp1:        +58.0°C                                                                 
                                                                                      
ucsi_source_psy_USBC000:002-isa-0000                                                  
Adapter: ISA adapter                                                                  
in0:           5.00 V  (min =  +5.00 V, max = +20.00 V)                               
curr1:         3.00 A  (max =  +3.25 A)                                               
                                                                                      
nvme-pci-0400                                                                         
Adapter: PCI adapter                                                                  
Composite:    +37.9°C  (low  =  -0.1°C, high = +79.8°C)                               
                       (crit = +82.8°C)                                               
                                                                                      
acpitz-acpi-0                                                                         
Adapter: ACPI interface                                                               
temp1:        +51.0°C  (crit = +128.0°C)


After:

$ sensors
thinkpad-isa-0000
Adapter: ISA adapter
fan1:        65535 RPM
CPU:          +47.0°C  
GPU:              N/A  
temp3:        +34.0°C  
temp4:         +0.0°C  
temp5:        +42.0°C  
temp6:        +47.0°C  
temp7:        +47.0°C  
temp8:            N/A  

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +47.0°C  (high = +100.0°C, crit = +100.0°C)
Core 0:        +46.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:        +46.0°C  (high = +100.0°C, crit = +100.0°C)
Core 2:        +46.0°C  (high = +100.0°C, crit = +100.0°C)
Core 3:        +45.0°C  (high = +100.0°C, crit = +100.0°C)

ucsi_source_psy_USBC000:001-isa-0000
Adapter: ISA adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:         0.00 A  (max =  +0.00 A)

BAT0-acpi-0
Adapter: ACPI interface
in0:          12.08 V  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +44.0°C  

ucsi_source_psy_USBC000:002-isa-0000
Adapter: ISA adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:         0.00 A  (max =  +0.00 A)

nvme-pci-0400
Adapter: PCI adapter
Composite:    +32.9°C  (low  =  -0.1°C, high = +79.8°C)
                       (crit = +82.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +47.0°C  (crit = +128.0°C)
Comment 19 antony.gelberg 2023-12-04 01:03:42 UTC
Okay, final one (oh that Bugzilla would let one edit comments).

About the above, I realized that the discrepancy in ucsi_source_psy_USBC000:002-isa-0000 is because the PSU was unplugged in the second sample.

Also the GPU is now reporting temperature again (not sure if this is related to me reconnecting the PSU).

:shrug:
Comment 20 antony.gelberg 2024-01-04 13:47:53 UTC
Following up here. My laptop underwent warranty repair and all seems okay now. They replaced the fan module and the motherboard as they couldn't tell which was responsible due to the intermittent nature of the problem.

So I would say that it looks like this should be closed; if the problem still occurs I will report back.

Note You need to log in before you can comment on or make changes to this bug.