Bug 203335
Summary: | Lenovo T580 always complains it's overheating | ||
---|---|---|---|
Product: | ACPI | Reporter: | Laurent Bigonville (bigon) |
Component: | ACPICA-Core | Assignee: | acpi_acpica-core (acpi_acpica-core) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | rui.zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
lspci output
dmesg 5.3-rc5 thermal.txt |
Description
Laurent Bigonville
2019-04-16 12:04:09 UTC
please attach the output of lspci -vx Created attachment 282379 [details]
lspci output
Please check if the problem can be reproduced in the latest upstream kernel. BTW, what distribution you're using? please make sure thermald is running. I'm still see that today with using kernel 5.2 from debian unstable thermald was not running, I installed it know. Let's see if this is working It's still complaining even with thermald Oh, please confirm CONFIG_PROC_THERMAL_MMIO_RAPL is set with your test. Looks like CONFIG_PROC_THERMAL_MMIO_RAPL doesn't exist in 5.2 (was added in 5.3) Created attachment 284875 [details]
dmesg 5.3-rc5
I updated to 5.3-rc5 (from debian) and I see that CONFIG_PROC_THERMAL_MMIO_RAPL is enabled, I'm still seeing the same messages:
[ 510.202054] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 92)
[ 510.202055] mce: CPU6: Core temperature above threshold, cpu clock throttled (total events = 92)
[ 510.202056] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 216)
[ 510.202057] mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 216)
[ 510.202058] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 216)
[ 510.202059] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 216)
[ 510.202060] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 216)
[ 510.202061] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 216)
[ 510.202090] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 216)
[ 510.202091] mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 216)
[ 510.202941] mce: CPU2: Core temperature/speed normal
[ 510.202942] mce: CPU6: Core temperature/speed normal
[ 510.202943] mce: CPU6: Package temperature/speed normal
[ 510.202943] mce: CPU2: Package temperature/speed normal
[ 510.202998] mce: CPU0: Package temperature/speed normal
[ 510.202999] mce: CPU3: Package temperature/speed normal
[ 510.203000] mce: CPU5: Package temperature/speed normal
[ 510.203000] mce: CPU7: Package temperature/speed normal
[ 510.203001] mce: CPU1: Package temperature/speed normal
[ 510.203002] mce: CPU4: Package temperature/speed normal
Looking in dmesg I see:
[ 62.127412] thermal thermal_zone8: failed to read out thermal zone (-61)
Is that expected?
(In reply to Laurent Bigonville from comment #8) > Created attachment 284875 [details] > dmesg 5.3-rc5 > > I updated to 5.3-rc5 (from debian) and I see that > CONFIG_PROC_THERMAL_MMIO_RAPL is enabled, I'm still seeing the same messages: > > [ 510.202054] mce: CPU2: Core temperature above threshold, cpu clock > throttled (total events = 92) > [ 510.202055] mce: CPU6: Core temperature above threshold, cpu clock > throttled (total events = 92) > [ 510.202056] mce: CPU5: Package temperature above threshold, cpu clock > throttled (total events = 216) > [ 510.202057] mce: CPU7: Package temperature above threshold, cpu clock > throttled (total events = 216) > [ 510.202058] mce: CPU3: Package temperature above threshold, cpu clock > throttled (total events = 216) > [ 510.202059] mce: CPU1: Package temperature above threshold, cpu clock > throttled (total events = 216) > [ 510.202060] mce: CPU6: Package temperature above threshold, cpu clock > throttled (total events = 216) > [ 510.202061] mce: CPU2: Package temperature above threshold, cpu clock > throttled (total events = 216) > [ 510.202090] mce: CPU0: Package temperature above threshold, cpu clock > throttled (total events = 216) > [ 510.202091] mce: CPU4: Package temperature above threshold, cpu clock > throttled (total events = 216) > [ 510.202941] mce: CPU2: Core temperature/speed normal > [ 510.202942] mce: CPU6: Core temperature/speed normal > [ 510.202943] mce: CPU6: Package temperature/speed normal > [ 510.202943] mce: CPU2: Package temperature/speed normal > [ 510.202998] mce: CPU0: Package temperature/speed normal > [ 510.202999] mce: CPU3: Package temperature/speed normal > [ 510.203000] mce: CPU5: Package temperature/speed normal > [ 510.203000] mce: CPU7: Package temperature/speed normal > [ 510.203001] mce: CPU1: Package temperature/speed normal > [ 510.203002] mce: CPU4: Package temperature/speed normal > I'm curious if the system is really overheating when these messages are generated. how often do you get these errors? > Looking in dmesg I see: > > [ 62.127412] thermal thermal_zone8: failed to read out thermal zone (-61) > > Is that expected? that should be okay. please attach the output of "grep . /sys/class/thermal/thermal*/*" Created attachment 284897 [details]
thermal.txt
I don't know, it also happened during the night when the laptop was not used and left unattended
Temperatures looks OK:
$ sensors
iwlwifi-virtual-0
Adapter: Virtual device
temp1: +49.0°C
BAT0-acpi-0
Adapter: ACPI interface
in0: +16.49 V
pch_skylake-virtual-0
Adapter: Virtual device
temp1: +43.0°C
acpitz-acpi-0
Adapter: ACPI interface
temp1: +45.0°C (crit = +98.0°C)
BAT1-acpi-0
Adapter: ACPI interface
in0: +12.68 V
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +45.0°C (high = +100.0°C, crit = +100.0°C)
Core 0: +45.0°C (high = +100.0°C, crit = +100.0°C)
Core 1: +44.0°C (high = +100.0°C, crit = +100.0°C)
Core 2: +45.0°C (high = +100.0°C, crit = +100.0°C)
Core 3: +44.0°C (high = +100.0°C, crit = +100.0°C)
thinkpad-isa-0000
Adapter: ISA adapter
fan1: 0 RPM
temp1: +45.0°C
temp2: N/A
temp3: +0.0°C
temp4: +0.0°C
temp5: +0.0°C
temp6: +0.0°C
temp7: +0.0°C
temp8: +0.0°C
temp9: +0.0°C
temp10: +0.0°C
temp11: +66.0°C
temp12: +0.0°C
temp13: +0.0°C
temp14: +0.0°C
temp15: +0.0°C
temp16: +0.0°C
But to answer your question, it's happening multiple times a day please run a kernel later than 5.4-4c2, check the location of file tcc_offset_degree_celsius by "find /sys/ | grep tcc_offset_degree_celsius" and then get the content of this file. bigon@edoras:~$ sudo find /sys/ | grep tcc_offset_degree_celsius /sys/devices/pci0000:00/0000:00:04.0/tcc_offset_degree_celsius bigon@edoras:~$ cat '/sys/devices/pci0000:00/0000:00:04.0/tcc_offset_degree_celsius' 24 bigon@edoras:~$ uname -a Linux edoras 5.4.0-1-amd64 #1 SMP Debian 5.4.6-1 (2019-12-27) x86_64 GNU/Linux I think you can set it to a smaller value to get rid of the overheating messages. say, run "echo 14 > /sys/devices/pci0000:00/0000:00:04.0/tcc_offset_degree_celsius" FTR, I found a long thread on lenovo forums that looks related: https://forums.lenovo.com/t5/Other-Linux-Discussions/X1C6-T480s-low-cTDP-and-trip-temperature-in-Linux/td-p/4028489 that is really a long thread. I think that can be fixed with thermald running, right? We have made kernel changes for thermald to improve this. Bug closed. Please feel free to reopen it if you still have any questions. |