Bug 99381 - Random values of a sensor reading
Summary: Random values of a sensor reading
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_power-thermal
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-03 11:20 UTC by Andrzej Wes
Modified: 2016-09-20 08:50 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.0.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump (468.74 KB, text/plain)
2015-06-30 07:41 UTC, Andrzej Wes
Details

Description Andrzej Wes 2015-06-03 11:20:59 UTC
Hello!

I'm using Arch Linux. 
I've recently upgraded disto kernel from 4.0.4-1 -> 4.0.4-2 and the difference between them is that there are new config options enabled in 4.0.4-2:
+CONFIG_THERMAL_HWMON=y
+CONFIG_HWMON=y

Since that I have a problem with random, automatic shutdown caused by the sensors readings.
I have a Acer V17 (VN7-791G) with NVidia graphics, but I don't use it (only the integrated Intel graphics).

Update introduced new section to the `sensors` output:

acpitz-virtual-0
Adapter: Virtual device
temp1:        +27.8°C  (crit = +105.0°C)
temp2:        +29.8°C  (crit = +105.0°C)
temp3:        +117.0°C  (crit = +100.0°C)
temp4:        +56.0°C  (crit = +92.0°C)

And the "temp3" is the problem. Here are's a few measurements (1 second step):
temp3:        +30.0°C  (crit = +100.0°C)
temp3:       +102.0°C  (crit = +100.0°C)
temp3:       +114.0°C  (crit = +100.0°C)
temp3:        +30.0°C  (crit = +100.0°C)
temp3:        +30.0°C  (crit = +100.0°C)
temp3:       +102.0°C  (crit = +100.0°C)
temp3:       +102.0°C  (crit = +100.0°C)
temp3:        +30.0°C  (crit = +100.0°C)
temp3:        +30.0°C  (crit = +100.0°C)
temp3:        +30.0°C  (crit = +100.0°C)
temp3:        +30.0°C  (crit = +100.0°C)
temp3:       +102.0°C  (crit = +100.0°C)
temp3:       +102.0°C  (crit = +100.0°C)
...

More data here: http://pastebin.com/AedH74NA
Here's also a plot of "temp3" (measured every 1s): http://i62.tinypic.com/i2sll4.png


`sensors-detects` recognized following hardware:
Intel digital thermal sensor...                             Success!
    (driver `coretemp')

Trying family `National Semiconductor/ITE'...               Yes    

What data should I provide to help with solving this problem?
Comment 1 Andrzej Wes 2015-06-10 07:43:43 UTC
Found in logs:
"kernel: thermal thermal_zone2: critical temperature reached(113 C),shutting down"

thermal_zone2 is:

root:/sys/class/thermal/thermal_zone2# grep -r . /sys/class/thermal/thermal_zone2
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/temp:30000
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone2/power/control:auto
/sys/class/thermal/thermal_zone2/power/async:disabled
/sys/class/thermal/thermal_zone2/power/runtime_enabled:disabled
/sys/class/thermal/thermal_zone2/power/runtime_active_kids:0
/sys/class/thermal/thermal_zone2/power/runtime_active_time:0
grep: /sys/class/thermal/thermal_zone2/power/autosuspend_delay_ms: Input/output error
/sys/class/thermal/thermal_zone2/power/runtime_status:unsupported
/sys/class/thermal/thermal_zone2/power/runtime_usage:0
/sys/class/thermal/thermal_zone2/power/runtime_suspended_time:0
/sys/class/thermal/thermal_zone2/trip_point_0_temp:100000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/policy:step_wise
/sys/class/thermal/thermal_zone2/passive:0


"thermal_zone2" is what `sensors` command displays as "temp3".
I have worked around (hopefully) the problem by disabling Kernel Thermal management with:
echo "disabled" > /sys/class/thermal/thermal_zone2/mode
Comment 2 Aaron Lu 2015-06-24 07:20:36 UTC
Please attach acpidump:
# acpidump > acpidump.txt
Comment 3 Aaron Lu 2015-06-24 07:22:00 UTC
BTW, in previous kernels, I suppose you still have thermal_zone2 and its temp file doesn't show a random high value?
Comment 4 Andrzej Wes 2015-06-30 07:41:19 UTC
Created attachment 181451 [details]
acpidump
Comment 5 Andrzej Wes 2015-06-30 07:56:34 UTC
Hi,
I've attached my acpidump, but it looks like the problem has been solved in 4.0.6!
dmesg shows "thermal zone will be disabled" and thermal_zone2 is disabled.
Thanks!

Note You need to log in before you can comment on or make changes to this bug.