Bug 50041
Summary: | fan is actually on while ACPI shows it is off - HP NW9440 | ||
---|---|---|---|
Product: | ACPI | Reporter: | Matthias (morpheusxyz123) |
Component: | Power-Fan | Assignee: | Zhang Rui (rui.zhang) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | aaron.lu, adstl, lenb, me, me |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.6.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg
kernelconfig cpuinfo lspci acpidump of hp nw9440 Measurements for linux-3.9-rc7 at full speed of fan for comment 37 Findings for comment 39 |
Description
Matthias
2012-11-04 11:13:45 UTC
Created attachment 85451 [details]
dmesg
Created attachment 85461 [details]
kernelconfig
Created attachment 85471 [details]
cpuinfo
Created attachment 85481 [details]
lspci
> I can only observe this bug when the nvidia binary blob is
> enabled because no other driver lets the system reach the
> lowest possible fan speed.
Unclear that we can work on bugs that can only be reproduced when
the nvidia binary blob is enabled.
please attach the output of "grep . /sys/class/thermal/*/*" when system is idle and the fan is running in full speed. (In reply to comment #6) > please attach the output of "grep . /sys/class/thermal/*/*" when system is > idle > and the fan is running in full speed. I will post this as soon as possible. Thanks in advance for the help. ping... Today the bug did show up again. Here is the required information. grep . /sys/class/thermal/*/* /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device10/cur_state:0 /sys/class/thermal/cooling_device10/max_state:1 /sys/class/thermal/cooling_device10/type:Fan /sys/class/thermal/cooling_device11/cur_state:0 /sys/class/thermal/cooling_device11/max_state:10 /sys/class/thermal/cooling_device11/type:Processor /sys/class/thermal/cooling_device12/cur_state:0 /sys/class/thermal/cooling_device12/max_state:10 /sys/class/thermal/cooling_device12/type:Processor /sys/class/thermal/cooling_device1/cur_state:0 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:0 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:0 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state: /sys/class/thermal/cooling_device4/max_state:1 /sys/class/thermal/cooling_device4/type:Fan /sys/class/thermal/cooling_device5/cur_state:0 /sys/class/thermal/cooling_device5/max_state:1 /sys/class/thermal/cooling_device5/type:Fan /sys/class/thermal/cooling_device6/cur_state:0 /sys/class/thermal/cooling_device6/max_state:1 /sys/class/thermal/cooling_device6/type:Fan /sys/class/thermal/cooling_device7/cur_state:0 /sys/class/thermal/cooling_device7/max_state:1 /sys/class/thermal/cooling_device7/type:Fan /sys/class/thermal/cooling_device8/cur_state:0 /sys/class/thermal/cooling_device8/max_state:1 /sys/class/thermal/cooling_device8/type:Fan /sys/class/thermal/cooling_device9/cur_state:0 /sys/class/thermal/cooling_device9/max_state:1 /sys/class/thermal/cooling_device9/type:Fan /sys/class/thermal/thermal_zone0/cdev0_trip_point:5 /sys/class/thermal/thermal_zone0/cdev1_trip_point: /sys/class/thermal/thermal_zone0/cdev2_trip_point:3 /sys/class/thermal/thermal_zone0/cdev3_trip_point:2 /sys/class/thermal/thermal_zone0/cdev4_trip_point:1 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/passive:0 /sys/class/thermal/thermal_zone0/temp:48000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:256000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:91000 /sys/class/thermal/thermal_zone0/trip_point_1_type:active /sys/class/thermal/thermal_zone0/trip_point_2_temp:85000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:79000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:68000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:58000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/cdev0_trip_point:1 /sys/class/thermal/thermal_zone1/cdev1_trip_point:1 /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/temp:41000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/trip_point_1_temp:97000 /sys/class/thermal/thermal_zone1/trip_point_1_type:passive /sys/class/thermal/thermal_zone1/type:acpitz /sys/class/thermal/thermal_zone2/cdev0_trip_point:1 /sys/class/thermal/thermal_zone2/cdev1_trip_point:6 /sys/class/thermal/thermal_zone2/cdev2_trip_point:5 /sys/class/thermal/thermal_zone2/cdev3_trip_point:4 /sys/class/thermal/thermal_zone2/cdev4_trip_point:3 /sys/class/thermal/thermal_zone2/cdev5_trip_point:2 /sys/class/thermal/thermal_zone2/mode:enabled /sys/class/thermal/thermal_zone2/passive:0 /sys/class/thermal/thermal_zone2/temp:41000 /sys/class/thermal/thermal_zone2/trip_point_0_temp:126000 /sys/class/thermal/thermal_zone2/trip_point_0_type:critical /sys/class/thermal/thermal_zone2/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone2/trip_point_1_type:active /sys/class/thermal/thermal_zone2/trip_point_2_temp:86000 /sys/class/thermal/thermal_zone2/trip_point_2_type:active /sys/class/thermal/thermal_zone2/trip_point_3_temp:74000 /sys/class/thermal/thermal_zone2/trip_point_3_type:active /sys/class/thermal/thermal_zone2/trip_point_4_temp:67000 /sys/class/thermal/thermal_zone2/trip_point_4_type:active /sys/class/thermal/thermal_zone2/trip_point_5_temp:60000 /sys/class/thermal/thermal_zone2/trip_point_5_type:active /sys/class/thermal/thermal_zone2/trip_point_6_temp:55000 /sys/class/thermal/thermal_zone2/trip_point_6_type:active /sys/class/thermal/thermal_zone2/type:acpitz /sys/class/thermal/thermal_zone3/cdev0_trip_point:1 /sys/class/thermal/thermal_zone3/cdev1_trip_point:1 /sys/class/thermal/thermal_zone3/mode:enabled /sys/class/thermal/thermal_zone3/temp:36000 /sys/class/thermal/thermal_zone3/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone3/trip_point_0_type:critical /sys/class/thermal/thermal_zone3/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone3/trip_point_1_type:passive /sys/class/thermal/thermal_zone3/type:acpitz /sys/class/thermal/thermal_zone4/cdev0_trip_point:1 /sys/class/thermal/thermal_zone4/cdev1_trip_point:1 /sys/class/thermal/thermal_zone4/mode:enabled /sys/class/thermal/thermal_zone4/temp:31900 /sys/class/thermal/thermal_zone4/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone4/trip_point_0_type:critical /sys/class/thermal/thermal_zone4/trip_point_1_temp:60000 /sys/class/thermal/thermal_zone4/trip_point_1_type:passive /sys/class/thermal/thermal_zone4/type:acpitz /sys/class/thermal/thermal_zone5/mode:enabled /sys/class/thermal/thermal_zone5/passive:0 /sys/class/thermal/thermal_zone5/temp:20000 /sys/class/thermal/thermal_zone5/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone5/trip_point_0_type:critical /sys/class/thermal/thermal_zone5/type:acpitz (In reply to comment #9) > Today the bug did show up again. Here is the required information. > > grep . /sys/class/thermal/*/* > /sys/class/thermal/cooling_device0/cur_state:0 > /sys/class/thermal/cooling_device0/type:Fan > /sys/class/thermal/cooling_device10/cur_state:0 > /sys/class/thermal/cooling_device10/type:Fan > /sys/class/thermal/cooling_device1/cur_state:0 > /sys/class/thermal/cooling_device1/type:Fan > /sys/class/thermal/cooling_device2/cur_state:0 > /sys/class/thermal/cooling_device2/type:Fan > /sys/class/thermal/cooling_device3/cur_state:0 > /sys/class/thermal/cooling_device3/type:Fan > /sys/class/thermal/cooling_device4/cur_state: > /sys/class/thermal/cooling_device4/type:Fan > /sys/class/thermal/cooling_device5/cur_state:0 > /sys/class/thermal/cooling_device5/type:Fan > /sys/class/thermal/cooling_device6/cur_state:0 > /sys/class/thermal/cooling_device6/type:Fan > /sys/class/thermal/cooling_device7/cur_state:0 > /sys/class/thermal/cooling_device7/type:Fan > /sys/class/thermal/cooling_device8/cur_state:0 > /sys/class/thermal/cooling_device8/type:Fan > /sys/class/thermal/cooling_device9/cur_state:0 > /sys/class/thermal/cooling_device9/type:Fan well, all the these show that the ACPI FAN is in OFF state. hmmm, can you do the following test? when the system is idle and the fan does not spin, try "sudo echo 1 > /sys/class/thermal/cooling_deviceX/cur_state" for all the cooling devices with type "Fan", one by one. can you hear the fan spin after these commands? First of all I define the fan speeds: There are five hearable distinct fan speeds: [STEP_0]: lowest speed [STEP_1]: [STEP_0] +1 [STEP_2]: [STEP_1] +1 [STEP_3]: [STEP_2] +1 [STEP_4]: full speed I left the set settings alone and did go through the commands one by one: echo 1 > /sys/class/thermal/cooling_device0/cur_state: fan goes slowly from [STEP_0] to [STEP_4]. For your information the bug does turn the fan on to [STEP_4] instantly. echo 1 > /sys/class/thermal/cooling_device1/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device2/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device3/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device4/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device5/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device6/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device7/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device8/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device9/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device10/cur_state: no hearable change result: grep . /sys/class/thermal/*/* /sys/class/thermal/cooling_device0/cur_state:1 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device10/cur_state:1 /sys/class/thermal/cooling_device10/max_state:1 /sys/class/thermal/cooling_device10/type:Fan /sys/class/thermal/cooling_device11/cur_state:1 /sys/class/thermal/cooling_device11/max_state:10 /sys/class/thermal/cooling_device11/type:Processor /sys/class/thermal/cooling_device12/cur_state:1 /sys/class/thermal/cooling_device12/max_state:10 /sys/class/thermal/cooling_device12/type:Processor /sys/class/thermal/cooling_device1/cur_state:1 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:1 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:1 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state:1 /sys/class/thermal/cooling_device4/max_state:1 /sys/class/thermal/cooling_device4/type:Fan /sys/class/thermal/cooling_device5/cur_state:1 /sys/class/thermal/cooling_device5/max_state:1 /sys/class/thermal/cooling_device5/type:Fan /sys/class/thermal/cooling_device6/cur_state:1 /sys/class/thermal/cooling_device6/max_state:1 /sys/class/thermal/cooling_device6/type:Fan /sys/class/thermal/cooling_device7/cur_state:1 /sys/class/thermal/cooling_device7/max_state:1 /sys/class/thermal/cooling_device7/type:Fan /sys/class/thermal/cooling_device8/cur_state:1 /sys/class/thermal/cooling_device8/max_state:1 /sys/class/thermal/cooling_device8/type:Fan /sys/class/thermal/cooling_device9/cur_state:1 /sys/class/thermal/cooling_device9/max_state:1 /sys/class/thermal/cooling_device9/type:Fan Playing with the parameters independently: echo 1 > /sys/class/thermal/cooling_device0/cur_state: fan goes slowly from [STEP_0] to [STEP_4]. echo 0 > /sys/class/thermal/cooling_device0/cur_state: fan goes slowly back to [STEP_0]. echo 1 > /sys/class/thermal/cooling_device1/cur_state: fan goes slowly from [STEP_0] to [STEP_3]. echo 0 > /sys/class/thermal/cooling_device1/cur_state: fan goes slowly back to [STEP_0]. echo 1 > /sys/class/thermal/cooling_device2/cur_state: fan goes slowly from [STEP_0] to [STEP_2]. echo 0 > /sys/class/thermal/cooling_device2/cur_state: fan goes slowly back to [STEP_0] . echo 1 > /sys/class/thermal/cooling_device3/cur_state: fan goes slowly from [STEP_0] to [STEP_1]. echo 0 > /sys/class/thermal/cooling_device3/cur_state: fan goes slowly back to [STEP_0]. echo 1 > /sys/class/thermal/cooling_device4/cur_state: no hearable change echo 0 > /sys/class/thermal/cooling_device4/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device5/cur_state: fan goes slowly from [STEP_0] to [STEP_4]. echo 0 > /sys/class/thermal/cooling_device5/cur_state: fan goes slowly back to [STEP_0]. echo 1 > /sys/class/thermal/cooling_device6/cur_state: fan goes slowly from [STEP_0] to [STEP_3]. echo 0 > /sys/class/thermal/cooling_device6/cur_state: fan goes slowly back to [STEP_0]. echo 1 > /sys/class/thermal/cooling_device7/cur_state: fan goes slowly from [STEP_0] to [STEP_2]. echo 0 > /sys/class/thermal/cooling_device7/cur_state: fan goes slowly back to [STEP_0]. echo 1 > /sys/class/thermal/cooling_device8/cur_state: fan goes slowly from [STEP_0] to [STEP_1]. echo 0 > /sys/class/thermal/cooling_device8/cur_state: fan goes slowly back to [STEP_0]. echo 1 > /sys/class/thermal/cooling_device9/cur_state: no hearable change echo 0 > /sys/class/thermal/cooling_device9/cur_state: no hearable change echo 1 > /sys/class/thermal/cooling_device10/cur_state: no hearable change echo 0 > /sys/class/thermal/cooling_device10/cur_state: no hearable change thanks for your test. comment #11 shows that it is the ACPI FAN from which you hear the fan spinning, but comment #9 shows that the ACPI FAN state is off when you hear the fan spinning, is this correct? please attach the acpidump output of this box. Created attachment 87271 [details]
acpidump of hp nw9440
I suppose your statement is correct but to be absolutely sure I have to wait for the bug to happen again and take a look. So far I could observe that sometimes cooling_device4 and cooling_device9 are active (ON) when the bug happens. But this is not always the case. This seems to be a tricky one. I very much appreciate what you are doing. So thank you very much. Attached the output of acpidump. If you need something else, please let me know. Today the bug happened again. There was no fan active but the fan was spinning full speed. Bug is also in version 3.6.10. I will test 3.7.0 as soon as possible. Now I am running this kernel: Linux LAPPI 3.7.1 #1 SMP PREEMPT Wed Dec 19 12:51:23 CET 2012 x86_64 Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz GenuineIntel GNU/Linux The bug is also in this version. But now I get full fan speed when I resume from suspend2ram. Output from grep . /sys/class/thermal/*/* is provided below for the bug and after suspend2ram. The fan is also spinning at full speed after suspend2ram when I use the nouveau driver for my nvidia card. after suspend: /sys/class/thermal/cooling_device0/cur_state:1 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device10/cur_state:1 /sys/class/thermal/cooling_device10/max_state:1 /sys/class/thermal/cooling_device10/type:Fan /sys/class/thermal/cooling_device11/cur_state:0 /sys/class/thermal/cooling_device11/max_state:10 /sys/class/thermal/cooling_device11/type:Processor /sys/class/thermal/cooling_device12/cur_state:0 /sys/class/thermal/cooling_device12/max_state:10 /sys/class/thermal/cooling_device12/type:Processor /sys/class/thermal/cooling_device13/cur_state:0 /sys/class/thermal/cooling_device13/max_state:10 /sys/class/thermal/cooling_device13/type:LCD /sys/class/thermal/cooling_device1/cur_state:1 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:1 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:0 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state:0 /sys/class/thermal/cooling_device4/max_state:1 /sys/class/thermal/cooling_device4/type:Fan /sys/class/thermal/cooling_device5/cur_state:1 /sys/class/thermal/cooling_device5/max_state:1 /sys/class/thermal/cooling_device5/type:Fan /sys/class/thermal/cooling_device6/cur_state:1 /sys/class/thermal/cooling_device6/max_state:1 /sys/class/thermal/cooling_device6/type:Fan /sys/class/thermal/cooling_device7/cur_state:1 /sys/class/thermal/cooling_device7/max_state:1 /sys/class/thermal/cooling_device7/type:Fan /sys/class/thermal/cooling_device8/cur_state:1 /sys/class/thermal/cooling_device8/max_state:1 /sys/class/thermal/cooling_device8/type:Fan /sys/class/thermal/cooling_device9/cur_state:1 /sys/class/thermal/cooling_device9/max_state:1 /sys/class/thermal/cooling_device9/type:Fan /sys/class/thermal/thermal_zone0/cdev0_trip_point:5 /sys/class/thermal/thermal_zone0/cdev1_trip_point:4 /sys/class/thermal/thermal_zone0/cdev2_trip_point:3 /sys/class/thermal/thermal_zone0/cdev3_trip_point:2 /sys/class/thermal/thermal_zone0/cdev4_trip_point:1 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/passive:0 /sys/class/thermal/thermal_zone0/temp:29000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:256000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:91000 /sys/class/thermal/thermal_zone0/trip_point_1_type:active /sys/class/thermal/thermal_zone0/trip_point_2_temp:85000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:79000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:68000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:58000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/cdev0_trip_point:1 /sys/class/thermal/thermal_zone1/cdev1_trip_point:1 /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/temp:34000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/trip_point_1_temp:97000 /sys/class/thermal/thermal_zone1/trip_point_1_type:passive /sys/class/thermal/thermal_zone1/type:acpitz /sys/class/thermal/thermal_zone2/cdev0_trip_point:1 /sys/class/thermal/thermal_zone2/cdev1_trip_point:6 /sys/class/thermal/thermal_zone2/cdev2_trip_point:5 /sys/class/thermal/thermal_zone2/cdev3_trip_point:4 /sys/class/thermal/thermal_zone2/cdev4_trip_point:3 /sys/class/thermal/thermal_zone2/cdev5_trip_point:2 /sys/class/thermal/thermal_zone2/mode:enabled /sys/class/thermal/thermal_zone2/passive:0 /sys/class/thermal/thermal_zone2/temp:32000 /sys/class/thermal/thermal_zone2/trip_point_0_temp:126000 /sys/class/thermal/thermal_zone2/trip_point_0_type:critical /sys/class/thermal/thermal_zone2/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone2/trip_point_1_type:active /sys/class/thermal/thermal_zone2/trip_point_2_temp:86000 /sys/class/thermal/thermal_zone2/trip_point_2_type:active /sys/class/thermal/thermal_zone2/trip_point_3_temp:74000 /sys/class/thermal/thermal_zone2/trip_point_3_type:active /sys/class/thermal/thermal_zone2/trip_point_4_temp:67000 /sys/class/thermal/thermal_zone2/trip_point_4_type:active /sys/class/thermal/thermal_zone2/trip_point_5_temp:60000 /sys/class/thermal/thermal_zone2/trip_point_5_type:active /sys/class/thermal/thermal_zone2/trip_point_6_temp:55000 /sys/class/thermal/thermal_zone2/trip_point_6_type:active /sys/class/thermal/thermal_zone2/type:acpitz /sys/class/thermal/thermal_zone3/cdev0_trip_point:1 /sys/class/thermal/thermal_zone3/cdev1_trip_point:1 /sys/class/thermal/thermal_zone3/mode:enabled /sys/class/thermal/thermal_zone3/temp:25000 /sys/class/thermal/thermal_zone3/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone3/trip_point_0_type:critical /sys/class/thermal/thermal_zone3/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone3/trip_point_1_type:passive /sys/class/thermal/thermal_zone3/type:acpitz /sys/class/thermal/thermal_zone4/cdev0_trip_point:1 /sys/class/thermal/thermal_zone4/cdev1_trip_point:1 /sys/class/thermal/thermal_zone4/mode:enabled /sys/class/thermal/thermal_zone4/temp:26100 /sys/class/thermal/thermal_zone4/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone4/trip_point_0_type:critical /sys/class/thermal/thermal_zone4/trip_point_1_temp:60000 /sys/class/thermal/thermal_zone4/trip_point_1_type:passive /sys/class/thermal/thermal_zone4/type:acpitz /sys/class/thermal/thermal_zone5/mode:enabled /sys/class/thermal/thermal_zone5/passive:0 /sys/class/thermal/thermal_zone5/temp:100000 /sys/class/thermal/thermal_zone5/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone5/trip_point_0_type:critical /sys/class/thermal/thermal_zone5/type:acpitz bug: grep . /sys/class/thermal/*/* /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device10/cur_state:0 /sys/class/thermal/cooling_device10/max_state:1 /sys/class/thermal/cooling_device10/type:Fan /sys/class/thermal/cooling_device11/cur_state:0 /sys/class/thermal/cooling_device11/max_state:10 /sys/class/thermal/cooling_device11/type:Processor /sys/class/thermal/cooling_device12/cur_state:0 /sys/class/thermal/cooling_device12/max_state:10 /sys/class/thermal/cooling_device12/type:Processor /sys/class/thermal/cooling_device13/cur_state:0 /sys/class/thermal/cooling_device13/max_state:10 /sys/class/thermal/cooling_device13/type:LCD /sys/class/thermal/cooling_device1/cur_state:0 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:0 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:0 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state:0 /sys/class/thermal/cooling_device4/max_state:1 /sys/class/thermal/cooling_device4/type:Fan /sys/class/thermal/cooling_device5/cur_state:0 /sys/class/thermal/cooling_device5/max_state:1 /sys/class/thermal/cooling_device5/type:Fan /sys/class/thermal/cooling_device6/cur_state:0 /sys/class/thermal/cooling_device6/max_state:1 /sys/class/thermal/cooling_device6/type:Fan /sys/class/thermal/cooling_device7/cur_state:0 /sys/class/thermal/cooling_device7/max_state:1 /sys/class/thermal/cooling_device7/type:Fan /sys/class/thermal/cooling_device8/cur_state:0 /sys/class/thermal/cooling_device8/max_state:1 /sys/class/thermal/cooling_device8/type:Fan /sys/class/thermal/cooling_device9/cur_state:0 /sys/class/thermal/cooling_device9/max_state:1 /sys/class/thermal/cooling_device9/type:Fan /sys/class/thermal/thermal_zone0/cdev0_trip_point:5 /sys/class/thermal/thermal_zone0/cdev1_trip_point:4 /sys/class/thermal/thermal_zone0/cdev2_trip_point:3 /sys/class/thermal/thermal_zone0/cdev3_trip_point:2 /sys/class/thermal/thermal_zone0/cdev4_trip_point:1 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/passive:0 /sys/class/thermal/thermal_zone0/temp:48000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:256000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:91000 /sys/class/thermal/thermal_zone0/trip_point_1_type:active /sys/class/thermal/thermal_zone0/trip_point_2_temp:85000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:79000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:68000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:58000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/cdev0_trip_point:1 /sys/class/thermal/thermal_zone1/cdev1_trip_point:1 /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/temp:40000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/trip_point_1_temp:97000 /sys/class/thermal/thermal_zone1/trip_point_1_type:passive /sys/class/thermal/thermal_zone1/type:acpitz /sys/class/thermal/thermal_zone2/cdev0_trip_point:1 /sys/class/thermal/thermal_zone2/cdev1_trip_point:6 /sys/class/thermal/thermal_zone2/cdev2_trip_point:5 /sys/class/thermal/thermal_zone2/cdev3_trip_point:4 /sys/class/thermal/thermal_zone2/cdev4_trip_point:3 /sys/class/thermal/thermal_zone2/cdev5_trip_point:2 /sys/class/thermal/thermal_zone2/mode:enabled /sys/class/thermal/thermal_zone2/passive:0 /sys/class/thermal/thermal_zone2/temp:37000 /sys/class/thermal/thermal_zone2/trip_point_0_temp:126000 /sys/class/thermal/thermal_zone2/trip_point_0_type:critical /sys/class/thermal/thermal_zone2/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone2/trip_point_1_type:active /sys/class/thermal/thermal_zone2/trip_point_2_temp:86000 /sys/class/thermal/thermal_zone2/trip_point_2_type:active /sys/class/thermal/thermal_zone2/trip_point_3_temp:74000 /sys/class/thermal/thermal_zone2/trip_point_3_type:active /sys/class/thermal/thermal_zone2/trip_point_4_temp:67000 /sys/class/thermal/thermal_zone2/trip_point_4_type:active /sys/class/thermal/thermal_zone2/trip_point_5_temp:60000 /sys/class/thermal/thermal_zone2/trip_point_5_type:active /sys/class/thermal/thermal_zone2/trip_point_6_temp:55000 /sys/class/thermal/thermal_zone2/trip_point_6_type:active /sys/class/thermal/thermal_zone2/type:acpitz /sys/class/thermal/thermal_zone3/cdev0_trip_point:1 /sys/class/thermal/thermal_zone3/cdev1_trip_point:1 /sys/class/thermal/thermal_zone3/mode:enabled /sys/class/thermal/thermal_zone3/temp:33000 /sys/class/thermal/thermal_zone3/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone3/trip_point_0_type:critical /sys/class/thermal/thermal_zone3/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone3/trip_point_1_type:passive /sys/class/thermal/thermal_zone3/type:acpitz /sys/class/thermal/thermal_zone4/cdev0_trip_point:1 /sys/class/thermal/thermal_zone4/cdev1_trip_point:1 /sys/class/thermal/thermal_zone4/mode:enabled /sys/class/thermal/thermal_zone4/temp:29500 /sys/class/thermal/thermal_zone4/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone4/trip_point_0_type:critical /sys/class/thermal/thermal_zone4/trip_point_1_temp:60000 /sys/class/thermal/thermal_zone4/trip_point_1_type:passive /sys/class/thermal/thermal_zone4/type:acpitz /sys/class/thermal/thermal_zone5/mode:enabled /sys/class/thermal/thermal_zone5/passive:0 /sys/class/thermal/thermal_zone5/temp:20000 /sys/class/thermal/thermal_zone5/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone5/trip_point_0_type:critical /sys/class/thermal/thermal_zone5/type:acpitz Since kernel version 3.7 the nouveau driver is working properly on my laptop. Today the bug occurred while using the nouveau driver. Now I have a non tainted kernel. So I am hoping that this helps in finding the cause of the bug. Please tell me what I can do to solve this. The bug is also in kernel version 3.7.4. According to acpi -V the states of the cooling devices are not changing when the bug happens. The states just stay the same as before the bug. 3.7.6 is also affected. 3.8.0 is also affected. I think the problem should be fixed by this commit commit b8bb6cb999858043489c1ddef08eed2127559169 Author: Zhang Rui <rui.zhang@intel.com> Date: Thu Nov 22 15:45:02 2012 +0800 step_wise: Unify the code for both throttle and dethrottle Signed-off-by: Zhang Rui <rui.zhang@intel.com> so please check if the problm still exists in 3.9-rc1. The laptop is running with that kernel now. I'll let you know what happens. Good news, so far the bug is gone. But I'd like to test this kernel a little bit longer to be absolutely sure. Bad news, the dethrotteling does not work. The fan stays on the highest speed it reached and stays there. What I did was the following: Put load on the machine with cat /dev/zero > /dev/null on both cpu cores. Wait for the fan to spin up. Then I terminated the two cat processes when the fan speed hit [STEP_3] (last but one). I watched with htop the cpu utilization go down and waited 15 minutes for the fan to slow down. But it did not. grep . /sys/class/thermal/*/* at that time: /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device10/cur_state:0 /sys/class/thermal/cooling_device10/max_state:1 /sys/class/thermal/cooling_device10/type:Fan /sys/class/thermal/cooling_device11/cur_state:1 /sys/class/thermal/cooling_device11/max_state:10 /sys/class/thermal/cooling_device11/type:LCD /sys/class/thermal/cooling_device12/cur_state:0 /sys/class/thermal/cooling_device12/max_state:10 /sys/class/thermal/cooling_device12/type:Processor /sys/class/thermal/cooling_device13/cur_state:0 /sys/class/thermal/cooling_device13/max_state:10 /sys/class/thermal/cooling_device13/type:Processor /sys/class/thermal/cooling_device1/cur_state:0 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:1 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:1 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state:1 /sys/class/thermal/cooling_device4/max_state:1 /sys/class/thermal/cooling_device4/type:Fan /sys/class/thermal/cooling_device5/cur_state:0 /sys/class/thermal/cooling_device5/max_state:1 /sys/class/thermal/cooling_device5/type:Fan /sys/class/thermal/cooling_device6/cur_state:0 /sys/class/thermal/cooling_device6/max_state:1 /sys/class/thermal/cooling_device6/type:Fan /sys/class/thermal/cooling_device7/cur_state:0 /sys/class/thermal/cooling_device7/max_state:1 /sys/class/thermal/cooling_device7/type:Fan /sys/class/thermal/cooling_device8/cur_state:0 /sys/class/thermal/cooling_device8/max_state:1 /sys/class/thermal/cooling_device8/type:Fan /sys/class/thermal/cooling_device9/cur_state:1 /sys/class/thermal/cooling_device9/max_state:1 /sys/class/thermal/cooling_device9/type:Fan /sys/class/thermal/thermal_zone0/cdev0_trip_point:5 /sys/class/thermal/thermal_zone0/cdev1_trip_point:4 /sys/class/thermal/thermal_zone0/cdev2_trip_point:3 /sys/class/thermal/thermal_zone0/cdev3_trip_point:2 /sys/class/thermal/thermal_zone0/cdev4_trip_point:1 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/passive:0 /sys/class/thermal/thermal_zone0/policy:step_wise /sys/class/thermal/thermal_zone0/temp:48000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:256000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:91000 /sys/class/thermal/thermal_zone0/trip_point_1_type:active /sys/class/thermal/thermal_zone0/trip_point_2_temp:85000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:79000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:68000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:58000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/cdev0_trip_point:1 /sys/class/thermal/thermal_zone1/cdev1_trip_point:1 /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/policy:step_wise /sys/class/thermal/thermal_zone1/temp:39000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/trip_point_1_temp:97000 /sys/class/thermal/thermal_zone1/trip_point_1_type:passive /sys/class/thermal/thermal_zone1/type:acpitz /sys/class/thermal/thermal_zone2/cdev0_trip_point:1 /sys/class/thermal/thermal_zone2/cdev1_trip_point:6 /sys/class/thermal/thermal_zone2/cdev2_trip_point:5 /sys/class/thermal/thermal_zone2/cdev3_trip_point:4 /sys/class/thermal/thermal_zone2/cdev4_trip_point:3 /sys/class/thermal/thermal_zone2/cdev5_trip_point:2 /sys/class/thermal/thermal_zone2/mode:enabled /sys/class/thermal/thermal_zone2/passive:0 /sys/class/thermal/thermal_zone2/policy:step_wise /sys/class/thermal/thermal_zone2/temp:45000 /sys/class/thermal/thermal_zone2/trip_point_0_temp:126000 /sys/class/thermal/thermal_zone2/trip_point_0_type:critical /sys/class/thermal/thermal_zone2/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone2/trip_point_1_type:active /sys/class/thermal/thermal_zone2/trip_point_2_temp:86000 /sys/class/thermal/thermal_zone2/trip_point_2_type:active /sys/class/thermal/thermal_zone2/trip_point_3_temp:74000 /sys/class/thermal/thermal_zone2/trip_point_3_type:active /sys/class/thermal/thermal_zone2/trip_point_4_temp:67000 /sys/class/thermal/thermal_zone2/trip_point_4_type:active /sys/class/thermal/thermal_zone2/trip_point_5_temp:60000 /sys/class/thermal/thermal_zone2/trip_point_5_type:active /sys/class/thermal/thermal_zone2/trip_point_6_temp:42000 /sys/class/thermal/thermal_zone2/trip_point_6_type:active /sys/class/thermal/thermal_zone2/type:acpitz /sys/class/thermal/thermal_zone3/cdev0_trip_point:1 /sys/class/thermal/thermal_zone3/cdev1_trip_point:1 /sys/class/thermal/thermal_zone3/mode:enabled /sys/class/thermal/thermal_zone3/policy:step_wise /sys/class/thermal/thermal_zone3/temp:37000 /sys/class/thermal/thermal_zone3/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone3/trip_point_0_type:critical /sys/class/thermal/thermal_zone3/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone3/trip_point_1_type:passive /sys/class/thermal/thermal_zone3/type:acpitz /sys/class/thermal/thermal_zone4/cdev0_trip_point:1 /sys/class/thermal/thermal_zone4/cdev1_trip_point:1 /sys/class/thermal/thermal_zone4/mode:enabled /sys/class/thermal/thermal_zone4/policy:step_wise /sys/class/thermal/thermal_zone4/temp:38100 /sys/class/thermal/thermal_zone4/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone4/trip_point_0_type:critical /sys/class/thermal/thermal_zone4/trip_point_1_temp:60000 /sys/class/thermal/thermal_zone4/trip_point_1_type:passive /sys/class/thermal/thermal_zone4/type:acpitz /sys/class/thermal/thermal_zone5/mode:enabled /sys/class/thermal/thermal_zone5/passive:0 /sys/class/thermal/thermal_zone5/policy:step_wise /sys/class/thermal/thermal_zone5/temp:60000 /sys/class/thermal/thermal_zone5/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone5/trip_point_0_type:critical /sys/class/thermal/thermal_zone5/type:acpitz Expected result: fan should spin down to [STEP_0]. Let me know if you need anything else. I was to fast. I am sorry. The bug occurred again. grep . /sys/class/thermal/*/* at that time: /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device10/cur_state:0 /sys/class/thermal/cooling_device10/max_state:1 /sys/class/thermal/cooling_device10/type:Fan /sys/class/thermal/cooling_device11/cur_state:1 /sys/class/thermal/cooling_device11/max_state:10 /sys/class/thermal/cooling_device11/type:LCD /sys/class/thermal/cooling_device12/cur_state:0 /sys/class/thermal/cooling_device12/max_state:10 /sys/class/thermal/cooling_device12/type:Processor /sys/class/thermal/cooling_device13/cur_state:0 /sys/class/thermal/cooling_device13/max_state:10 /sys/class/thermal/cooling_device13/type:Processor /sys/class/thermal/cooling_device1/cur_state:0 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:0 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:0 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state:1 /sys/class/thermal/cooling_device4/max_state:1 /sys/class/thermal/cooling_device4/type:Fan /sys/class/thermal/cooling_device5/cur_state:0 /sys/class/thermal/cooling_device5/max_state:1 /sys/class/thermal/cooling_device5/type:Fan /sys/class/thermal/cooling_device6/cur_state:0 /sys/class/thermal/cooling_device6/max_state:1 /sys/class/thermal/cooling_device6/type:Fan /sys/class/thermal/cooling_device7/cur_state:0 /sys/class/thermal/cooling_device7/max_state:1 /sys/class/thermal/cooling_device7/type:Fan /sys/class/thermal/cooling_device8/cur_state:0 /sys/class/thermal/cooling_device8/max_state:1 /sys/class/thermal/cooling_device8/type:Fan /sys/class/thermal/cooling_device9/cur_state:0 /sys/class/thermal/cooling_device9/max_state:1 /sys/class/thermal/cooling_device9/type:Fan /sys/class/thermal/thermal_zone0/cdev0_trip_point:5 /sys/class/thermal/thermal_zone0/cdev1_trip_point:4 /sys/class/thermal/thermal_zone0/cdev2_trip_point:3 /sys/class/thermal/thermal_zone0/cdev3_trip_point:2 /sys/class/thermal/thermal_zone0/cdev4_trip_point:1 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/passive:0 /sys/class/thermal/thermal_zone0/policy:step_wise /sys/class/thermal/thermal_zone0/temp:48000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:256000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:91000 /sys/class/thermal/thermal_zone0/trip_point_1_type:active /sys/class/thermal/thermal_zone0/trip_point_2_temp:85000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:79000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:68000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:58000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/cdev0_trip_point:1 /sys/class/thermal/thermal_zone1/cdev1_trip_point:1 /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/policy:step_wise /sys/class/thermal/thermal_zone1/temp:49000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/trip_point_1_temp:97000 /sys/class/thermal/thermal_zone1/trip_point_1_type:passive /sys/class/thermal/thermal_zone1/type:acpitz /sys/class/thermal/thermal_zone2/cdev0_trip_point:1 /sys/class/thermal/thermal_zone2/cdev1_trip_point:6 /sys/class/thermal/thermal_zone2/cdev2_trip_point:5 /sys/class/thermal/thermal_zone2/cdev3_trip_point:4 /sys/class/thermal/thermal_zone2/cdev4_trip_point:3 /sys/class/thermal/thermal_zone2/cdev5_trip_point:2 /sys/class/thermal/thermal_zone2/mode:enabled /sys/class/thermal/thermal_zone2/passive:0 /sys/class/thermal/thermal_zone2/policy:step_wise /sys/class/thermal/thermal_zone2/temp:48000 /sys/class/thermal/thermal_zone2/trip_point_0_temp:126000 /sys/class/thermal/thermal_zone2/trip_point_0_type:critical /sys/class/thermal/thermal_zone2/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone2/trip_point_1_type:active /sys/class/thermal/thermal_zone2/trip_point_2_temp:86000 /sys/class/thermal/thermal_zone2/trip_point_2_type:active /sys/class/thermal/thermal_zone2/trip_point_3_temp:74000 /sys/class/thermal/thermal_zone2/trip_point_3_type:active /sys/class/thermal/thermal_zone2/trip_point_4_temp:67000 /sys/class/thermal/thermal_zone2/trip_point_4_type:active /sys/class/thermal/thermal_zone2/trip_point_5_temp:60000 /sys/class/thermal/thermal_zone2/trip_point_5_type:active /sys/class/thermal/thermal_zone2/trip_point_6_temp:42000 /sys/class/thermal/thermal_zone2/trip_point_6_type:active /sys/class/thermal/thermal_zone2/type:acpitz /sys/class/thermal/thermal_zone3/cdev0_trip_point:1 /sys/class/thermal/thermal_zone3/cdev1_trip_point:1 /sys/class/thermal/thermal_zone3/mode:enabled /sys/class/thermal/thermal_zone3/policy:step_wise /sys/class/thermal/thermal_zone3/temp:47000 /sys/class/thermal/thermal_zone3/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone3/trip_point_0_type:critical /sys/class/thermal/thermal_zone3/trip_point_1_temp:95000 /sys/class/thermal/thermal_zone3/trip_point_1_type:passive /sys/class/thermal/thermal_zone3/type:acpitz /sys/class/thermal/thermal_zone4/cdev0_trip_point:1 /sys/class/thermal/thermal_zone4/cdev1_trip_point:1 /sys/class/thermal/thermal_zone4/mode:enabled /sys/class/thermal/thermal_zone4/policy:step_wise /sys/class/thermal/thermal_zone4/temp:39800 /sys/class/thermal/thermal_zone4/trip_point_0_temp:102000 /sys/class/thermal/thermal_zone4/trip_point_0_type:critical /sys/class/thermal/thermal_zone4/trip_point_1_temp:60000 /sys/class/thermal/thermal_zone4/trip_point_1_type:passive /sys/class/thermal/thermal_zone4/type:acpitz /sys/class/thermal/thermal_zone5/mode:enabled /sys/class/thermal/thermal_zone5/passive:0 /sys/class/thermal/thermal_zone5/policy:step_wise /sys/class/thermal/thermal_zone5/temp:25000 /sys/class/thermal/thermal_zone5/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone5/trip_point_0_type:critical /sys/class/thermal/thermal_zone5/type:acpitz linux-3.9-rc3 is also affected. In linux-3.8.6 the bug is also present. I tested linux-3.9-rc6 and the bug shows up. Furthermore with the step_wise governor the fan does not dethrottle. I decided to test the fair_share governor. This governor stays at the fan speed selected by the bios after POST. I put heavy load on the machine and the temperatures rose quickly. The fan stayed at the selected speed. Not to damage my hardware I manually activated all cooling devices before the fan would normally speed up to full speed. Then I saw something rather odd. Thermal_Zone5 showed a 100°C while the fan was spinning full speed and the machine was idle. All other thermal zones reported temperatures far below the normal temperatures for the idle machine (must be since machine is idle and fan is spinning full speed). Then I deactivated all cooling devices (echo 0 > ...) and the reported temperature of thermal_zone5 dropped to 20°C instantly. So I played around a little bit. Turns out that when I activate all cooling devices (echo 1 > ...) the temperature of this cooling zone jump to 100°C and when I deactivate all cooling devices (echo 0 > ...) the reported temperature drops back to room temperature. In my understanding this should not happen. Can it be that this confuses the kernel and causes my fan problem? Oh, I forgot to mention that I switched back to the step_wise governor and did the testing (echo 1 > ... and echo 0 > ...). Sorry! Okay. Now we have two bugs in this bug reports. 1. the original bug report that the fan runs at full speed but ACPI fan shows it is OFF. And this bug can only be reproduced with the nvidia binary blob enabled. So there are two things that are controlling the fan, ACPI and nvidia. For this problem, we will not continue to debug anymore because we can not help on problems that may caused by a binary driver. 2. dethrottle issue in 3.8 and 3.9-rc6. For this problem, please file a new bug report against Power Management/Thermal category, and I'll look at the problem there. I'm a little confused by comments #27. So please specify what test you ran with which governors, and what result you got in that bug report. Bug closed. To 1.: I am running a non tainted kernel now. So there is no nvidia binary blob in this game anymore. I switched to nouveau to help debug this thing. See comment #18. And it is the same problem. The fan runs full speed but ACPI fan shows it is OFF. As for comments #27 I am sorry I was not clear. I am running linux-3.9-rc6 with step_wise governor. I did the following to get the fan running at full speed: echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:00/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:01/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:02/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:03/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:04/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:05/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:06/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:07/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:08/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:09/thermal_cooling/cur_state echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:0a/thermal_cooling/cur_state As a result of this the shown temperature of thermal_zone5 jumps up to 100°C. This can't be since there is no component giving of so much heat. When I do the following to dethrottle the fan: echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:00/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:01/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:02/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:03/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:04/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:05/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:06/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:07/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:08/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:09/thermal_cooling/cur_state echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:0a/thermal_cooling/cur_state the reported temperature of thermal_zone5 goes down to room temperature. I am a little confused by this behavior and I don't know if it is related to the problem that the fan is spinning full speed and ACPI is showing it as OFF. I am only reporting what I am seeing in good hope to give any hints to solve the ACPI problem. I tested linux-3.9-rc6 with fair_share governor. With this configuration the fan stays always at the speed selected by the bios after POST. Not to damage my hardware I activated the fan manually and this is when I saw the strange behavior of thermal_zone5. The problem with the dethrottling is only present in linux-3.9. linux-3.8 is not affected by this. I file another bug report for the dethrottling problem. But how do we move on from now with the ACPI fan problem? (In reply to comment #30) > To 1.: I am running a non tainted kernel now. So there is no nvidia binary > blob > in this game anymore. I switched to nouveau to help debug this thing. See > comment #18. And it is the same problem. The fan runs full speed but ACPI fan > shows it is OFF. > bug reopened please attach the output of grep . /sys/class/thermal/*/device/path ll /sys/class/thermal/t*/c* please spin on the fan one by one and check which cooling device makes the temperature bogus. oh wait, we are now starts to debug the bogus temperature problem in this bug report... let's move this topic in bug #56601. for the problem that fan is running when system is idle, the temperature is not bogus at all, right? as you can not get accurate thermal information in the latest linux kernel, let's work together to fix bug #56601 first and then go back to this one, does this sound okay for you? (In reply to comment #33) > oh wait, > we are now starts to debug the bogus temperature problem in this bug > report... > let's move this topic in bug #56601. Sounds good to me. > for the problem that fan is running when system is idle, the temperature is > not > bogus at all, right? Yes it is. > as you can not get accurate thermal information in the latest linux kernel, > let's work together to fix bug #56601 first and then go back to this one, > does > this sound okay for you? Lets do that. I am happy with any help I can get. as you stated before, there are four step of fan speed you can hear, right? please try to reproduce the problem with the fan dethrottle problem fixed and tell me which fan speed you're hearing when the fan cooling device shows cooling state 0. There are five different fan speeds I can hear. When the fan cooling device shows cooling state 0 there are two different speeds I can hear. Fan is spinning at lowest speed or at highest speed. Just now it happened with linux-3.9-rc7 and the patch from bug 56601. Created attachment 99201 [details] Measurements for linux-3.9-rc7 at full speed of fan for comment 37 please attach the output of grep . /sys/class/thermal/thermal_zone*/cdev*/device/* and grep . /sys/class/thermal/thermal_zone*/cdev*/* at the same time when the problem is reproduced again. I need to check if it is because ACPI reports wrong cooling device state first. If it is not, this seems to be a tough bug because some other unknown stuff changes the fan speed behind ACPI. I checked your BIOS in detail, here is what I get, Fan POWER PR._STA PR._PN/_OFF C33F C334 C327 C328(on, 0x00, 0x00) C340 C335 C327 C328(on, 0x00, 0x01) C341 C336 .. C328(on, 0x00, 0x02) C342 C337 .. C328(on, 0x00, 0x03) C343 C338 .. C328(on, 0x00, 0x04) C344 C339 .. C328(on, 0x01, 0x00) C345 C33A .. C328(on, 0x01, 0x01) C346 C33B .. C328(on, 0x01, 0x02) C347 C33C .. C328(on, 0x01, 0x03) C348 C33D .. C328(on, 0x01, 0x04) C349 C33E C326 will list below it seems that fan C349 is quite different from the others. here is the ASL code for Power resource C33E PowerResource (C33E, 0x00, 0x0000) { Method (_STA, 0, NotSerialized) // _STA: Status { Return (C326) } Method (_ON, 0, NotSerialized) // _ON_: Power On { If (LAnd (LEqual (C326, 0x00), LEqual (C142, 0x00))) { If (LGreaterEqual (C325, \_TZ.TZ2._AC0 ())) { \_SB.C149 (0xEA74, 0x03, 0x01, 0x00, 0x00) Store (0x01, C142) If (LEqual (\_SB.C003.C085.C130.C134 (), 0x10DE)) { If (LGreaterEqual (\C009 (), 0x06)) { Store (0x01, \_SB.C003.C085.C130.C139) Notify (\_SB.C003.C085.C130, 0xCA) } } } } Store (0x01, C326) } Method (_OFF, 0, NotSerialized) // _OFF: Power Off { If (LAnd (C326, C142)) { If (LLess (C325, \_TZ.TZ2._AC0 ())) { \_SB.C149 (0xEA74, 0x03, 0x00, 0x00, 0x00) Store (0x00, C142) If (LEqual (\_SB.C003.C085.C130.C134 (), 0x10DE)) { If (LGreaterEqual (\C009 (), 0x06)) { Store (0x01, \_SB.C003.C085.C130.C139) Notify (\_SB.C003.C085.C130, 0xCA) } } } } Store (0x00, C326) } } we can see that 1. the status of this power resource is a variable, it must be ON after evaluating _ON method and must be OFF after evaluating _OFF method. 2. there is a Notify (\_SB.C003.C085.C130, 0xCA) in _ON/_OFF methods, and C130 is nvidia vga controller. so this does have something to do with graphics. please boot with acpi_osi="!Windows 2006" and see if the problem still exist. I'll test that. (In reply to comment #40) > I need to check if it is because ACPI reports wrong cooling device state > first. > If it is not, this seems to be a tough bug because some other unknown stuff > changes the fan speed behind ACPI. Normally the fastest fan speed is not reached on this machine not even in hot summers and on full load. It seems as if it is an emergency system for not damaging the hardware if it gets really really hot. Is there a piece of the kernel which could trigger this "emergency" system? matthias, please do the test with the patch in https://bugzilla.kernel.org/show_bug.cgi?id=56591#c29 which is the final patch to fix the dethrottle problem in bug 56601. Your final patch to fix the dethrottle problem works for me with linux-3.9-rc8. Thank you! As for the other problem, I will get back to you on that when it shows. (In reply to comment #45) > Your final patch to fix the dethrottle problem works for me with > linux-3.9-rc8. > Thank you! > great. thanks for the testing. rename the title of this bug report, and let's focus on why the fan is on while ACPI shows it is off. (In reply to comment #43) > I'll test that. > > (In reply to comment #40) > > I need to check if it is because ACPI reports wrong cooling device state > first. > > If it is not, this seems to be a tough bug because some other unknown stuff > > changes the fan speed behind ACPI. > > Normally the fastest fan speed is not reached on this machine not even in hot > summers and on full load. about the "fastest fan speed", I assume that you mean STP4 that you can hear, right? that's probably because the temperature never goes up to 91C/95C. > It seems as if it is an emergency system for not > damaging the hardware if it gets really really hot. Is there a piece of the > kernel which could trigger this "emergency" system? No, if the temperature goes above 91C/95C, cooling device 0 and cooling device 5 should be turned on automatically. you can try to heat the system over 91C/95C and check if the fan is still not running in fastest speed. But be careful to do such test... :p The fastest fan speed is indeed STP4. I tested a little bit and it turns out STP4 is reached when reported temperatures hit exactly 80°C (CORETEMP reports 81°C). Normally after reaching 79°C fan speed STP3 suffice to hold the temperature steady. I don't want to heat up the machine more. It is my main workhorse and I don't want to damage the hardware. As the Intel spec state the CPU should not reach 100°C. So I prefer this 20°C "safe zone" over 91/95°C. You said that the temperature must hit 91/95°C to hit fan speed STP4. But now what turns on the fan full speed at 80°C? Seems like we are going somewhere... Created attachment 100411 [details] Findings for comment 39 Problem exists with acpi_osi="!Windows 2006" parameter, too. please attach the output of grep . /sys/class/thermal/cooling_device*/device/path when the problem happens again, please try "echo 1 > /sys/class/thermal/thermal_zone2/cdev0/cur_state" and then "echo 0 > /sys/class/thermal/thermal_zone2/cdev0/cur_state" can you still hear the fan spinning? grep . /sys/class/thermal/cooling_device*/device/path /sys/class/thermal/cooling_device0/device/path:\_TZ_.C33F /sys/class/thermal/cooling_device10/device/path:\_TZ_.C349 /sys/class/thermal/cooling_device11/device/path:\_SB_.C003.C085.C130.C14C /sys/class/thermal/cooling_device12/device/path:\_PR_.CPU0 /sys/class/thermal/cooling_device13/device/path:\_PR_.CPU1 /sys/class/thermal/cooling_device1/device/path:\_TZ_.C340 /sys/class/thermal/cooling_device2/device/path:\_TZ_.C341 /sys/class/thermal/cooling_device3/device/path:\_TZ_.C342 /sys/class/thermal/cooling_device4/device/path:\_TZ_.C343 /sys/class/thermal/cooling_device5/device/path:\_TZ_.C344 /sys/class/thermal/cooling_device6/device/path:\_TZ_.C345 /sys/class/thermal/cooling_device7/device/path:\_TZ_.C346 /sys/class/thermal/cooling_device8/device/path:\_TZ_.C347 /sys/class/thermal/cooling_device9/device/path:\_TZ_.C348 The rest I will provide as soon as the problem shows again. (In reply to comment #51) > when the problem happens again, please try > "echo 1 > /sys/class/thermal/thermal_zone2/cdev0/cur_state" > and then > "echo 0 > /sys/class/thermal/thermal_zone2/cdev0/cur_state" > can you still hear the fan spinning? or you can try "echo 0 > /sys/class/thermal/cooling_device10/cur_state" and "echo 1 > /sys/class/thermal/cooling_device10/cur_state" which I think is the same thing. I did some measurements and found the following out using the bogus thermal_zone which reports the fan speed in %. Activating the cooling_devices results in the following speeds: cooling_device0 -> 20% cooling_device1 -> 70% cooling_device2 -> 60% cooling_device3 -> 40% cooling_device4 -> 25% cooling_device5 -> 100% cooling_device6 -> 70% cooling_device7 -> 60% cooling_device8 -> 40% cooling_device9 -> 25% cooling_device10 -> 20% cooling_device11 -> 20% cooling_device12 -> 20% When the bug shows the bogus thermal_zone shows 20% fan speed but the fan spins full speed. OK. I tried these commands when the bug showed: echo 1 > /sys/class/thermal/thermal_zone2/cdev0/cur_state Result: No change in fan speed echo 0 > /sys/class/thermal/thermal_zone2/cdev0/cur_state Result: No change in fan speed echo 0 > /sys/class/thermal/cooling_device10/cur_state Result: No change in fan speed echo 1 > /sys/class/thermal/cooling_device10/cur_state Result: No change in fan speed echo 1 > /sys/class/thermal/cooling_device1/cur_state Result: Fan speed slows down to 20% and then speeds up to 70% echo 0 > /sys/class/thermal/cooling_device1/cur_state Result: Fan speed goes down to 20% After roughly five minutes of silence the bug showed again. So it tested further commands: echo 1 > /sys/class/thermal/cooling_device2/cur_state Result: Fan speed goes down to 20% and then up to 60% echo 0 > /sys/class/thermal/cooling_device2/cur_state Result: Fan speed goes down to 20% Well the bug showed again: echo 1 > /sys/class/thermal/cooling_device3/cur_state Result: Fan speed goes down to 20% and then up to 40% echo 0 > /sys/class/thermal/cooling_device3/cur_state Result: Fan speed goes down to 20% Same game again: echo 1 > /sys/class/thermal/cooling_device4/cur_state Result: Fan speed goes down to 20% and then up to 25% echo 0 > /sys/class/thermal/cooling_device4/cur_state Result: Fan speed goes down to 20% And again: echo 1 > /sys/class/thermal/cooling_device5/cur_state Result: Fan speed slows down to 20% and then speeds up to 100% echo 0 > /sys/class/thermal/cooling_device5/cur_state Result: Fan speed goes down to 20% And again: echo 1 > /sys/class/thermal/cooling_device6/cur_state Result: Fan speed slows down to 20% and then speeds up to 70% echo 0 > /sys/class/thermal/cooling_device6/cur_state Result: Fan speed goes down to 20% And again: echo 1 > /sys/class/thermal/cooling_device7/cur_state Result: Fan speed slows down to 20% and then speeds up to 60% echo 0 > /sys/class/thermal/cooling_device7/cur_state Result: Fan speed goes down to 20% And again: echo 1 > /sys/class/thermal/cooling_device8/cur_state Result: Fan speed slows down to 20% and then speeds up to 40% echo 0 > /sys/class/thermal/cooling_device8/cur_state Result: Fan speed goes down to 20% And again: echo 1 > /sys/class/thermal/cooling_device9/cur_state Result: Fan speed slows down to 20% echo 0 > /sys/class/thermal/cooling_device9/cur_state Result: no change And again: echo 1 > /sys/class/thermal/cooling_device11/cur_state Result: no change echo 0 > /sys/class/thermal/cooling_device11/cur_state Result: no change And again: echo 1 > /sys/class/thermal/cooling_device12/cur_state Result: no change echo 0 > /sys/class/thermal/cooling_device12/cur_state Result: no change It seems that there are six different fan speeds. I am sorry for reporting that wrong but I did not hear the differences. I hope you can work with these measurements. Once the bug shows up it repeats itself quite often when you play with the settings of the cooling_devices. (In reply to comment #54) > And again: > > echo 1 > /sys/class/thermal/cooling_device9/cur_state > Result: Fan speed slows down to 20% Made a little error: this should be Result: Fan speed slows down to 20% and then up to 25%. I verified that just now. Just to make sure, the problem can only be reproduced when the nouveau driver is loaded, right? I will test if the problem shows when no graphic driver is loaded. So far I have only tested with a graphical environment. as far as I can see from the acpidump, no other OS code will change the ACPI fan state, so IMO, it is BIOS that changes the fan speed. so it would be nice if you can check the BIOS option to see if there is any Fan related options. OK, there is a option to turn the fan of when connected to the ac adapter. This activates the same fan configuration as if the laptop is running from battery power. With this the bug shows also. Otherwise there is no fan related setting in BIOS. If the BIOS changes the fan speed, why does this behavior not occur when running Windows (XP and 7 Pro tested) and with <=linux-2.6.31? Does it suffice to rmmod the nouveau module when the bug shows and see what happens? I let the laptop run for a while without the nouveau (and any other graphics driver) but so far the bug did not show. But this does not indicate that it does not happen. I need more testing time for this. (In reply to comment #59) > OK, there is a option to turn the fan of when connected to the ac adapter. > This > activates the same fan configuration as if the laptop is running from battery > power. With this the bug shows also. Otherwise there is no fan related > setting > in BIOS. > bad news. > If the BIOS changes the fan speed, why does this behavior not occur when > running Windows (XP and 7 Pro tested) and with <=linux-2.6.31? > good question. hmm, you can still use a 2.6.31 kernel that with this problem, right? > Does it suffice to rmmod the nouveau module when the bug shows and see what > happens? I let the laptop run for a while without the nouveau (and any other > graphics driver) but so far the bug did not show. But this does not indicate > that it does not happen. I need more testing time for this. okay. If the problem can not be reproduced without nouveau driver, this suggests that the graphics driver changes the fan speed without ACPI's awareness, and it also explains why this is a regression, some code/new functionality introduced in nouveau driver touches the fan speed. I'll reassign to graphics people to see if they can find something interesting. Because from ACPI's perspective of view, we can really do nothing here. At the moment I am traveling and I don't have the machine with me. I can try to run the old 2.6.31 kernel. This kernel was the last one without the problem but nouveau was not useable on this kernel. So back in time I used the nvidia binary blob. I tested the same graphic drivers on 2.6.31 and 2.6.32. 2.6.32 had the problem regardless of the nvidia graphic driver. At the moment I can't tell you if the problem only occurs when a graphic driver is loaded. I have to test it further because sometimes the bug really shows. The longest time between the bug showing was one and a half week. That time I thought the bug was solved but it wasn't. So I will leave the machine running for a long time without the nouveau driver to see if it shows or not when I am back home. Thanks for helping! (In reply to comment #61) > At the moment I am traveling and I don't have the machine with me. I can try > to > run the old 2.6.31 kernel. This kernel was the last one without the problem > but > nouveau was not useable on this kernel. So back in time I used the nvidia > binary blob. I tested the same graphic drivers on 2.6.31 and 2.6.32. 2.6.32 > had > the problem regardless of the nvidia graphic driver. Right, the kernel nvidia driver is shipped in 2.6.33. so is it possible that you run 2.6.31/32 kernel without the nvidia binary blob, say in text mode? > > At the moment I can't tell you if the problem only occurs when a graphic > driver > is loaded. I have to test it further because sometimes the bug really shows. > The longest time between the bug showing was one and a half week. That time I > thought the bug was solved but it wasn't. So I will leave the machine running > for a long time without the nouveau driver to see if it shows or not when I > am > back home. great. please check if the problem can be reproduced without Nvidia driver, this is important as I can do nothing if the fan state is changed beyong ACPI scope, in that case, we need the help from graphics experts. Matthias, any update? Hello everyone, I have an HP nw8240 and I believe that I'm experiencing the same bug with 3.9.4, the difference being that on my machine the range of available speeds is 5 from 0 rpm (quiet) to what I believe it to be a 4th speed (although I've never reached it). First of all, here are a few outputs to understand the configuration: grep . /sys/class/thermal/*/device/path /sys/class/thermal/cooling_device0/device/path:\_TZ_.C255 /sys/class/thermal/cooling_device1/device/path:\_TZ_.C256 /sys/class/thermal/cooling_device2/device/path:\_TZ_.C257 /sys/class/thermal/cooling_device3/device/path:\_TZ_.C258 /sys/class/thermal/cooling_device4/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone0/device/path:\_TZ_.TZ1_ /sys/class/thermal/thermal_zone1/device/path:\_TZ_.TZ2_ /sys/class/thermal/thermal_zone2/device/path:\_TZ_.TZ3_ /sys/class/thermal/thermal_zone3/device/path:\_TZ_.TZ4_ grep . /sys/class/thermal/thermal_zone*/cdev*/device/* /sys/class/thermal/thermal_zone0/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone0/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone0/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone0/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone0/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: /sys/class/thermal/thermal_zone0/cdev1/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev1/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev1/device/path:\_TZ_.C258 /sys/class/thermal/thermal_zone0/cdev1/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev1/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev1/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev1/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev1/device/uid:3 /sys/class/thermal/thermal_zone0/cdev2/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev2/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev2/device/path:\_TZ_.C257 /sys/class/thermal/thermal_zone0/cdev2/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev2/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev2/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev2/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev2/device/uid:2 /sys/class/thermal/thermal_zone0/cdev3/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev3/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev3/device/path:\_TZ_.C256 /sys/class/thermal/thermal_zone0/cdev3/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev3/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev3/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev3/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev3/device/uid:1 /sys/class/thermal/thermal_zone0/cdev4/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev4/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev4/device/path:\_TZ_.C255 /sys/class/thermal/thermal_zone0/cdev4/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev4/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev4/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev4/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev4/device/uid:0 /sys/class/thermal/thermal_zone2/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone2/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone2/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone2/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: /sys/class/thermal/thermal_zone3/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone3/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone3/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone3/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: grep . /sys/class/thermal/thermal_zone*/* /sys/class/thermal/thermal_zone0/cdev0_trip_point:1 /sys/class/thermal/thermal_zone0/cdev1_trip_point:5 /sys/class/thermal/thermal_zone0/cdev2_trip_point:4 /sys/class/thermal/thermal_zone0/cdev3_trip_point:3 /sys/class/thermal/thermal_zone0/cdev4_trip_point:2 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/policy:step_wise /sys/class/thermal/thermal_zone0/temp:45000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:100000 /sys/class/thermal/thermal_zone0/trip_point_1_type:passive /sys/class/thermal/thermal_zone0/trip_point_2_temp:85000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:70000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:60000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:50000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/passive:0 /sys/class/thermal/thermal_zone1/policy:step_wise /sys/class/thermal/thermal_zone1/temp:50000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/type:acpitz /sys/class/thermal/thermal_zone2/cdev0_trip_point:1 /sys/class/thermal/thermal_zone2/mode:enabled /sys/class/thermal/thermal_zone2/policy:step_wise /sys/class/thermal/thermal_zone2/temp:38200 /sys/class/thermal/thermal_zone2/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone2/trip_point_0_type:critical /sys/class/thermal/thermal_zone2/trip_point_1_temp:60000 /sys/class/thermal/thermal_zone2/trip_point_1_type:passive /sys/class/thermal/thermal_zone2/type:acpitz /sys/class/thermal/thermal_zone3/cdev0_trip_point:1 /sys/class/thermal/thermal_zone3/mode:enabled /sys/class/thermal/thermal_zone3/policy:step_wise /sys/class/thermal/thermal_zone3/temp:0 /sys/class/thermal/thermal_zone3/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone3/trip_point_0_type:critical /sys/class/thermal/thermal_zone3/trip_point_1_temp:110000 /sys/class/thermal/thermal_zone3/trip_point_1_type:passive /sys/class/thermal/thermal_zone3/type:acpitz This is what it is supposed to happen (and I'm simplifying it, but I believe it is enough to isolate where the bugs lies): - when the TZ0 temp > 50, CD3 fan is turned on using the lowest available speed and CD3 cur_state is changed from 0 to 1, and the fan is running until TZ0 temp goes < 45, when the CD3 fan is turned off and CD3 cur_state is changed from 1 to 0. All the other fan speeds are supposed to kick in once the corresponding trip points are reached, but since I'm using an 'undervolted' cpu I'm rarely reaching the 2nd speed, much less any higher speeds, even with a relatively high cpu load. @Matthias you might want to look into the linux-phc option as well. So this is how it looks like when the fan is spinning, after TZ0 went above 50 and until it manages to go under 45. grep . /sys/class/thermal/thermal_zone*/temp /sys/devices/virtual/thermal/cooling_device*/cur_state /sys/class/thermal/thermal_zone0/temp:46000 /sys/class/thermal/thermal_zone1/temp:51000 /sys/class/thermal/thermal_zone2/temp:38100 /sys/class/thermal/thermal_zone3/temp:40000 /sys/devices/virtual/thermal/cooling_device0/cur_state:0 /sys/devices/virtual/thermal/cooling_device1/cur_state:0 /sys/devices/virtual/thermal/cooling_device2/cur_state:0 /sys/devices/virtual/thermal/cooling_device3/cur_state:1 /sys/devices/virtual/thermal/cooling_device4/cur_state:0 And this is how it looks like when the fan is off, after TZ0 went below 45 and before it will go above 50 again. grep . /sys/class/thermal/thermal_zone*/temp /sys/devices/virtual/thermal/cooling_device*/cur_state /sys/class/thermal/thermal_zone0/temp:47000 /sys/class/thermal/thermal_zone1/temp:51000 /sys/class/thermal/thermal_zone2/temp:38100 /sys/class/thermal/thermal_zone3/temp:0 /sys/devices/virtual/thermal/cooling_device0/cur_state:0 /sys/devices/virtual/thermal/cooling_device1/cur_state:0 /sys/devices/virtual/thermal/cooling_device2/cur_state:0 /sys/devices/virtual/thermal/cooling_device3/cur_state:0 /sys/devices/virtual/thermal/cooling_device4/cur_state:0 Now when the bug occurs even if TZ0 went below 45, CD3 fan continues to spin (at the same lowest speed) even if the cur_state of CD3 has already been changed from 1 to 0, and here's how the output looks like: grep . /sys/class/thermal/thermal_zone*/temp /sys/devices/virtual/thermal/cooling_device*/cur_state /sys/class/thermal/thermal_zone0/temp:39000 /sys/class/thermal/thermal_zone1/temp:51000 /sys/class/thermal/thermal_zone2/temp:38100 /sys/class/thermal/thermal_zone3/temp:40000 /sys/devices/virtual/thermal/cooling_device0/cur_state:0 /sys/devices/virtual/thermal/cooling_device1/cur_state:0 /sys/devices/virtual/thermal/cooling_device2/cur_state:0 /sys/devices/virtual/thermal/cooling_device3/cur_state:0 /sys/devices/virtual/thermal/cooling_device4/cur_state:0 I have not found what is causing the bug to kick in, but I don't think it has anything to do with the nouveau video driver, because in my case I'm using the radeon open source driver and even with a lowered gpu clock through sysfs to 1/3 of the maximum speed (TZ1= gpu temp between 48-51, well below 58-61 when the gpu is running at full speed) the bug continues to randomly show up, sometimes multiple times during a single day, other times once every few days etc. What I'm doing is keeping an eye on TZs and CDs with watch -n1 grep . /sys/class/thermal/thermal_zone*/temp /sys/devices/virtual/thermal/cooling_device*/cur_state and when the bug shows I have to do a echo 1 > /sys/class/thermal/cooling_device3/cur_state echo 0 > /sys/class/thermal/cooling_device3/cur_state to turn off the fan. Also another thing that I observed is that TZ3 temp has only 2 states either 40 when the fan is on or 0 when the fan is off, but while from 0 to 40 it goes in one step once the fan has been turned on, from 40 it takes a few successive steps 40->36->32->27->16->0 in a matter of 3-4 seconds to reach 0 when the fan has been turned off. The system has only one physical fan. Btw, I also used to run 2.6.31 before I switched to 3.9.4 with uptimes sometimes in excess of 6 months and daily hibernates/resumes without any acpi glitches. If you need any additional information please let me know as I would also like to see this bug fixed. Hello Zhang, sorry for my late response. I was traveling. I think we need a graphics expert on this. I had the machine running for a long while without the nouveau driver and the fan bug has not occurred ever since. I even tried to recreate some of my normal day workloads. As for the 2.6.31/32 kernels: I haven't tried to run them yet. I suspect it to be difficult but I will give it a try. If my system won't run with these kernels, can I boot with an old live CD? Would this help? Greetings Matthias Well, here are my thoughts on the different graphic drivers: I would suspect the nouveau and the nvidia binary blob to behave in somewhat same ways. Thanks to Al we know that the problem can also be reproduced with the radeon driver enabled. @Al: Does your machine show the bug with the radeon driver not loaded? It would be great if you can confirm my findings. I would not like to send somebody on a wild goose chase. Since both these machines share one cooling system between CPU and GPU it would make sense to change the fan speed from the graphics driver to prevent the GPU from any heat damage. But I never saw any GPU temperatures justifying to turn the fan on full speed. I have sad news. Today the bug occured without a graphic driver loaded. Now we know that it happens less often without a graphic driver. Good thing I left the machine running, I guess. Sorry! That's actually good news, if the source of the bug seems to be confined to the acpi subsystem figuring out what it is and eventually fixing it should happen sooner. There are a few other thoughts that I would like to add. The main problem I see with this bug is that afaict, none of us knows how or what's actually setting it off, so “reproducing it” really means just having the system up for long enough for it to happen. For example I'm running the same instance for maybe 6 days since the last hibernate/resume cycle and the 'bug' only happened once in the 1st day, maybe 4 or 5 times on the 2nd day, but not once for the next 3 days and again today when it happened once, while the system has gone through the same daily usage routine +/- other tasks. I've also played a little bit with the CDs just too see how the system behaves when an external source is interfering with the usual cycles. More precisely while the fan was still on after TZ0 had hit the > 50 trip point and the temperature was slowly decreasing toward < 45 I echoed a '0' to the CD3 when the temperature was around 46, so that turned off the fan manually (also interrupting the usual cycle that would have turned it off once the TZ0 had gone < 45). What happened after is that once the TZ0 temperature started to rise again, and it had reached the 1st trip point at > 50 the CD3 fan was not turned on, so the TZ0 temperature continued to rise (also without any supplemental load TZ1 started to slowly rise, which is absolutely normal considering the fact that there is only one physical fan and the cpu and gpu are sharing the same heat sink/pipe) and only after it had reached the > 60 trip point the fan has been turned on using the 2nd speed, which in turn changed the both CD2 and CD3 cur_state from 0 to 1. Then the TZ0 (and TZ1 of course, but I don't think that this is interacting with any CDs as long as it stays <= 60) temperature started to decrease and once it reached 53 for TZ0 the fan changed to 1st (lower) speed, also changing the state of CD2 from 1 to 0 and still keeping CD3 to 1 until < 45, when the fan was turned off and CD3 was changed to 0 too. Is this the expected behaviour or is this another bug? (once you are interfering with the CDs' state to skip the automated action at the next trip point and reverse to the default at the 2nd). Also is it normal to have more than one CD marked as active while obviously a single physical fan can't spin at 2 different speeds in the same time? I know one thing, manually activating/deactivating (echo 1/0 > …) one CD at a time is not changing the state of the others. So for example if I'm issuing an echo 0 to CD0 that is turning on the fan at its maximum speed, but none of the other CDs are turning from 0 to 1. Another observation: once the bug has shown off (TZ0 < 45, the fan spinning at its 1st speed, while CD3 cur_state shows that the fan is off – in fact, all CDs' cur_state are showing that the fan is off, but in my case CD3 is the one that takes care of turning it off and on at its 1st speed) if you want to turn it off, echo 0 is not doing anything, you need to do an echo 1 first and then an echo 0 to turn it off. I'm hoping that for somebody familiar with the code and logic behind the acpi kernel subsystem (especially TZs and CDs) this additional information will help to further refine the search for a solution. Does anybody know what has changes since 2.6.31 in this aspect? Another odd thing happened today and this time something has gone wrong in the CDs activity in a direction that could also pose a risk to the hardware. The system is now up for 7 days and today I found it with the TZ0 at 54 and CD3 on (1). With the cpu locked at the lowest clock speed I have never seen the TZ0 go over 51, because once CD3 is turned on at > 50 immediately pulls the temperature below 50. Since the ambient temperature was particularly lower this past night (maybe 3-4 degrees lower than usual for this time of the year), there wasn't any significant load on it (avg ~ 10% with isolated short peaks up to 20%) and the cpu was locked at the lowest speed for the entire time I was not actively using it, there could be only one explanation for this: the CD3 fan failed to be turned on after the > 50 trip point was met, and quite certainly it was only turned on at > 60, probably using CD2 (2nd fan speed) and at the time when I caught it, it was on its way down, still within the range where CD3 (1st speed) is supposed to be active, but after TZ0 had reached the < 53 trip point that had turned the CD2 off and switched to CD3. I just ran a few tests to see if this assumption could be plausible and indeed even with a constant 100% load at the lowest cpu speed once the CD3 is turned on TZ0 is instantly pulled below 50 after the brief moment when the TZ0 > 50 to turn on the CD3 and within 25 seconds TZ0 is < 45 again. In fact with a constant 100% load and a locked cpu clock at its 2nd fastest speed (there are 6 available cpu speeds) I can't push TZ0 above 53 once CD3 has been turned on. A constant 100% load at the maximum cpu speed while CD3 is on caps the maximum TZ0 temperature at 58, so really in the current conditions it is virtually impossible to automatically turn on the CD2, unless something has failed in the automated cycles that CDs are supposed to follow. The fact that TZ1 (gpu) was also around 52-53 seems to confirm this scenario since in the absence of any load on the gpu, assuming that everything works normal with the cooling devices TZ1 stays around 49-50 degrees. Zhang, can you tell us which information you need from the old kernels? linux-3.10-rc7 is also affected. Is anybody still looking into this? Do you need any additional information? @Matthias – until somebody comes up with a solution, a user space workaround is probably the only option right now. I'm at least using a cron script to deal with it. @Al: Well, I reboot or suspend2ram. This makes the problem go away temporarily. Problem is when the fan bug occurs and the CPU gets hot the fan speed drops until 80°C is reached. That is not a nice thing. I used to echo 1 && echo 0 to all the cooling devices. This drops fan speed to normal levels. But this helps only for a very short period of time. Then the fan spins up to full speed again. This puts unnecessary stress on the fan motor. It is good that you reported this problem also. Perhaps we can find somebody else with the same bug to report this. More persons "can cover more ground" on testing this. I can't run 2.6.31 as this kernel is to old for my up to date gentoo. I can't even compile this thing anymore. I have downloaded SystemRescueCD which contains this kernel to test it. As SystemRescueCD is based on gentoo, I think it is a good point to start debugging this thing. Bug shows in linux-3.10.0 too. I can't test linux-3.11 because of bug 60568. It takes ages for the bug to show without nouveau loaded. @Al can you test that kernel to see if it has the same bug? TBH, I think I'm stuck in this issue. I've run out of my ideas about why this happens. I think the reason why it is "easier" to reproduce the problem with graphics driver loaded is because a working GPU may heat the system more often. Oh, btw, as it is really hard to reproduce this bug, is it possible that this is not a regression? Say, the problem actually exists in old kernels like 2.6.31, but it is just very difficult to reproduce because there is no nouveau driver at the moment. Well, I have never experienced the bug when I was on 2.6.31. As this bug is independent from the graphics driver loaded, it happens with nouveau and the nvidia binary blob, I do not think it existed in 2.6.31. Al is experiencing the same bug with other graphics hardware. He is using a radeon. If the logic did not change much between 2.6.31 and 2.6.32 to drive the fan, I would perhaps suspect it to be a timing issue. Can you tell me if I can do a test to see if this is the case? It was definitely not present in 2.6.31. I used (and still use it from time to time) that for years and never had at least one occurrence of this bug showing up. Also with 3.9.4 I'm running my system for days or sometimes weeks with the gpu clock locked at minimum speed so the heat dissipation is the lowest possible (for this gpu the default speed is the highest one and I don't think it can be modified without the proper driver) and the effects of the bug are present and showing up just as often. I really don't think that there's any correlation between the video driver and this bug, which I tend to believe it's the result of a change either in the logic of the acpi code (apparently a lot of the proc related stuff was removed and/or moved to the sys between 2.6.31 and 3.9.4 – I don't know exactly when that happened because I jumped directly from 2.6.31 to 3.9.4, but again, a lot of the proc code doesn't exist anymore in 3.9.4, so is it possible that the logic of the code was not entirely preserved and some changes have been made between these 2 releases?) and/or a timing issue, like Matthias said. Maybe if somebody who's familiar with the code could take a look at the parts that change the status of the CDs and TZs and how these interact with each other, the order of changes, any potential race conditions, if there's any feedback from the TZ or the CD once a change has been performed and so on. Another thing is that this bug occurs both ways – failing to actually stop the fan, claiming that the fan is stopped while it is on AND also in the other way, claiming that the fan was activated while the fan is in fact off (in which case the next trip point is where the situation gets corrected), and here is a capture of the system after it failed to activate the fan at the first trip point (see TZ3 which instead of being 40 is 0 so the physical fan is actually off, while TZ0 shows a temp above the 1st trip point and CD3 which claims that the fan is on): grep . /sys/class/thermal/*/* /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device1/cur_state:0 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:0 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:1 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state:0 /sys/class/thermal/cooling_device4/max_state:10 /sys/class/thermal/cooling_device4/type:Processor /sys/class/thermal/thermal_zone0/cdev0_trip_point:1 /sys/class/thermal/thermal_zone0/cdev1_trip_point:5 /sys/class/thermal/thermal_zone0/cdev2_trip_point:4 /sys/class/thermal/thermal_zone0/cdev3_trip_point:3 /sys/class/thermal/thermal_zone0/cdev4_trip_point:2 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/policy:step_wise /sys/class/thermal/thermal_zone0/temp:56000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:100000 /sys/class/thermal/thermal_zone0/trip_point_1_type:passive /sys/class/thermal/thermal_zone0/trip_point_2_temp:85000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:70000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:60000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:45000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/passive:0 /sys/class/thermal/thermal_zone1/policy:step_wise /sys/class/thermal/thermal_zone1/temp:53000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/type:acpitz /sys/class/thermal/thermal_zone2/cdev0_trip_point:1 /sys/class/thermal/thermal_zone2/mode:enabled /sys/class/thermal/thermal_zone2/policy:step_wise /sys/class/thermal/thermal_zone2/temp:38600 /sys/class/thermal/thermal_zone2/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone2/trip_point_0_type:critical /sys/class/thermal/thermal_zone2/trip_point_1_temp:60000 /sys/class/thermal/thermal_zone2/trip_point_1_type:passive /sys/class/thermal/thermal_zone2/type:acpitz /sys/class/thermal/thermal_zone3/cdev0_trip_point:1 /sys/class/thermal/thermal_zone3/mode:enabled /sys/class/thermal/thermal_zone3/policy:step_wise /sys/class/thermal/thermal_zone3/temp:0 /sys/class/thermal/thermal_zone3/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone3/trip_point_0_type:critical /sys/class/thermal/thermal_zone3/trip_point_1_temp:110000 /sys/class/thermal/thermal_zone3/trip_point_1_type:passive /sys/class/thermal/thermal_zone3/type:acpitz ------------------------------------ grep . /sys/class/thermal/*/device/path /sys/class/thermal/cooling_device0/device/path:\_TZ_.C255 /sys/class/thermal/cooling_device1/device/path:\_TZ_.C256 /sys/class/thermal/cooling_device2/device/path:\_TZ_.C257 /sys/class/thermal/cooling_device3/device/path:\_TZ_.C258 /sys/class/thermal/cooling_device4/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone0/device/path:\_TZ_.TZ1_ /sys/class/thermal/thermal_zone1/device/path:\_TZ_.TZ2_ /sys/class/thermal/thermal_zone2/device/path:\_TZ_.TZ3_ /sys/class/thermal/thermal_zone3/device/path:\_TZ_.TZ4_ ------------------------------------ grep . /sys/class/thermal/thermal_zone*/cdev*/device/* /sys/class/thermal/thermal_zone0/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone0/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone0/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone0/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone0/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: /sys/class/thermal/thermal_zone0/cdev1/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev1/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev1/device/path:\_TZ_.C258 /sys/class/thermal/thermal_zone0/cdev1/device/power_state:D0 /sys/class/thermal/thermal_zone0/cdev1/device/real_power_state:D0 /sys/class/thermal/thermal_zone0/cdev1/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev1/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev1/device/uid:3 /sys/class/thermal/thermal_zone0/cdev2/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev2/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev2/device/path:\_TZ_.C257 /sys/class/thermal/thermal_zone0/cdev2/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev2/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev2/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev2/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev2/device/uid:2 /sys/class/thermal/thermal_zone0/cdev3/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev3/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev3/device/path:\_TZ_.C256 /sys/class/thermal/thermal_zone0/cdev3/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev3/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev3/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev3/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev3/device/uid:1 /sys/class/thermal/thermal_zone0/cdev4/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev4/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev4/device/path:\_TZ_.C255 /sys/class/thermal/thermal_zone0/cdev4/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev4/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev4/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev4/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev4/device/uid:0 /sys/class/thermal/thermal_zone2/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone2/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone2/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone2/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: /sys/class/thermal/thermal_zone3/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone3/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone3/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone3/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: ------------------------------------ grep . /sys/class/thermal/thermal_zone*/cdev*/* /sys/class/thermal/thermal_zone0/cdev0/cur_state:0 /sys/class/thermal/thermal_zone0/cdev0/max_state:10 /sys/class/thermal/thermal_zone0/cdev0/type:Processor /sys/class/thermal/thermal_zone0/cdev1/cur_state:1 /sys/class/thermal/thermal_zone0/cdev1/max_state:1 /sys/class/thermal/thermal_zone0/cdev1/type:Fan /sys/class/thermal/thermal_zone0/cdev2/cur_state:0 /sys/class/thermal/thermal_zone0/cdev2/max_state:1 /sys/class/thermal/thermal_zone0/cdev2/type:Fan /sys/class/thermal/thermal_zone0/cdev3/cur_state:0 /sys/class/thermal/thermal_zone0/cdev3/max_state:1 /sys/class/thermal/thermal_zone0/cdev3/type:Fan /sys/class/thermal/thermal_zone0/cdev4/cur_state:0 /sys/class/thermal/thermal_zone0/cdev4/max_state:1 /sys/class/thermal/thermal_zone0/cdev4/type:Fan /sys/class/thermal/thermal_zone2/cdev0/cur_state:0 /sys/class/thermal/thermal_zone2/cdev0/max_state:10 /sys/class/thermal/thermal_zone2/cdev0/type:Processor /sys/class/thermal/thermal_zone3/cdev0/cur_state:0 /sys/class/thermal/thermal_zone3/cdev0/max_state:10 /sys/class/thermal/thermal_zone3/cdev0/type:Processor And here is the log after the 2nd trip point was reached and now you see both CD3 and CD2 as being on, while obviously there's only one physical active cooling device in the system which cannot run at 2 different speeds at the same time and also TZ3 is 55 indicating that the physical fan is actually on and should be running at the 2nd lowest speed which is exactly what is happening. Apparently TZ3 is 0 when the actual fan is off and the cpu temperature indicated by the TZ0 is below the 1st trip point, 40 when the fan is running at its 1st speed and the cpu temperature as indicate by TZ0 is in the range covered by this speed, 55 when the fan is running at its 2nd speed and the system's temperature is in the range which should be covered by this speed and so on (basically TZ3 is not an actual temperature sensor and just jumps from 0 to 40 to 55 and so on and then back to 55 the to 40 and finally to 0 depending on the actual state of the physical fan and/or the temperature range in which the cpu/system is, based on the speed of fan which should cover that range, as indicated, I believe, by the trip_point_*_temp based on the current state of the system): grep . /sys/class/thermal/*/* /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device1/cur_state:0 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:1 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:1 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state:0 /sys/class/thermal/cooling_device4/max_state:10 /sys/class/thermal/cooling_device4/type:Processor /sys/class/thermal/thermal_zone0/cdev0_trip_point:1 /sys/class/thermal/thermal_zone0/cdev1_trip_point:5 /sys/class/thermal/thermal_zone0/cdev2_trip_point:4 /sys/class/thermal/thermal_zone0/cdev3_trip_point:3 /sys/class/thermal/thermal_zone0/cdev4_trip_point:2 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/policy:step_wise /sys/class/thermal/thermal_zone0/temp:59000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:100000 /sys/class/thermal/thermal_zone0/trip_point_1_type:passive /sys/class/thermal/thermal_zone0/trip_point_2_temp:85000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:70000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:55000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:45000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/passive:0 /sys/class/thermal/thermal_zone1/policy:step_wise /sys/class/thermal/thermal_zone1/temp:54000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/type:acpitz /sys/class/thermal/thermal_zone2/cdev0_trip_point:1 /sys/class/thermal/thermal_zone2/mode:enabled /sys/class/thermal/thermal_zone2/policy:step_wise /sys/class/thermal/thermal_zone2/temp:38700 /sys/class/thermal/thermal_zone2/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone2/trip_point_0_type:critical /sys/class/thermal/thermal_zone2/trip_point_1_temp:60000 /sys/class/thermal/thermal_zone2/trip_point_1_type:passive /sys/class/thermal/thermal_zone2/type:acpitz /sys/class/thermal/thermal_zone3/cdev0_trip_point:1 /sys/class/thermal/thermal_zone3/mode:enabled /sys/class/thermal/thermal_zone3/policy:step_wise /sys/class/thermal/thermal_zone3/temp:55000 /sys/class/thermal/thermal_zone3/trip_point_0_temp:110000 /sys/class/thermal/thermal_zone3/trip_point_0_type:critical /sys/class/thermal/thermal_zone3/trip_point_1_temp:110000 /sys/class/thermal/thermal_zone3/trip_point_1_type:passive /sys/class/thermal/thermal_zone3/type:acpitz ------------------------------------ grep . /sys/class/thermal/*/device/path /sys/class/thermal/cooling_device0/device/path:\_TZ_.C255 /sys/class/thermal/cooling_device1/device/path:\_TZ_.C256 /sys/class/thermal/cooling_device2/device/path:\_TZ_.C257 /sys/class/thermal/cooling_device3/device/path:\_TZ_.C258 /sys/class/thermal/cooling_device4/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone0/device/path:\_TZ_.TZ1_ /sys/class/thermal/thermal_zone1/device/path:\_TZ_.TZ2_ /sys/class/thermal/thermal_zone2/device/path:\_TZ_.TZ3_ /sys/class/thermal/thermal_zone3/device/path:\_TZ_.TZ4_ ------------------------------------ grep . /sys/class/thermal/thermal_zone*/cdev*/device/* /sys/class/thermal/thermal_zone0/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone0/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone0/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone0/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone0/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: /sys/class/thermal/thermal_zone0/cdev1/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev1/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev1/device/path:\_TZ_.C258 /sys/class/thermal/thermal_zone0/cdev1/device/power_state:D0 /sys/class/thermal/thermal_zone0/cdev1/device/real_power_state:D0 /sys/class/thermal/thermal_zone0/cdev1/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev1/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev1/device/uid:3 /sys/class/thermal/thermal_zone0/cdev2/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev2/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev2/device/path:\_TZ_.C257 /sys/class/thermal/thermal_zone0/cdev2/device/power_state:D0 /sys/class/thermal/thermal_zone0/cdev2/device/real_power_state:D0 /sys/class/thermal/thermal_zone0/cdev2/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev2/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev2/device/uid:2 /sys/class/thermal/thermal_zone0/cdev3/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev3/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev3/device/path:\_TZ_.C256 /sys/class/thermal/thermal_zone0/cdev3/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev3/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev3/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev3/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev3/device/uid:1 /sys/class/thermal/thermal_zone0/cdev4/device/hid:PNP0C0B /sys/class/thermal/thermal_zone0/cdev4/device/modalias:acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev4/device/path:\_TZ_.C255 /sys/class/thermal/thermal_zone0/cdev4/device/power_state:D3cold /sys/class/thermal/thermal_zone0/cdev4/device/real_power_state:D3cold /sys/class/thermal/thermal_zone0/cdev4/device/uevent:DRIVER=fan /sys/class/thermal/thermal_zone0/cdev4/device/uevent:MODALIAS=acpi:PNP0C0B: /sys/class/thermal/thermal_zone0/cdev4/device/uid:0 /sys/class/thermal/thermal_zone2/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone2/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone2/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone2/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: /sys/class/thermal/thermal_zone3/cdev0/device/hid:LNXCPU /sys/class/thermal/thermal_zone3/cdev0/device/modalias:acpi:LNXCPU: /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.C001 /sys/class/thermal/thermal_zone3/cdev0/device/uevent:DRIVER=processor /sys/class/thermal/thermal_zone3/cdev0/device/uevent:MODALIAS=acpi:LNXCPU: ------------------------------------ grep . /sys/class/thermal/thermal_zone*/cdev*/* /sys/class/thermal/thermal_zone0/cdev0/cur_state:0 /sys/class/thermal/thermal_zone0/cdev0/max_state:10 /sys/class/thermal/thermal_zone0/cdev0/type:Processor /sys/class/thermal/thermal_zone0/cdev1/cur_state:1 /sys/class/thermal/thermal_zone0/cdev1/max_state:1 /sys/class/thermal/thermal_zone0/cdev1/type:Fan /sys/class/thermal/thermal_zone0/cdev2/cur_state:1 /sys/class/thermal/thermal_zone0/cdev2/max_state:1 /sys/class/thermal/thermal_zone0/cdev2/type:Fan /sys/class/thermal/thermal_zone0/cdev3/cur_state:0 /sys/class/thermal/thermal_zone0/cdev3/max_state:1 /sys/class/thermal/thermal_zone0/cdev3/type:Fan /sys/class/thermal/thermal_zone0/cdev4/cur_state:0 /sys/class/thermal/thermal_zone0/cdev4/max_state:1 /sys/class/thermal/thermal_zone0/cdev4/type:Fan /sys/class/thermal/thermal_zone2/cdev0/cur_state:0 /sys/class/thermal/thermal_zone2/cdev0/max_state:10 /sys/class/thermal/thermal_zone2/cdev0/type:Processor /sys/class/thermal/thermal_zone3/cdev0/cur_state:0 /sys/class/thermal/thermal_zone3/cdev0/max_state:10 /sys/class/thermal/thermal_zone3/cdev0/type:Processor I have made an additional observation. The more often the speed of the fan changes the faster the bug shows. After suspend to ram the bug shows even faster. I assume the acpi system is reinitialized on wake up from suspend to ram. Does reinitialization happen at "normal" runtime? And Zhang you are right. The more heat there is the faster the bug shows. That is why it is so hard to reproduce the bug without a gpu driver. Why was the regression status changed from yes to no? Is this supposed to be a feature now? To randomly indicate that the fan is on while it is actually off and vice versa? In addition to what was initially reported by Matthias, the bug 'works' both ways and I explained it in my previous messages. This is certainly a regression and a very bad one because the average user is not even aware of it and the potential consequences of this are a premature wearing off of the fan motor because while the software layer indicates that it is off the fan continues to work without any possibility of being interrupter unless the user intervenes. And in the other direction when the software indicates that the fan is on while it is actually off, not only the cpu's, but the general temperature of the system rises overstressing and thus certainly reducing the life of everything inside that computer. I'm sorry if I may sound rude, but somebody f**** up the logic in the acpi subsystem in a very subtle way that now no one seems to be able to even get close to where it is, let alone come up with a solution. As for the regression status: The fan did work with linux-2.6.31 and after the update to linux-2.6.32 it did not work. IMHO this is a regression. What can we do? I will test the new linux-3.12 and report back. So long... Doing a git bisect may be helpful since we know v2.6.31 works and v2.6.32 doesn't. http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ Today the bug hit me while I was running linux-3.12.7. I tried a git bisect once but one bisected kernel ate my filesystem. Today I can't even compile linux-2.6.31 anymore. Problem is I am on Gentoo which is a rolling release distribution. Rolling back is not done easily. I will try to install debian. Perhaps I can do the bisecting from there. Have a nice day... Any update on bisect? I am bisecting now. The bug shows rarely. So here is what I am doing. I changed the kernel config to support an initramfs. I boot and let the initramfs drop to a busybox shell. Modules are loaded as usual but I do not mount my rootfs. Now I have to sit and wait to see if the bug shows. This is going to take a while. I will get back to you with the full bisection log. Hopefully this will lead to a solution for the problem. The bug seems to be introduced by a patch in the 2.6.32 series. At the moment I am testing if the bug exists in linux-2.6.32.7. My first bug report of this was wrong and I am sorry for that. It turns out gentoo did not name the kernel like it was named upstream. 2.6.32 in gentoo was actually 2.6.32.8 upstream. I checked that. I will do a bisect once I find the first bad release. I know for sure that 2.6.32.8 shows the bug. The bug in 2.6.32.8 showed after the second cold boot and 4:27 hours of uptime. This bug searching is quite time consuming. So I need some more time to find the first bad commit. Have a nice day! Matthias, any update about this problem? I am still searching for the first bad one. I thought I had it but the bisect I did went wrong and gave me a powerpc driver patch as the bad patch. I know it would be hard but I think this is the only way to find the root cause. Thanks for your effort, Matthias! As I use my machine for my daily work, I am running, when not searching for the fan bug, linux-3.16.1. I have been running this kernel for 15 days and so far I have not experienced the fan bug. I am tempted to say the bug got fixed somewhere between linux-3.15.3 and linux-3.16.1. I keep linux-3.16.1 for another 15 days just to be sure. I will keep investigating what caused the bug as I wish that the bug will not return in any way. hi, matthias, any good news? :) linux-3.16.1 has passed the test. No fan bug occured during testing. Now I am running linux-3.16.6 and so far the bug is gone. Will try the 3.17 series in a few days. This is good news, isn't it? yes, good news. I will close this bug as the problem is gone in 3.16. please feel free to re-open it if the problem come back again in the latest upstream kernel. |