Bug 50041 - fan is actually on while ACPI shows it is off - HP NW9440
fan is actually on while ACPI shows it is off - HP NW9440
Status: CLOSED CODE_FIX
Product: ACPI
Classification: Unclassified
Component: Power-Fan
All Linux
: P1 normal
Assigned To: Zhang Rui
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-04 11:13 UTC by Matthias
Modified: 2014-11-23 12:56 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.6.5
Tree: Mainline
Regression: No


Attachments
dmesg (47.35 KB, text/plain)
2012-11-04 11:14 UTC, Matthias
Details
kernelconfig (68.77 KB, text/plain)
2012-11-04 11:15 UTC, Matthias
Details
cpuinfo (1.48 KB, text/plain)
2012-11-04 11:15 UTC, Matthias
Details
lspci (28.11 KB, text/plain)
2012-11-04 11:15 UTC, Matthias
Details
acpidump of hp nw9440 (95.15 KB, application/octet-stream)
2012-11-26 08:35 UTC, Matthias
Details
Measurements for linux-3.9-rc7 at full speed of fan for comment 37 (7.88 KB, text/plain)
2013-04-18 11:35 UTC, Matthias
Details
Findings for comment 39 (11.63 KB, text/plain)
2013-05-02 07:02 UTC, Matthias
Details

Description Matthias 2012-11-04 11:13:45 UTC
The fan of my laptop spinns full speed after a while when it is not under any load. To reproduce this the following conditions have to be met:

1. let the laptop enter the lowest fan speed. This is fan off when running on battery or the lowest speed when running on ac.

2. Do nothing on the laptop. Just let it sit there. After a while the fan starts spinning full speed. This is not always the case. Sometimes the laptop works for days before I get this bug again. Perhaps it is important to know that the laptop even under heavy load does not engage full fan speed. The last but one speed is sufficient to cool this machine. 

The laptop is a HP Compaq NW9440 EY612EA#ABD. It has a nvidia quadro fx1500m graphics chip. Problem is that the system shares one cooling system for cpu and graphics card. I can only observe this bug when the nvidia binary blob is enabled because no other driver lets the system reach the lowest possible fan speed. With the nouveau driver the fan speed is two steps faster when ideling. The latest kernel version without this bug is 2.6.31. After this kernel all kernels have the bug under the same conditions. Latest tested version is 3.6.5. I have tried to bisect this but ran into a dead end because sometimes the nvidia driver would not work and other times the bisected kernel crashed my computer. Because this problem is so annoying I kindly ask you to give me a hint where to look to get rid of this problem.
Furthermore I did this 'echo disable > /sys/class/thermal/thermal_zone0/mode' for each thermal zone. But it did not help. Neither did the playing with the acpi_osi parameter nor fixing the two bugs of the DSDT table. 

Output of acpi -V:

Battery 0: Unknown, 99%
Battery 0: design capacity 3978 mAh, last full capacity 3978 mAh = 100%
Adapter 0: on-line
Thermal 0: ok, 20.0 degrees C
Thermal 0: trip point 0 switches to mode critical at temperature 110.0 degrees C
Thermal 1: ok, 35.0 degrees C
Thermal 1: trip point 0 switches to mode critical at temperature 102.0 degrees C
Thermal 1: trip point 1 switches to mode passive at temperature 60.0 degrees C
Thermal 2: ok, 42.0 degrees C
Thermal 2: trip point 0 switches to mode critical at temperature 105.0 degrees C
Thermal 2: trip point 1 switches to mode passive at temperature 95.0 degrees C
Thermal 3: ok, 50.0 degrees C
Thermal 3: trip point 0 switches to mode critical at temperature 126.0 degrees C
Thermal 3: trip point 1 switches to mode active at temperature 95.0 degrees C
Thermal 3: trip point 2 switches to mode active at temperature 86.0 degrees C
Thermal 3: trip point 3 switches to mode active at temperature 74.0 degrees C
Thermal 3: trip point 4 switches to mode active at temperature 67.0 degrees C
Thermal 4: ok, 44.0 degrees C
Thermal 4: trip point 0 switches to mode critical at temperature 102.0 degrees C
Thermal 4: trip point 1 switches to mode passive at temperature 97.0 degrees C
Thermal 5: ok, 48.0 degrees C
Thermal 5: trip point 0 switches to mode critical at temperature 256.0 degrees C
Thermal 5: trip point 1 switches to mode active at temperature 91.0 degrees C
Thermal 5: trip point 2 switches to mode active at temperature 85.0 degrees C
Thermal 5: trip point 3 switches to mode active at temperature 79.0 degrees C
Thermal 5: trip point 4 switches to mode active at temperature 68.0 degrees C
Cooling 0: Processor 0 of 10
Cooling 1: Processor 0 of 10
Cooling 2: Fan 0 of 1
Cooling 3: Fan 0 of 1
Cooling 4: Fan 0 of 1
Cooling 5: Fan 0 of 1
Cooling 6: Fan 0 of 1
Cooling 7: Fan 0 of 1
Cooling 8: Fan 0 of 1
Cooling 9: Fan 0 of 1
Cooling 10: Fan 0 of 1
Cooling 11: Fan 0 of 1
Cooling 12: Fan 0 of 1

Interesting thing is when the bug occurs there is no active thermal zone or fan visible in acpi -V. 
I tried to get help from HP for this but they said I should use Windows!? I know my kernel is tainted  but I hope to get some hints as to where I can look and what I can try to get rid of this bug. I even tried to replace all of the acpi drivers with the old ones. But since I am no programmer it failed to compile. Any help is greatly appreciated. 

Some information: 

cat /proc/version:
Linux version 3.6.5 (root@LAPPI) (gcc version 4.5.4 (Gentoo 4.5.4 p1.0, pie-0.4.7) ) #1 SMP PREEMPT Fri Nov 2 12:21:16 CET 2012

Kernel without this bug: 2.6.31

Environment: Gentoo Linux AMD64 stable

/usr/src/linux/scripts/ver_linux                                                                     
If some fields are empty or look unusual you may have an old version.                                          
Compare to the current minimal requirements in Documentation/Changes.                                                 
                                                                                                                      
Linux LAPPI 3.6.5 #1 SMP PREEMPT Fri Nov 2 12:21:16 CET 2012 x86_64 Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz GenuineIntel GNU/Linux
                                                                                                                              
Gnu C                  4.5.4                                                                                                           
Gnu make               3.82                                                                                                            
binutils               2.22                                                                                                            
util-linux             2.21.2                                                                                                          
mount                  support                                                                                                         
module-init-tools      3.16                                                                                                                       
e2fsprogs              1.42                                                                                                                                   
Linux C Library        2.15                                                                                                                                   
Dynamic linker (ldd)   2.15                                                                                                                                                 
Procps                 3.2.8                                                                                                                                                
Net-tools              1.60_p20110409135728                                                                                                                                 
Kbd                    1.15.3wip                                                                                                                                            
Sh-utils               8.16                                                                                                                                                 
wireless-tools         29                                                                                                                                                   
Modules Loaded         usblp usb_storage firewire_sbp2 tifm_sd mmc_core xt_LOG xt_limit xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables squashfs hid_logitech hid_generic usbhid hid nvidia snd_hda_codec_analog arc4 uhci_hcd ehci_hcd snd_hda_intel snd_hda_codec snd_pcm usbcore snd_timer snd iwl3945 iwlegacy mac80211 cfg80211 tg3 sr_mod soundcore cdrom hp_accel rfkill firewire_ohci lis3lv02d snd_page_alloc usb_common agpgart libphy ata_piix firewire_core wmi input_polldev tifm_7xx1 i2c_core tifm_core

cat /proc/modules
usblp 10919 0 - Live 0xffffffffa0104000
usb_storage 45352 0 - Live 0xffffffffa01cd000
firewire_sbp2 12513 0 - Live 0xffffffffa0158000
tifm_sd 8970 0 - Live 0xffffffffa00cc000
mmc_core 73103 1 tifm_sd, Live 0xffffffffa01b2000
xt_LOG 7279 1 - Live 0xffffffffa0085000
xt_limit 1865 2 - Live 0xffffffffa0071000
xt_tcpudp 2423 4 - Live 0xffffffffa0052000
nf_conntrack_ipv4 5966 4 - Live 0xffffffffa0062000
nf_defrag_ipv4 1195 1 nf_conntrack_ipv4, Live 0xffffffffa0046000
xt_state 1151 4 - Live 0xffffffffa001c000
nf_conntrack 48873 2 nf_conntrack_ipv4,xt_state, Live 0xffffffffa013e000
iptable_filter 1136 1 - Live 0xffffffffa0013000
ip_tables 15712 1 iptable_filter, Live 0xffffffffa005d000
x_tables 15811 6 xt_LOG,xt_limit,xt_tcpudp,xt_state,iptable_filter,ip_tables, Live 0xffffffffa002d000
squashfs 22566 1 - Live 0xffffffffa0056000
hid_logitech 8133 0 - Live 0xffffffffa0ee9000
hid_generic 1033 0 - Live 0xffffffffa0ee5000
usbhid 26046 0 - Live 0xffffffffa0ed9000
hid 57699 3 hid_logitech,hid_generic,usbhid, Live 0xffffffffa0eb1000
nvidia 11208150 32 - Live 0xffffffffa034d000 (PO)
snd_hda_codec_analog 77036 1 - Live 0xffffffffa0334000
arc4 1885 2 - Live 0xffffffffa0330000
uhci_hcd 22603 0 - Live 0xffffffffa0308000
ehci_hcd 40135 0 - Live 0xffffffffa02bd000
snd_hda_intel 22512 7 - Live 0xffffffffa02b1000
snd_hda_codec 72988 2 snd_hda_codec_analog,snd_hda_intel, Live 0xffffffffa0295000
snd_pcm 69421 4 snd_hda_intel,snd_hda_codec, Live 0xffffffffa027b000
usbcore 134688 5 usblp,usb_storage,usbhid,uhci_hcd,ehci_hcd, Live 0xffffffffa0190000
snd_timer 17937 3 snd_pcm, Live 0xffffffffa0187000
snd 46967 15 snd_hda_codec_analog,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer, Live 0xffffffffa0173000
iwl3945 51825 0 - Live 0xffffffffa015f000
iwlegacy 45088 1 iwl3945, Live 0xffffffffa014b000
mac80211 215736 2 iwl3945,iwlegacy, Live 0xffffffffa0108000
cfg80211 170987 3 iwl3945,iwlegacy,mac80211, Live 0xffffffffa00d2000
tg3 136938 0 - Live 0xffffffffa00a9000
sr_mod 13167 0 - Live 0xffffffffa00a1000
soundcore 864 1 snd, Live 0xffffffffa009d000
cdrom 34525 1 sr_mod, Live 0xffffffffa0090000
hp_accel 16248 0 - Live 0xffffffffa0089000
rfkill 14839 1 cfg80211, Live 0xffffffffa0080000
firewire_ohci 31212 0 - Live 0xffffffffa0074000
lis3lv02d 9964 1 hp_accel, Live 0xffffffffa006d000
snd_page_alloc 6657 2 snd_hda_intel,snd_pcm, Live 0xffffffffa0068000
usb_common 801 1 usbcore, Live 0xffffffffa0054000
agpgart 25617 1 nvidia, Live 0xffffffffa0048000
libphy 18536 1 tg3, Live 0xffffffffa003e000
ata_piix 22391 0 - Live 0xffffffffa0033000
firewire_core 50755 2 firewire_sbp2,firewire_ohci, Live 0xffffffffa001f000
wmi 8075 0 - Live 0xffffffffa0019000
input_polldev 2514 1 lis3lv02d, Live 0xffffffffa0015000
tifm_7xx1 4242 0 - Live 0xffffffffa0010000
i2c_core 18184 1 nvidia, Live 0xffffffffa0006000
tifm_core 4639 2 tifm_sd,tifm_7xx1, Live 0xffffffffa0000000

cat /proc/ioports
0000-0cf7 : PCI Bus 0000:00
  0000-001f : dma1
  0020-0021 : pic1
  0040-0043 : timer0
  0050-0053 : timer1
  0060-0060 : keyboard
  0062-0062 : EC data
  0064-0064 : keyboard
  0066-0066 : EC cmd
  0070-0071 : rtc0
  0080-008f : dma page reg
  00a0-00a1 : pic2
  00c0-00df : dma2
  00f0-00ff : fpu
  0170-0177 : 0000:00:1f.1
    0170-0177 : ata_piix
  01f0-01f7 : 0000:00:1f.1
    01f0-01f7 : ata_piix
  0376-0376 : 0000:00:1f.1
    0376-0376 : ata_piix
  03c0-03df : vga+
  03f6-03f6 : 0000:00:1f.1
    03f6-03f6 : ata_piix
  04d0-04d1 : pnp 00:0b
  0500-057f : pnp 00:09
  0800-080f : pnp 00:09
0cf8-0cff : PCI conf1
0d00-ffff : PCI Bus 0000:00
  1000-107f : 0000:00:1f.0
    1000-107f : pnp 00:0b
      1000-1003 : ACPI PM1a_EVT_BLK
      1004-1005 : ACPI PM1a_CNT_BLK
      1008-100b : ACPI PM_TMR
      1010-1015 : ACPI CPU throttle
      1020-1020 : ACPI PM2_CNT_BLK
      1028-102f : ACPI GPE0_BLK
  1100-113f : 0000:00:1f.0
    1100-113f : pnp 00:0b
  1200-121f : pnp 00:0b
  1370-1377 : 0000:00:1f.2
    1370-1377 : ahci
  13f0-13f7 : 0000:00:1f.2
    13f0-13f7 : ahci
  1574-1577 : 0000:00:1f.2
    1574-1577 : ahci
  15f4-15f7 : 0000:00:1f.2
    15f4-15f7 : ahci
  2000-2fff : PCI Bus 0000:08
  3000-3fff : PCI Bus 0000:10
  4000-4fff : PCI Bus 0000:01
    4000-407f : 0000:01:00.0
  5000-501f : 0000:00:1d.0
    5000-501f : uhci_hcd
  5020-503f : 0000:00:1d.1
    5020-503f : uhci_hcd
  5040-505f : 0000:00:1d.2
    5040-505f : uhci_hcd
  5060-507f : 0000:00:1d.3
    5060-507f : uhci_hcd
  5080-508f : 0000:00:1f.1
    5080-508f : ata_piix
  50b0-50bf : 0000:00:1f.2
    50b0-50bf : ahci
  6000-6fff : PCI Bus 0000:02
    6000-60ff : PCI CardBus 0000:03
    6400-64ff : PCI CardBus 0000:03

cat /proc/iomem
00000000-00000fff : reserved
00001000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000cdbff : Video ROM
000cdc00-000cffff : pnp 00:0c
000d0000-000dffff : PCI Bus 0000:00
000e0000-000fffff : reserved
  000f0000-000fffff : System ROM
00100000-d7fcffff : System RAM
  01000000-0138705a : Kernel code
  0138705b-014ef8bf : Kernel data
  01567000-015f2fff : Kernel bss
d7fd0000-d7fe55ff : reserved
d7fe5600-d7ff7fff : ACPI Non-volatile Storage
d7ff8000-d7ffffff : reserved
d8000000-fedfffff : PCI Bus 0000:00
  d8000000-dbffffff : PCI Bus 0000:02
    d8000000-dbffffff : PCI CardBus 0000:03
  dc000000-dc1fffff : PCI Bus 0000:08
  dc200000-dc3fffff : PCI Bus 0000:10
  e0000000-efffffff : PCI Bus 0000:01
    e0000000-efffffff : 0000:01:00.0
  f0000000-f3ffffff : PCI CardBus 0000:03
  f4000000-f40fffff : PCI Bus 0000:10
    f4000000-f4000fff : 0000:10:00.0
      f4000000-f4000fff : iwl3945
  f4100000-f41fffff : PCI Bus 0000:08
    f4100000-f410ffff : 0000:08:00.0
      f4100000-f410ffff : tg3
  f4200000-f45fffff : PCI Bus 0000:02
    f4200000-f4200fff : 0000:02:06.0
    f4201000-f42017ff : 0000:02:06.1
      f4201000-f42017ff : firewire_ohci
    f4204000-f4207fff : 0000:02:06.1
    f4208000-f4208fff : 0000:02:06.2
      f4208000-f4208fff : tifm_7xx1
    f4209000-f42090ff : 0000:02:06.3
    f420a000-f420afff : 0000:02:06.4
    f420b000-f420bfff : 0000:02:06.4
  f5000000-f6ffffff : PCI Bus 0000:01
    f5000000-f5ffffff : 0000:01:00.0
      f5000000-f5ffffff : nvidia
    f6000000-f6ffffff : 0000:01:00.0
  f7000000-f7003fff : 0000:00:1b.0
    f7000000-f7003fff : ICH HD audio
  f7004000-f70043ff : 0000:00:1d.7
    f7004000-f70043ff : ehci_hcd
  f7005000-f70053ff : 0000:00:1f.2
    f7005000-f70053ff : ahci
  f8000000-fbffffff : PCI MMCONFIG 0000 [bus 00-3f]
    f8000000-fbffffff : pnp 00:0b
  fec00000-fec00fff : reserved
    fec00000-fec003ff : IOAPIC 0
  fed00000-fed003ff : HPET 0
  fed20000-fed9afff : reserved
    fed20000-fed3ffff : pnp 00:0b
    fed45000-fed8ffff : pnp 00:0b
    fed90000-fed9afff : pnp 00:0b
  feda0000-fedbffff : reserved
    feda0000-fedbffff : pnp 00:0c
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved
    fee00000-fee00fff : pnp 00:0c
fee01000-ffffffff : PCI Bus 0000:00
  ffb00000-ffbfffff : reserved
    ffb00000-ffbfffff : pnp 00:09
  fff00000-ffffffff : reserved
    fff00000-ffffffff : pnp 00:09
Comment 1 Matthias 2012-11-04 11:14:44 UTC
Created attachment 85451 [details]
dmesg
Comment 2 Matthias 2012-11-04 11:15:09 UTC
Created attachment 85461 [details]
kernelconfig
Comment 3 Matthias 2012-11-04 11:15:30 UTC
Created attachment 85471 [details]
cpuinfo
Comment 4 Matthias 2012-11-04 11:15:51 UTC
Created attachment 85481 [details]
lspci
Comment 5 Len Brown 2012-11-06 03:05:15 UTC
> I can only observe this bug when the nvidia binary blob is
> enabled because no other driver lets the system reach the
> lowest possible fan speed.

Unclear that we can work on bugs that can only be reproduced when
the nvidia binary blob is enabled.
Comment 6 Zhang Rui 2012-11-13 06:20:30 UTC
please attach the output of "grep . /sys/class/thermal/*/*" when system is idle and the fan is running in full speed.
Comment 7 Matthias 2012-11-13 06:41:26 UTC
(In reply to comment #6)
> please attach the output of "grep . /sys/class/thermal/*/*" when system is idle
> and the fan is running in full speed.

I will post this as soon as possible. Thanks in advance for the help.
Comment 8 Zhang Rui 2012-11-20 02:03:32 UTC
ping...
Comment 9 Matthias 2012-11-22 17:37:43 UTC
Today the bug did show up again. Here is the required information. 

grep . /sys/class/thermal/*/*
/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device10/cur_state:0
/sys/class/thermal/cooling_device10/max_state:1
/sys/class/thermal/cooling_device10/type:Fan
/sys/class/thermal/cooling_device11/cur_state:0
/sys/class/thermal/cooling_device11/max_state:10
/sys/class/thermal/cooling_device11/type:Processor
/sys/class/thermal/cooling_device12/cur_state:0
/sys/class/thermal/cooling_device12/max_state:10
/sys/class/thermal/cooling_device12/type:Processor
/sys/class/thermal/cooling_device1/cur_state:0
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:0
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:0
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:
/sys/class/thermal/cooling_device4/max_state:1
/sys/class/thermal/cooling_device4/type:Fan
/sys/class/thermal/cooling_device5/cur_state:0
/sys/class/thermal/cooling_device5/max_state:1
/sys/class/thermal/cooling_device5/type:Fan
/sys/class/thermal/cooling_device6/cur_state:0
/sys/class/thermal/cooling_device6/max_state:1
/sys/class/thermal/cooling_device6/type:Fan
/sys/class/thermal/cooling_device7/cur_state:0
/sys/class/thermal/cooling_device7/max_state:1
/sys/class/thermal/cooling_device7/type:Fan
/sys/class/thermal/cooling_device8/cur_state:0
/sys/class/thermal/cooling_device8/max_state:1
/sys/class/thermal/cooling_device8/type:Fan
/sys/class/thermal/cooling_device9/cur_state:0
/sys/class/thermal/cooling_device9/max_state:1
/sys/class/thermal/cooling_device9/type:Fan
/sys/class/thermal/thermal_zone0/cdev0_trip_point:5
/sys/class/thermal/thermal_zone0/cdev1_trip_point:
/sys/class/thermal/thermal_zone0/cdev2_trip_point:3
/sys/class/thermal/thermal_zone0/cdev3_trip_point:2
/sys/class/thermal/thermal_zone0/cdev4_trip_point:1
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/passive:0
/sys/class/thermal/thermal_zone0/temp:48000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:256000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:91000
/sys/class/thermal/thermal_zone0/trip_point_1_type:active
/sys/class/thermal/thermal_zone0/trip_point_2_temp:85000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:79000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:68000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:58000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/cdev0_trip_point:1
/sys/class/thermal/thermal_zone1/cdev1_trip_point:1
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/temp:41000
/sys/class/thermal/thermal_zone1/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/trip_point_1_temp:97000
/sys/class/thermal/thermal_zone1/trip_point_1_type:passive
/sys/class/thermal/thermal_zone1/type:acpitz
/sys/class/thermal/thermal_zone2/cdev0_trip_point:1
/sys/class/thermal/thermal_zone2/cdev1_trip_point:6
/sys/class/thermal/thermal_zone2/cdev2_trip_point:5
/sys/class/thermal/thermal_zone2/cdev3_trip_point:4
/sys/class/thermal/thermal_zone2/cdev4_trip_point:3
/sys/class/thermal/thermal_zone2/cdev5_trip_point:2
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/passive:0
/sys/class/thermal/thermal_zone2/temp:41000
/sys/class/thermal/thermal_zone2/trip_point_0_temp:126000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone2/trip_point_1_type:active
/sys/class/thermal/thermal_zone2/trip_point_2_temp:86000
/sys/class/thermal/thermal_zone2/trip_point_2_type:active
/sys/class/thermal/thermal_zone2/trip_point_3_temp:74000
/sys/class/thermal/thermal_zone2/trip_point_3_type:active
/sys/class/thermal/thermal_zone2/trip_point_4_temp:67000
/sys/class/thermal/thermal_zone2/trip_point_4_type:active
/sys/class/thermal/thermal_zone2/trip_point_5_temp:60000
/sys/class/thermal/thermal_zone2/trip_point_5_type:active
/sys/class/thermal/thermal_zone2/trip_point_6_temp:55000
/sys/class/thermal/thermal_zone2/trip_point_6_type:active
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone3/cdev0_trip_point:1
/sys/class/thermal/thermal_zone3/cdev1_trip_point:1
/sys/class/thermal/thermal_zone3/mode:enabled
/sys/class/thermal/thermal_zone3/temp:36000
/sys/class/thermal/thermal_zone3/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone3/trip_point_0_type:critical
/sys/class/thermal/thermal_zone3/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone3/trip_point_1_type:passive
/sys/class/thermal/thermal_zone3/type:acpitz
/sys/class/thermal/thermal_zone4/cdev0_trip_point:1
/sys/class/thermal/thermal_zone4/cdev1_trip_point:1
/sys/class/thermal/thermal_zone4/mode:enabled
/sys/class/thermal/thermal_zone4/temp:31900
/sys/class/thermal/thermal_zone4/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone4/trip_point_0_type:critical
/sys/class/thermal/thermal_zone4/trip_point_1_temp:60000
/sys/class/thermal/thermal_zone4/trip_point_1_type:passive
/sys/class/thermal/thermal_zone4/type:acpitz
/sys/class/thermal/thermal_zone5/mode:enabled
/sys/class/thermal/thermal_zone5/passive:0
/sys/class/thermal/thermal_zone5/temp:20000
/sys/class/thermal/thermal_zone5/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone5/trip_point_0_type:critical
/sys/class/thermal/thermal_zone5/type:acpitz
Comment 10 Zhang Rui 2012-11-23 06:59:55 UTC
(In reply to comment #9)
> Today the bug did show up again. Here is the required information. 
> 
> grep . /sys/class/thermal/*/*
> /sys/class/thermal/cooling_device0/cur_state:0
> /sys/class/thermal/cooling_device0/type:Fan
> /sys/class/thermal/cooling_device10/cur_state:0
> /sys/class/thermal/cooling_device10/type:Fan
> /sys/class/thermal/cooling_device1/cur_state:0
> /sys/class/thermal/cooling_device1/type:Fan
> /sys/class/thermal/cooling_device2/cur_state:0
> /sys/class/thermal/cooling_device2/type:Fan
> /sys/class/thermal/cooling_device3/cur_state:0
> /sys/class/thermal/cooling_device3/type:Fan
> /sys/class/thermal/cooling_device4/cur_state:
> /sys/class/thermal/cooling_device4/type:Fan
> /sys/class/thermal/cooling_device5/cur_state:0
> /sys/class/thermal/cooling_device5/type:Fan
> /sys/class/thermal/cooling_device6/cur_state:0
> /sys/class/thermal/cooling_device6/type:Fan
> /sys/class/thermal/cooling_device7/cur_state:0
> /sys/class/thermal/cooling_device7/type:Fan
> /sys/class/thermal/cooling_device8/cur_state:0
> /sys/class/thermal/cooling_device8/type:Fan
> /sys/class/thermal/cooling_device9/cur_state:0
> /sys/class/thermal/cooling_device9/type:Fan

well, all the these show that the ACPI FAN is in OFF state.

hmmm, can you do the following test?
when the system is idle and the fan does not spin,
try "sudo echo 1 > /sys/class/thermal/cooling_deviceX/cur_state"
for all the cooling devices with type "Fan", one by one.

can you hear the fan spin after these commands?
Comment 11 Matthias 2012-11-23 08:48:55 UTC
First of all I define the fan speeds:

There are five hearable distinct fan speeds:
[STEP_0]: lowest speed
[STEP_1]: [STEP_0] +1
[STEP_2]: [STEP_1] +1
[STEP_3]: [STEP_2] +1
[STEP_4]: full speed

I left the set settings alone and did go through the commands one by one:

echo 1 > /sys/class/thermal/cooling_device0/cur_state: fan goes slowly from [STEP_0] to [STEP_4]. For your information the bug does turn the fan on to [STEP_4] instantly.
echo 1 > /sys/class/thermal/cooling_device1/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device2/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device3/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device4/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device5/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device6/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device7/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device8/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device9/cur_state: no hearable change
echo 1 > /sys/class/thermal/cooling_device10/cur_state: no hearable change

result: grep . /sys/class/thermal/*/*
/sys/class/thermal/cooling_device0/cur_state:1
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device10/cur_state:1
/sys/class/thermal/cooling_device10/max_state:1
/sys/class/thermal/cooling_device10/type:Fan
/sys/class/thermal/cooling_device11/cur_state:1
/sys/class/thermal/cooling_device11/max_state:10
/sys/class/thermal/cooling_device11/type:Processor
/sys/class/thermal/cooling_device12/cur_state:1
/sys/class/thermal/cooling_device12/max_state:10
/sys/class/thermal/cooling_device12/type:Processor
/sys/class/thermal/cooling_device1/cur_state:1
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:1
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:1
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:1
/sys/class/thermal/cooling_device4/max_state:1
/sys/class/thermal/cooling_device4/type:Fan
/sys/class/thermal/cooling_device5/cur_state:1
/sys/class/thermal/cooling_device5/max_state:1
/sys/class/thermal/cooling_device5/type:Fan
/sys/class/thermal/cooling_device6/cur_state:1
/sys/class/thermal/cooling_device6/max_state:1
/sys/class/thermal/cooling_device6/type:Fan
/sys/class/thermal/cooling_device7/cur_state:1
/sys/class/thermal/cooling_device7/max_state:1
/sys/class/thermal/cooling_device7/type:Fan
/sys/class/thermal/cooling_device8/cur_state:1
/sys/class/thermal/cooling_device8/max_state:1
/sys/class/thermal/cooling_device8/type:Fan
/sys/class/thermal/cooling_device9/cur_state:1
/sys/class/thermal/cooling_device9/max_state:1
/sys/class/thermal/cooling_device9/type:Fan

Playing with the parameters independently:

echo 1 > /sys/class/thermal/cooling_device0/cur_state: fan goes slowly from [STEP_0] to [STEP_4].
echo 0 > /sys/class/thermal/cooling_device0/cur_state: fan goes slowly back to [STEP_0].

echo 1 > /sys/class/thermal/cooling_device1/cur_state: fan goes slowly from [STEP_0] to [STEP_3].
echo 0 > /sys/class/thermal/cooling_device1/cur_state: fan goes slowly back to [STEP_0].

echo 1 > /sys/class/thermal/cooling_device2/cur_state: fan goes slowly from [STEP_0] to [STEP_2].
echo 0 > /sys/class/thermal/cooling_device2/cur_state: fan goes slowly back to [STEP_0] .

echo 1 > /sys/class/thermal/cooling_device3/cur_state: fan goes slowly from [STEP_0] to [STEP_1].
echo 0 > /sys/class/thermal/cooling_device3/cur_state: fan goes slowly back to [STEP_0].

echo 1 > /sys/class/thermal/cooling_device4/cur_state: no hearable change
echo 0 > /sys/class/thermal/cooling_device4/cur_state: no hearable change

echo 1 > /sys/class/thermal/cooling_device5/cur_state: fan goes slowly from [STEP_0] to [STEP_4].
echo 0 > /sys/class/thermal/cooling_device5/cur_state: fan goes slowly back to [STEP_0].

echo 1 > /sys/class/thermal/cooling_device6/cur_state: fan goes slowly from [STEP_0] to [STEP_3].
echo 0 > /sys/class/thermal/cooling_device6/cur_state: fan goes slowly back to [STEP_0].

echo 1 > /sys/class/thermal/cooling_device7/cur_state: fan goes slowly from [STEP_0] to [STEP_2].
echo 0 > /sys/class/thermal/cooling_device7/cur_state: fan goes slowly back to [STEP_0].

echo 1 > /sys/class/thermal/cooling_device8/cur_state: fan goes slowly from [STEP_0] to [STEP_1].
echo 0 > /sys/class/thermal/cooling_device8/cur_state: fan goes slowly back to [STEP_0].

echo 1 > /sys/class/thermal/cooling_device9/cur_state: no hearable change
echo 0 > /sys/class/thermal/cooling_device9/cur_state: no hearable change

echo 1 > /sys/class/thermal/cooling_device10/cur_state: no hearable change
echo 0 > /sys/class/thermal/cooling_device10/cur_state: no hearable change
Comment 12 Zhang Rui 2012-11-26 02:31:57 UTC
thanks for your test.

comment #11 shows that it is the ACPI FAN from which you hear the fan spinning,
but comment #9 shows that the ACPI FAN state is off when you hear the fan spinning, is this correct?

please attach the acpidump output of this box.
Comment 13 Matthias 2012-11-26 08:35:54 UTC
Created attachment 87271 [details]
acpidump of hp nw9440
Comment 14 Matthias 2012-11-26 08:43:53 UTC
I suppose your statement is correct but to be absolutely sure I have to wait for the bug to happen again and take a look. So far I could observe that sometimes cooling_device4 and cooling_device9 are active (ON) when the bug happens. But this is not always the case. This seems to be a tricky one. 
I very much appreciate what you are doing. So thank you very much. 

Attached the output of acpidump. If you need something else, please let me know.
Comment 15 Matthias 2012-12-09 18:42:22 UTC
Today the bug happened again. There was no fan active but the fan was spinning full speed.
Comment 16 Matthias 2012-12-12 15:08:34 UTC
Bug is also in version 3.6.10. I will test 3.7.0 as soon as possible.
Comment 17 Matthias 2012-12-20 15:19:26 UTC
Now I am running this kernel: Linux LAPPI 3.7.1 #1 SMP PREEMPT Wed Dec 19 12:51:23 CET 2012 x86_64 Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz GenuineIntel GNU/Linux

The bug is also in this version. But now I get full fan speed when I resume from suspend2ram. Output from grep . /sys/class/thermal/*/* is provided below for the bug and after suspend2ram. The fan is also spinning at full speed after suspend2ram when I use the nouveau driver for my nvidia card.

after suspend:
/sys/class/thermal/cooling_device0/cur_state:1
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device10/cur_state:1
/sys/class/thermal/cooling_device10/max_state:1
/sys/class/thermal/cooling_device10/type:Fan
/sys/class/thermal/cooling_device11/cur_state:0
/sys/class/thermal/cooling_device11/max_state:10
/sys/class/thermal/cooling_device11/type:Processor
/sys/class/thermal/cooling_device12/cur_state:0
/sys/class/thermal/cooling_device12/max_state:10
/sys/class/thermal/cooling_device12/type:Processor
/sys/class/thermal/cooling_device13/cur_state:0
/sys/class/thermal/cooling_device13/max_state:10
/sys/class/thermal/cooling_device13/type:LCD
/sys/class/thermal/cooling_device1/cur_state:1
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:1
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:0
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:0
/sys/class/thermal/cooling_device4/max_state:1
/sys/class/thermal/cooling_device4/type:Fan
/sys/class/thermal/cooling_device5/cur_state:1
/sys/class/thermal/cooling_device5/max_state:1
/sys/class/thermal/cooling_device5/type:Fan
/sys/class/thermal/cooling_device6/cur_state:1
/sys/class/thermal/cooling_device6/max_state:1
/sys/class/thermal/cooling_device6/type:Fan
/sys/class/thermal/cooling_device7/cur_state:1
/sys/class/thermal/cooling_device7/max_state:1
/sys/class/thermal/cooling_device7/type:Fan
/sys/class/thermal/cooling_device8/cur_state:1
/sys/class/thermal/cooling_device8/max_state:1
/sys/class/thermal/cooling_device8/type:Fan
/sys/class/thermal/cooling_device9/cur_state:1
/sys/class/thermal/cooling_device9/max_state:1
/sys/class/thermal/cooling_device9/type:Fan
/sys/class/thermal/thermal_zone0/cdev0_trip_point:5
/sys/class/thermal/thermal_zone0/cdev1_trip_point:4
/sys/class/thermal/thermal_zone0/cdev2_trip_point:3
/sys/class/thermal/thermal_zone0/cdev3_trip_point:2
/sys/class/thermal/thermal_zone0/cdev4_trip_point:1
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/passive:0
/sys/class/thermal/thermal_zone0/temp:29000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:256000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:91000
/sys/class/thermal/thermal_zone0/trip_point_1_type:active
/sys/class/thermal/thermal_zone0/trip_point_2_temp:85000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:79000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:68000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:58000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/cdev0_trip_point:1
/sys/class/thermal/thermal_zone1/cdev1_trip_point:1
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/temp:34000
/sys/class/thermal/thermal_zone1/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/trip_point_1_temp:97000
/sys/class/thermal/thermal_zone1/trip_point_1_type:passive
/sys/class/thermal/thermal_zone1/type:acpitz
/sys/class/thermal/thermal_zone2/cdev0_trip_point:1
/sys/class/thermal/thermal_zone2/cdev1_trip_point:6
/sys/class/thermal/thermal_zone2/cdev2_trip_point:5
/sys/class/thermal/thermal_zone2/cdev3_trip_point:4
/sys/class/thermal/thermal_zone2/cdev4_trip_point:3
/sys/class/thermal/thermal_zone2/cdev5_trip_point:2
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/passive:0
/sys/class/thermal/thermal_zone2/temp:32000
/sys/class/thermal/thermal_zone2/trip_point_0_temp:126000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone2/trip_point_1_type:active
/sys/class/thermal/thermal_zone2/trip_point_2_temp:86000
/sys/class/thermal/thermal_zone2/trip_point_2_type:active
/sys/class/thermal/thermal_zone2/trip_point_3_temp:74000
/sys/class/thermal/thermal_zone2/trip_point_3_type:active
/sys/class/thermal/thermal_zone2/trip_point_4_temp:67000
/sys/class/thermal/thermal_zone2/trip_point_4_type:active
/sys/class/thermal/thermal_zone2/trip_point_5_temp:60000
/sys/class/thermal/thermal_zone2/trip_point_5_type:active
/sys/class/thermal/thermal_zone2/trip_point_6_temp:55000
/sys/class/thermal/thermal_zone2/trip_point_6_type:active
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone3/cdev0_trip_point:1
/sys/class/thermal/thermal_zone3/cdev1_trip_point:1
/sys/class/thermal/thermal_zone3/mode:enabled
/sys/class/thermal/thermal_zone3/temp:25000
/sys/class/thermal/thermal_zone3/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone3/trip_point_0_type:critical
/sys/class/thermal/thermal_zone3/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone3/trip_point_1_type:passive
/sys/class/thermal/thermal_zone3/type:acpitz
/sys/class/thermal/thermal_zone4/cdev0_trip_point:1
/sys/class/thermal/thermal_zone4/cdev1_trip_point:1
/sys/class/thermal/thermal_zone4/mode:enabled
/sys/class/thermal/thermal_zone4/temp:26100
/sys/class/thermal/thermal_zone4/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone4/trip_point_0_type:critical
/sys/class/thermal/thermal_zone4/trip_point_1_temp:60000
/sys/class/thermal/thermal_zone4/trip_point_1_type:passive
/sys/class/thermal/thermal_zone4/type:acpitz
/sys/class/thermal/thermal_zone5/mode:enabled
/sys/class/thermal/thermal_zone5/passive:0
/sys/class/thermal/thermal_zone5/temp:100000
/sys/class/thermal/thermal_zone5/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone5/trip_point_0_type:critical
/sys/class/thermal/thermal_zone5/type:acpitz 

bug:
grep . /sys/class/thermal/*/*
/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device10/cur_state:0
/sys/class/thermal/cooling_device10/max_state:1
/sys/class/thermal/cooling_device10/type:Fan
/sys/class/thermal/cooling_device11/cur_state:0
/sys/class/thermal/cooling_device11/max_state:10
/sys/class/thermal/cooling_device11/type:Processor
/sys/class/thermal/cooling_device12/cur_state:0
/sys/class/thermal/cooling_device12/max_state:10
/sys/class/thermal/cooling_device12/type:Processor
/sys/class/thermal/cooling_device13/cur_state:0
/sys/class/thermal/cooling_device13/max_state:10
/sys/class/thermal/cooling_device13/type:LCD
/sys/class/thermal/cooling_device1/cur_state:0
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:0
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:0
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:0
/sys/class/thermal/cooling_device4/max_state:1
/sys/class/thermal/cooling_device4/type:Fan
/sys/class/thermal/cooling_device5/cur_state:0
/sys/class/thermal/cooling_device5/max_state:1
/sys/class/thermal/cooling_device5/type:Fan
/sys/class/thermal/cooling_device6/cur_state:0
/sys/class/thermal/cooling_device6/max_state:1
/sys/class/thermal/cooling_device6/type:Fan
/sys/class/thermal/cooling_device7/cur_state:0
/sys/class/thermal/cooling_device7/max_state:1
/sys/class/thermal/cooling_device7/type:Fan
/sys/class/thermal/cooling_device8/cur_state:0
/sys/class/thermal/cooling_device8/max_state:1
/sys/class/thermal/cooling_device8/type:Fan
/sys/class/thermal/cooling_device9/cur_state:0
/sys/class/thermal/cooling_device9/max_state:1
/sys/class/thermal/cooling_device9/type:Fan
/sys/class/thermal/thermal_zone0/cdev0_trip_point:5
/sys/class/thermal/thermal_zone0/cdev1_trip_point:4
/sys/class/thermal/thermal_zone0/cdev2_trip_point:3
/sys/class/thermal/thermal_zone0/cdev3_trip_point:2
/sys/class/thermal/thermal_zone0/cdev4_trip_point:1
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/passive:0
/sys/class/thermal/thermal_zone0/temp:48000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:256000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:91000
/sys/class/thermal/thermal_zone0/trip_point_1_type:active
/sys/class/thermal/thermal_zone0/trip_point_2_temp:85000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:79000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:68000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:58000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/cdev0_trip_point:1
/sys/class/thermal/thermal_zone1/cdev1_trip_point:1
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/temp:40000
/sys/class/thermal/thermal_zone1/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/trip_point_1_temp:97000
/sys/class/thermal/thermal_zone1/trip_point_1_type:passive
/sys/class/thermal/thermal_zone1/type:acpitz
/sys/class/thermal/thermal_zone2/cdev0_trip_point:1
/sys/class/thermal/thermal_zone2/cdev1_trip_point:6
/sys/class/thermal/thermal_zone2/cdev2_trip_point:5
/sys/class/thermal/thermal_zone2/cdev3_trip_point:4
/sys/class/thermal/thermal_zone2/cdev4_trip_point:3
/sys/class/thermal/thermal_zone2/cdev5_trip_point:2
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/passive:0
/sys/class/thermal/thermal_zone2/temp:37000
/sys/class/thermal/thermal_zone2/trip_point_0_temp:126000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone2/trip_point_1_type:active
/sys/class/thermal/thermal_zone2/trip_point_2_temp:86000
/sys/class/thermal/thermal_zone2/trip_point_2_type:active
/sys/class/thermal/thermal_zone2/trip_point_3_temp:74000
/sys/class/thermal/thermal_zone2/trip_point_3_type:active
/sys/class/thermal/thermal_zone2/trip_point_4_temp:67000
/sys/class/thermal/thermal_zone2/trip_point_4_type:active
/sys/class/thermal/thermal_zone2/trip_point_5_temp:60000
/sys/class/thermal/thermal_zone2/trip_point_5_type:active
/sys/class/thermal/thermal_zone2/trip_point_6_temp:55000
/sys/class/thermal/thermal_zone2/trip_point_6_type:active
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone3/cdev0_trip_point:1
/sys/class/thermal/thermal_zone3/cdev1_trip_point:1
/sys/class/thermal/thermal_zone3/mode:enabled
/sys/class/thermal/thermal_zone3/temp:33000
/sys/class/thermal/thermal_zone3/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone3/trip_point_0_type:critical
/sys/class/thermal/thermal_zone3/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone3/trip_point_1_type:passive
/sys/class/thermal/thermal_zone3/type:acpitz
/sys/class/thermal/thermal_zone4/cdev0_trip_point:1
/sys/class/thermal/thermal_zone4/cdev1_trip_point:1
/sys/class/thermal/thermal_zone4/mode:enabled
/sys/class/thermal/thermal_zone4/temp:29500
/sys/class/thermal/thermal_zone4/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone4/trip_point_0_type:critical
/sys/class/thermal/thermal_zone4/trip_point_1_temp:60000
/sys/class/thermal/thermal_zone4/trip_point_1_type:passive
/sys/class/thermal/thermal_zone4/type:acpitz
/sys/class/thermal/thermal_zone5/mode:enabled
/sys/class/thermal/thermal_zone5/passive:0
/sys/class/thermal/thermal_zone5/temp:20000
/sys/class/thermal/thermal_zone5/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone5/trip_point_0_type:critical
/sys/class/thermal/thermal_zone5/type:acpitz
Comment 18 Matthias 2013-01-03 08:22:36 UTC
Since kernel version 3.7 the nouveau driver is working properly on my laptop. Today the bug occurred while using the nouveau driver. Now I have a non tainted kernel. So I am hoping that this helps in finding the cause of the bug. 
Please tell me what I can do to solve this.
Comment 19 Matthias 2013-01-25 14:33:34 UTC
The bug is also in kernel version 3.7.4. According to acpi -V the states of the cooling devices are not changing when the bug happens. The states just stay the same as before the bug.
Comment 20 Matthias 2013-02-07 13:23:31 UTC
3.7.6 is also affected.
Comment 21 Matthias 2013-02-20 11:59:21 UTC
3.8.0 is also affected.
Comment 22 Zhang Rui 2013-03-05 00:47:29 UTC
I think the problem should be fixed by this commit

commit b8bb6cb999858043489c1ddef08eed2127559169
Author: Zhang Rui <rui.zhang@intel.com>
Date:   Thu Nov 22 15:45:02 2012 +0800

    step_wise: Unify the code for both throttle and dethrottle

    Signed-off-by: Zhang Rui <rui.zhang@intel.com>

so please check if the problm still exists in 3.9-rc1.
Comment 23 Matthias 2013-03-05 12:33:08 UTC
The laptop is running with that kernel now. I'll let you know what happens.
Comment 24 Matthias 2013-03-08 13:52:25 UTC
Good news, so far the bug is gone. But I'd like to test this kernel a little bit longer to be absolutely sure.  
Bad news, the dethrotteling does not work. The fan stays on the highest speed it reached and stays there. 
What I did was the following: Put load on the machine with cat /dev/zero > /dev/null on both cpu cores. Wait for the fan to spin up. Then I terminated the two cat processes when the fan speed hit [STEP_3] (last but one). I watched with htop the cpu utilization go down and waited 15 minutes for the fan to slow down. But it did not.

grep . /sys/class/thermal/*/* at that time:

/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device10/cur_state:0
/sys/class/thermal/cooling_device10/max_state:1
/sys/class/thermal/cooling_device10/type:Fan
/sys/class/thermal/cooling_device11/cur_state:1
/sys/class/thermal/cooling_device11/max_state:10
/sys/class/thermal/cooling_device11/type:LCD
/sys/class/thermal/cooling_device12/cur_state:0
/sys/class/thermal/cooling_device12/max_state:10
/sys/class/thermal/cooling_device12/type:Processor
/sys/class/thermal/cooling_device13/cur_state:0
/sys/class/thermal/cooling_device13/max_state:10
/sys/class/thermal/cooling_device13/type:Processor
/sys/class/thermal/cooling_device1/cur_state:0
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:1
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:1
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:1
/sys/class/thermal/cooling_device4/max_state:1
/sys/class/thermal/cooling_device4/type:Fan
/sys/class/thermal/cooling_device5/cur_state:0
/sys/class/thermal/cooling_device5/max_state:1
/sys/class/thermal/cooling_device5/type:Fan
/sys/class/thermal/cooling_device6/cur_state:0
/sys/class/thermal/cooling_device6/max_state:1
/sys/class/thermal/cooling_device6/type:Fan
/sys/class/thermal/cooling_device7/cur_state:0
/sys/class/thermal/cooling_device7/max_state:1
/sys/class/thermal/cooling_device7/type:Fan
/sys/class/thermal/cooling_device8/cur_state:0
/sys/class/thermal/cooling_device8/max_state:1
/sys/class/thermal/cooling_device8/type:Fan
/sys/class/thermal/cooling_device9/cur_state:1
/sys/class/thermal/cooling_device9/max_state:1
/sys/class/thermal/cooling_device9/type:Fan
/sys/class/thermal/thermal_zone0/cdev0_trip_point:5
/sys/class/thermal/thermal_zone0/cdev1_trip_point:4
/sys/class/thermal/thermal_zone0/cdev2_trip_point:3
/sys/class/thermal/thermal_zone0/cdev3_trip_point:2
/sys/class/thermal/thermal_zone0/cdev4_trip_point:1
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/passive:0
/sys/class/thermal/thermal_zone0/policy:step_wise
/sys/class/thermal/thermal_zone0/temp:48000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:256000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:91000
/sys/class/thermal/thermal_zone0/trip_point_1_type:active
/sys/class/thermal/thermal_zone0/trip_point_2_temp:85000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:79000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:68000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:58000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/cdev0_trip_point:1
/sys/class/thermal/thermal_zone1/cdev1_trip_point:1
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/policy:step_wise
/sys/class/thermal/thermal_zone1/temp:39000
/sys/class/thermal/thermal_zone1/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/trip_point_1_temp:97000
/sys/class/thermal/thermal_zone1/trip_point_1_type:passive
/sys/class/thermal/thermal_zone1/type:acpitz
/sys/class/thermal/thermal_zone2/cdev0_trip_point:1
/sys/class/thermal/thermal_zone2/cdev1_trip_point:6
/sys/class/thermal/thermal_zone2/cdev2_trip_point:5
/sys/class/thermal/thermal_zone2/cdev3_trip_point:4
/sys/class/thermal/thermal_zone2/cdev4_trip_point:3
/sys/class/thermal/thermal_zone2/cdev5_trip_point:2
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/passive:0
/sys/class/thermal/thermal_zone2/policy:step_wise
/sys/class/thermal/thermal_zone2/temp:45000
/sys/class/thermal/thermal_zone2/trip_point_0_temp:126000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone2/trip_point_1_type:active
/sys/class/thermal/thermal_zone2/trip_point_2_temp:86000
/sys/class/thermal/thermal_zone2/trip_point_2_type:active
/sys/class/thermal/thermal_zone2/trip_point_3_temp:74000
/sys/class/thermal/thermal_zone2/trip_point_3_type:active
/sys/class/thermal/thermal_zone2/trip_point_4_temp:67000
/sys/class/thermal/thermal_zone2/trip_point_4_type:active
/sys/class/thermal/thermal_zone2/trip_point_5_temp:60000
/sys/class/thermal/thermal_zone2/trip_point_5_type:active
/sys/class/thermal/thermal_zone2/trip_point_6_temp:42000
/sys/class/thermal/thermal_zone2/trip_point_6_type:active
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone3/cdev0_trip_point:1
/sys/class/thermal/thermal_zone3/cdev1_trip_point:1
/sys/class/thermal/thermal_zone3/mode:enabled
/sys/class/thermal/thermal_zone3/policy:step_wise
/sys/class/thermal/thermal_zone3/temp:37000
/sys/class/thermal/thermal_zone3/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone3/trip_point_0_type:critical
/sys/class/thermal/thermal_zone3/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone3/trip_point_1_type:passive
/sys/class/thermal/thermal_zone3/type:acpitz
/sys/class/thermal/thermal_zone4/cdev0_trip_point:1
/sys/class/thermal/thermal_zone4/cdev1_trip_point:1
/sys/class/thermal/thermal_zone4/mode:enabled
/sys/class/thermal/thermal_zone4/policy:step_wise
/sys/class/thermal/thermal_zone4/temp:38100
/sys/class/thermal/thermal_zone4/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone4/trip_point_0_type:critical
/sys/class/thermal/thermal_zone4/trip_point_1_temp:60000
/sys/class/thermal/thermal_zone4/trip_point_1_type:passive
/sys/class/thermal/thermal_zone4/type:acpitz
/sys/class/thermal/thermal_zone5/mode:enabled
/sys/class/thermal/thermal_zone5/passive:0
/sys/class/thermal/thermal_zone5/policy:step_wise
/sys/class/thermal/thermal_zone5/temp:60000
/sys/class/thermal/thermal_zone5/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone5/trip_point_0_type:critical
/sys/class/thermal/thermal_zone5/type:acpitz

Expected result: fan should spin down to [STEP_0].

Let me know if you need anything else.
Comment 25 Matthias 2013-03-08 20:35:10 UTC
I was to fast. I am sorry. The bug occurred again. 
grep . /sys/class/thermal/*/* at that time:

/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device10/cur_state:0
/sys/class/thermal/cooling_device10/max_state:1
/sys/class/thermal/cooling_device10/type:Fan
/sys/class/thermal/cooling_device11/cur_state:1
/sys/class/thermal/cooling_device11/max_state:10
/sys/class/thermal/cooling_device11/type:LCD
/sys/class/thermal/cooling_device12/cur_state:0
/sys/class/thermal/cooling_device12/max_state:10
/sys/class/thermal/cooling_device12/type:Processor
/sys/class/thermal/cooling_device13/cur_state:0
/sys/class/thermal/cooling_device13/max_state:10
/sys/class/thermal/cooling_device13/type:Processor
/sys/class/thermal/cooling_device1/cur_state:0
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:0
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:0
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:1
/sys/class/thermal/cooling_device4/max_state:1
/sys/class/thermal/cooling_device4/type:Fan
/sys/class/thermal/cooling_device5/cur_state:0
/sys/class/thermal/cooling_device5/max_state:1
/sys/class/thermal/cooling_device5/type:Fan
/sys/class/thermal/cooling_device6/cur_state:0
/sys/class/thermal/cooling_device6/max_state:1
/sys/class/thermal/cooling_device6/type:Fan
/sys/class/thermal/cooling_device7/cur_state:0
/sys/class/thermal/cooling_device7/max_state:1
/sys/class/thermal/cooling_device7/type:Fan
/sys/class/thermal/cooling_device8/cur_state:0
/sys/class/thermal/cooling_device8/max_state:1
/sys/class/thermal/cooling_device8/type:Fan
/sys/class/thermal/cooling_device9/cur_state:0
/sys/class/thermal/cooling_device9/max_state:1
/sys/class/thermal/cooling_device9/type:Fan
/sys/class/thermal/thermal_zone0/cdev0_trip_point:5
/sys/class/thermal/thermal_zone0/cdev1_trip_point:4
/sys/class/thermal/thermal_zone0/cdev2_trip_point:3
/sys/class/thermal/thermal_zone0/cdev3_trip_point:2
/sys/class/thermal/thermal_zone0/cdev4_trip_point:1
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/passive:0
/sys/class/thermal/thermal_zone0/policy:step_wise
/sys/class/thermal/thermal_zone0/temp:48000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:256000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:91000
/sys/class/thermal/thermal_zone0/trip_point_1_type:active
/sys/class/thermal/thermal_zone0/trip_point_2_temp:85000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:79000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:68000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:58000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/cdev0_trip_point:1
/sys/class/thermal/thermal_zone1/cdev1_trip_point:1
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/policy:step_wise
/sys/class/thermal/thermal_zone1/temp:49000
/sys/class/thermal/thermal_zone1/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/trip_point_1_temp:97000
/sys/class/thermal/thermal_zone1/trip_point_1_type:passive
/sys/class/thermal/thermal_zone1/type:acpitz
/sys/class/thermal/thermal_zone2/cdev0_trip_point:1
/sys/class/thermal/thermal_zone2/cdev1_trip_point:6
/sys/class/thermal/thermal_zone2/cdev2_trip_point:5
/sys/class/thermal/thermal_zone2/cdev3_trip_point:4
/sys/class/thermal/thermal_zone2/cdev4_trip_point:3
/sys/class/thermal/thermal_zone2/cdev5_trip_point:2
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/passive:0
/sys/class/thermal/thermal_zone2/policy:step_wise
/sys/class/thermal/thermal_zone2/temp:48000
/sys/class/thermal/thermal_zone2/trip_point_0_temp:126000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone2/trip_point_1_type:active
/sys/class/thermal/thermal_zone2/trip_point_2_temp:86000
/sys/class/thermal/thermal_zone2/trip_point_2_type:active
/sys/class/thermal/thermal_zone2/trip_point_3_temp:74000
/sys/class/thermal/thermal_zone2/trip_point_3_type:active
/sys/class/thermal/thermal_zone2/trip_point_4_temp:67000
/sys/class/thermal/thermal_zone2/trip_point_4_type:active
/sys/class/thermal/thermal_zone2/trip_point_5_temp:60000
/sys/class/thermal/thermal_zone2/trip_point_5_type:active
/sys/class/thermal/thermal_zone2/trip_point_6_temp:42000
/sys/class/thermal/thermal_zone2/trip_point_6_type:active
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone3/cdev0_trip_point:1
/sys/class/thermal/thermal_zone3/cdev1_trip_point:1
/sys/class/thermal/thermal_zone3/mode:enabled
/sys/class/thermal/thermal_zone3/policy:step_wise
/sys/class/thermal/thermal_zone3/temp:47000
/sys/class/thermal/thermal_zone3/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone3/trip_point_0_type:critical
/sys/class/thermal/thermal_zone3/trip_point_1_temp:95000
/sys/class/thermal/thermal_zone3/trip_point_1_type:passive
/sys/class/thermal/thermal_zone3/type:acpitz
/sys/class/thermal/thermal_zone4/cdev0_trip_point:1
/sys/class/thermal/thermal_zone4/cdev1_trip_point:1
/sys/class/thermal/thermal_zone4/mode:enabled
/sys/class/thermal/thermal_zone4/policy:step_wise
/sys/class/thermal/thermal_zone4/temp:39800
/sys/class/thermal/thermal_zone4/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone4/trip_point_0_type:critical
/sys/class/thermal/thermal_zone4/trip_point_1_temp:60000
/sys/class/thermal/thermal_zone4/trip_point_1_type:passive
/sys/class/thermal/thermal_zone4/type:acpitz
/sys/class/thermal/thermal_zone5/mode:enabled
/sys/class/thermal/thermal_zone5/passive:0
/sys/class/thermal/thermal_zone5/policy:step_wise
/sys/class/thermal/thermal_zone5/temp:25000
/sys/class/thermal/thermal_zone5/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone5/trip_point_0_type:critical
/sys/class/thermal/thermal_zone5/type:acpitz
Comment 26 Matthias 2013-03-18 18:05:03 UTC
linux-3.9-rc3 is also affected.
Comment 27 Matthias 2013-04-13 13:18:09 UTC
In linux-3.8.6 the bug is also present.
I tested linux-3.9-rc6 and the bug shows up. Furthermore with the step_wise governor the fan does not dethrottle. I decided to test the fair_share governor. This governor stays at the fan speed selected by the bios after POST. I put heavy load on the machine and the temperatures rose quickly. The fan stayed at the selected speed. Not to damage my hardware I manually activated all cooling devices before the fan would normally speed up to full speed. Then I saw something rather odd. Thermal_Zone5 showed a 100°C while the fan was spinning full speed and the machine was idle. All other thermal zones reported temperatures far below the normal temperatures for the idle machine (must be since machine is idle and fan is spinning full speed). Then I deactivated all cooling devices (echo 0 > ...) and the reported temperature of thermal_zone5 dropped to 20°C instantly. So I played around a little bit. Turns out that when I activate all cooling devices (echo 1 > ...) the temperature of this cooling zone jump to 100°C and when I deactivate all cooling devices (echo 0 > ...) the reported temperature drops back to room temperature. 
In my understanding this should not happen. Can it be that this confuses the kernel and causes my fan problem?
Comment 28 Matthias 2013-04-13 13:20:04 UTC
Oh, I forgot to mention that I switched back to the step_wise governor and did the testing (echo 1 > ... and echo 0 > ...). Sorry!
Comment 29 Zhang Rui 2013-04-13 17:29:18 UTC
Okay. Now we have two bugs in this bug reports.
1. the original bug report that the fan runs at full speed but ACPI fan shows it is OFF. And this bug can only be reproduced with the nvidia binary blob enabled.
So there are two things that are controlling the fan, ACPI and nvidia.
For this problem, we will not continue to debug anymore because we can not help on problems that may caused by a binary driver.
2. dethrottle issue in 3.8 and 3.9-rc6. For this problem, please file a new bug report against Power Management/Thermal category, and I'll look at the problem there. I'm a little confused by comments #27.
So please specify what test you ran with which governors, and what result you got in that bug report. 

Bug closed.
Comment 30 Matthias 2013-04-14 10:09:32 UTC
To 1.: I am running a non tainted kernel now. So there is no nvidia binary blob in this game anymore. I switched to nouveau to help debug this thing. See comment #18. And it is the same problem. The fan runs full speed but ACPI fan shows it is OFF. 

As for comments #27 I am sorry I was not clear. I am running linux-3.9-rc6 with step_wise governor.

I did the following to get the fan running at full speed: 
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:00/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:01/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:02/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:03/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:04/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:05/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:06/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:07/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:08/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:09/thermal_cooling/cur_state
echo 1 > /sys/bus/acpi/drivers/fan/PNP0C0B\:0a/thermal_cooling/cur_state

As a result of this the shown temperature of thermal_zone5 jumps up to 100°C. This can't be since there is no component giving of so much heat. 

When I do the following to dethrottle the fan:

echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:00/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:01/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:02/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:03/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:04/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:05/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:06/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:07/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:08/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:09/thermal_cooling/cur_state
echo 0 > /sys/bus/acpi/drivers/fan/PNP0C0B\:0a/thermal_cooling/cur_state

the reported temperature of thermal_zone5 goes down to room temperature. 

I am a little confused by this behavior and I don't know if it is related to the problem that the fan is spinning full speed and ACPI is showing it as OFF. I am only reporting what I am seeing in good hope to give any hints to solve the ACPI problem. 

I tested linux-3.9-rc6 with fair_share governor. With this configuration the fan stays always at the speed selected by the bios after POST. Not to damage my hardware I activated the fan manually and this is when I saw the strange behavior of thermal_zone5. 

The problem with the dethrottling is only present in linux-3.9. linux-3.8 is not affected by this.

I file another bug report for the dethrottling problem.

But how do we move on from now with the ACPI fan problem?
Comment 31 Zhang Rui 2013-04-15 00:45:44 UTC
(In reply to comment #30)
> To 1.: I am running a non tainted kernel now. So there is no nvidia binary blob
> in this game anymore. I switched to nouveau to help debug this thing. See
> comment #18. And it is the same problem. The fan runs full speed but ACPI fan
> shows it is OFF. 
> 
bug reopened
Comment 32 Zhang Rui 2013-04-15 00:51:08 UTC
please attach the output of
grep . /sys/class/thermal/*/device/path
ll /sys/class/thermal/t*/c*

please spin on the fan one by one and check which cooling device makes the temperature bogus.
Comment 33 Zhang Rui 2013-04-15 01:08:50 UTC
oh wait,
we are now starts to debug the bogus temperature problem in this bug report...
let's move this topic in bug #56601.

for the problem that fan is running when system is idle, the temperature is not bogus at all, right?

as you can not get accurate thermal information in the latest linux kernel, let's work together to fix bug #56601 first and then go back to this one, does this sound okay for you?
Comment 34 Matthias 2013-04-15 14:57:01 UTC
(In reply to comment #33)
> oh wait,
> we are now starts to debug the bogus temperature problem in this bug report...
> let's move this topic in bug #56601.

Sounds good to me.

> for the problem that fan is running when system is idle, the temperature is not
> bogus at all, right?

Yes it is. 

> as you can not get accurate thermal information in the latest linux kernel,
> let's work together to fix bug #56601 first and then go back to this one, does
> this sound okay for you?

Lets do that. I am happy with any help I can get.
Comment 35 Zhang Rui 2013-04-16 15:14:00 UTC
as you stated before, there are four step of fan speed you can hear, right?
please try to reproduce the problem with the fan dethrottle problem fixed and tell me which fan speed you're hearing when the fan cooling device shows cooling state 0.
Comment 36 Matthias 2013-04-17 16:09:10 UTC
There are five different fan speeds I can hear. When the fan cooling device shows cooling state 0 there are two different speeds I can hear. Fan is spinning at lowest speed or at highest speed.
Comment 37 Matthias 2013-04-18 11:33:27 UTC
Just now it happened with linux-3.9-rc7 and the patch from bug 56601.
Comment 38 Matthias 2013-04-18 11:35:00 UTC
Created attachment 99201 [details]
Measurements for linux-3.9-rc7 at full speed of fan for comment 37
Comment 39 Zhang Rui 2013-04-22 05:20:16 UTC
please attach the output of
grep . /sys/class/thermal/thermal_zone*/cdev*/device/*
and
grep . /sys/class/thermal/thermal_zone*/cdev*/*
at the same time when the problem is reproduced again.
Comment 40 Zhang Rui 2013-04-22 05:21:33 UTC
I need to check if it is because ACPI reports wrong cooling device state first.
If it is not, this seems to be a tough bug because some other unknown stuff changes the fan speed behind ACPI.
Comment 41 Zhang Rui 2013-04-22 05:53:26 UTC
I checked your BIOS in detail, here is what I get,

Fan  POWER PR._STA  PR._PN/_OFF
C33F C334  C327     C328(on, 0x00, 0x00)
C340 C335  C327     C328(on, 0x00, 0x01)
C341 C336  ..       C328(on, 0x00, 0x02)
C342 C337  ..       C328(on, 0x00, 0x03)
C343 C338  ..       C328(on, 0x00, 0x04)
C344 C339  ..       C328(on, 0x01, 0x00)
C345 C33A  ..       C328(on, 0x01, 0x01)
C346 C33B  ..       C328(on, 0x01, 0x02)
C347 C33C  ..       C328(on, 0x01, 0x03)
C348 C33D  ..       C328(on, 0x01, 0x04)
C349 C33E C326      will list below

it seems that fan C349 is quite different from the others.
here is the ASL code for Power resource C33E
        PowerResource (C33E, 0x00, 0x0000)
        {
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                Return (C326)
            }

            Method (_ON, 0, NotSerialized)  // _ON_: Power On
            {
                If (LAnd (LEqual (C326, 0x00), LEqual (C142, 0x00)))
                {
                    If (LGreaterEqual (C325, \_TZ.TZ2._AC0 ()))
                    {
                        \_SB.C149 (0xEA74, 0x03, 0x01, 0x00, 0x00)
                        Store (0x01, C142)
                        If (LEqual (\_SB.C003.C085.C130.C134 (), 0x10DE))
                        {
                            If (LGreaterEqual (\C009 (), 0x06))
                            {
                                Store (0x01, \_SB.C003.C085.C130.C139)
                                Notify (\_SB.C003.C085.C130, 0xCA)
                            }
                        }
                    }
                }

                Store (0x01, C326)
            }

            Method (_OFF, 0, NotSerialized)  // _OFF: Power Off
            {
                If (LAnd (C326, C142))
                {
                    If (LLess (C325, \_TZ.TZ2._AC0 ()))
                    {
                        \_SB.C149 (0xEA74, 0x03, 0x00, 0x00, 0x00)
                        Store (0x00, C142)
                        If (LEqual (\_SB.C003.C085.C130.C134 (), 0x10DE))
                        {
                            If (LGreaterEqual (\C009 (), 0x06))
                            {
                                Store (0x01, \_SB.C003.C085.C130.C139)
                                Notify (\_SB.C003.C085.C130, 0xCA)
                            }
                        }
                    }
                }

                Store (0x00, C326)
            }
        }
we can see that
1. the status of this power resource is a variable, it must be ON after evaluating _ON method and must be OFF after evaluating _OFF method.
2. there is a Notify (\_SB.C003.C085.C130, 0xCA) in _ON/_OFF methods, and C130 is nvidia vga controller. so this does have something to do with graphics.
Comment 42 Zhang Rui 2013-04-22 05:54:13 UTC
please boot with acpi_osi="!Windows 2006" and see if the problem still exist.
Comment 43 Matthias 2013-04-22 06:05:15 UTC
I'll test that. 

(In reply to comment #40)
> I need to check if it is because ACPI reports wrong cooling device state first.
> If it is not, this seems to be a tough bug because some other unknown stuff
> changes the fan speed behind ACPI.

Normally the fastest fan speed is not reached on this machine not even in hot summers and on full load. It seems as if it is an emergency system for not damaging the hardware if it gets really really hot. Is there a piece of the kernel which could trigger this "emergency" system?
Comment 44 Zhang Rui 2013-04-23 02:15:27 UTC
matthias,
please do the test with the patch in https://bugzilla.kernel.org/show_bug.cgi?id=56591#c29
which is the final patch to fix the dethrottle problem in bug 56601.
Comment 45 Matthias 2013-04-23 09:07:50 UTC
Your final patch to fix the dethrottle problem works for me with linux-3.9-rc8. Thank you! 

As for the other problem, I will get back to you on that when it shows.
Comment 46 Zhang Rui 2013-04-23 09:32:55 UTC
(In reply to comment #45)
> Your final patch to fix the dethrottle problem works for me with linux-3.9-rc8.
> Thank you! 
> 
great.
thanks for the testing.

rename the title of this bug report, and let's focus on why the fan is on while ACPI shows it is off.
Comment 47 Zhang Rui 2013-04-23 09:39:50 UTC
(In reply to comment #43)
> I'll test that. 
> 
> (In reply to comment #40)
> > I need to check if it is because ACPI reports wrong cooling device state first.
> > If it is not, this seems to be a tough bug because some other unknown stuff
> > changes the fan speed behind ACPI.
> 
> Normally the fastest fan speed is not reached on this machine not even in hot
> summers and on full load.

about the "fastest fan speed", I assume that you mean STP4 that you can hear, right?
that's probably because the temperature never goes up to 91C/95C.

> It seems as if it is an emergency system for not
> damaging the hardware if it gets really really hot. Is there a piece of the
> kernel which could trigger this "emergency" system?

No, if the temperature goes above 91C/95C, cooling device 0 and cooling device 5 should be turned on automatically.

you can try to heat the system over 91C/95C and check if the fan is still not running in fastest speed. But be careful to do such test... :p
Comment 48 Matthias 2013-04-24 10:50:00 UTC
The fastest fan speed is indeed STP4. I tested a little bit and it turns out STP4 is reached when reported temperatures hit exactly 80°C (CORETEMP reports 81°C). Normally after reaching 79°C fan speed STP3 suffice to hold the temperature steady. I don't want to heat up the machine more. It is my main workhorse and I don't want to damage the hardware. 
As the Intel spec state the CPU should not reach 100°C. So I prefer this 20°C "safe zone" over 91/95°C. 
You said that the temperature must hit 91/95°C to hit fan speed STP4. But now what turns on the fan full speed at 80°C? Seems like we are going somewhere...
Comment 49 Matthias 2013-05-02 07:02:47 UTC
Created attachment 100411 [details]
Findings for comment 39
Comment 50 Matthias 2013-05-02 19:24:48 UTC
Problem exists with acpi_osi="!Windows 2006" parameter, too.
Comment 51 Zhang Rui 2013-05-06 02:32:53 UTC
please attach the output of
grep . /sys/class/thermal/cooling_device*/device/path

when the problem happens again, please try
"echo 1 > /sys/class/thermal/thermal_zone2/cdev0/cur_state"
and then
"echo 0 > /sys/class/thermal/thermal_zone2/cdev0/cur_state"
can you still hear the fan spinning?
Comment 52 Matthias 2013-05-06 13:51:37 UTC
grep . /sys/class/thermal/cooling_device*/device/path
/sys/class/thermal/cooling_device0/device/path:\_TZ_.C33F
/sys/class/thermal/cooling_device10/device/path:\_TZ_.C349
/sys/class/thermal/cooling_device11/device/path:\_SB_.C003.C085.C130.C14C
/sys/class/thermal/cooling_device12/device/path:\_PR_.CPU0
/sys/class/thermal/cooling_device13/device/path:\_PR_.CPU1
/sys/class/thermal/cooling_device1/device/path:\_TZ_.C340
/sys/class/thermal/cooling_device2/device/path:\_TZ_.C341
/sys/class/thermal/cooling_device3/device/path:\_TZ_.C342
/sys/class/thermal/cooling_device4/device/path:\_TZ_.C343
/sys/class/thermal/cooling_device5/device/path:\_TZ_.C344
/sys/class/thermal/cooling_device6/device/path:\_TZ_.C345
/sys/class/thermal/cooling_device7/device/path:\_TZ_.C346
/sys/class/thermal/cooling_device8/device/path:\_TZ_.C347
/sys/class/thermal/cooling_device9/device/path:\_TZ_.C348

The rest I will provide as soon as the problem shows again.
Comment 53 Zhang Rui 2013-05-06 14:03:32 UTC
(In reply to comment #51)

> when the problem happens again, please try
> "echo 1 > /sys/class/thermal/thermal_zone2/cdev0/cur_state"
> and then
> "echo 0 > /sys/class/thermal/thermal_zone2/cdev0/cur_state"
> can you still hear the fan spinning?

or you can try
"echo 0 > /sys/class/thermal/cooling_device10/cur_state"
and
"echo 1 > /sys/class/thermal/cooling_device10/cur_state"
which I think is the same thing.
Comment 54 Matthias 2013-05-07 09:49:55 UTC
I did some measurements and found the following out using the bogus thermal_zone which reports the fan speed in %. Activating the cooling_devices results in the following speeds:

cooling_device0 -> 20%
cooling_device1 -> 70%
cooling_device2 -> 60%
cooling_device3 -> 40%
cooling_device4 -> 25%
cooling_device5 -> 100%
cooling_device6 -> 70%
cooling_device7 -> 60%
cooling_device8 -> 40%
cooling_device9 -> 25%
cooling_device10 -> 20%
cooling_device11 -> 20%
cooling_device12 -> 20%

When the bug shows the bogus thermal_zone shows 20% fan speed but the fan spins full speed.

OK. I tried these commands when the bug showed:

echo 1 > /sys/class/thermal/thermal_zone2/cdev0/cur_state
Result: No change in fan speed

echo 0 > /sys/class/thermal/thermal_zone2/cdev0/cur_state
Result: No change in fan speed

echo 0 > /sys/class/thermal/cooling_device10/cur_state
Result: No change in fan speed

echo 1 > /sys/class/thermal/cooling_device10/cur_state
Result: No change in fan speed

echo 1 > /sys/class/thermal/cooling_device1/cur_state
Result: Fan speed slows down to 20% and then speeds up to 70%

echo 0 > /sys/class/thermal/cooling_device1/cur_state
Result: Fan speed goes down to 20%

After roughly five minutes of silence the bug showed again. So it tested further commands:

echo 1 > /sys/class/thermal/cooling_device2/cur_state
Result: Fan speed goes down to 20% and then up to 60%

echo 0 > /sys/class/thermal/cooling_device2/cur_state
Result: Fan speed goes down to 20%

Well the bug showed again:

echo 1 > /sys/class/thermal/cooling_device3/cur_state
Result: Fan speed goes down to 20% and then up to 40%

echo 0 > /sys/class/thermal/cooling_device3/cur_state
Result: Fan speed goes down to 20%

Same game again:

echo 1 > /sys/class/thermal/cooling_device4/cur_state
Result: Fan speed goes down to 20% and then up to 25%

echo 0 > /sys/class/thermal/cooling_device4/cur_state
Result: Fan speed goes down to 20%

And again:

echo 1 > /sys/class/thermal/cooling_device5/cur_state
Result: Fan speed slows down to 20% and then speeds up to 100%

echo 0 > /sys/class/thermal/cooling_device5/cur_state
Result: Fan speed goes down to 20%

And again:

echo 1 > /sys/class/thermal/cooling_device6/cur_state
Result: Fan speed slows down to 20% and then speeds up to 70%

echo 0 > /sys/class/thermal/cooling_device6/cur_state
Result: Fan speed goes down to 20%

And again:

echo 1 > /sys/class/thermal/cooling_device7/cur_state
Result: Fan speed slows down to 20% and then speeds up to 60%

echo 0 > /sys/class/thermal/cooling_device7/cur_state
Result: Fan speed goes down to 20%

And again:

echo 1 > /sys/class/thermal/cooling_device8/cur_state
Result: Fan speed slows down to 20% and then speeds up to 40%

echo 0 > /sys/class/thermal/cooling_device8/cur_state
Result: Fan speed goes down to 20%

And again:

echo 1 > /sys/class/thermal/cooling_device9/cur_state
Result: Fan speed slows down to 20%

echo 0 > /sys/class/thermal/cooling_device9/cur_state
Result: no change 

And again:

echo 1 > /sys/class/thermal/cooling_device11/cur_state
Result: no change

echo 0 > /sys/class/thermal/cooling_device11/cur_state
Result: no change 

And again:

echo 1 > /sys/class/thermal/cooling_device12/cur_state
Result: no change

echo 0 > /sys/class/thermal/cooling_device12/cur_state
Result: no change 

It seems that there are six different fan speeds. I am sorry for reporting that wrong but I did not hear the differences. I hope you can work with these measurements.
Once the bug shows up it repeats itself quite often when you play with the settings of the cooling_devices.
Comment 55 Matthias 2013-05-07 09:53:32 UTC
(In reply to comment #54)
> And again:
> 
> echo 1 > /sys/class/thermal/cooling_device9/cur_state
> Result: Fan speed slows down to 20%

Made a little error: this should be
 Result: Fan speed slows down to 20% and then up to 25%. I verified that just now.
Comment 56 Zhang Rui 2013-05-13 01:28:11 UTC
Just to make sure, the problem can only be reproduced when  the nouveau driver is loaded, right?
Comment 57 Matthias 2013-05-13 06:35:18 UTC
I will test if the problem shows when no graphic driver is loaded. So far I have only tested with a graphical environment.
Comment 58 Zhang Rui 2013-05-15 07:21:54 UTC
as far as I can see from the acpidump, no other OS code will change the ACPI fan state, so IMO, it is BIOS that changes the fan speed.
so it would be nice if you can check the BIOS option to see if there is any Fan related options.
Comment 59 Matthias 2013-05-17 13:56:12 UTC
OK, there is a option to turn the fan of when connected to the ac adapter. This activates the same fan configuration as if the laptop is running from battery power. With this the bug shows also. Otherwise there is no fan related setting in BIOS. 

If the BIOS changes the fan speed, why does this behavior not occur when running Windows (XP and 7 Pro tested) and with <=linux-2.6.31? 

Does it suffice to rmmod the nouveau module when the bug shows and see what happens? I let the laptop run for a while without the nouveau (and any other graphics driver) but so far the bug did not show. But this does not indicate that it does not happen. I need more testing time for this.
Comment 60 Zhang Rui 2013-05-20 03:00:38 UTC
(In reply to comment #59)
> OK, there is a option to turn the fan of when connected to the ac adapter. This
> activates the same fan configuration as if the laptop is running from battery
> power. With this the bug shows also. Otherwise there is no fan related setting
> in BIOS. 
> 
bad news.

> If the BIOS changes the fan speed, why does this behavior not occur when
> running Windows (XP and 7 Pro tested) and with <=linux-2.6.31? 
> 
good question.
hmm, you can still use a 2.6.31 kernel that with this problem, right?

> Does it suffice to rmmod the nouveau module when the bug shows and see what
> happens? I let the laptop run for a while without the nouveau (and any other
> graphics driver) but so far the bug did not show. But this does not indicate
> that it does not happen. I need more testing time for this.

okay.
If the problem can not be reproduced without nouveau driver, this suggests that the graphics driver changes the fan speed without ACPI's awareness, and it also explains why this is a regression, some code/new functionality introduced in nouveau driver touches the fan speed.

I'll reassign to graphics people to see if they can find something interesting.
Because from ACPI's perspective of view, we can really do nothing here.
Comment 61 Matthias 2013-05-22 13:17:05 UTC
At the moment I am traveling and I don't have the machine with me. I can try to run the old 2.6.31 kernel. This kernel was the last one without the problem but nouveau was not useable on this kernel. So back in time I used the nvidia binary blob. I tested the same graphic drivers on 2.6.31 and 2.6.32. 2.6.32 had the problem regardless of the nvidia graphic driver. 

At the moment I can't tell you if the problem only occurs when a graphic driver is loaded. I have to test it further because sometimes the bug really shows. The longest time between the bug showing was one and a half week. That time I thought the bug was solved but it wasn't. So I will leave the machine running for a long time without the nouveau driver to see if it shows or not when I am back home. 

Thanks for helping!
Comment 62 Zhang Rui 2013-06-03 05:05:24 UTC
(In reply to comment #61)
> At the moment I am traveling and I don't have the machine with me. I can try to
> run the old 2.6.31 kernel. This kernel was the last one without the problem but
> nouveau was not useable on this kernel. So back in time I used the nvidia
> binary blob. I tested the same graphic drivers on 2.6.31 and 2.6.32. 2.6.32 had
> the problem regardless of the nvidia graphic driver.

Right, the kernel nvidia driver is shipped in 2.6.33.

so is it possible that you run 2.6.31/32 kernel without the nvidia binary blob, say in text mode?

> 
> At the moment I can't tell you if the problem only occurs when a graphic driver
> is loaded. I have to test it further because sometimes the bug really shows.
> The longest time between the bug showing was one and a half week. That time I
> thought the bug was solved but it wasn't. So I will leave the machine running
> for a long time without the nouveau driver to see if it shows or not when I am
> back home. 

great. please check if the problem can be reproduced without Nvidia driver, this is important as I can do nothing if the fan state is changed beyong ACPI scope, in that case, we need the help from graphics experts.
Comment 63 Zhang Rui 2013-06-14 07:54:01 UTC
Matthias, any update?
Comment 64 Al 2013-06-17 01:07:34 UTC
Hello everyone,

I have an HP nw8240 and I believe that I'm experiencing the same bug with 3.9.4, the difference being that on my machine the range of available speeds is 5 from 0 rpm (quiet) to what I believe it to be a 4th speed (although I've never reached it). First of all, here are a few outputs to understand the configuration:
grep . /sys/class/thermal/*/device/path
/sys/class/thermal/cooling_device0/device/path:\_TZ_.C255
/sys/class/thermal/cooling_device1/device/path:\_TZ_.C256
/sys/class/thermal/cooling_device2/device/path:\_TZ_.C257
/sys/class/thermal/cooling_device3/device/path:\_TZ_.C258
/sys/class/thermal/cooling_device4/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone0/device/path:\_TZ_.TZ1_
/sys/class/thermal/thermal_zone1/device/path:\_TZ_.TZ2_
/sys/class/thermal/thermal_zone2/device/path:\_TZ_.TZ3_
/sys/class/thermal/thermal_zone3/device/path:\_TZ_.TZ4_

grep . /sys/class/thermal/thermal_zone*/cdev*/device/*
/sys/class/thermal/thermal_zone0/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone0/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone0/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone0/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone0/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:
/sys/class/thermal/thermal_zone0/cdev1/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev1/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev1/device/path:\_TZ_.C258
/sys/class/thermal/thermal_zone0/cdev1/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev1/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev1/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev1/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev1/device/uid:3
/sys/class/thermal/thermal_zone0/cdev2/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev2/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev2/device/path:\_TZ_.C257
/sys/class/thermal/thermal_zone0/cdev2/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev2/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev2/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev2/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev2/device/uid:2
/sys/class/thermal/thermal_zone0/cdev3/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev3/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev3/device/path:\_TZ_.C256
/sys/class/thermal/thermal_zone0/cdev3/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev3/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev3/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev3/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev3/device/uid:1
/sys/class/thermal/thermal_zone0/cdev4/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev4/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev4/device/path:\_TZ_.C255
/sys/class/thermal/thermal_zone0/cdev4/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev4/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev4/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev4/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev4/device/uid:0
/sys/class/thermal/thermal_zone2/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone2/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone2/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone2/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:
/sys/class/thermal/thermal_zone3/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone3/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone3/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone3/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:

grep . /sys/class/thermal/thermal_zone*/*
/sys/class/thermal/thermal_zone0/cdev0_trip_point:1
/sys/class/thermal/thermal_zone0/cdev1_trip_point:5
/sys/class/thermal/thermal_zone0/cdev2_trip_point:4
/sys/class/thermal/thermal_zone0/cdev3_trip_point:3
/sys/class/thermal/thermal_zone0/cdev4_trip_point:2
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/policy:step_wise
/sys/class/thermal/thermal_zone0/temp:45000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:100000
/sys/class/thermal/thermal_zone0/trip_point_1_type:passive
/sys/class/thermal/thermal_zone0/trip_point_2_temp:85000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:70000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:60000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:50000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/passive:0
/sys/class/thermal/thermal_zone1/policy:step_wise
/sys/class/thermal/thermal_zone1/temp:50000
/sys/class/thermal/thermal_zone1/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/type:acpitz
/sys/class/thermal/thermal_zone2/cdev0_trip_point:1
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/policy:step_wise
/sys/class/thermal/thermal_zone2/temp:38200
/sys/class/thermal/thermal_zone2/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/trip_point_1_temp:60000
/sys/class/thermal/thermal_zone2/trip_point_1_type:passive
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone3/cdev0_trip_point:1
/sys/class/thermal/thermal_zone3/mode:enabled
/sys/class/thermal/thermal_zone3/policy:step_wise
/sys/class/thermal/thermal_zone3/temp:0
/sys/class/thermal/thermal_zone3/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone3/trip_point_0_type:critical
/sys/class/thermal/thermal_zone3/trip_point_1_temp:110000
/sys/class/thermal/thermal_zone3/trip_point_1_type:passive
/sys/class/thermal/thermal_zone3/type:acpitz

This is what it is supposed to happen (and I'm simplifying it, but I believe it is enough to isolate where the bugs lies):
- when the TZ0 temp > 50, CD3 fan is turned on using the lowest available speed and CD3 cur_state is changed from 0 to 1, and the fan is running until TZ0 temp goes < 45, when the CD3 fan is turned off and  CD3 cur_state is changed from 1 to 0. All the other fan speeds are supposed to kick in once the corresponding trip points are reached, but since I'm using an 'undervolted' cpu I'm rarely reaching the 2nd speed, much less any higher speeds, even with a relatively high cpu load. @Matthias you might want to look into the linux-phc option as well. 
So this is how it looks like when the fan is spinning, after TZ0 went above 50 and until it manages to go under 45.
grep . /sys/class/thermal/thermal_zone*/temp /sys/devices/virtual/thermal/cooling_device*/cur_state
/sys/class/thermal/thermal_zone0/temp:46000
/sys/class/thermal/thermal_zone1/temp:51000
/sys/class/thermal/thermal_zone2/temp:38100
/sys/class/thermal/thermal_zone3/temp:40000
/sys/devices/virtual/thermal/cooling_device0/cur_state:0
/sys/devices/virtual/thermal/cooling_device1/cur_state:0
/sys/devices/virtual/thermal/cooling_device2/cur_state:0
/sys/devices/virtual/thermal/cooling_device3/cur_state:1
/sys/devices/virtual/thermal/cooling_device4/cur_state:0

And this is how it looks like when the fan is off, after TZ0 went below 45 and before it will go above 50 again.
grep . /sys/class/thermal/thermal_zone*/temp /sys/devices/virtual/thermal/cooling_device*/cur_state
/sys/class/thermal/thermal_zone0/temp:47000
/sys/class/thermal/thermal_zone1/temp:51000
/sys/class/thermal/thermal_zone2/temp:38100
/sys/class/thermal/thermal_zone3/temp:0
/sys/devices/virtual/thermal/cooling_device0/cur_state:0
/sys/devices/virtual/thermal/cooling_device1/cur_state:0
/sys/devices/virtual/thermal/cooling_device2/cur_state:0
/sys/devices/virtual/thermal/cooling_device3/cur_state:0
/sys/devices/virtual/thermal/cooling_device4/cur_state:0

Now when the bug occurs even if TZ0 went below 45, CD3 fan continues to spin (at the same lowest speed) even if the cur_state of CD3 has already been changed from 1 to 0, and here's how the output looks like:
grep . /sys/class/thermal/thermal_zone*/temp /sys/devices/virtual/thermal/cooling_device*/cur_state
/sys/class/thermal/thermal_zone0/temp:39000
/sys/class/thermal/thermal_zone1/temp:51000
/sys/class/thermal/thermal_zone2/temp:38100
/sys/class/thermal/thermal_zone3/temp:40000
/sys/devices/virtual/thermal/cooling_device0/cur_state:0
/sys/devices/virtual/thermal/cooling_device1/cur_state:0
/sys/devices/virtual/thermal/cooling_device2/cur_state:0
/sys/devices/virtual/thermal/cooling_device3/cur_state:0
/sys/devices/virtual/thermal/cooling_device4/cur_state:0

I have not found what is causing the bug to kick in, but I don't think it has anything to do with the nouveau video driver, because in my case I'm using the radeon open source driver and even with a lowered gpu clock through sysfs to 1/3 of the maximum speed (TZ1= gpu temp between 48-51, well below 58-61 when the gpu is running at full speed) the bug continues to randomly show up, sometimes multiple times during a single day, other times once every few days etc.
What I'm doing is keeping an eye on TZs and CDs with 
watch -n1 grep . /sys/class/thermal/thermal_zone*/temp /sys/devices/virtual/thermal/cooling_device*/cur_state
and when the bug shows I have to do a
echo 1 > /sys/class/thermal/cooling_device3/cur_state
echo 0 > /sys/class/thermal/cooling_device3/cur_state
to turn off the fan.
Also another thing that I observed is that TZ3 temp has only 2 states either 40 when the fan is on or 0 when the fan is off, but while from 0 to 40 it goes in one step once the fan has been turned on, from 40 it takes a few successive steps 40->36->32->27->16->0 in a matter of 3-4 seconds to reach 0 when the fan has been turned off. The system has only one physical fan. Btw, I also used to run 2.6.31 before I switched to 3.9.4 with uptimes sometimes in excess of 6 months and daily hibernates/resumes without any acpi glitches. 
If you need any additional information please let me know as I would also like to see this bug fixed.
Comment 65 Matthias 2013-06-18 14:52:14 UTC
Hello Zhang,

sorry for my late response. I was traveling. 

I think we need a graphics expert on this. I had the machine running for a long while without the nouveau driver and the fan bug has not occurred ever since. I  even tried to recreate some of my normal day workloads. 

As for the 2.6.31/32 kernels: I haven't tried to run them yet. I suspect it to be difficult but I will give it a try. If my system won't run with these kernels, can I boot with an old live CD? Would this help?

Greetings

Matthias
Comment 66 Matthias 2013-06-18 15:11:17 UTC
Well, here are my thoughts on the different graphic drivers:

I would suspect the nouveau and the nvidia binary blob to behave in somewhat same ways.
Thanks to Al we know that the problem can also be reproduced with the radeon driver enabled. 
@Al: Does your machine show the bug with the radeon driver not loaded? It would be great if you can confirm my findings. I would not like to send somebody on a wild goose chase. 

Since both these machines share one cooling system between CPU and GPU it would make sense to change the fan speed from the graphics driver to prevent the GPU from any heat damage. But I never saw any GPU temperatures justifying to turn the fan on full speed.
Comment 67 Matthias 2013-06-19 06:19:31 UTC
I have sad news. Today the bug occured without a graphic driver loaded. Now we know that it happens less often without a graphic driver. Good thing I left the machine running, I guess.

Sorry!
Comment 68 Al 2013-06-20 04:44:28 UTC
That's actually good news, if the source of the bug seems to be confined to the acpi subsystem figuring out what it is and eventually fixing it should happen sooner.
There are a few other thoughts that I would like to add. The main problem I see with this bug is that afaict, none of us knows how or what's actually setting it off, so “reproducing it” really means just having the system up for long enough for it to happen. For example I'm running the same instance for maybe 6 days since the last hibernate/resume cycle and the 'bug' only happened once in the 1st day, maybe 4 or 5 times on the 2nd day, but not once for the next 3 days and again today when it happened once, while the system has gone through the same daily usage routine +/- other tasks. 
I've also played a little bit with the CDs just too see how the system behaves when an external source is interfering with the usual cycles. More precisely while the fan was still on after TZ0 had hit the > 50 trip point and the temperature was slowly decreasing toward < 45 I echoed a '0' to the CD3 when the temperature was around 46, so that turned off the fan manually (also interrupting the usual cycle that would have turned it off once the TZ0 had gone < 45). What happened after is that once the TZ0 temperature started to rise again, and it had reached the 1st trip point at > 50 the CD3 fan was not turned on, so the TZ0 temperature continued to rise (also without any supplemental load TZ1 started to slowly rise, which is absolutely normal considering the fact that there is only one physical fan and the cpu and gpu are sharing the same heat sink/pipe) and only after it had reached the > 60 trip point the fan has been turned on using the 2nd speed, which in turn changed the both CD2 and CD3 cur_state from 0 to 1. Then the TZ0 (and TZ1 of course, but I don't think that this is interacting with any CDs as long as it stays <= 60) temperature started to decrease and once it reached 53 for TZ0 the fan changed to 1st (lower) speed, also changing the state of CD2 from 1 to 0 and still keeping CD3 to 1 until < 45, when the fan was turned off and CD3 was changed to 0 too. Is this the expected behaviour or is this another bug? (once you are interfering with the CDs' state to skip the automated action at the next trip point and reverse to the default at the 2nd). Also is it normal to have more than one CD marked as active while obviously a single physical fan can't spin at 2 different speeds in the same time?
I know one thing, manually activating/deactivating (echo 1/0  > …) one CD at a time is not changing the state of the others. So for example if I'm issuing an echo 0 to CD0 that is turning on the fan at its maximum speed, but none of the other CDs are turning from 0 to 1.
Another observation: once the bug has shown off  (TZ0 < 45, the fan spinning at its 1st speed, while CD3 cur_state shows that the fan is off – in fact, all CDs' cur_state are showing that the fan is off, but in my case CD3 is the one that takes care of turning it off and on at its 1st speed) if you want to turn it off, echo 0 is not doing anything, you need to do an echo 1 first  and then an echo 0 to turn it off.
I'm hoping that for somebody familiar with the code and logic behind the acpi kernel subsystem (especially TZs and CDs) this additional information will help to further refine the search for a solution. Does anybody know what has changes since 2.6.31 in this aspect?
Comment 69 Al 2013-06-21 18:35:50 UTC
Another odd thing happened today and this time something has gone wrong in the CDs activity in a direction that could also pose a risk to the hardware. The system is now up for 7 days and today I found it with the TZ0 at 54 and CD3 on (1). With the cpu locked at the lowest clock speed I have never seen the TZ0 go over 51, because once CD3 is turned on at > 50 immediately pulls the temperature below 50. Since the ambient temperature was particularly lower this past night (maybe 3-4 degrees lower than usual for this time of the year), there wasn't any significant load on it (avg ~ 10% with isolated short peaks up to 20%) and the cpu was locked at the lowest speed for the entire time I was not actively using it, there could be only one explanation for this: the CD3 fan failed to be turned on after the > 50 trip point was met, and quite certainly it was only turned on at > 60, probably using CD2 (2nd fan speed) and at the time when I caught it, it was on its way down, still within the range where CD3 (1st speed) is supposed to be active, but after TZ0 had reached the < 53 trip point that had turned the CD2 off and switched to CD3. I just ran a few tests to see if this assumption could be plausible and indeed even with a constant 100% load at the lowest cpu speed once the CD3 is turned on TZ0 is instantly pulled below 50 after the brief moment when the TZ0 > 50 to turn on the CD3 and within 25 seconds TZ0 is < 45 again. In fact with a constant 100% load and a locked cpu clock at its 2nd fastest speed (there are 6 available cpu speeds) I can't push TZ0 above 53 once CD3 has been turned on. A constant 100% load at the maximum cpu speed while CD3 is on caps the maximum TZ0 temperature at 58, so really in the current conditions it is virtually impossible to automatically turn on the CD2, unless something has failed in the automated cycles that CDs are supposed to follow. The fact that TZ1 (gpu) was also around 52-53 seems to confirm this scenario since in the absence of any load on the gpu, assuming that everything works normal with the cooling devices TZ1 stays around 49-50 degrees.
Comment 70 Matthias 2013-06-24 12:49:30 UTC
Zhang, can you tell us which information you need from the old kernels?
Comment 71 Matthias 2013-06-30 14:22:50 UTC
linux-3.10-rc7 is also affected.
Comment 72 Al 2013-07-04 03:00:52 UTC
Is anybody still looking into this? Do you need any additional information?
@Matthias – until somebody comes up with a solution, a user space workaround is probably the only option right now. I'm at least using a cron script to deal with it.
Comment 73 Matthias 2013-07-04 06:28:53 UTC
@Al: Well, I reboot or suspend2ram. This makes the problem go away temporarily. Problem is when the fan bug occurs and the CPU gets hot the fan speed drops until 80°C is reached. That is not a nice thing. 
I used to echo 1 && echo 0 to all the cooling devices. This drops fan speed to normal levels. But this helps only for a very short period of time. Then the fan spins up to full speed again. This puts unnecessary stress on the fan motor.

It is good that you reported this problem also. Perhaps we can find somebody else with the same bug to report this. More persons "can cover more ground" on testing this.    

I can't run 2.6.31 as this kernel is to old for my up to date gentoo. I can't even compile this thing anymore. I have downloaded SystemRescueCD which contains this kernel to test it. As SystemRescueCD is based on gentoo, I think it is a good point to start debugging this thing.
Comment 74 Matthias 2013-07-10 08:27:48 UTC
Bug shows in linux-3.10.0 too.
Comment 75 Matthias 2013-07-23 14:48:13 UTC
I can't test linux-3.11 because of bug 60568. It takes ages for the bug to show without nouveau loaded. 

@Al can you test that kernel to see if it has the same bug?
Comment 76 Zhang Rui 2013-08-15 07:39:03 UTC
TBH, I think I'm stuck in this issue. I've run out of my ideas about why this happens.

I think the reason why it is "easier" to reproduce the problem with graphics driver loaded is because a working GPU may heat the system more often.
Comment 77 Zhang Rui 2013-08-15 07:52:33 UTC
Oh, btw, as it is really hard to reproduce this bug, is it possible that this is not a regression?
Say, the problem actually exists in old kernels like 2.6.31, but it is just very difficult to reproduce because there is no nouveau driver at the moment.
Comment 78 Matthias 2013-08-17 09:37:02 UTC
Well, I have never experienced the bug when I was on 2.6.31. As this bug is independent from the graphics driver loaded, it happens with nouveau and the nvidia binary blob, I do not think it existed in 2.6.31. Al is experiencing the same bug with other graphics hardware. He is using a radeon. 

If the logic did not change much between 2.6.31 and 2.6.32 to drive the fan, I would perhaps suspect it to be a timing issue. Can you tell me if I can do a test to see if this is the case?
Comment 79 Al 2013-08-18 21:56:34 UTC
It was definitely not present in 2.6.31. I used (and still use it from time to time) that for years and never had at least one occurrence of this bug showing up. Also with 3.9.4 I'm running my system for days or sometimes weeks with the gpu clock locked at minimum speed so the heat dissipation is the lowest possible (for this gpu the default speed is the highest one and I don't think it can be modified without the proper driver) and the effects of the bug are present and showing up just as often. I really don't think that there's any correlation between the video driver and this bug, which I tend to believe it's the result of a change either in the logic of the acpi code (apparently a lot of the proc related stuff was removed and/or moved to the sys between 2.6.31 and 3.9.4 – I don't know exactly when that happened because I jumped directly from 2.6.31 to 3.9.4, but again, a lot of the proc code doesn't exist anymore in 3.9.4, so is it possible that the logic of the code was not entirely preserved and some changes have been made between these 2 releases?) and/or a timing issue, like Matthias said. Maybe if somebody who's familiar with the code could take a look at the parts that change the status of the CDs and TZs and how these interact with each other, the order of changes, any potential race conditions, if there's any feedback from the TZ or the CD once a change has been performed and so on.
Another thing is that this bug occurs both ways – failing to actually stop the fan, claiming that the fan is stopped while it is on AND also in the other way, claiming that the fan was activated while the fan is in fact off (in which case the next trip point is where the situation gets corrected), and here is a capture of the system after it failed to activate the fan at the first trip point (see TZ3 which instead of being 40 is 0 so the physical fan is actually off, while TZ0 shows a temp above the 1st trip point and CD3 which claims that the fan is on):

grep . /sys/class/thermal/*/*
/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device1/cur_state:0
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:0
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:1
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:0
/sys/class/thermal/cooling_device4/max_state:10
/sys/class/thermal/cooling_device4/type:Processor
/sys/class/thermal/thermal_zone0/cdev0_trip_point:1
/sys/class/thermal/thermal_zone0/cdev1_trip_point:5
/sys/class/thermal/thermal_zone0/cdev2_trip_point:4
/sys/class/thermal/thermal_zone0/cdev3_trip_point:3
/sys/class/thermal/thermal_zone0/cdev4_trip_point:2
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/policy:step_wise
/sys/class/thermal/thermal_zone0/temp:56000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:100000
/sys/class/thermal/thermal_zone0/trip_point_1_type:passive
/sys/class/thermal/thermal_zone0/trip_point_2_temp:85000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:70000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:60000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:45000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/passive:0
/sys/class/thermal/thermal_zone1/policy:step_wise
/sys/class/thermal/thermal_zone1/temp:53000
/sys/class/thermal/thermal_zone1/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/type:acpitz
/sys/class/thermal/thermal_zone2/cdev0_trip_point:1
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/policy:step_wise
/sys/class/thermal/thermal_zone2/temp:38600
/sys/class/thermal/thermal_zone2/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/trip_point_1_temp:60000
/sys/class/thermal/thermal_zone2/trip_point_1_type:passive
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone3/cdev0_trip_point:1
/sys/class/thermal/thermal_zone3/mode:enabled
/sys/class/thermal/thermal_zone3/policy:step_wise
/sys/class/thermal/thermal_zone3/temp:0
/sys/class/thermal/thermal_zone3/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone3/trip_point_0_type:critical
/sys/class/thermal/thermal_zone3/trip_point_1_temp:110000
/sys/class/thermal/thermal_zone3/trip_point_1_type:passive
/sys/class/thermal/thermal_zone3/type:acpitz
------------------------------------
grep . /sys/class/thermal/*/device/path
/sys/class/thermal/cooling_device0/device/path:\_TZ_.C255
/sys/class/thermal/cooling_device1/device/path:\_TZ_.C256
/sys/class/thermal/cooling_device2/device/path:\_TZ_.C257
/sys/class/thermal/cooling_device3/device/path:\_TZ_.C258
/sys/class/thermal/cooling_device4/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone0/device/path:\_TZ_.TZ1_
/sys/class/thermal/thermal_zone1/device/path:\_TZ_.TZ2_
/sys/class/thermal/thermal_zone2/device/path:\_TZ_.TZ3_
/sys/class/thermal/thermal_zone3/device/path:\_TZ_.TZ4_
------------------------------------
grep . /sys/class/thermal/thermal_zone*/cdev*/device/*
/sys/class/thermal/thermal_zone0/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone0/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone0/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone0/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone0/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:
/sys/class/thermal/thermal_zone0/cdev1/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev1/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev1/device/path:\_TZ_.C258
/sys/class/thermal/thermal_zone0/cdev1/device/power_state:D0
/sys/class/thermal/thermal_zone0/cdev1/device/real_power_state:D0
/sys/class/thermal/thermal_zone0/cdev1/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev1/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev1/device/uid:3
/sys/class/thermal/thermal_zone0/cdev2/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev2/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev2/device/path:\_TZ_.C257
/sys/class/thermal/thermal_zone0/cdev2/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev2/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev2/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev2/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev2/device/uid:2
/sys/class/thermal/thermal_zone0/cdev3/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev3/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev3/device/path:\_TZ_.C256
/sys/class/thermal/thermal_zone0/cdev3/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev3/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev3/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev3/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev3/device/uid:1
/sys/class/thermal/thermal_zone0/cdev4/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev4/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev4/device/path:\_TZ_.C255
/sys/class/thermal/thermal_zone0/cdev4/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev4/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev4/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev4/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev4/device/uid:0
/sys/class/thermal/thermal_zone2/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone2/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone2/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone2/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:
/sys/class/thermal/thermal_zone3/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone3/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone3/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone3/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:
------------------------------------
grep . /sys/class/thermal/thermal_zone*/cdev*/*
/sys/class/thermal/thermal_zone0/cdev0/cur_state:0
/sys/class/thermal/thermal_zone0/cdev0/max_state:10
/sys/class/thermal/thermal_zone0/cdev0/type:Processor
/sys/class/thermal/thermal_zone0/cdev1/cur_state:1
/sys/class/thermal/thermal_zone0/cdev1/max_state:1
/sys/class/thermal/thermal_zone0/cdev1/type:Fan
/sys/class/thermal/thermal_zone0/cdev2/cur_state:0
/sys/class/thermal/thermal_zone0/cdev2/max_state:1
/sys/class/thermal/thermal_zone0/cdev2/type:Fan
/sys/class/thermal/thermal_zone0/cdev3/cur_state:0
/sys/class/thermal/thermal_zone0/cdev3/max_state:1
/sys/class/thermal/thermal_zone0/cdev3/type:Fan
/sys/class/thermal/thermal_zone0/cdev4/cur_state:0
/sys/class/thermal/thermal_zone0/cdev4/max_state:1
/sys/class/thermal/thermal_zone0/cdev4/type:Fan
/sys/class/thermal/thermal_zone2/cdev0/cur_state:0
/sys/class/thermal/thermal_zone2/cdev0/max_state:10
/sys/class/thermal/thermal_zone2/cdev0/type:Processor
/sys/class/thermal/thermal_zone3/cdev0/cur_state:0
/sys/class/thermal/thermal_zone3/cdev0/max_state:10
/sys/class/thermal/thermal_zone3/cdev0/type:Processor


And here is the log after the 2nd trip point was reached and now you see both CD3 and CD2 as being on, while obviously there's only one physical active cooling device in the system which cannot run at 2 different speeds at the same time and also TZ3 is 55 indicating that the physical fan is actually on and should be running at the 2nd lowest speed which is exactly what is happening. Apparently TZ3 is 0 when the actual fan is off and the cpu temperature indicated by the TZ0 is below the 1st trip point, 40  when the fan is running at its 1st speed and the cpu temperature as indicate by TZ0 is in the range covered by this speed, 55 when the fan is running at its 2nd speed and the system's temperature is in the range which should be covered by this speed and so on (basically TZ3 is not an actual temperature sensor and just jumps from 0 to 40 to 55 and so on and then back to 55 the to 40 and finally to 0 depending on the actual state of the physical fan and/or the temperature range in which the cpu/system is, based on the speed of fan which should cover that range, as indicated, I believe, by the trip_point_*_temp based on the current state of the system):

grep . /sys/class/thermal/*/*
/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device1/cur_state:0
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:1
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:1
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:0
/sys/class/thermal/cooling_device4/max_state:10
/sys/class/thermal/cooling_device4/type:Processor
/sys/class/thermal/thermal_zone0/cdev0_trip_point:1
/sys/class/thermal/thermal_zone0/cdev1_trip_point:5
/sys/class/thermal/thermal_zone0/cdev2_trip_point:4
/sys/class/thermal/thermal_zone0/cdev3_trip_point:3
/sys/class/thermal/thermal_zone0/cdev4_trip_point:2
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/policy:step_wise
/sys/class/thermal/thermal_zone0/temp:59000
/sys/class/thermal/thermal_zone0/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:100000
/sys/class/thermal/thermal_zone0/trip_point_1_type:passive
/sys/class/thermal/thermal_zone0/trip_point_2_temp:85000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:70000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:55000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:45000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/passive:0
/sys/class/thermal/thermal_zone1/policy:step_wise
/sys/class/thermal/thermal_zone1/temp:54000
/sys/class/thermal/thermal_zone1/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/type:acpitz
/sys/class/thermal/thermal_zone2/cdev0_trip_point:1
/sys/class/thermal/thermal_zone2/mode:enabled
/sys/class/thermal/thermal_zone2/policy:step_wise
/sys/class/thermal/thermal_zone2/temp:38700
/sys/class/thermal/thermal_zone2/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone2/trip_point_0_type:critical
/sys/class/thermal/thermal_zone2/trip_point_1_temp:60000
/sys/class/thermal/thermal_zone2/trip_point_1_type:passive
/sys/class/thermal/thermal_zone2/type:acpitz
/sys/class/thermal/thermal_zone3/cdev0_trip_point:1
/sys/class/thermal/thermal_zone3/mode:enabled
/sys/class/thermal/thermal_zone3/policy:step_wise
/sys/class/thermal/thermal_zone3/temp:55000
/sys/class/thermal/thermal_zone3/trip_point_0_temp:110000
/sys/class/thermal/thermal_zone3/trip_point_0_type:critical
/sys/class/thermal/thermal_zone3/trip_point_1_temp:110000
/sys/class/thermal/thermal_zone3/trip_point_1_type:passive
/sys/class/thermal/thermal_zone3/type:acpitz
------------------------------------
grep . /sys/class/thermal/*/device/path
/sys/class/thermal/cooling_device0/device/path:\_TZ_.C255
/sys/class/thermal/cooling_device1/device/path:\_TZ_.C256
/sys/class/thermal/cooling_device2/device/path:\_TZ_.C257
/sys/class/thermal/cooling_device3/device/path:\_TZ_.C258
/sys/class/thermal/cooling_device4/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone0/device/path:\_TZ_.TZ1_
/sys/class/thermal/thermal_zone1/device/path:\_TZ_.TZ2_
/sys/class/thermal/thermal_zone2/device/path:\_TZ_.TZ3_
/sys/class/thermal/thermal_zone3/device/path:\_TZ_.TZ4_
------------------------------------
grep . /sys/class/thermal/thermal_zone*/cdev*/device/*
/sys/class/thermal/thermal_zone0/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone0/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone0/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone0/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone0/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:
/sys/class/thermal/thermal_zone0/cdev1/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev1/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev1/device/path:\_TZ_.C258
/sys/class/thermal/thermal_zone0/cdev1/device/power_state:D0
/sys/class/thermal/thermal_zone0/cdev1/device/real_power_state:D0
/sys/class/thermal/thermal_zone0/cdev1/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev1/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev1/device/uid:3
/sys/class/thermal/thermal_zone0/cdev2/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev2/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev2/device/path:\_TZ_.C257
/sys/class/thermal/thermal_zone0/cdev2/device/power_state:D0
/sys/class/thermal/thermal_zone0/cdev2/device/real_power_state:D0
/sys/class/thermal/thermal_zone0/cdev2/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev2/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev2/device/uid:2
/sys/class/thermal/thermal_zone0/cdev3/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev3/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev3/device/path:\_TZ_.C256
/sys/class/thermal/thermal_zone0/cdev3/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev3/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev3/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev3/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev3/device/uid:1
/sys/class/thermal/thermal_zone0/cdev4/device/hid:PNP0C0B
/sys/class/thermal/thermal_zone0/cdev4/device/modalias:acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev4/device/path:\_TZ_.C255
/sys/class/thermal/thermal_zone0/cdev4/device/power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev4/device/real_power_state:D3cold
/sys/class/thermal/thermal_zone0/cdev4/device/uevent:DRIVER=fan
/sys/class/thermal/thermal_zone0/cdev4/device/uevent:MODALIAS=acpi:PNP0C0B:
/sys/class/thermal/thermal_zone0/cdev4/device/uid:0
/sys/class/thermal/thermal_zone2/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone2/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone2/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone2/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:
/sys/class/thermal/thermal_zone3/cdev0/device/hid:LNXCPU
/sys/class/thermal/thermal_zone3/cdev0/device/modalias:acpi:LNXCPU:
/sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.C001
/sys/class/thermal/thermal_zone3/cdev0/device/uevent:DRIVER=processor
/sys/class/thermal/thermal_zone3/cdev0/device/uevent:MODALIAS=acpi:LNXCPU:
------------------------------------
grep . /sys/class/thermal/thermal_zone*/cdev*/*
/sys/class/thermal/thermal_zone0/cdev0/cur_state:0
/sys/class/thermal/thermal_zone0/cdev0/max_state:10
/sys/class/thermal/thermal_zone0/cdev0/type:Processor
/sys/class/thermal/thermal_zone0/cdev1/cur_state:1
/sys/class/thermal/thermal_zone0/cdev1/max_state:1
/sys/class/thermal/thermal_zone0/cdev1/type:Fan
/sys/class/thermal/thermal_zone0/cdev2/cur_state:1
/sys/class/thermal/thermal_zone0/cdev2/max_state:1
/sys/class/thermal/thermal_zone0/cdev2/type:Fan
/sys/class/thermal/thermal_zone0/cdev3/cur_state:0
/sys/class/thermal/thermal_zone0/cdev3/max_state:1
/sys/class/thermal/thermal_zone0/cdev3/type:Fan
/sys/class/thermal/thermal_zone0/cdev4/cur_state:0
/sys/class/thermal/thermal_zone0/cdev4/max_state:1
/sys/class/thermal/thermal_zone0/cdev4/type:Fan
/sys/class/thermal/thermal_zone2/cdev0/cur_state:0
/sys/class/thermal/thermal_zone2/cdev0/max_state:10
/sys/class/thermal/thermal_zone2/cdev0/type:Processor
/sys/class/thermal/thermal_zone3/cdev0/cur_state:0
/sys/class/thermal/thermal_zone3/cdev0/max_state:10
/sys/class/thermal/thermal_zone3/cdev0/type:Processor
Comment 80 Matthias 2013-08-23 09:54:58 UTC
I have made an additional observation. The more often the speed of the fan changes the faster the bug shows.

After suspend to ram the bug shows even faster. I assume the acpi system is reinitialized on wake up from suspend to ram. Does reinitialization happen at "normal" runtime? 

And Zhang you are right. The more heat there is the faster the bug shows. That is why it is so hard to reproduce the bug without a gpu driver.
Comment 81 Al 2013-10-14 19:06:59 UTC
Why was the regression status changed from yes to no? Is this supposed to be a feature now? To randomly indicate that the fan is on while it is actually off and vice versa? In addition to what was initially reported by Matthias, the bug 'works' both ways and I explained it in my previous messages. This is certainly a regression and a very bad one because the average user is not even aware of it and the potential consequences of this are a premature wearing off of the fan motor because while the software layer indicates that it is off the fan continues to work without any possibility of being interrupter unless the user intervenes. And in the other direction when the software indicates that the fan is on while it is actually off, not only the cpu's, but the general temperature of the system rises overstressing and thus certainly reducing the life of everything inside that computer. 
I'm sorry if I may sound rude, but somebody f**** up the logic in the acpi subsystem in a very subtle way that now no one seems to be able to even get close to where it is, let alone come up with a solution.
Comment 82 Matthias 2013-11-10 16:00:28 UTC
As for the regression status: The fan did work with linux-2.6.31 and after the update to linux-2.6.32 it did not work. IMHO this is a regression.

What can we do? I will test the new linux-3.12 and report back.

So long...
Comment 83 Aaron Lu 2013-12-11 02:18:02 UTC
Doing a git bisect may be helpful since we know v2.6.31 works and v2.6.32 doesn't.
http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/
Comment 84 Matthias 2014-01-15 19:53:17 UTC
Today the bug hit me while I was running linux-3.12.7. 
I tried a git bisect once but one bisected kernel ate my filesystem. Today I can't even compile linux-2.6.31 anymore. Problem is I am on Gentoo which is a rolling release distribution. Rolling back is not done easily. I will try to install debian. Perhaps I can do the bisecting from there. 

Have a nice day...
Comment 85 Aaron Lu 2014-01-27 08:32:37 UTC
Any update on bisect?
Comment 86 Matthias 2014-01-30 09:45:11 UTC
I am bisecting now. The bug shows rarely.
So here is what I am doing. I changed the kernel config to support an initramfs. I boot and let the initramfs drop to a busybox shell. Modules are loaded as usual but I do not mount my rootfs. Now I have to sit and wait to see if the bug shows. This is going to take a while. I will get back to you with the full bisection log. Hopefully this will lead to a solution for the problem.
Comment 87 Matthias 2014-03-18 20:04:16 UTC
The bug seems to be introduced by a patch in the 2.6.32 series. At the moment I am testing if the bug exists in linux-2.6.32.7. My first bug report of this was wrong and I am sorry for that. It turns out gentoo did not name the kernel like it was named upstream. 2.6.32 in gentoo was actually 2.6.32.8 upstream. I checked that. 
I will do a bisect once I find the first bad release. I know for sure that 2.6.32.8 shows the bug. The bug in 2.6.32.8 showed after the second cold boot and 4:27 hours of uptime. 
This bug searching is quite time consuming. So I need some more time to find the first bad commit.

Have a nice day!
Comment 88 Zhang Rui 2014-06-03 02:49:58 UTC
Matthias,
any update about this problem?
Comment 89 Matthias 2014-07-02 06:12:53 UTC
I am still searching for the first bad one. I thought I had it but the bisect I did went wrong and gave me a powerpc driver patch as the bad patch.
Comment 90 Zhang Rui 2014-07-02 06:23:47 UTC
I know it would be hard but I think this is the only way to find the root cause.
Thanks for your effort, Matthias!
Comment 91 Matthias 2014-09-09 12:21:48 UTC
As I use my machine for my daily work, I am running, when not searching for the fan bug, linux-3.16.1. I have been running this kernel for 15 days and so far I have not experienced the fan bug. 
I am tempted to say the bug got fixed somewhere between linux-3.15.3 and linux-3.16.1. I keep linux-3.16.1 for another 15 days just to be sure.

I will keep investigating what caused the bug as I wish that the bug will not return in any way.
Comment 92 Zhang Rui 2014-10-23 07:58:24 UTC
hi, matthias, any good news? :)
Comment 93 Matthias 2014-10-23 18:46:03 UTC
linux-3.16.1 has passed the test. No fan bug occured during testing. Now I am running linux-3.16.6 and so far the bug is gone. Will try the 3.17 series in a few days.

This is good news, isn't it?
Comment 94 Zhang Rui 2014-10-24 02:34:14 UTC
yes, good news.
I will close this bug as the problem is gone in 3.16.
please feel free to re-open it if the problem come back again in the latest upstream kernel.

Note You need to log in before you can comment on or make changes to this bug.