Bug 67101

Summary: weird fan control with 3.12, was ok in 3.9 - HP ProBook 6555b
Product: Power Management Reporter: Olaf Hering (olaf)
Component: ThermalAssignee: Zhang Rui (rui.zhang)
Status: CLOSED OBSOLETE    
Severity: normal CC: aaron.lu, manuelkrause
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.12 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg-3.12.5.txt
config-3.12.5.txt
config-diff-3.9-to3.12.txt
dmesg-3.11.10.txt
acpidump.txt
sys-thermal-3.11.10.txt
sys-thermal-3.12.5.txt
dmesg-3.15.9.txt
acpidump.txt
v3.19-fan.patch
bko67101-v4.0-rc4.txt
bko67101-v4.0-rc4.txt
bko67101-v4.0-rc4.txt
bko67101-start-fan-after-resume.txt
debug patch
bko67101-v4.0-rc4+.txt
bko67101-v4.0-rc5-syslog.txt
bko67101-v4.0-rc5.txt
bko67101-v4.0-rc5.txt

Description Olaf Hering 2013-12-16 12:51:22 UTC
Created attachment 118561 [details]
dmesg-3.12.5.txt

HP ProBook 6555b, openSUSE 11.4, GNOME desktop, 
AMD Phenom(tm) II N830 Triple-Core Processor 

With mainline kernel 3.9 the fan goes smooth, just enough to provide a little air flow. Its almost, but not entirely, silent. If there is some load the fan goes faster after a while, then goes slower. Or if there is real high load it goes really fast and loud until the temp is below a certain value. Then it goes slow again. Or, if I'm out of luck, the load is so high over a longer period of time that the system just shuts off itself. The latter can be avoided by forcing "powersave" scaling_governor. All this continues to work after a suspend/resume cycle.

Now with 3.12 the fan behaves different. During initial bootup the fan behaves just as with 3.9, everything is fine. After a suspend/resume the fan is either off or on. What I see (according to sensors(1)) is that if the fan is off the cpu temp slowly raises from 69° to about 76°. Then the fan starts fast. Now the temp slowly drops down towards 69°. Once that is reached the fan goes off again, with the result that the temp slowly starts to raise again until the fan kicks in.

Since that fan noise is annoying it would be nice to know what is causing the difference between 3.9 and 3.12.

As a start I attach the 3.12 dmesg, .config and .config diff betwen 3.9 and 3.12. Since its my workstation it takes some time to figure out when this started happening.
Comment 1 Olaf Hering 2013-12-16 12:52:03 UTC
Created attachment 118571 [details]
config-3.12.5.txt
Comment 2 Olaf Hering 2013-12-16 12:54:10 UTC
Created attachment 118581 [details]
config-diff-3.9-to3.12.txt

differences in .config, nothing appears to cause this. Unless I have an important option disabled, which starts to matter with post 3.9.
Comment 3 Olaf Hering 2013-12-16 14:16:20 UTC
Created attachment 118591 [details]
dmesg-3.11.10.txt

3.11 appears to behave like 3.9. I just did a suspend/resume cycle and the fans do not go up and down.
Comment 4 Aaron Lu 2013-12-17 07:36:43 UTC
Please attach acpidump:
# acpidump > acpidump.txt
Comment 5 Olaf Hering 2013-12-17 19:11:31 UTC
Created attachment 118811 [details]
acpidump.txt
Comment 6 Aaron Lu 2013-12-18 02:28:45 UTC
Please show me:
$ ls -l /sys/class/thermal/thermal_zone*/

And show the below cmd both before suspend and after resume:
$ cd /sys/class/thermal
$ grep . */* 2>/dev/null

It seems that some of the ACPI FAN device stops working after resume.
Comment 7 Olaf Hering 2013-12-18 16:11:09 UTC
Created attachment 118921 [details]
sys-thermal-3.11.10.txt

3.11 output, after a few resume cycles. 3.12 will follow.
Comment 8 Olaf Hering 2013-12-18 16:43:20 UTC
Created attachment 118941 [details]
sys-thermal-3.12.5.txt

this is before suspend. After resume the difference is essentially:
...
cooling_device2/max_state:1
cooling_device2/type:Fan
[-cooling_device3/cur_state:1-]
{+cooling_device3/cur_state:0+}
cooling_device3/max_state:1
cooling_device3/type:Fan
[-cooling_device4/cur_state:1-]
{+cooling_device4/cur_state:0+}
cooling_device4/max_state:1
cooling_device4/type:Fan
...
thermal_zone0/policy:step_wise
[-thermal_zone0/temp:69000-]
{+thermal_zone0/temp:70600+}
thermal_zone0/trip_point_0_temp:105000
...
thermal_zone0/trip_point_3_type:active
[-thermal_zone0/trip_point_4_temp:74000-]
{+thermal_zone0/trip_point_4_temp:67000+}
thermal_zone0/trip_point_4_type:active
...
Comment 9 Aaron Lu 2013-12-19 06:23:51 UTC
When temperature drops below 67, the cooling_device4 should be kept on while cooling_device3 will be turned off. It seems we didn't keep cooling_device4 on for that case, I'll need to check the code see if there is something wrong, in the mean time, can you please test when temp drops below 67, and you feel that the FAN is off, does manually turn on cooling_device3 help? You can manually turn on the fan represented by cooling_device3 by:
# cd /sys/class/thermal/cooling_device3
# cat cur_state
// should be 0
# echo 1 > cur_state
Does the fan starts now?


Another interesting things is, the temp for trip point 4 changed after resume.
Comment 10 Olaf Hering 2014-01-18 13:55:00 UTC
I'm not sure what is supposed to happen with these cur_state files.

With 3.12 nothing changes if I write to cooling_device3.

Now I run 3.11 and I did:
for i in cooling_device*/c* ; do echo 0 > $i ; done

This did indeed lead to instant silence.
However, writing 1 or 3 to the files does not cause any fan to start. Instead the system slowly starts to heat up (according to sensors). Once temp reaches 80° the fan starts to run fast. That is not reflected in the cur_state files. Then temp drops again and fan stops.

Is the kernel supposed to actively drive the fans anyway via these files?
Comment 11 Olaf Hering 2014-01-18 16:12:14 UTC
Now after fresh boot into 3.11:
root@probook:/sys/class/thermal # head cooling_device*/c*
==> cooling_device0/cur_state <==
0

==> cooling_device1/cur_state <==
0

==> cooling_device2/cur_state <==
1

==> cooling_device3/cur_state <==
1

==> cooling_device4/cur_state <==
1

==> cooling_device5/cur_state <==
0

==> cooling_device6/cur_state <==
0

==> cooling_device7/cur_state <==
0
root@probook:/sys/class/thermal # echo 0 > cooling_device3/cur_state
root@probook:/sys/class/thermal # head cooling_device*/c*
==> cooling_device0/cur_state <==
0

==> cooling_device1/cur_state <==
0

==> cooling_device2/cur_state <==
0

==> cooling_device3/cur_state <==
0

==> cooling_device4/cur_state <==
1

==> cooling_device5/cur_state <==
0

==> cooling_device6/cur_state <==
0

==> cooling_device7/cur_state <==
0

one of the fans is now off, another one is still running. Looks like that single write affected both #2 and #3.

If I write 1 to #2, then I hear the fan goes a little faster for one or two seconds, then goes off or slower again. but cur_state remains  0.
Comment 12 Aaron Lu 2014-01-20 02:01:19 UTC
Write 0 to cur_state cause the cooling device representing the FAN to off state, write 1 cause the FAN to on state.

Most people normally have one physical FAN, while the firmware presents several virtual FANs representing different FAN speed for the physical FAN. OS can only see virtual FANs by querying firmware, so the multiple cooling_deviceX should be that virtual FANs.

You write all 0 to them and that will cause all those virtual FANs to an OFF state, so you get a quiet laptop. Writing 1s to them should cause the FAN to on state.

When you play with those cooling_deviceX sysfs files, be sure to check its type, only FAN type cooling_device should be touched.
Comment 13 Olaf Hering 2014-01-23 12:15:54 UTC
With 3.11 I'm appearently able to control fan speed by writing 1 to one of the cooling_device{0..4}/cur_state files. Each one seems to represent a certain speed, maybe there is even more than one fan inside. No idea. 

I will see if 3.13 allows the same. If not I have to bisect 3.11..3.12 to see if a certain change causes the bug.
Comment 14 Olaf Hering 2014-01-31 14:32:49 UTC
From what I have seen this week is that writing 1 to
/sys/class/thermal/cooling_device{2,3,4}/cur_state while the firmware starts the fans in emergency seems to restore the pre-suspend state. I have not found the exact pattern and time when writing to these files will restore the state.

Next week I will try to bisect between 3.11 and 3.12 to find the offending commit.
Comment 15 Zhang Rui 2014-06-23 06:00:00 UTC
does the problem still exist in latest upstream kernel, say 3.16-rc2?
Comment 16 Olaf Hering 2014-08-12 10:22:39 UTC
I had to send it to HP for repair because it failed to poweron after suspend.
The motherboard was replaced. Now for some reason the fancontrol does not seem to work at all. The fan is quiet, then it runs for a while, then quiet again. The temp goes up and down between 51° and maybe 60°.

Appearently the firmware changed from "BIOS 68DTM Ver. F.09 05/04/2011" to "BIOS 68DTM Ver. F.21 06/14/2012". In the BIOS settings there is a "FAN always on when on AC power", but toggling it does not seem to change. I tested 3.15 and 3.16.

I will attach dmesg and acpidump.
Comment 17 Olaf Hering 2014-08-12 10:23:47 UTC
Created attachment 146321 [details]
dmesg-3.15.9.txt
Comment 18 Olaf Hering 2014-08-12 10:24:11 UTC
Created attachment 146331 [details]
acpidump.txt
Comment 19 Zhang Rui 2014-10-23 11:46:07 UTC
as there is a firmware upgrade, please attach the output of "grep . /sys/class/thermal/thermal*/*" in 3.15 or 3.16 kernel.
Comment 20 Zhang Rui 2014-12-02 08:37:24 UTC
ping...
Comment 21 Olaf Hering 2014-12-12 14:49:44 UTC
with 3.18.0: 
 # grep . /sys/class/thermal/thermal*/*
/sys/class/thermal/thermal_zone0/cdev0_trip_point:1
/sys/class/thermal/thermal_zone0/cdev1_trip_point:1
/sys/class/thermal/thermal_zone0/cdev2_trip_point:1
/sys/class/thermal/thermal_zone0/cdev3_trip_point:6
/sys/class/thermal/thermal_zone0/cdev4_trip_point:5
/sys/class/thermal/thermal_zone0/cdev5_trip_point:4
/sys/class/thermal/thermal_zone0/cdev6_trip_point:3
/sys/class/thermal/thermal_zone0/cdev7_trip_point:2
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/policy:step_wise
/sys/class/thermal/thermal_zone0/temp:50200
/sys/class/thermal/thermal_zone0/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/trip_point_1_temp:100000
/sys/class/thermal/thermal_zone0/trip_point_1_type:passive
/sys/class/thermal/thermal_zone0/trip_point_2_temp:90000
/sys/class/thermal/thermal_zone0/trip_point_2_type:active
/sys/class/thermal/thermal_zone0/trip_point_3_temp:82000
/sys/class/thermal/thermal_zone0/trip_point_3_type:active
/sys/class/thermal/thermal_zone0/trip_point_4_temp:74000
/sys/class/thermal/thermal_zone0/trip_point_4_type:active
/sys/class/thermal/thermal_zone0/trip_point_5_temp:66000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/trip_point_6_temp:45000
/sys/class/thermal/thermal_zone0/trip_point_6_type:active
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone1/cdev0_trip_point:1
/sys/class/thermal/thermal_zone1/cdev1_trip_point:1
/sys/class/thermal/thermal_zone1/cdev2_trip_point:1
/sys/class/thermal/thermal_zone1/mode:enabled
/sys/class/thermal/thermal_zone1/policy:step_wise
/sys/class/thermal/thermal_zone1/temp:27500
/sys/class/thermal/thermal_zone1/trip_point_0_temp:105000
/sys/class/thermal/thermal_zone1/trip_point_0_type:critical
/sys/class/thermal/thermal_zone1/trip_point_1_temp:55000
/sys/class/thermal/thermal_zone1/trip_point_1_type:passive
/sys/class/thermal/thermal_zone1/type:acpitz
Comment 22 Zhang Rui 2015-02-16 03:10:59 UTC
please try the two patches at comment #131 and comment #132 in bug #78201 on top of 3.19 kernel and see if the problem still exists.
Comment 23 Olaf Hering 2015-02-16 14:24:17 UTC
Created attachment 167121 [details]
v3.19-fan.patch

This is the variant I used on top of v3.19.

In the end it does not help. As said earlier: After exchanging the mainboard the fan stays off most off the time. During this time the system heats up until around 60°. Then the fan kicks in at full speed until its cooled down to around 50°

Appearently there is just a (the?) fan running very slowly. It can be turned off by writing 0 to cooling_device4/cur_state. But writing 1 to it does not power it on. Every other cur_state file has 0, and writing 1 to it has no effect.
Comment 24 Olaf Hering 2015-02-16 14:26:05 UTC
Do we have any contact at HP who can help with understanding what is actually required to control the fans? Or is that info already available within the accessible ACPI infos?
Comment 25 Zhang Rui 2015-03-16 03:31:05 UTC
please try the patch set at https://bugzilla.kernel.org/show_bug.cgi?id=78201#c150 (please ignore "0001-Debug-patch-to-sync-thermal-zone-update.patch" from the tarball as the code obviously reappears under the correct new name "0004-Thermal-make-thermal_zone_device_update-atomic.patch".)
and boot with boot option module.dyndbg="module thermal_sys +fp" and then attach the dmesg output after the problem is reproduced.
Comment 26 Olaf Hering 2015-03-16 11:28:45 UTC
Created attachment 170741 [details]
bko67101-v4.0-rc4.txt

This is dmesg+.config from a fresh boot with v4.0-rc4 plus the patchset mentioned in previous comment.

The/One fan still runs very very slowly, the temp goes up to around 62°, then it runs fullspeed until its cooled down, then it runs again very very slowly while it continues to heat up.
Comment 27 Zhang Rui 2015-03-16 11:42:20 UTC
I checked your dmesg output, and everything works as expected.
Say, during boot, temperature is 60C, and the fan is running at lowest speed, because only the lowest trip point is crossed. (trip1 = 45C, trip2 = 66C)

Plus, I didn't see the fan is running in full speed, from your dmesg output, is the dmesg output got after the fan running in full speed?
if no, please do, and please also attach the output of "grep . /sys/class/thermal/cooling*/*" when the fan is running at full speed, at around 60C.
Comment 28 Olaf Hering 2015-03-16 13:52:30 UTC
Created attachment 170781 [details]
bko67101-v4.0-rc4.txt

After a while there are indeed some messages. But nothing shows in dmesg while the temperate goes up and down between 51° and 62°.

The sysfs output does not seem to change:

+ sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +51.5C  (crit = +105.0C)
temp2:        +29.2C  (crit = +105.0C)

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +51.5C  (high = +70.0C)
                       (crit = +100.0C, hyst = +95.0C)

+ grep .  /sys/class/thermal/cooling*/*
/sys/class/thermal/cooling_device0/cur_state:0
/sys/class/thermal/cooling_device0/max_state:1
/sys/class/thermal/cooling_device0/type:Fan
/sys/class/thermal/cooling_device1/cur_state:0
/sys/class/thermal/cooling_device1/max_state:1
/sys/class/thermal/cooling_device1/type:Fan
/sys/class/thermal/cooling_device2/cur_state:0
/sys/class/thermal/cooling_device2/max_state:1
/sys/class/thermal/cooling_device2/type:Fan
/sys/class/thermal/cooling_device3/cur_state:0
/sys/class/thermal/cooling_device3/max_state:1
/sys/class/thermal/cooling_device3/type:Fan
/sys/class/thermal/cooling_device4/cur_state:1
/sys/class/thermal/cooling_device4/max_state:1
/sys/class/thermal/cooling_device4/type:Fan
/sys/class/thermal/cooling_device5/cur_state:0
/sys/class/thermal/cooling_device5/max_state:24
/sys/class/thermal/cooling_device5/type:LCD
/sys/class/thermal/cooling_device6/cur_state:0
/sys/class/thermal/cooling_device6/max_state:3
/sys/class/thermal/cooling_device6/type:Processor
/sys/class/thermal/cooling_device7/cur_state:0
/sys/class/thermal/cooling_device7/max_state:3
/sys/class/thermal/cooling_device7/type:Processor
/sys/class/thermal/cooling_device8/cur_state:0
/sys/class/thermal/cooling_device8/max_state:3
/sys/class/thermal/cooling_device8/type:Processor
Comment 29 Olaf Hering 2015-03-17 09:06:27 UTC
Created attachment 170921 [details]
bko67101-v4.0-rc4.txt

After suspend and resume the situation changed. 

All cooling_device{0,1,2,3,4}/cur_state are zero. The temperature goes up and down (with noisy fan) between 46° and (I think) 57°.

After a while I realized that all cur_state are zero. So first I wrote 1 into #3, nothing changed. Then I wrote 1 into #4, its cur_state became 1 and the fan ran very slowly. Then I wrote 1 into #2. Its cur_state remains zero, but the fan starts to run slowly and the temperature stays at 51°, ±1°.

#4 behaves the same after fresh boot, the fan can be controlled for very low speed and cur_state changes. Now #2 reacts at least. But only if I first write 1 into #4. 

After playing around some more it goes like this:
write 0 to #2, #4 turns it off
write 1 to #2 does nothing
write 1 to #4 and #2 starts the fan.
write 1 to #4 and #1 starts the fan at slightly higher speed.
write 1 to #4 and #0 starts at even higher speed.
and writing 1 to #2 slows down the fan again.

But, just a single 'echo 1 > cur_state' is not enough, I have to do it in a while loop until I hear that the fan reacts.
Comment 30 Olaf Hering 2015-03-18 07:44:02 UTC
Created attachment 171051 [details]
bko67101-start-fan-after-resume.txt

Today I tried to kick the fan after resume.

First #4 did not react at all to my echo 1 > cur_state. I had to wait until the kernel recognized some state change. Once that happend, I was able to write 1 into 4/cur_state and 1/cur_state. Now the fan runs at a reasonable low speed which seem to keep the temperature around 51°.
Comment 31 Manuel Krause 2015-03-18 23:55:54 UTC
The fans you can turn off are ascending in numbers but descending in power.
You would need to turn off "echo 0 > /sys/class/thermal/cooling_device0" (100%) first and then any further == lower speed with numbers 1,2,3 up to 4 in this row.
Turning all other + No. 4 to 0 means no cooling. So, take actions for possible overheating.
Comment 32 Manuel Krause 2015-03-19 00:01:43 UTC
Just kidding somehow: Some people on the internet even recommend that we should just only clean our Notebook FAN?! ^^
Comment 33 Zhang Rui 2015-03-20 03:47:34 UTC
about of all, about the strange behavior of setting the fan manually,
I checked the BIOS code, and this is caused by BIOS.
        Method (FNON, 2, Serialized)
        {
            ShiftLeft (Arg0, 0x01, Local0)
            Decrement (Local0)
            If (LEqual (And (CRTF, Local0), 0x00))
            {
                \_SB.PCI0.LPCB.EC0.KSFS (Arg1)
            }

            Or (CRTF, Arg0, CRTF)
        }

        Method (FNOF, 2, Serialized)
        {
            And (CRTF, Not (Arg0), CRTF)
            Store (0x00, Local0)
            If (CRTF)
            {
                Store (Arg1, Local0)
            }

            If (LOr (Arg1, LEqual (CRTF, 0x00)))
            {
                \_SB.PCI0.LPCB.EC0.KSFS (Local0)
            }
        }

This is how fan works
1. After boot, you must setting fan4 first, or else other fan setting is a nop before turning on fan4.
2. after resume, setting Fan speed to zero does not work.
Comment 34 Zhang Rui 2015-03-20 03:54:40 UTC
(In reply to Olaf Hering from comment #28)
> Created attachment 170781 [details]
> bko67101-v4.0-rc4.txt
> 
> After a while there are indeed some messages. But nothing shows in dmesg
> while the temperate goes up and down between 51° and 62°.
> 

/sys/class/thermal/thermal_zone0/trip_point_5_temp:66000
/sys/class/thermal/thermal_zone0/trip_point_5_type:active
/sys/class/thermal/thermal_zone0/trip_point_6_temp:45000
/sys/class/thermal/thermal_zone0/trip_point_6_type:active

there is no trip point between 45C and 66C, so it is reasonable that there is no fan state change when the temperature goes up and down between 51C and 62C.

(In reply to Olaf Hering from comment #29)
> Created attachment 170921 [details]
> bko67101-v4.0-rc4.txt
> 
> After suspend and resume the situation changed. 
> 
> All cooling_device{0,1,2,3,4}/cur_state are zero. The temperature goes up
> and down (with noisy fan) between 46° and (I think) 57°.
> 
> After a while I realized that all cur_state are zero. So first I wrote 1
> into #3, nothing changed. Then I wrote 1 into #4, its cur_state became 1 and
> the fan ran very slowly. Then I wrote 1 into #2. Its cur_state remains zero,
> but the fan starts to run slowly and the temperature stays at 51°, ±1°.
> 
Mar 18 08:38:59 probook kernel: [62377.578066] thermal_zone_trip_update: thermal thermal_zone0: Trip6[type=0,temp=53000]:trend=1,throttle=1
Mar 18 08:38:59 probook kernel: [62377.580840] get_target_state: thermal cooling_device4: cur_state=0
Mar 18 08:38:59 probook kernel: [62377.580849] thermal_zone_trip_update: thermal cooling_device4: old_target=-1, target=1
Mar 18 08:38:59 probook kernel: [62377.580855] thermal_cdev_update: thermal cooling_device4: zone0->target=1
Mar 18 08:38:59 probook kernel: [62377.583872] thermal_cdev_update: thermal cooling_device4: set to state 1

The dmesg shows that fan4 is set to state 1.
After this message happens, please double check if a cooling_device4/cur_state is still 0.
Comment 35 Zhang Rui 2015-03-20 03:55:16 UTC
Anyhow, this does not seems like a kernel problem to me.
I will send you a customized DSDT, please check if the problem still exists.
Comment 36 Zhang Rui 2015-03-20 04:07:23 UTC
Created attachment 171351 [details]
debug patch

please apply this patch on top and see if fan misbehavior still exists after resume.
Comment 37 Zhang Rui 2015-03-20 04:11:57 UTC
BTW, there is no customized DSDT, please just follow comment #36.
Comment 38 Olaf Hering 2015-03-20 09:59:15 UTC
(In reply to Zhang Rui from comment #34)
> The dmesg shows that fan4 is set to state 1.
> After this message happens, please double check if a
> cooling_device4/cur_state is still 0.

cooling_device4/cur_state always shows the 0 or 1 I write into it. Just the others are always zero. 

Is this just this model which does not show the selected fan speed? Hmm, I think with the old firmware also 0-3 showed the value I wrote into cur_state.
Comment 39 Olaf Hering 2015-03-20 10:01:42 UTC
And after a fresh boot I was able to start the fan by writing 1 to cooling_device2/cur_state. Have to check if that is caused by the patch from #36. Will attach dmesg.
Comment 40 Olaf Hering 2015-03-20 10:04:15 UTC
Created attachment 171401 [details]
bko67101-v4.0-rc4+.txt

dmesg after fresh boot and comment #136.
Comment 41 Olaf Hering 2015-03-23 16:42:57 UTC
Created attachment 171811 [details]
bko67101-v4.0-rc5-syslog.txt

 /var/log/messages

reboot into -rc5 around 11am
Comment 42 Olaf Hering 2015-03-23 16:44:01 UTC
Created attachment 171821 [details]
bko67101-v4.0-rc5.txt

full dmesg 4.0-rc5 + .config
Comment 43 Olaf Hering 2015-03-23 16:57:48 UTC
comment #41 and comment  #42 contain todays boot.

What happend at 9am is that the laptop was cold, so cooling_device4/cur_state did not react to "echo 1 > cur_state" until the temperature reached some level. I dont have logs for that.
Then I rebased to -rc5 and booted that new kernel, around 11am.
At some point I was able to cur_state in #4 to turn on the fan, and used #2 to set it at low speed. I set also low cpu speed at that time:
# for i in /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor
# do
#  echo powersave > $i
# done

Temperature and noise level was fine. One hour later I left for lunch. When I returned around 2pm the fan was running loud. I think the temperature was around 66°. #4 cur_state was 0. 

I was able to set it to 1, then enabled #2. At some point the fan started to run slowly.

To me it looks like the temp fell below a certain point, which disabled #4. This essentially means the "firmware" is now in charge to handle fan and temperature.

I wonder if a) the kernel has correct data anyway to keep the fan at a certain speed as temp goes up and down during usage, and b) the kernel can monitor state of the hardware.

Not sure if a is true. Are these "trip points" correct?
What about b? Since "cur_state" is zero for #0 to #3, how would it know whats going on?
Comment 44 Olaf Hering 2015-03-24 07:55:03 UTC
Created attachment 171941 [details]
bko67101-v4.0-rc5.txt

dmesg from cold boot. Writes to #4/cur_state are not recognized until the temperature reached 53°. Then enabling the fan via #2/cur_state works.
Comment 45 Olaf Hering 2015-03-26 14:07:04 UTC
Where is the place where all the cooling_device#|temperatures are read?

Its all wrong on that model. #4 is a fan on/off knob. #0-#3 are fan speeds, but their attached temperatures are wrong. If temp goes above 66° #3 is turned on, even if I selected #0 manually already. The result is that the fan is nearly off, while it ran fast once I selected it. 
Looks like the temps should be like that:
#0 60°
#1 56°
#2 48°
#3 1° or whatever
#4 on/off

Is the kernel supposed to enable #2 when temp goes from 58° to 54°, or will it do that only if temp reaches 48°?
Comment 46 Zhang Rui 2016-03-15 05:37:35 UTC
we have made a couple of changes in thermal subsystem recently, please confirm if the problem still exists in 4.5.

Note that, it is also possible that the problem still exists, but with different symptom, so, please give a detailed description about the problem if it's still not working well.
Comment 47 Olaf Hering 2016-03-23 17:32:28 UTC
The problem does still exist, perhaps I should either open a fresh new bug or update the Summary.

Essentially the kernel does not enable the fan itself during boot. This has to be done manually with something like:

while :
do
 for i in 2 4
 do
  echo 1 > /sys/class/thermal/cooling_device$i/cur_state
 done
 sleep 2
done

But this works only if the temperature reaches 55C (or similar). Once that is done, another loop variant is required to really kick the fan to low speed:

while :
do
 for i in 2 4
 do
  echo 1 > /sys/class/thermal/cooling_device$i/cur_state
  cat /sys/class/thermal/cooling_device$i/cur_state > /dev/null
 done
 sleep 2
done

All this works only if the fancontrol is set to user_space. step_wise will just follow ACPI, and since the provided temp values are all way off nothing will ever happen.
Comment 48 Zhang Rui 2016-05-09 02:19:47 UTC
(In reply to Olaf Hering from comment #47)
> The problem does still exist, perhaps I should either open a fresh new bug
> or update the Summary.
>
exactly.
Please file a new bug with the acpidump, dmesg and the detailed symptomof the problem.

> 
> All this works only if the fancontrol is set to user_space.

No, please don't use user_space governor.

> step_wise will
> just follow ACPI, and since the provided temp values are all way off nothing
> will ever happen.

what do you mean by saying "temp values are all way off"?
From your previous dmesg output, I can see the temperature is changing as expected. Why you can not use step_wise governor?
Comment 49 Zhang Rui 2016-05-09 02:21:59 UTC
When filing a new bug report, please
1. give a detail description of the problem
2. attach the acpidump
3. stick to step_wise governor, enable thermal dynamic debug, and attach the dmesg output after the problem reproduced.

Close this bug report as obsolete.