Created attachment 118561 [details] dmesg-3.12.5.txt HP ProBook 6555b, openSUSE 11.4, GNOME desktop, AMD Phenom(tm) II N830 Triple-Core Processor With mainline kernel 3.9 the fan goes smooth, just enough to provide a little air flow. Its almost, but not entirely, silent. If there is some load the fan goes faster after a while, then goes slower. Or if there is real high load it goes really fast and loud until the temp is below a certain value. Then it goes slow again. Or, if I'm out of luck, the load is so high over a longer period of time that the system just shuts off itself. The latter can be avoided by forcing "powersave" scaling_governor. All this continues to work after a suspend/resume cycle. Now with 3.12 the fan behaves different. During initial bootup the fan behaves just as with 3.9, everything is fine. After a suspend/resume the fan is either off or on. What I see (according to sensors(1)) is that if the fan is off the cpu temp slowly raises from 69° to about 76°. Then the fan starts fast. Now the temp slowly drops down towards 69°. Once that is reached the fan goes off again, with the result that the temp slowly starts to raise again until the fan kicks in. Since that fan noise is annoying it would be nice to know what is causing the difference between 3.9 and 3.12. As a start I attach the 3.12 dmesg, .config and .config diff betwen 3.9 and 3.12. Since its my workstation it takes some time to figure out when this started happening.
Created attachment 118571 [details] config-3.12.5.txt
Created attachment 118581 [details] config-diff-3.9-to3.12.txt differences in .config, nothing appears to cause this. Unless I have an important option disabled, which starts to matter with post 3.9.
Created attachment 118591 [details] dmesg-3.11.10.txt 3.11 appears to behave like 3.9. I just did a suspend/resume cycle and the fans do not go up and down.
Please attach acpidump: # acpidump > acpidump.txt
Created attachment 118811 [details] acpidump.txt
Please show me: $ ls -l /sys/class/thermal/thermal_zone*/ And show the below cmd both before suspend and after resume: $ cd /sys/class/thermal $ grep . */* 2>/dev/null It seems that some of the ACPI FAN device stops working after resume.
Created attachment 118921 [details] sys-thermal-3.11.10.txt 3.11 output, after a few resume cycles. 3.12 will follow.
Created attachment 118941 [details] sys-thermal-3.12.5.txt this is before suspend. After resume the difference is essentially: ... cooling_device2/max_state:1 cooling_device2/type:Fan [-cooling_device3/cur_state:1-] {+cooling_device3/cur_state:0+} cooling_device3/max_state:1 cooling_device3/type:Fan [-cooling_device4/cur_state:1-] {+cooling_device4/cur_state:0+} cooling_device4/max_state:1 cooling_device4/type:Fan ... thermal_zone0/policy:step_wise [-thermal_zone0/temp:69000-] {+thermal_zone0/temp:70600+} thermal_zone0/trip_point_0_temp:105000 ... thermal_zone0/trip_point_3_type:active [-thermal_zone0/trip_point_4_temp:74000-] {+thermal_zone0/trip_point_4_temp:67000+} thermal_zone0/trip_point_4_type:active ...
When temperature drops below 67, the cooling_device4 should be kept on while cooling_device3 will be turned off. It seems we didn't keep cooling_device4 on for that case, I'll need to check the code see if there is something wrong, in the mean time, can you please test when temp drops below 67, and you feel that the FAN is off, does manually turn on cooling_device3 help? You can manually turn on the fan represented by cooling_device3 by: # cd /sys/class/thermal/cooling_device3 # cat cur_state // should be 0 # echo 1 > cur_state Does the fan starts now? Another interesting things is, the temp for trip point 4 changed after resume.
I'm not sure what is supposed to happen with these cur_state files. With 3.12 nothing changes if I write to cooling_device3. Now I run 3.11 and I did: for i in cooling_device*/c* ; do echo 0 > $i ; done This did indeed lead to instant silence. However, writing 1 or 3 to the files does not cause any fan to start. Instead the system slowly starts to heat up (according to sensors). Once temp reaches 80° the fan starts to run fast. That is not reflected in the cur_state files. Then temp drops again and fan stops. Is the kernel supposed to actively drive the fans anyway via these files?
Now after fresh boot into 3.11: root@probook:/sys/class/thermal # head cooling_device*/c* ==> cooling_device0/cur_state <== 0 ==> cooling_device1/cur_state <== 0 ==> cooling_device2/cur_state <== 1 ==> cooling_device3/cur_state <== 1 ==> cooling_device4/cur_state <== 1 ==> cooling_device5/cur_state <== 0 ==> cooling_device6/cur_state <== 0 ==> cooling_device7/cur_state <== 0 root@probook:/sys/class/thermal # echo 0 > cooling_device3/cur_state root@probook:/sys/class/thermal # head cooling_device*/c* ==> cooling_device0/cur_state <== 0 ==> cooling_device1/cur_state <== 0 ==> cooling_device2/cur_state <== 0 ==> cooling_device3/cur_state <== 0 ==> cooling_device4/cur_state <== 1 ==> cooling_device5/cur_state <== 0 ==> cooling_device6/cur_state <== 0 ==> cooling_device7/cur_state <== 0 one of the fans is now off, another one is still running. Looks like that single write affected both #2 and #3. If I write 1 to #2, then I hear the fan goes a little faster for one or two seconds, then goes off or slower again. but cur_state remains 0.
Write 0 to cur_state cause the cooling device representing the FAN to off state, write 1 cause the FAN to on state. Most people normally have one physical FAN, while the firmware presents several virtual FANs representing different FAN speed for the physical FAN. OS can only see virtual FANs by querying firmware, so the multiple cooling_deviceX should be that virtual FANs. You write all 0 to them and that will cause all those virtual FANs to an OFF state, so you get a quiet laptop. Writing 1s to them should cause the FAN to on state. When you play with those cooling_deviceX sysfs files, be sure to check its type, only FAN type cooling_device should be touched.
With 3.11 I'm appearently able to control fan speed by writing 1 to one of the cooling_device{0..4}/cur_state files. Each one seems to represent a certain speed, maybe there is even more than one fan inside. No idea. I will see if 3.13 allows the same. If not I have to bisect 3.11..3.12 to see if a certain change causes the bug.
From what I have seen this week is that writing 1 to /sys/class/thermal/cooling_device{2,3,4}/cur_state while the firmware starts the fans in emergency seems to restore the pre-suspend state. I have not found the exact pattern and time when writing to these files will restore the state. Next week I will try to bisect between 3.11 and 3.12 to find the offending commit.
does the problem still exist in latest upstream kernel, say 3.16-rc2?
I had to send it to HP for repair because it failed to poweron after suspend. The motherboard was replaced. Now for some reason the fancontrol does not seem to work at all. The fan is quiet, then it runs for a while, then quiet again. The temp goes up and down between 51° and maybe 60°. Appearently the firmware changed from "BIOS 68DTM Ver. F.09 05/04/2011" to "BIOS 68DTM Ver. F.21 06/14/2012". In the BIOS settings there is a "FAN always on when on AC power", but toggling it does not seem to change. I tested 3.15 and 3.16. I will attach dmesg and acpidump.
Created attachment 146321 [details] dmesg-3.15.9.txt
Created attachment 146331 [details] acpidump.txt
as there is a firmware upgrade, please attach the output of "grep . /sys/class/thermal/thermal*/*" in 3.15 or 3.16 kernel.
ping...
with 3.18.0: # grep . /sys/class/thermal/thermal*/* /sys/class/thermal/thermal_zone0/cdev0_trip_point:1 /sys/class/thermal/thermal_zone0/cdev1_trip_point:1 /sys/class/thermal/thermal_zone0/cdev2_trip_point:1 /sys/class/thermal/thermal_zone0/cdev3_trip_point:6 /sys/class/thermal/thermal_zone0/cdev4_trip_point:5 /sys/class/thermal/thermal_zone0/cdev5_trip_point:4 /sys/class/thermal/thermal_zone0/cdev6_trip_point:3 /sys/class/thermal/thermal_zone0/cdev7_trip_point:2 /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/policy:step_wise /sys/class/thermal/thermal_zone0/temp:50200 /sys/class/thermal/thermal_zone0/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:100000 /sys/class/thermal/thermal_zone0/trip_point_1_type:passive /sys/class/thermal/thermal_zone0/trip_point_2_temp:90000 /sys/class/thermal/thermal_zone0/trip_point_2_type:active /sys/class/thermal/thermal_zone0/trip_point_3_temp:82000 /sys/class/thermal/thermal_zone0/trip_point_3_type:active /sys/class/thermal/thermal_zone0/trip_point_4_temp:74000 /sys/class/thermal/thermal_zone0/trip_point_4_type:active /sys/class/thermal/thermal_zone0/trip_point_5_temp:66000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/trip_point_6_temp:45000 /sys/class/thermal/thermal_zone0/trip_point_6_type:active /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/cdev0_trip_point:1 /sys/class/thermal/thermal_zone1/cdev1_trip_point:1 /sys/class/thermal/thermal_zone1/cdev2_trip_point:1 /sys/class/thermal/thermal_zone1/mode:enabled /sys/class/thermal/thermal_zone1/policy:step_wise /sys/class/thermal/thermal_zone1/temp:27500 /sys/class/thermal/thermal_zone1/trip_point_0_temp:105000 /sys/class/thermal/thermal_zone1/trip_point_0_type:critical /sys/class/thermal/thermal_zone1/trip_point_1_temp:55000 /sys/class/thermal/thermal_zone1/trip_point_1_type:passive /sys/class/thermal/thermal_zone1/type:acpitz
please try the two patches at comment #131 and comment #132 in bug #78201 on top of 3.19 kernel and see if the problem still exists.
Created attachment 167121 [details] v3.19-fan.patch This is the variant I used on top of v3.19. In the end it does not help. As said earlier: After exchanging the mainboard the fan stays off most off the time. During this time the system heats up until around 60°. Then the fan kicks in at full speed until its cooled down to around 50° Appearently there is just a (the?) fan running very slowly. It can be turned off by writing 0 to cooling_device4/cur_state. But writing 1 to it does not power it on. Every other cur_state file has 0, and writing 1 to it has no effect.
Do we have any contact at HP who can help with understanding what is actually required to control the fans? Or is that info already available within the accessible ACPI infos?
please try the patch set at https://bugzilla.kernel.org/show_bug.cgi?id=78201#c150 (please ignore "0001-Debug-patch-to-sync-thermal-zone-update.patch" from the tarball as the code obviously reappears under the correct new name "0004-Thermal-make-thermal_zone_device_update-atomic.patch".) and boot with boot option module.dyndbg="module thermal_sys +fp" and then attach the dmesg output after the problem is reproduced.
Created attachment 170741 [details] bko67101-v4.0-rc4.txt This is dmesg+.config from a fresh boot with v4.0-rc4 plus the patchset mentioned in previous comment. The/One fan still runs very very slowly, the temp goes up to around 62°, then it runs fullspeed until its cooled down, then it runs again very very slowly while it continues to heat up.
I checked your dmesg output, and everything works as expected. Say, during boot, temperature is 60C, and the fan is running at lowest speed, because only the lowest trip point is crossed. (trip1 = 45C, trip2 = 66C) Plus, I didn't see the fan is running in full speed, from your dmesg output, is the dmesg output got after the fan running in full speed? if no, please do, and please also attach the output of "grep . /sys/class/thermal/cooling*/*" when the fan is running at full speed, at around 60C.
Created attachment 170781 [details] bko67101-v4.0-rc4.txt After a while there are indeed some messages. But nothing shows in dmesg while the temperate goes up and down between 51° and 62°. The sysfs output does not seem to change: + sensors acpitz-virtual-0 Adapter: Virtual device temp1: +51.5C (crit = +105.0C) temp2: +29.2C (crit = +105.0C) k10temp-pci-00c3 Adapter: PCI adapter temp1: +51.5C (high = +70.0C) (crit = +100.0C, hyst = +95.0C) + grep . /sys/class/thermal/cooling*/* /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:1 /sys/class/thermal/cooling_device0/type:Fan /sys/class/thermal/cooling_device1/cur_state:0 /sys/class/thermal/cooling_device1/max_state:1 /sys/class/thermal/cooling_device1/type:Fan /sys/class/thermal/cooling_device2/cur_state:0 /sys/class/thermal/cooling_device2/max_state:1 /sys/class/thermal/cooling_device2/type:Fan /sys/class/thermal/cooling_device3/cur_state:0 /sys/class/thermal/cooling_device3/max_state:1 /sys/class/thermal/cooling_device3/type:Fan /sys/class/thermal/cooling_device4/cur_state:1 /sys/class/thermal/cooling_device4/max_state:1 /sys/class/thermal/cooling_device4/type:Fan /sys/class/thermal/cooling_device5/cur_state:0 /sys/class/thermal/cooling_device5/max_state:24 /sys/class/thermal/cooling_device5/type:LCD /sys/class/thermal/cooling_device6/cur_state:0 /sys/class/thermal/cooling_device6/max_state:3 /sys/class/thermal/cooling_device6/type:Processor /sys/class/thermal/cooling_device7/cur_state:0 /sys/class/thermal/cooling_device7/max_state:3 /sys/class/thermal/cooling_device7/type:Processor /sys/class/thermal/cooling_device8/cur_state:0 /sys/class/thermal/cooling_device8/max_state:3 /sys/class/thermal/cooling_device8/type:Processor
Created attachment 170921 [details] bko67101-v4.0-rc4.txt After suspend and resume the situation changed. All cooling_device{0,1,2,3,4}/cur_state are zero. The temperature goes up and down (with noisy fan) between 46° and (I think) 57°. After a while I realized that all cur_state are zero. So first I wrote 1 into #3, nothing changed. Then I wrote 1 into #4, its cur_state became 1 and the fan ran very slowly. Then I wrote 1 into #2. Its cur_state remains zero, but the fan starts to run slowly and the temperature stays at 51°, ±1°. #4 behaves the same after fresh boot, the fan can be controlled for very low speed and cur_state changes. Now #2 reacts at least. But only if I first write 1 into #4. After playing around some more it goes like this: write 0 to #2, #4 turns it off write 1 to #2 does nothing write 1 to #4 and #2 starts the fan. write 1 to #4 and #1 starts the fan at slightly higher speed. write 1 to #4 and #0 starts at even higher speed. and writing 1 to #2 slows down the fan again. But, just a single 'echo 1 > cur_state' is not enough, I have to do it in a while loop until I hear that the fan reacts.
Created attachment 171051 [details] bko67101-start-fan-after-resume.txt Today I tried to kick the fan after resume. First #4 did not react at all to my echo 1 > cur_state. I had to wait until the kernel recognized some state change. Once that happend, I was able to write 1 into 4/cur_state and 1/cur_state. Now the fan runs at a reasonable low speed which seem to keep the temperature around 51°.
The fans you can turn off are ascending in numbers but descending in power. You would need to turn off "echo 0 > /sys/class/thermal/cooling_device0" (100%) first and then any further == lower speed with numbers 1,2,3 up to 4 in this row. Turning all other + No. 4 to 0 means no cooling. So, take actions for possible overheating.
Just kidding somehow: Some people on the internet even recommend that we should just only clean our Notebook FAN?! ^^
about of all, about the strange behavior of setting the fan manually, I checked the BIOS code, and this is caused by BIOS. Method (FNON, 2, Serialized) { ShiftLeft (Arg0, 0x01, Local0) Decrement (Local0) If (LEqual (And (CRTF, Local0), 0x00)) { \_SB.PCI0.LPCB.EC0.KSFS (Arg1) } Or (CRTF, Arg0, CRTF) } Method (FNOF, 2, Serialized) { And (CRTF, Not (Arg0), CRTF) Store (0x00, Local0) If (CRTF) { Store (Arg1, Local0) } If (LOr (Arg1, LEqual (CRTF, 0x00))) { \_SB.PCI0.LPCB.EC0.KSFS (Local0) } } This is how fan works 1. After boot, you must setting fan4 first, or else other fan setting is a nop before turning on fan4. 2. after resume, setting Fan speed to zero does not work.
(In reply to Olaf Hering from comment #28) > Created attachment 170781 [details] > bko67101-v4.0-rc4.txt > > After a while there are indeed some messages. But nothing shows in dmesg > while the temperate goes up and down between 51° and 62°. > /sys/class/thermal/thermal_zone0/trip_point_5_temp:66000 /sys/class/thermal/thermal_zone0/trip_point_5_type:active /sys/class/thermal/thermal_zone0/trip_point_6_temp:45000 /sys/class/thermal/thermal_zone0/trip_point_6_type:active there is no trip point between 45C and 66C, so it is reasonable that there is no fan state change when the temperature goes up and down between 51C and 62C. (In reply to Olaf Hering from comment #29) > Created attachment 170921 [details] > bko67101-v4.0-rc4.txt > > After suspend and resume the situation changed. > > All cooling_device{0,1,2,3,4}/cur_state are zero. The temperature goes up > and down (with noisy fan) between 46° and (I think) 57°. > > After a while I realized that all cur_state are zero. So first I wrote 1 > into #3, nothing changed. Then I wrote 1 into #4, its cur_state became 1 and > the fan ran very slowly. Then I wrote 1 into #2. Its cur_state remains zero, > but the fan starts to run slowly and the temperature stays at 51°, ±1°. > Mar 18 08:38:59 probook kernel: [62377.578066] thermal_zone_trip_update: thermal thermal_zone0: Trip6[type=0,temp=53000]:trend=1,throttle=1 Mar 18 08:38:59 probook kernel: [62377.580840] get_target_state: thermal cooling_device4: cur_state=0 Mar 18 08:38:59 probook kernel: [62377.580849] thermal_zone_trip_update: thermal cooling_device4: old_target=-1, target=1 Mar 18 08:38:59 probook kernel: [62377.580855] thermal_cdev_update: thermal cooling_device4: zone0->target=1 Mar 18 08:38:59 probook kernel: [62377.583872] thermal_cdev_update: thermal cooling_device4: set to state 1 The dmesg shows that fan4 is set to state 1. After this message happens, please double check if a cooling_device4/cur_state is still 0.
Anyhow, this does not seems like a kernel problem to me. I will send you a customized DSDT, please check if the problem still exists.
Created attachment 171351 [details] debug patch please apply this patch on top and see if fan misbehavior still exists after resume.
BTW, there is no customized DSDT, please just follow comment #36.
(In reply to Zhang Rui from comment #34) > The dmesg shows that fan4 is set to state 1. > After this message happens, please double check if a > cooling_device4/cur_state is still 0. cooling_device4/cur_state always shows the 0 or 1 I write into it. Just the others are always zero. Is this just this model which does not show the selected fan speed? Hmm, I think with the old firmware also 0-3 showed the value I wrote into cur_state.
And after a fresh boot I was able to start the fan by writing 1 to cooling_device2/cur_state. Have to check if that is caused by the patch from #36. Will attach dmesg.
Created attachment 171401 [details] bko67101-v4.0-rc4+.txt dmesg after fresh boot and comment #136.
Created attachment 171811 [details] bko67101-v4.0-rc5-syslog.txt /var/log/messages reboot into -rc5 around 11am
Created attachment 171821 [details] bko67101-v4.0-rc5.txt full dmesg 4.0-rc5 + .config
comment #41 and comment #42 contain todays boot. What happend at 9am is that the laptop was cold, so cooling_device4/cur_state did not react to "echo 1 > cur_state" until the temperature reached some level. I dont have logs for that. Then I rebased to -rc5 and booted that new kernel, around 11am. At some point I was able to cur_state in #4 to turn on the fan, and used #2 to set it at low speed. I set also low cpu speed at that time: # for i in /sys/devices/system/cpu/cpu?/cpufreq/scaling_governor # do # echo powersave > $i # done Temperature and noise level was fine. One hour later I left for lunch. When I returned around 2pm the fan was running loud. I think the temperature was around 66°. #4 cur_state was 0. I was able to set it to 1, then enabled #2. At some point the fan started to run slowly. To me it looks like the temp fell below a certain point, which disabled #4. This essentially means the "firmware" is now in charge to handle fan and temperature. I wonder if a) the kernel has correct data anyway to keep the fan at a certain speed as temp goes up and down during usage, and b) the kernel can monitor state of the hardware. Not sure if a is true. Are these "trip points" correct? What about b? Since "cur_state" is zero for #0 to #3, how would it know whats going on?
Created attachment 171941 [details] bko67101-v4.0-rc5.txt dmesg from cold boot. Writes to #4/cur_state are not recognized until the temperature reached 53°. Then enabling the fan via #2/cur_state works.
Where is the place where all the cooling_device#|temperatures are read? Its all wrong on that model. #4 is a fan on/off knob. #0-#3 are fan speeds, but their attached temperatures are wrong. If temp goes above 66° #3 is turned on, even if I selected #0 manually already. The result is that the fan is nearly off, while it ran fast once I selected it. Looks like the temps should be like that: #0 60° #1 56° #2 48° #3 1° or whatever #4 on/off Is the kernel supposed to enable #2 when temp goes from 58° to 54°, or will it do that only if temp reaches 48°?
we have made a couple of changes in thermal subsystem recently, please confirm if the problem still exists in 4.5. Note that, it is also possible that the problem still exists, but with different symptom, so, please give a detailed description about the problem if it's still not working well.
The problem does still exist, perhaps I should either open a fresh new bug or update the Summary. Essentially the kernel does not enable the fan itself during boot. This has to be done manually with something like: while : do for i in 2 4 do echo 1 > /sys/class/thermal/cooling_device$i/cur_state done sleep 2 done But this works only if the temperature reaches 55C (or similar). Once that is done, another loop variant is required to really kick the fan to low speed: while : do for i in 2 4 do echo 1 > /sys/class/thermal/cooling_device$i/cur_state cat /sys/class/thermal/cooling_device$i/cur_state > /dev/null done sleep 2 done All this works only if the fancontrol is set to user_space. step_wise will just follow ACPI, and since the provided temp values are all way off nothing will ever happen.
(In reply to Olaf Hering from comment #47) > The problem does still exist, perhaps I should either open a fresh new bug > or update the Summary. > exactly. Please file a new bug with the acpidump, dmesg and the detailed symptomof the problem. > > All this works only if the fancontrol is set to user_space. No, please don't use user_space governor. > step_wise will > just follow ACPI, and since the provided temp values are all way off nothing > will ever happen. what do you mean by saying "temp values are all way off"? From your previous dmesg output, I can see the temperature is changing as expected. Why you can not use step_wise governor?
When filing a new bug report, please 1. give a detail description of the problem 2. attach the acpidump 3. stick to step_wise governor, enable thermal dynamic debug, and attach the dmesg output after the problem reproduced. Close this bug report as obsolete.