Bug 19452 - Fan control fails after wakeup from S3 on HP laptop Compaq 6735b
Summary: Fan control fails after wakeup from S3 on HP laptop Compaq 6735b
Status: CLOSED DOCUMENTED
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Fan (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lan Tianyu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-10-01 22:26 UTC by Manuel Ullmann
Modified: 2015-01-25 23:13 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.2.6-3.10-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump output (438.01 KB, application/octet-stream)
2010-10-09 14:32 UTC, Manuel Ullmann
Details
output of "grep . /sys/devices/virtual/thermal/*/*" (2.84 KB, text/plain)
2010-10-09 14:33 UTC, Manuel Ullmann
Details
output of "grep . /sys/devices/virtual/thermal/*/trip_point_?_temp" (268 bytes, text/plain)
2010-10-09 20:01 UTC, Manuel Ullmann
Details
output of stopwatch script (1.06 KB, text/plain)
2010-10-14 08:02 UTC, Manuel Ullmann
Details
debug.patch (491 bytes, patch)
2013-06-29 06:12 UTC, Lan Tianyu
Details | Diff
output of acpi_listen with patched kernel (609 bytes, application/octet-stream)
2013-06-29 09:22 UTC, Manuel Ullmann
Details
bash script used as workaround for this bug (5.44 KB, application/x-shellscript)
2015-01-06 14:34 UTC, Manuel Ullmann
Details
bash script used as workaround for this bug (6.19 KB, application/x-shellscript)
2015-01-25 23:13 UTC, Manuel Ullmann
Details

Description Manuel Ullmann 2010-10-01 22:26:59 UTC
After resuming from S3 the acpi fan control seems to be disabled. All trip-points are ignored, if their temperature is reached.

Steps to Reproduce:

1) Suspend the machine.

2) Resume.

3) Do anything stressing, until a trip-point temperature is reached.
(I prefer make -j3 in a kernel-source-directory)

Actual results:

The fan remains on his state.

Expected results:

The fan should have increased its RPM.

Build Date & Platform:

Build 2010-09-31 on Gentoo Linux 10.0

Did not test this on other platforms.

Additional information:

My laptop fan has 5 states (FAN0-4).

On resuming it is always in the following state:
FAN0:off #loud
FAN1:off
FAN2:off
FAN3:on
FAN4:on # silent
At least this is, what is reported to the OS, actually it is always at the state of FAN0. Doing an echo 3 > /proc/acpi/fan/FAN3/state after resuming will fix this. The Fan state could be triggered by the BIOS, but this is just a guess.

If the temperature reaches ca. 85°C, the fan is again triggered to the state of FAN0, without the OS knowing about this. This could also be a BIOS mechanism, as such temperatures can reduce hardware lifetime.

Hibernating works fine, as long as the system is hibernated of a clean session.
If the session was suspended before, the fan control won´t work on resume.

I can manually control the fan via the /proc/acpi/fan interface. Activating a fan state works only, if the trip-point-temperature was reached.
i.e.:
The trip-point of FAN2 is 55°C. If I do an
echo 0 > /proc/acpi/fan/FAN2/state
while the temperature is <55°C this won´t have an effect.
If I do this while the temperature is >=55°C it will activate the fan state.

If you can´t confirm this, the bug could be specific to my laptop model (wouldn´t be the first time).

Please tell me, if I can provide further information.
Comment 1 Zhang Rui 2010-10-08 03:18:19 UTC
please attach the acpidump output.
please attach the output of "grep . /sys/class/thermal/*/*".
Comment 2 Manuel Ullmann 2010-10-09 14:32:26 UTC
Created attachment 32982 [details]
acpidump output
Comment 3 Manuel Ullmann 2010-10-09 14:33:11 UTC
Created attachment 32992 [details]
output of "grep . /sys/devices/virtual/thermal/*/*"
Comment 4 Manuel Ullmann 2010-10-09 20:01:38 UTC
Created attachment 33002 [details]
output of "grep . /sys/devices/virtual/thermal/*/trip_point_?_temp"

The trip-point temperatures are reduced by 5°C, if they are reached while the fan state of the lower trip-point is not active.
So, if trip_point_2_temp (60C) is reached and I did manually deactivate the lower fan state,

echo 0 > /sys/devices/virtual/thermal/cooling_device3/cur_state

the trip-point-temperature will be reduced to 55°C. By manually deactivating the Fan-states, I can trigger the activation of cooling_device0 at exactly 85°C, which is not seen by .../thermal/*/cur_state. (see attachement)

The deactivation of this fan state seems to depend on CPU load and temperature.
There is no fixed temperature-value, where it will reproducable stop.
Due to my test results, I would say, that the time amount between the interruption of kernel compiling and spinning down of the fan gets lesser, if the temperature is lower.
It is between 18 and 2 seconds, where 18 seconds are the time amount after an immediate interruption, and 2 seconds after an interruption 78°C.

When I suspend and wakeup, the trip-point-temps are restored to their origins.
The Fan-states are correctly restored, forget this point in the initial comment.
It seems, that after the suspend the trip-point-temperature for the above mentioned Fan-State is increased to ca. 87-89°C. Maybe it actually depends completely on CPU load, but the 85°C trip-point on the first activation wouldn´t be reproducable then. Very strange, however.

I have manual fan control via sysfs unless
  - I switch off not the highest fan-state.
  - cooling_device0 is activated.
I can´t control any fan states, which are associated with a higher trip-point-temp, than current temp. Therefore cooling_device0 will never be activated in a normal way (which could be seen in sysfs).

On most wakeups, the fan spins up to state0, which is not seen by the system. This can be fixed by de- and reactivate the highest fan state. In rare cases the fan state is restored correctly.
Comment 5 Manuel Ullmann 2010-10-09 20:09:53 UTC
Forgot to mention, after the suspend I can control only the states, which are restored. I´ve updated to kernel-2.6.36-rc7-git1. Note, that I could reproduce this also with 2.6.27.54.
Comment 6 Manuel Ullmann 2010-10-14 08:02:44 UTC
Created attachment 33572 [details]
output of stopwatch script

Hi again,

I wrote a little script to count time amounts for the fan spin up issue at ca. 85-89°C. I commented and tightened the output and will attach it now.

Here is the script:

#!/bin/bash
#stopwatch.sh
cat /sys/class/hwmon/hwmon0/device/temp1_input
time read -s -n1 -p "Hit any key to stop counting."
cat /sys/class/hwmon/hwmon0/device/temp1_input
exit 0

I hope this way you can form an opinion about this issue.
Comment 7 Manuel Ullmann 2010-10-14 08:27:28 UTC
Updating affected kernel version.

Could you tell me valid values for /sys/devices/virtual/thermal/thermal_zone?/mode ? The documentation seems to be outdated. It still states the values are 'kernel' and 'user'. But neither is it set to any of these values (instead 'enabled') nor would sysfs accept these (invalid argument).
Maybe de- and reactivating the trip-point system would trigger a recalculation of the trip points and the current fan state.

Also there were a two people with my laptop model answering on my last bug report, bug 15166. A kernel developer with ssh access and an owner of it. Maybe they could confirm these issues. (see comment #7 and #40) Maybe I should put them on the CC list. What do you think?
Comment 8 Zhang Rui 2010-12-16 06:34:30 UTC
As there are quite a lot of changes on ACPI power resource recently, and some of the fixes have not been shipped in upstream kernel yet.
It would be great if you can wait until 2,6,37-rc6 released and give it a try.
Comment 9 Zhang Rui 2010-12-27 01:53:07 UTC
ping, 2.6.37-rc7 released, can you please verify if the problem still exists?
Comment 10 Manuel Ullmann 2010-12-30 17:17:54 UTC
Sorry, have not checked my mails recently. Tested 2.6.37-rc8 and could reproduce the issues mentioned above. I´m still trying to figure out, if anything changed. Compiling 2.6.36.2 to compare. I´ll report back, if I notice something.
Comment 11 Manuel Ullmann 2010-12-30 19:10:44 UTC
Ok, I have noticed a few things.

- sysfs is much less responsive. I need 3-8 tries until I can successfully 
deactivate a fan state.

- Resuming from hibernation causes fan control loss. It can be regained by hibernating while the trip point between 80 and 90°C is active. Resuming such a session will activate most fan states. Fan control is regained by deactivating those fan states, when temperature is lower. The trip point at 60°C is not recognised, if temperature falls below it.

- Resuming from suspend causes still the same behaviour. Windows XP seems to have similar problems after resuming from S3. It regains fan control at 90°C.
Comment 12 Zhang Rui 2012-01-18 02:19:33 UTC
It's great that kernel bugzilla is back.

can you please verify if the problem still exists in the latest upstream
kernel?
Comment 13 Manuel Ullmann 2012-01-30 13:02:52 UTC
My laptop is repaired at the moment. I´ll test the latest kernel, when I get it back.
Comment 14 Manuel Ullmann 2012-02-20 14:09:09 UTC
I tested 3.2.6 and the gentoo-sources 3.2.1-r2. I get a different behaviour in both versions.
The behaviour of 3.2.6 is equal to the one I reported earlier: at ca. 88°C the fan is triggered in his highest state, falling back to the restored state, when CPU load is low or temperature is at around 80°C.

The gentoo-sources trigger the fan at 90°C to its highest state and fan control is regained. This is the same behaviour as in Windows XP.

It is still not possible to trigger a fan state with echo 1 > /sys/devices/virtual/thermal/cooling_device?/cur_state, but deactivating a state with echo 0 > ... is possible.

If you like, I can test the vanilla-sources 3.2.1 to see if this change the behaviour to the one I experience with the gentoo-kernel. The gentoo-patches unlikely patched this issue, and basically consist of patches for the livecd (console decorations etc.).
Comment 15 Manuel Ullmann 2012-06-15 15:30:29 UTC
Tested 3.4.2 and gentoo-sources 3.2.12. Same behaviour. Suspend to disk works fine.
Comment 16 Manuel Ullmann 2012-10-23 19:34:58 UTC
Tested 3.6.2. Same behaviour. Updated affected kernel versions. I will test kernel 3.8 next, when it is released.
Comment 17 Lan Tianyu 2013-06-25 07:36:28 UTC
Hi:
    Could you check whether the issue take places at v3.10-rc7?
Comment 18 Manuel Ullmann 2013-06-26 15:30:22 UTC
It is still acting alike at 3.10-rc7. What would be now the correct status to set?
Comment 19 Lan Tianyu 2013-06-28 08:05:48 UTC
Could you apply the following patch and use acpi_listen to catch acpi event?
Compare the result when temperature is rising before or after system suspend.


diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index a33821c..3cf1816 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -976,6 +976,7 @@ static void acpi_thermal_notify(struct acpi_device *device, u32 event)
        switch (event) {
        case ACPI_THERMAL_NOTIFY_TEMPERATURE:
                acpi_thermal_check(tz);
+               acpi_bus_generate_proc_event(device, event, 0);
                break;
        case ACPI_THERMAL_NOTIFY_THRESHOLDS:
                acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_THRESHOLDS);
Comment 20 Manuel Ullmann 2013-06-28 18:11:14 UTC
I have somehow issues with applying the patch. I copied the patch and called it acpi.patch in the linux source directory. I removed the line break before u32 event and am now trying to apply it with "patch -p1 -i acpi.patch", but the output is 
"patching file drivers/acpi/thermal.c
Hunk #1 FAILED at 976.
1 out of 1 hunk FAILED -- saving rejects to file drivers/acpi/thermal.c.rej"
. What is wrong?
Comment 21 Lan Tianyu 2013-06-29 06:12:14 UTC
Created attachment 106301 [details]
debug.patch

I attach the patch as an attachment. Please try it on the v3.10-rc7.
Comment 22 Manuel Ullmann 2013-06-29 09:22:32 UTC
Created attachment 106321 [details]
output of acpi_listen with patched kernel

Ok, I tested the kernel and I attach the output of acpi_listen (which is basically disabled after suspend). With the patch I get two outputs for each even, but I can't see the information gain at the moment. The Fan triggering to its highest state is not shown by acpi_listen. Yesterday during a firefox compile with kernel 3.8.13-gentoo this was also in a normal session the case.
Comment 23 Lan Tianyu 2013-06-30 09:36:30 UTC
From the output you attached, the Bios doesn't produce thermal events after system suspend. So this should be a Bios bug.
Comment 24 Manuel Ullmann 2013-06-30 13:00:22 UTC
Yeah, the Bios is kind of broken. The option "Fan always on while on ac" has no effect. The Fan is always on full speed outside of the OS. The summer time changing is not working properly. Since the last Bios update every second boot fails, because of an "unknown Wlan module". Rebooting resolves it. If an bootable dvd is in the dvd-device, and it takes too long to load a few leds start to blink and the boot is aborted.
There are certainly still a few more bugs, which I could not see. The laptop model has an early uefi without secure boot. I don't think HP will ship any further bios updates for this old model. I will use it until the next hardware defect and then try another vendor.
Comment 25 Manuel Ullmann 2015-01-06 14:34:20 UTC
Created attachment 162611 [details]
bash script used as workaround for this bug

I found kinda a workaround for this bug. There is this nice kernel option THERMAL_EMULATION. Playing around with it, I found out, that the Bios seems to check for temperature equality for sending its ACPI events. If the temperature is risen too fast (like in every average heating scenario), it will send an ACPI switch fan off event.
If the temperature is reduced by 5°C this will usually switch the fan off completely.
The script could so far not activate the last fan state.
echo 80000 > emul_temp
had no effect, no matter how long I did wait. Anyway it was triggered quite rarely normally.
Currently spinning the fan down is usually done while emul_temp is 0 (deactivated), so I could not test the condition, where the active fan state is too high.
There is still the emergency ventilation, which spins the fan up in half a second. It happens now earlier just above 70°C. I noted, that k10temp has suspiciously set its temp1_max to 70°C, but I don´t know, whether that is related.
On the other hand deactivating the fan via sysfs nodes and heating the system did sometimes cause this behaviour directly after reboot, so this might be still be caused by the BIOS.
Well, I have talked enough for now.
Maybe this will help any owners of this laptop, who still use it.
Comment 26 Manuel Ullmann 2015-01-25 23:13:56 UTC
Created attachment 164731 [details]
bash script used as workaround for this bug

So the case of the fan being too fast did now come up a few times and as expected the script was buggy. I fixed that, did some other cleanup and considered, that the cooling device order in thermal_zone changes from session to session, so I do a little bit sorting. The script has now been working for quite a while. There is room for improvement, but it does what it has to.
If the reported temperature decreases by 2°C too fast, the fan will still shut down, so I can´t avoid this. I´m testing currently, whether waiting longer than 30 seconds in the tooHigh case will change anything. If not, I can comment the whole tooHigh part, simply echo 34000 to emul_temp and reinitialise the fan.
Nevertheless I can´t avoid it if I have to activate a fan state for 55000 trip point, while the current temperature is already at 65°C.

Note You need to log in before you can comment on or make changes to this bug.