Bug 9485
Summary: | ACPI: Unable to turn cooling device 'on' | ||
---|---|---|---|
Product: | ACPI | Reporter: | nico (nico) |
Component: | Power-Thermal | Assignee: | Zhang Rui (rui.zhang) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | acpi-bugzilla, yakui.zhao |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24-rc3 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
cpi_dmesg
acpi_dmesg acpidump (kernel 2.6.24-rc6; Abit IC7-G) dmesg - verbose dmesg after 4h uptime Patch 1/2: ACPI cleanup in power resource Patch 2/2: Delete the strick check in power transistion dmesg - verbose - working dmesg - verbose - w/ patches and w/o acpi.power_nocheck |
Description
nico
2007-12-01 15:56:09 UTC
Created attachment 13813 [details]
cpi_dmesg
dmesg output after some minutes
Created attachment 13814 [details]
acpi_dmesg
dmesg output after some minutes
I am now using the fancontrol script provided by lm_sensors which is doing quite the same job as the Abit FanEQ does, just in software mode. However the bug with FanEQ and ACPI is still there. I think the kernel gets confused with FanEQ turned on and then ACPI tries to turn the fan on (which is on all the time). Hi, Nico, what do you mean the fan is quite loud? Is it spinning very fast even if the temperature is low? Please attach the acpidump output of your laptop. I'm not clear about what the Abit Fan EQ is, but if it programs the fan, then it's true that there may be some conflicts between Abit FanEQ scrtipt and ACPI, and there also may be conflicts between lm_sensors and ACPI. So please stop Abit FanEQ and clear CONFIG_HWMON, and attach the result of "cat /proc/acpi/thermal/*/*". Okay, just to clarify things: It's not a laptop it's a desktop system, the motherboard is a Abit IC7-G and the FanEQ is a simple function in the bios to control the fan (you can set the percentage of the fan speed and a temperature when the fan starts spinning at full speed when the temperature limit is exceeded) Here is a screenshot of it: http://www.all-about-pc.de/Hardware/Motherboards/ABIT_IC7_MAX3/images/faneq2.jpg It's a really simple function but with it turned on I get the dmesg spam (I think once the limit is exceeded it starts spamming) I said the fan is loud, well it is loud by default and I have to throttle it. At the moment I am using the fancontol script provided by lm_sensors to do that (with FanEQ it would spam dmesg). lm_sensors works fine and there is no problem with that, it's just the FanEQ stuff. It would be really more comfortable to use FanEQ though. First of all I don't know how to create an acpidump maybe you can give me a link to a howto or something. Also, do I have to enable ACPI_DEBUG to create a dump? Secondly do you need a dump when the problem is happening or when it's not? Until now I have disabled FanEQ in the BIOS, disabled CONFIG_HWMON in the kernel, enabled ACPI_DEBUG in the kernel and also disabled the fancontrol script. Are there any further things I have to do? Hi, nico, sorry for the delay, So there are two problems in all. 1. dmesg spams when FANEQ turned on. I think this is a confliction between ACPI fan and BIOS controlled fan. They may be the same device but controlled via different interfaces. 2. fan too lound while controlling the fan via ACPI there are only two states for ACPI fan, on and off. so the fan is always turned on, right? Please attach the result of "cat /proc/acpi/thermal/*/*" and "cat /proc/acpi/fan/*/*". >Please attach the acpidump output Please download the latest pmtools source code at http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/ make, and use the acpidump tool to get the acpidump. >do I have to enable ACPI_DEBUG to create a dump? no, you don't. but please set it as this may be helpful for debugging. >do you need a dump when the problem is happening or when it's not? this doesn't matter. >Are there any further things I have to do? no. Created attachment 14286 [details]
acpidump (kernel 2.6.24-rc6; Abit IC7-G)
~ # cat /proc/acpi/thermal_zone/*/*
0 - Active; 1 - Passive
<polling disabled>
state: ok
temperature: 41 C
critical (S5): 70 C
active[0]: 60 C: devices= FAN
~ # cat /proc/acpi/fan/*/*
status: off
The fan is always on.
active[0] is the temperature where the fan starts spinning at full speed once exceeded; at the moment the fan is set to 60% speed (FanEQ). No dmesg spam so far. Now I am compiling something, the CPU temp is 60C°+, the fan starts spinning at full speed and the dmesg spam starts: ACPI: Transitioning device [FAN] to D0 ACPI: Transitioning device [FAN] to D0 ACPI: Unable to turn cooling device [f7c165a0] 'on' ACPI: Transitioning device [FAN] to D0 ACPI: Transitioning device [FAN] to D0 ACPI: Unable to turn cooling device [f7c165a0] 'on /proc/acpi/fan/FAN/state still says: status: off The compile process is done now, the CPU is getting more cooler, under the active[0] limit (~43°C) and the fan DOES NOT throttle down to 60%. It's still running at fullspeed (I think this is wrong.) But there is no more dmesg spam. This is really strange for me. >at the moment the fan is set to 60% speed (FanEQ).
>It's still running at fullspeed (I think this is wrong.)
No, ACPI fan device only supports two states in all, ON and OFF.
So if you want to use ACPI to control the fan, you can not throttle the fan.
Use ACPI and FanEQ to control the fan at the same time is a bad idea. :(
So please do the same test with FanEQ disabled and CONFIG_HWMON cleared. :)
After the system is boot,
what do you see in the dmesg if you "echo 0 >/proc/acpi/fan/FAN/state"?
Can you turn on the fan successfully?
And what do you see if you turn off the fan, ie, "echo 3 >/proc/acpi/fan/FAN/state"?
Close this bug as no response from the bug reporter. Hello and sorry for the inactivity. I am on kernel 2.6.26.5 now and the problem still persists. I was trying to do the stuff you said at your last post but "echo 0 >/proc/acpi/fan/FAN/state" just responds with "echo: write error: exec format error". I am willing to help you fixing this bug but now I need new instructions what to do :) Will you please try the patch set on the latest kernel and see whether the problem still exists? >http://bugzilla.kernel.org/show_bug.cgi?id=11000#C30,31,32 After the patch is applied, please add the boot option of "acpi.power_nocheck=1". thanks. I am on kernel 2.6.27-rc6 now. With the patches I get the ACPI: Transitioning device [FAN] to D0 ACPI: Unable to turn cooling device [f78165a0] 'on' messages even when the fan has not started spinning at full speed. hm no, the messages appear when the temprature is over the one I have set in faneq but then fan does not even start spinning at full speed as it used to. This _is_ confusing me :( Just to recapitulate everything: I have and option called FanEQ in the BIOS - I can set a percentage (atm 60%) for the fan speed and a temperature (atm 55°C) where the fan starts spinning at full speed when exceed. So my fan is spinning at 60% speed all the time and when the cpu is getting hotter than 55° it starts spinning at full speed. After the cpu cooled down to under 55° the fan is spinning with 60% speed again. This is how it supposed to work (and it does with other operating systems). My problem is when the fan started spinning at full speed (cpu hotter than 55°) dmesg spams about ACPI: Transitioning device [FAN] to D0 ACPI: Unable to turn cooling device [f78165a0] 'on' Forgot all what I said about lm_sensors and HWMON this was I have used when FanEQ was off. I *think* the kernel acpi is not even needed for FanEQ to work - the kernel acpi code just confuses this function. It tries to turn the fan on after it is spinning faster - which is wrong because it is already on. With the three patches applied the fan does not start spinning faster but the messages printed to dmesg are the same. Hi, Nico Thanks for the test and info. The ACPI thermal driver can't do the things as Faneq. It only turns on/off the FAN device by comparing the thermal temperature with the predefined temp(This is returned by the AC0 object). Do you mean that the following message still exists after the patch is applied? Right? >ACPI: Transitioning device [FAN] to D0 >ACPI: Unable to turn cooling device [f78165a0] 'on' Can you confirm whether the boot option of "acpi.power_nocheck=1" is added? It will be great if you can add the boot option of "acpi.debug_layer=0x00810000 acpi.debug_level=0x17" and attach the output of dmesg. Of course the option of "acpi.power_nocheck" is still needed. Thanks. This message still appeared with the patch, yes (also acpi.power_nocheck=1 was on). Also I forgot to say that the fan does NOT get slower when the cpu cooled down to under 55° (set in faneq). It will spin at full speed even if the cpu goes under temp set in faneq, which is wrong. For this testing I have enabled the ACPI Fan option but the "ACPI: Transitioning dev... ACPI: Unable to t.." get spammed to dmesg even if I have ACPI Fan disabled in the kernel - which I normally have. I'm doing testing with "acpi.debug_layer=0x00810000 acpi.debug_level=0x17" now. Created attachment 17900 [details]
dmesg - verbose
uhm, that does not look really good.
Hi, Nico From the log in comment #17 it seems that there doesn't exist the following message. >ACPI: Transitioning device [FAN] to D0 >ACPI: Unable to turn cooling device [f78165a0] 'on' Will you please try it again? Had better attach the output of dmesg after the thermal temperature is greater than predefined temp and then goes under the predefined temp. Thanks. Created attachment 17925 [details]
dmesg after 4h uptime
here is the dmesg output after 4h uptime, the spam stats as soon as the thermal temperature is greater than predefined temp, but I can't tell you when the temp goes back under the limit.
Created attachment 17961 [details]
Patch 1/2: ACPI cleanup in power resource
It is changed from state to *state.
Created attachment 17962 [details]
Patch 2/2: Delete the strick check in power transistion
Delete the too strick check in power transistion
Hi, Nico Will you please try the above two patches and see whether the warning message still exists? Of course the boot option of "acpi.power_nocheck=1" and patch set in comment #12 are still required. Please also add the boot option of "acpi.debug_layer=0x00810000 acpi.debug_level=0x017" and attach the output of dmesg. Thanks. Created attachment 18004 [details]
dmesg - verbose - working
Yay! This patches seem to work very well :)
I did some testing - everything is fine, dmesg output is in attachment.
Hi, Nico thanks for the test and response. Please don't add the boot option of "acpi.power_nocheck=1" and see whether it still works well. Thanks. Created attachment 18032 [details]
dmesg - verbose - w/ patches and w/o acpi.power_nocheck
Without acpi.power_nocheck=1 it does NOT work, after the fan started spinning faster and the cpu cooled down it's still running at fullspeed and does not slow down.
dmesg attached
Reopen bug since I don't know if this is good or not that it's not working when acpi.power_nocheck is not used. Hi, Nico Thanks for caring this problem. This issue is related with broken BIOS that the system can't work if acpi.power_nocheck is not added. In current kernel when the device is turned on/off, the ACPI driver will try to check the state again to confirm whether the device is turned on/off. If the ACPI object can't return the correct device power state, it indicates that OS fails in device power transistion. But on windows there is no such strick check so the box can work well. From the dmesg in comment #25 it seems that the ACPI object can't return the correct device power state if acpi.power_nocheck is not added. Thanks. Hi, nico, this is a BIOS problem which we won't fix it in Linux kernel. so please use boot option acpi.power_nocheck=1 as a workaround instead. and IMO, we can close it. Hm this sucks. Also I don't think that I am the only one who is using fancontrol on an abit mainboard. Well I'm using acpi.power_nocheck=1 now. Thx for your help anyways. patch in comment #21 shipped in 2.6.28-rc4-git3 closed commit 676962dac6e267ce7c13f73962208f9124a084bb Author: Zhao Yakui <yakui.zhao@intel.com> Date: Mon Oct 27 16:05:39 2008 +0800 ACPI: fan: Delete the strict check in power transition |