Bug 9485 - ACPI: Unable to turn cooling device 'on'
Summary: ACPI: Unable to turn cooling device 'on'
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-01 15:56 UTC by nico
Modified: 2008-11-12 21:23 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.24-rc3
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
cpi_dmesg (29.93 KB, text/plain)
2007-12-01 15:58 UTC, nico
Details
acpi_dmesg (29.93 KB, text/plain)
2007-12-01 15:58 UTC, nico
Details
acpidump (kernel 2.6.24-rc6; Abit IC7-G) (86.72 KB, text/plain)
2008-01-04 14:40 UTC, nico
Details
dmesg - verbose (30.33 KB, text/plain)
2008-09-20 07:35 UTC, nico
Details
dmesg after 4h uptime (30.28 KB, text/plain)
2008-09-21 09:14 UTC, nico
Details
Patch 1/2: ACPI cleanup in power resource (768 bytes, patch)
2008-09-22 20:39 UTC, ykzhao
Details | Diff
Patch 2/2: Delete the strick check in power transistion (1.18 KB, patch)
2008-09-22 20:40 UTC, ykzhao
Details | Diff
dmesg - verbose - working (30.42 KB, text/plain)
2008-09-24 05:44 UTC, nico
Details
dmesg - verbose - w/ patches and w/o acpi.power_nocheck (30.38 KB, text/plain)
2008-09-25 06:31 UTC, nico
Details

Description nico 2007-12-01 15:56:09 UTC
Most recent kernel where this bug did not occur: I really don't know, I think I have this problem since end 2006 / start 2007.
Distribution: gentoo x86
Hardware Environment: Abit IC7-G
Software Environment: Portage 2.1.4_rc4 (default-linux/x86/2007.0/desktop, gcc-4.2.2, glibc-2.7-r0, 2.6.24-rc3 i686)

Problem Description:
With FanEQ on, in the bios (if under 55°C then 60% fan speed) acpi is spamming my messages file every six secconds see attachment dmesg_acpi. Without logrotate my messages file would be 200mb+ today. This is really annoying. Now I turned off this FanEQ thing in the bios and there are no bad acpi messages. My fan is really loud now, is there a other way to contol it btw?
If you google for "unable to turn cooling device" you can see I'm not the only one where this happens. Bugs #9227 and #9432 are related.

Steps to reproduce:
throttle cpu fan speed per bios, get the cpu hotter than ??°C and check dmesg.
Comment 1 nico 2007-12-01 15:58:00 UTC
Created attachment 13813 [details]
cpi_dmesg

dmesg output after some minutes
Comment 2 nico 2007-12-01 15:58:10 UTC
Created attachment 13814 [details]
acpi_dmesg

dmesg output after some minutes
Comment 3 nico 2007-12-02 11:05:35 UTC
I am now using the fancontrol script provided by lm_sensors which is doing quite the same job as the Abit FanEQ does, just in software mode. However the bug with FanEQ and ACPI is still there. I think the kernel gets confused with FanEQ turned on and then ACPI tries to turn the fan on (which is on all the time).
Comment 4 Zhang Rui 2007-12-19 21:12:03 UTC
Hi, Nico,
what do you mean the fan is quite loud?
Is it spinning very fast even if the temperature is low?
Please attach the acpidump output of your laptop.
I'm not clear about what the Abit Fan EQ is, but if it programs the fan, then it's true that there may be some conflicts between Abit FanEQ scrtipt and ACPI, and there also may be conflicts between lm_sensors and ACPI.
So please stop Abit FanEQ and clear CONFIG_HWMON, and attach the result of
"cat /proc/acpi/thermal/*/*".
Comment 5 nico 2007-12-23 10:17:32 UTC
Okay, just to clarify things: It's not a laptop it's a desktop system, the motherboard is a Abit IC7-G and the FanEQ is a simple function in the bios to control the fan (you can set the percentage of the fan speed and a temperature when the fan starts spinning at full speed when the temperature limit is exceeded) Here is a screenshot of it: http://www.all-about-pc.de/Hardware/Motherboards/ABIT_IC7_MAX3/images/faneq2.jpg It's a really simple function but with it turned on I get the dmesg spam (I think once the limit is exceeded it starts spamming) I said the fan is loud, well it is loud by default and I have to throttle it. At the moment I am using the fancontol script provided by lm_sensors to do that (with FanEQ it would spam dmesg). lm_sensors works fine and there is no problem with that, it's just the FanEQ stuff. It would be really more comfortable to use FanEQ though.

First of all I don't know how to create an acpidump maybe you can give me a link to a howto or something. Also, do I have to enable ACPI_DEBUG to create a dump?

Secondly do you need a dump when the problem is happening or when it's not?

Until now I have disabled FanEQ in the BIOS, disabled CONFIG_HWMON in the kernel, enabled ACPI_DEBUG in the kernel and also disabled the fancontrol script. Are there any further things I have to do?
Comment 6 Zhang Rui 2008-01-03 19:19:42 UTC
Hi, nico, sorry for the delay,
So there are two problems in all.
1. dmesg spams when FANEQ turned on.
I think this is a confliction between ACPI fan and BIOS controlled fan.
They may be the same device but controlled via different interfaces.
2. fan too lound while controlling the fan via ACPI
there are only two states for ACPI fan, on and off.
so the fan is always turned on, right?
Please attach the result of "cat /proc/acpi/thermal/*/*"
and "cat /proc/acpi/fan/*/*".
>Please attach the acpidump output
Please download the latest pmtools source code at
http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/
make, and use the acpidump tool to get the acpidump.
>do I have to enable ACPI_DEBUG to create a dump?
no, you don't. but please set it as this may be helpful for debugging.
>do you need a dump when the problem is happening or when it's not?
this doesn't matter.
>Are there any further things I have to do?
no.
Comment 7 nico 2008-01-04 14:40:51 UTC
Created attachment 14286 [details]
acpidump (kernel 2.6.24-rc6; Abit IC7-G)

~ # cat /proc/acpi/thermal_zone/*/*   
0 - Active; 1 - Passive
<polling disabled>
state:                   ok
temperature:             41 C
critical (S5):           70 C
active[0]:               60 C: devices= FAN

~ # cat /proc/acpi/fan/*/*
status:                  off

The fan is always on.
Comment 8 nico 2008-01-04 15:03:54 UTC
active[0] is the temperature where the fan starts spinning at full speed once exceeded; at the moment the fan is set to 60% speed (FanEQ). No dmesg spam so far.

Now I am compiling something, the CPU temp is 60C°+, the fan starts spinning at full speed and the dmesg spam starts:

ACPI: Transitioning device [FAN] to D0
ACPI: Transitioning device [FAN] to D0
ACPI: Unable to turn cooling device [f7c165a0] 'on'
ACPI: Transitioning device [FAN] to D0
ACPI: Transitioning device [FAN] to D0
ACPI: Unable to turn cooling device [f7c165a0] 'on

/proc/acpi/fan/FAN/state still says: status: off

The compile process is done now, the CPU is getting more cooler, under the active[0] limit (~43°C) and the fan DOES NOT throttle down to 60%. It's still running at fullspeed (I think this is wrong.) But there is no more dmesg spam.

This is really strange for me.
Comment 9 Zhang Rui 2008-01-06 22:01:35 UTC
>at the moment the fan is set to 60% speed (FanEQ).
>It's still running at fullspeed (I think this is wrong.)
No, ACPI fan device only supports two states in all, ON and OFF.
So if you want to use ACPI to control the fan, you can not throttle the fan.
Use ACPI and FanEQ to control the fan at the same time is a bad idea. :(

So please do the same test with FanEQ disabled and CONFIG_HWMON cleared. :)
After the system is boot,
what do you see in the dmesg if you "echo 0 >/proc/acpi/fan/FAN/state"?
Can you turn on the fan successfully?
And what do you see if you turn off the fan, ie, "echo 3 >/proc/acpi/fan/FAN/state"?
Comment 10 Zhang Rui 2008-02-18 18:28:43 UTC
Close this bug as no response from the bug reporter.
Comment 11 nico 2008-09-18 18:49:01 UTC
Hello and sorry for the inactivity.

I am on kernel 2.6.26.5 now and the problem still persists.

I was trying to do the stuff you said at your last post but "echo 0 >/proc/acpi/fan/FAN/state" just responds with "echo: write error: exec format error".

I am willing to help you fixing this bug but now I need new instructions what to do :)
Comment 12 ykzhao 2008-09-18 19:15:25 UTC
Will you please try the patch set on the latest kernel and see whether the problem still exists?
     >http://bugzilla.kernel.org/show_bug.cgi?id=11000#C30,31,32

    After the patch is applied, please add the boot option of "acpi.power_nocheck=1".
    
   thanks.
Comment 13 nico 2008-09-19 07:01:08 UTC
I am on kernel 2.6.27-rc6 now.

With the patches I get the

ACPI: Transitioning device [FAN] to D0
ACPI: Unable to turn cooling device [f78165a0] 'on'

messages even when the fan has not started spinning at full speed.
Comment 14 nico 2008-09-19 07:34:37 UTC
hm no, the messages appear when the temprature is over the one I have set in faneq but then fan does not even start spinning at full speed as it used to. This _is_ confusing me :(

Just to recapitulate everything:
I have and option called FanEQ in the BIOS - I can set a percentage (atm 60%) for the fan speed and a temperature (atm 55°C) where the fan starts spinning at full speed when exceed. So my fan is spinning at 60% speed all the time and when the cpu is getting hotter than 55° it starts spinning at full speed. After the cpu cooled down to under 55° the fan is spinning with 60% speed again. This is how it supposed to work (and it does with other operating systems).

My problem is when the fan started spinning at full speed (cpu hotter than 55°) dmesg spams about
ACPI: Transitioning device [FAN] to D0
ACPI: Unable to turn cooling device [f78165a0] 'on'

Forgot all what I said about lm_sensors and HWMON this was I have used when FanEQ was off.

I *think* the kernel acpi is not even needed for FanEQ to work - the kernel acpi code just confuses this function. It tries to turn the fan on after it is spinning faster - which is wrong because it is already on.

With the three patches applied the fan does not start spinning faster but the messages printed to dmesg are the same.
Comment 15 ykzhao 2008-09-20 06:52:38 UTC
Hi, Nico
    Thanks for the test and info.
    The ACPI thermal driver can't do the things as Faneq. It only turns on/off the FAN device by comparing the thermal temperature with the predefined temp(This is returned by the AC0 object).
    
    Do you mean that the following message still exists after the patch is applied? Right?
    >ACPI: Transitioning device [FAN] to D0
    >ACPI: Unable to turn cooling device [f78165a0] 'on'
    Can you confirm whether the boot option of "acpi.power_nocheck=1" is added?
    It will be great if you can add the boot option of "acpi.debug_layer=0x00810000 acpi.debug_level=0x17" and attach the output of dmesg. Of course the option of "acpi.power_nocheck" is still needed.
   Thanks.
Comment 16 nico 2008-09-20 07:30:40 UTC
This message still appeared with the patch, yes (also acpi.power_nocheck=1 was on). Also I forgot to say that the fan does NOT get slower when the cpu cooled down to under 55° (set in faneq). It will spin at full speed even if the cpu goes under temp set in faneq, which is wrong. For this testing I have enabled the ACPI Fan option but the "ACPI: Transitioning dev... ACPI: Unable to t.." get spammed to dmesg even if I have ACPI Fan disabled in the kernel - which I normally have.
I'm doing testing with "acpi.debug_layer=0x00810000 acpi.debug_level=0x17" now.
Comment 17 nico 2008-09-20 07:35:40 UTC
Created attachment 17900 [details]
dmesg - verbose

uhm, that does not look really good.
Comment 18 ykzhao 2008-09-21 07:30:33 UTC
Hi, Nico
    From the log in comment #17 it seems that there doesn't exist the following message.
    >ACPI: Transitioning device [FAN] to D0
    >ACPI: Unable to turn cooling device [f78165a0] 'on'

    Will you please try it again? Had better attach the output of dmesg after the thermal temperature is greater than predefined temp and then goes under the predefined temp.
    Thanks.
Comment 19 nico 2008-09-21 09:14:12 UTC
Created attachment 17925 [details]
dmesg after 4h uptime

here is the dmesg output after 4h uptime, the spam stats as soon as the thermal temperature is greater than predefined temp, but I can't tell you when the temp goes back under the limit.
Comment 20 ykzhao 2008-09-22 20:39:10 UTC
Created attachment 17961 [details]
Patch 1/2: ACPI cleanup in power resource

It is changed from state to *state.
Comment 21 ykzhao 2008-09-22 20:40:23 UTC
Created attachment 17962 [details]
Patch 2/2: Delete the strick check in power transistion

Delete the too strick check in power transistion
Comment 22 ykzhao 2008-09-22 20:43:16 UTC
Hi, Nico
    Will you please try the above two patches and see whether the warning message still exists? 
    Of course the boot option of "acpi.power_nocheck=1" and patch set in comment #12 are still required. 
     Please also add the boot option of "acpi.debug_layer=0x00810000 acpi.debug_level=0x017" and attach the output of dmesg.
     Thanks.
    
Comment 23 nico 2008-09-24 05:44:34 UTC
Created attachment 18004 [details]
dmesg - verbose - working

Yay! This patches seem to work very well :)
I did some testing - everything is fine, dmesg output is in attachment.
Comment 24 ykzhao 2008-09-24 18:16:40 UTC
Hi, Nico
    thanks for the test and response.
    Please don't add the boot option of "acpi.power_nocheck=1" and see whether it still works well.
    Thanks.
Comment 25 nico 2008-09-25 06:31:17 UTC
Created attachment 18032 [details]
dmesg - verbose - w/ patches and w/o acpi.power_nocheck

Without acpi.power_nocheck=1 it does NOT work, after the fan started spinning faster and the cpu cooled down it's still running at fullspeed and does not slow down.

dmesg attached
Comment 26 nico 2008-09-28 05:58:49 UTC
Reopen bug since I don't know if this is good or not that it's not working when acpi.power_nocheck is not used.
Comment 27 ykzhao 2008-10-03 05:46:19 UTC
Hi, Nico
    Thanks for caring this problem. This issue is related with broken BIOS that the system can't work if acpi.power_nocheck is not added. In current kernel when the device is turned on/off, the ACPI driver will try to check the state again to confirm whether the device is turned on/off. If the ACPI object can't return the correct device power state, it indicates that OS fails in device power transistion. But on windows there is no such strick check so the box can work well. 
    From the dmesg in comment #25 it seems that the ACPI object can't return the correct device power state if acpi.power_nocheck is not added.
    Thanks.
Comment 28 Zhang Rui 2008-10-05 20:25:29 UTC
Hi, nico,
this is a BIOS problem which we won't fix it in Linux kernel.
so please use boot option acpi.power_nocheck=1 as a workaround instead.
and IMO, we can close it.
Comment 29 nico 2008-10-06 09:53:11 UTC
Hm this sucks. Also I don't think that I am the only one who is using fancontrol on an abit mainboard.
Well I'm using acpi.power_nocheck=1 now. Thx for your help anyways.
Comment 30 Len Brown 2008-11-12 21:23:09 UTC
patch in comment #21 shipped in 2.6.28-rc4-git3

closed

commit 676962dac6e267ce7c13f73962208f9124a084bb
Author: Zhao Yakui <yakui.zhao@intel.com>
Date:   Mon Oct 27 16:05:39 2008 +0800

    ACPI: fan: Delete the strict check in power transition

Note You need to log in before you can comment on or make changes to this bug.