Bug 8374 - ACPI: Unable to turn cooling device [c146aec8] 'on'
ACPI: Unable to turn cooling device [c146aec8] 'on'
Status: CLOSED CODE_FIX
Product: ACPI
Classification: Unclassified
Component: Power-Fan
i386 Linux
: P2 normal
Assigned To: Zhang Rui
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-04-27 01:16 UTC by Peter Ganzhorn
Modified: 2008-08-07 21:04 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.21-rc7
Tree: Mainline
Regression: ---


Attachments
Output of 'cat /proc/acpi/thermal_zone/*/*' (816 bytes, text/plain)
2007-04-28 02:54 UTC, Peter Ganzhorn
Details
Shows the temperatures increasing without fan kicking in before auto-shut-down (1.74 KB, text/plain)
2007-04-28 02:55 UTC, Peter Ganzhorn
Details
debug patch: export ACPI device bus_id instead of acpi_handle (3.36 KB, patch)
2007-04-28 22:29 UTC, Zhang Rui
Details | Diff
Add Thomas as this patch's author. (3.40 KB, patch)
2007-04-28 22:37 UTC, Zhang Rui
Details | Diff
dmesg output of HP nc6000 running 2.6.21 (13.81 KB, text/plain)
2007-04-30 08:53 UTC, Peter Ganzhorn
Details

Description Peter Ganzhorn 2007-04-27 01:16:04 UTC
Most recent kernel where this bug did *NOT* occur: None
Distribution: Debian 4.0 - Etch (Final)
Hardware Environment: HP/Compaq nc6000 (laptop), Pentium M 1.6 GHz, Intel ICH4,
ATI Radeon Mobility 9600, 512 MB RAM, newest available BIOS
Software Environment: Debian Etch, no self-compiled tools but the kernel
Problem Description: This is gonna be quite a long but hopefully equally useful
description:
Shortly after I bought the laptop and installed Linux on it, I noticed that it
sometimes turned itself off while I was away and did not turn off the computer.
It took me quite a while to figure out what was wrong until the laptop shut down
by itself while I was using it.
The kernel printed a message that a ACPI critical trip point was reached and
shut the system down to prevent damage. After I knew there was a temperature
problem I wrote a script to log the temperatures and after it happend again I
saw that the temperatures suddenly started to rise from about 40
Comment 1 Zhang Rui 2007-04-27 19:31:46 UTC
>There are _four_ fans reported, but I only have _one_ real fan in the
laptop.
It's possible that four fans reported by ACPI stand for the different speed of 
one real fan.
>Another crazy thing is that echoing the number "3" into the state files makes
>the fan slow down sometimes, but no other number seems to work...
That's right. echo "o"/"3" to the state file will turn the fan on/off.
>The next annoying thing is that the real fan can run with various speeds,
>but combining the virtual fans mostly just turns it either ON or OFF, nothing 
in between.
Now you know the reason. :)
> "ACPI: Unable to turn cooling device [c146aec8] 'on'"
This seems to be the real problem to me.
Please attach the result of
#cat /proc/acpi/thermal_zone/*/*
And better provide the dmesg output when the system begins to over-heat. Maybe 
a script that keeps on polling /proc/acpi/thermal_zone/xxx/tempeture will help 
you do this.
Comment 2 Peter Ganzhorn 2007-04-28 02:54:00 UTC
Created attachment 11309 [details]
Output of 'cat /proc/acpi/thermal_zone/*/*'
Comment 3 Peter Ganzhorn 2007-04-28 02:55:02 UTC
Created attachment 11310 [details]
Shows the temperatures increasing without fan kicking in before auto-shut-down
Comment 4 Peter Ganzhorn 2007-04-28 03:09:43 UTC
>>There are _four_ fans reported, but I only have _one_ real fan in the
>laptop.
>It's possible that four fans reported by ACPI stand for the different speed of 
>one real fan.
May be possible, but is surely malfunctioning as well, see below.
>>Another crazy thing is that echoing the number "3" into the state files makes
>>the fan slow down sometimes, but no other number seems to work...
>That's right. echo "o"/"3" to the state file will turn the fan on/off.
If you mean the letter "o" will turn it on, you're right. I tried with a "0"
(Zero) but this did not work ;)
If the virtual fans represent different speeds of the real fan, there has to be
something wrong as well:
I got four virtual fans in /proc/acpi/fan named C20F, C210, C211 and C212.
Usually only C212 is "on" and the fan blows as slow as it can go.
echo "o" > /proc/acpi/fan/C20F ==> fan blows with full speed
echo "3" > /proc/acpi/fan/C20F ==> fan won't slow down, though it's state file
reports "off" again.
echo "3" > /proc/acpi/fan/C212 ==> fan stops completely
echo "o" > /proc/acpi/fan/C212 ==> fan starts to blow in slowest mode again
Why does it not return to the slow mode when turning the first virtual fan off?
The state files change correctly, but the real fan is not affected by the change
until all virtual fans are turned off.
>> "ACPI: Unable to turn cooling device [c146aec8] 'on'"
>This seems to be the real problem to me.
Yes, probably that's the biggest bug we're hunting here ;)
>Please attach the result of
>#cat /proc/acpi/thermal_zone/*/*
Done.
>And better provide the dmesg output when the system begins to over-heat. Maybe 
>a script that keeps on polling /proc/acpi/thermal_zone/xxx/tempeture will help 
>you do this.
I attached a log of a script I wrote myself to record temperatures, the attached
part shows the temperatures rise before the system shut itself down because
reaching a critical trip point, that's what /var/log/messages recorded at the
same time:
Apr 26 12:53:43 shutdownm1 kernel: ACPI: Critical trip point
Apr 26 12:53:44 shutdownm1 shutdown[27743]: shutting down for system halt

So if you need anything else just tell me, and you should know that I appreciate
your help a lot!
BTW: Shutting down the system with the power button after playing with the fans
did not work because acpid must have failed, I found out with the help of google
 - killing acpid reproduces this.
At the moment I have disabled acpid completely to be sure that it does not
interfere or even cause the bug though this seems unlikely to me.
Comment 5 Zhang Rui 2007-04-28 22:21:13 UTC
>>That's right. echo "o"/"3" to the state file will turn the fan on/off.
>If you mean the letter "o" will turn it on, you're right.
Sorry, I mean "0"/"3". But anyway, all the invalid charactors are considered a 
"0", so the result is the same. :)

>I tried with a "0" (Zero) but this did not work ;)
I don't think so. I have a hpnx5000 on hand, and the ACPI in this box is pretty
much the same as yours. I did some test and it seems to work well.

>echo "o" > /proc/acpi/fan/C20F ==> fan blows with full speed
>echo "3" > /proc/acpi/fan/C20F ==> fan won't slow down,
>though it's state file reports "off" again.
Make sure the other fans are off, and try again. I think you'll get something
new. :)

>Why does it not return to the slow mode when turning the first virtual fan off?
>The state files change correctly, but the real fan is not affected by the
>change until all virtual fans are turned off.
Please turn on the fan C212,C211,C210,C20F in turn, and turn them off in the
opposite order. I did this test and all the fans seem to work well.

>>> "ACPI: Unable to turn cooling device [c146aec8] 'on'"
>>This seems to be the real problem to me.
>Yes, probably that's the biggest bug we're hunting here ;)
Let's dig more about this.
Please apply the debug patch I attached and
#echo 0x1f >/sys/module/acpi/patameters/debug_level
and attach the dmesg output after "ACPI: Unable to turn cooling device ....
'on', result is .." pops up.
Comment 6 Zhang Rui 2007-04-28 22:29:31 UTC
Created attachment 11324 [details]
debug patch: export ACPI device bus_id instead of acpi_handle
Comment 7 Zhang Rui 2007-04-28 22:37:30 UTC
Created attachment 11325 [details]
Add Thomas as this patch's author.
Comment 8 Zhang Rui 2007-04-28 22:52:43 UTC
And please test if compiling CONFIG_MOUSE_PS2=m (and best not loading it)
or compiling it out helps. :)
Comment 9 Peter Ganzhorn 2007-04-29 06:51:23 UTC
Seems you were right about the fan, turning them on and off in reverse order
works to make the fan go faster and slower in steps, though there is no
difference between C210 and C20F. C212, C211 and C210 however are different
speeds of the fan for sure.
But why does the fan _not_ slow down when C210 and C20F are "on" and C20F is
turned off?
This behaviour is 100% reproducable and I'm not sure if this is all good.

I'm going to patch my kernel now and build it without CONFIG_MOUSE_PS2 - btw,
how could this module make a difference?
Comment 10 Peter Ganzhorn 2007-04-29 07:14:19 UTC
ACPI: Transitioning device [C211] to D0
ACPI: Unable to turn cooling device [C211] 'on', result is -8

CONFIG_MOUSE_PS2 was still enabled, Kernel was 2.6.21 (final).
Problem occured when C212 and C20F were on and C211 was switched on and off a
few times.
"bash: echo: write error: Exec format error" occured as well.
As expected the fan did not kick in when producing load by compiling something.
I stopped compiling when about 70
Comment 11 Peter Ganzhorn 2007-04-29 10:02:57 UTC
With CONFIG_MOUSE_PS2 disabled my RAID won't get detected anymore - though both
hard drives which are part of my RAID1 array are detected properly and the
partition type still reads "FD - Linux Raid Autodetect", /dev/md0 will not be
created.
Did we just discover another bug? What's so special about CONFIG_MOUSE_PS2 and
why may it interfere with the ACPI subsystem or the RAID subsystem?
I really don't see what connection may exist, but I tried a few times rebooting
now - the kernel never detects my raid without CONFIG_MOUSE_PS2 but never fails
with it.
Is this a known issue (you told me to get rid of it, so you may know at least
something I suppose) or should I open yet another bug report? :P
Comment 12 Thomas Renninger 2007-04-30 02:23:16 UTC
> What's so special about CONFIG_MOUSE_PS2 and why may it interfere with the 
> ACPI subsystem
Hard to explain, the EC (Embedded Controller) touches mouse/kbd HW large parts 
of ACPI code make use of the EC(e.g. reading out temp, battery,...). The EC is 
a separated microcontroller with its own firmware, so don't ask why/where 
exactly the interference happens, it just happens, especially on the HPs which 
(as far as I can tell) seem to have broken EC firmware.

> or the RAID subsystem
This is really weird. Better recheck your .config file with previous ones...

The patches that should fix up most mouse interference problems on HPs can be 
found here: bug #7689. Latest kernels should already have them included, but 
you might want to double check, that your's also has...

Can you please also attach a full dmesg output (Have you experienced any 
AE_TIME errors in /var/log/messages?).
Comment 13 Peter Ganzhorn 2007-04-30 08:52:26 UTC
>Hard to explain, the EC (Embedded Controller) touches mouse/kbd HW large parts 
>of ACPI code make use of the EC(e.g. reading out temp, battery,...). The EC is 
>a separated microcontroller with its own firmware, so don't ask why/where 
>exactly the interference happens, it just happens, especially on the HPs which 
>(as far as I can tell) seem to have broken EC firmware.
Crazy, but interesting. Do you think HP will fix this with a BIOS update or is
this beyond the BIOS' functions? If there is no possible way to reflash this
separate microcontrollers memory this is of course no option ;)

>This is really weird. Better recheck your .config file with previous ones...
I double-checked everything, the .config is identical with older kernels of mine
except for CONFIG_MOUSE_PS2 - it's crazy, but my RAID won't function without a
MOUSE_PS2-enabled kernel. I'm gonna ask the guys at linuxforen.de about this, a
few of them know a lot as well - if they don't have a clue as well I'm gonna
file another bug report about this.

>The patches that should fix up most mouse interference problems on HPs can be 
>found here: bug #7689. Latest kernels should already have them included, but 
>you might want to double check, that your's also has...
Thanks, I'll see if any of those patches are not in my kernel. BTW, it's plain
Vanilla 2.6.21.

>Can you please also attach a full dmesg output (Have you experienced any 
>AE_TIME errors in /var/log/messages?).
"cat /var/log/messages | grep AE_" does not show anything at all.
dmesg output attached.
Comment 14 Peter Ganzhorn 2007-04-30 08:53:55 UTC
Created attachment 11347 [details]
dmesg output of HP nc6000 running 2.6.21
Comment 15 Peter Ganzhorn 2007-05-06 11:32:30 UTC
Though I have not had any trouble with the laptop for a few days now, the
auto-shutdown problem still exists with 2.6.21 and acpid disabled - guess acpid
is not involved after all. Just wanted to let you know that.
But I can tell you at least one good thing, I identified the CONFIG_MOUSE_PS2
and RAID mystery's cause: without CONFIG_MOUSE_PS2 my external usb disks (of
which my RAID consists) will be detected after the kernel checks if there are
any disks which are part of a RAID array and therefor no RAID will be detected -
with CONFIG_MOUSE_PS2 it's simply the other way round and everything works like
a charm.
For details check http://bugzilla.kernel.org/show_bug.cgi?id=8412 .
Comment 16 Peter Ganzhorn 2007-05-26 09:46:25 UTC
Since 2.6.22-rc1 I haven't had any more problems with the laptop, maybe the
problem is fixed now? 2.6.21 still did auto-shutdowns, but 2.6.22-rc* seem to
work a lot better for me.
Just wanted to let you know about this.
Comment 17 Zhang Rui 2007-08-27 02:07:06 UTC
Already fixed in the upstream kernel.
Please reopen it if you still have some problems.
Comment 18 Peter Ganzhorn 2007-09-01 02:56:39 UTC
Thanks a lot, looks pretty fixed to me ;)
Comment 19 cruz 2008-07-23 23:46:33 UTC
Hi,

I just installed Debian 4.0 with the same machine. I have exactly the same problem but my kernel is still:
$ uname -r
2.6.18-6-686

I am new to linux, could you please tell me how I could upgrade to 2.6.22-rc1?

Thanks
Comment 20 Peter Ganzhorn 2008-07-24 08:46:50 UTC
I don't know what's the newest kernel available for stock Debian 4.0, but personally I don't recommend upgrading the kernel manually to any new/unexperienced Linux user.
There is, however, a tool named "synaptic" included with the Debian Linux distribution (used for package management, so installing/uninstalling software provided by Debian). Look for this tool and see what's the latest kernel available there (you may want to search for "kernel" or "linux" in the available packages - then look for the newest kernel with the same suffix as yours, which would be "x.y.z-*-686". (The "*" stands for Debian-specific updates to that specific kernel version, so higher numbers are better for this field).

If you still have trouble you may want to search with Google for a how-to (there are lots) or ask in a Linux support forum.
A great German support forum is linuxforen.de, just in case you're from Germany.
Comment 21 cruz 2008-08-07 21:04:43 UTC
Hi,

I installed on my HP NC6000:

Mandriva 2008 (2.6.24.4-laptop-lmnb) --- problem fixed
Fedora 9 (2.6.25-14.fc9.i686) --- problem fixed
Debian Lenny/sid (2.6.24-1-686) --- problem not fixed

Why does the Debian kernel with higher version still have this problem?

Thanks

Note You need to log in before you can comment on or make changes to this bug.