Bug 9432

Summary: unable to turn cooling device 'off' - LG LE50 Express laptop
Product: ACPI Reporter: Marcus Better (marcus)
Component: Power-FanAssignee: ykzhao (yakui.zhao)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: acpi-bugzilla, mingo, pioto, pliny
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.22, 2.6.23, 2.6.24 Subsystem:
Regression: --- Bisected commit-id:
Attachments: dmesg output
acpidump output

Description Marcus Better 2007-11-22 01:32:20 UTC
Most recent kernel where this bug did not occur: 2.6.24-rc1
Distribution: Debian
Hardware Environment: LG LE50 Express laptop
Software Environment: Debian testing/unstable (i386), X.org, KDE
Problem Description:

I get lots of these kernel log messages:

Nov 22 09:13:38 better kernel: ACPI: Transitioning device [FAN0] to D3
Nov 22 09:13:38 better kernel: ACPI: Transitioning device [FAN0] to D3
Nov 22 09:13:38 better kernel: ACPI: Unable to turn cooling device [f7c3a5b8] 'off'
Nov 22 09:13:39 better kernel: ACPI: Transitioning device [FAN0] to D3
Nov 22 09:13:39 better kernel: ACPI: Transitioning device [FAN0] to D3
Nov 22 09:13:39 better kernel: ACPI: Unable to turn cooling device [f7c3a5b8] 'off'
Nov 22 09:13:43 better kernel: ACPI: Transitioning device [FAN0] to D3
Nov 22 09:13:43 better kernel: ACPI: Transitioning device [FAN0] to D3
Nov 22 09:13:43 better kernel: ACPI: Unable to turn cooling device [f7c3a5b8] 'off'
Comment 1 Marcus Better 2007-11-22 01:34:11 UTC
Created attachment 13691 [details]
dmesg output
Comment 2 Len Brown 2007-11-22 07:20:22 UTC
These messages started between 2.6.24-rc1 and -rc3?
Can you git-bisect to find out what broke it?
Or even simpler, can you try reverting one at a time
drivers/acpi/fan.c and drivers/acpi/ec.c?
Comment 3 Marcus Better 2007-11-22 23:51:29 UTC
> These messages started between 2.6.24-rc1 and -rc3?

I'm not so sure now. The messages yesterday stopped after about 40 minutes, 
and today I don't see them at all (with -rc3). Yesterday I had just carried 
my laptop to work in cold weather (5-10 degrees centigrade) which I don't 
normally do, so perhaps the low temeperature could have triggered something.

Is there any other way to test this, or do I have to walk it again? :)
Comment 4 Marcus Better 2007-11-23 00:16:38 UTC
Ok, this just occurred again. I'll try to revert the files to -rc1 versions.
Comment 5 Ingo Molnar 2007-12-04 04:10:46 UTC
Marcus, did the messages occur again? -rc4 has about 1000 lines of ACPI related fixes relative to -rc3, so it might be worth checking -rc4 as well. ec.c in particular had a few updates.
Comment 6 Marcus Better 2007-12-04 04:22:32 UTC
> ------- Comment #5 from mingo@elte.hu  2007-12-04 04:10 -------
> Marcus, did the messages occur again?

I've been on vacation but expect to test it soon.
Comment 7 Marcus Better 2007-12-04 06:51:57 UTC
> ------- Comment #2 from len.brown@intel.com  2007-11-22 07:20 -------
> These messages started between 2.6.24-rc1 and -rc3?
> Can you git-bisect to find out what broke it?
> Or even simpler, can you try reverting one at a time
> drivers/acpi/fan.c and drivers/acpi/ec.c?

The messages persist even after reverting both files to the -rc3 versions. So 
I'm no longer sure about -rc3.
Comment 8 Marcus Better 2007-12-05 01:59:23 UTC
> ------- Comment #5 from mingo@elte.hu  2007-12-04 04:10 -------
> Marcus, did the messages occur again? -rc4 has about 1000 lines of ACPI
> related fixes relative to -rc3, so it might be worth checking -rc4 as well.

The messages still occur on -rc4.
Comment 9 Alexey Starikovskiy 2007-12-05 02:01:43 UTC
Please attach output of acpidump
Comment 10 Marcus Better 2007-12-05 02:29:37 UTC
Created attachment 13858 [details]
acpidump output
Comment 11 Alexey Starikovskiy 2007-12-05 05:45:30 UTC
please try to revert drivers/acpi/power.c to rc1 version...
In your DSDT power._ON and power._OFF call the same method STHT with no arguments, so it does something based on its internal state, rather than something we ask from it.
Comment 12 Marcus Better 2007-12-11 04:05:22 UTC
(In reply to comment #11)
> please try to revert drivers/acpi/power.c to rc1 version...

This doesn't help.

However I may have been wrong earlier about the effect of reverting fan.c and ec.c - I think I misunderstood how to use git, and did only 
  git reset v2.6.24-rc1 drivers/acpi/fan.c 
But this only affects the index, right? So I would need
  git checkout-index drivers/acpi/fan.c
as well, if I understand correctly?

Should I try reverting those two files again?
Comment 13 Marcus Better 2007-12-11 04:21:43 UTC
(In reply to comment #2)
> Or even simpler, can you try reverting one at a time
> drivers/acpi/fan.c and drivers/acpi/ec.c?

Ok, now that I finally reverted the files correctly, I noticed that fan.c cannot be reverted to 2.6.24-rc1 independently because of the commit "ACPI: Fan: Drop force_power_state acpi_device option".
Comment 14 Marcus Better 2007-12-12 02:17:43 UTC
(In reply to comment #2)
> Or even simpler, can you try reverting one at a time
> drivers/acpi/fan.c and drivers/acpi/ec.c?

Reverting ec.c to 2.6.24-rc1 still produces the messages.
Comment 15 Marcus Better 2007-12-13 02:49:28 UTC
I just got the messages on 2.6.23 also. Never noticed that before.
Comment 16 Rafael J. Wysocki 2007-12-13 07:12:07 UTC
When you get these messages, does the fan finally go 'off' after some time?
Comment 17 Marcus Better 2007-12-14 01:01:26 UTC
> ------- Comment #16 from rjwysocki@sisk.pl  2007-12-13 07:12 -------
> When you get these messages, does the fan finally go 'off' after some time?

No, it is still on. (I doubt the system could work for long without a fan, it 
is generating some heat.)

~$ cat /proc/acpi/fan/FAN0/state
status:                  on
Comment 18 Marcus Better 2008-01-28 03:22:39 UTC
Still present in 2.6.24. This is still NEEDINFO, can I provide any other info?
Comment 19 Alexey Starikovskiy 2008-01-28 12:45:30 UTC
Marcus,
do you know any kernel, there you don't have this problem?
Comment 20 Marcus Better 2008-02-05 00:45:54 UTC
(In reply to comment #19)
> Marcus,
> do you know any kernel, there you don't have this problem?

No. I just reproduced it on 2.6.22.11 (Debian linux-image-2.6.22-3-686 2.6.22-6).
Comment 21 Marcus Better 2008-03-19 05:59:13 UTC
Confirmed on 2.6.25-rc6
Comment 22 Mike Kelly 2008-04-02 19:54:01 UTC
I'm also having this issue, it seems. Kernel is 2.6.24-gentoo-r3. My system still responds to pings, but no other net traffic/serial console stuff... The console is just endlessly filled with this:

ACPI: Unable to turn cooling device [df00f2a0] 'off'
ACPI: Transitioning device [FAN1] to D3
ACPI: Transitioning device [FAN1] to D3

I'll post acpidump stuff in a bit.
Comment 23 Kenneth Brown 2008-05-20 00:33:39 UTC
(In reply to comment #22)
> I'm also having this issue, it seems. Kernel is 2.6.24-gentoo-r3. 

Count me in as vexed by this one, as well as a bunch of Ubuntu users. This problem has been reported over on their side since at least 2.6.20. The inability to turn off the fan leads to a kacpi/kacpi_notify runaway.

You can find one of the longer Ubuntu bug reports here: https://bugs.launchpad.net/ubuntu/+source/acpi/+bug/75174

This is my best guess for what's going on.

1.) Some load heats up the CPU enough for the fan to kick in.
2.) The fan does it's job and the CPU cools enough for the kernel to decide it's time to turn it off.
3.) ACPI tries and fails to turn off the fan. A lot. And tries to blog about it via klogd.
4.) CPU temp increases. ACPI takes things personal and *keeps* trying to turn the fan off.
5.) Profit!

On CPU's that support throttling, this sequence can be short-circuited by throttling down the CPU. My ill-informed guess is that once the temp gets low enough, the BIOS is shutting off the fan, at which point ACPI stops trying to do so.

If I understand the way fan control is supposed to work, you can test this by writing 0 (on) and 3 (off) to /proc/acpi/fan/$FAN/state. In my case I can turn it on, but attempts to turn it off result in a write error.
Comment 24 ykzhao 2008-07-07 19:39:43 UTC
Can someone try the workaround patch in 
    http://bugzilla.kernel.org/show_bug.cgi?id=11000#c20 
and see whether the laptop can work well?
    Note: the boot option of "acpi.power_nocheck=1" is required).

   Thanks.
Comment 25 ykzhao 2008-08-07 22:24:14 UTC
Since there is no response for more than one month, the bug will be rejected.
If the problem still exists, please reopen it and try the debug patch mentioned in comment #24.
Thanks.