Bug 5000

Summary: fan always on after waking from swsusp
Product: ACPI Reporter: Sanjoy Mahajan (sanjoy)
Component: Power-FanAssignee: Konstantin Karasyov (konstantin.karasyov)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, nigel.cunningham, pavel
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13-rc6 Subsystem:
Regression: --- Bisected commit-id:
Attachments: syslog msgs with acpi_debug=0x1F
adding suspend/resume functionality
fan implementation cleanup
Suspend/resume support for fan device
suspend/resume functionalty w/ additions from Patrick Mochel
updated patch
debug patch
clean patch

Description Sanjoy Mahajan 2005-08-04 23:45:10 UTC
Most recent kernel where this bug did not occur: swsusp never worked for me
before 2.6.13-rc3 or so, and I noticed the fan problem alongside swsusp working.

Distribution: Debian testing
Hardware Environment: TP 600X, latest BIOS
Software Environment: dmidecode etc. output (from a slightly different kernel,
but same DSDT/BIOS) is at Bug #4989

Problem Description: After rebooting from swsusp, the fan stays on no matter
what I do.  Normally the fan behaves well (turning on only when the temperatures
exceed trip points).  After the reboot, the system knows the temperatures are
low enough, and that no active cooling is reported:

     Thermal 1: ok, 40.0 degrees C
     Thermal 2: ok, 38.0 degrees C
     Thermal 3: ok, 34.0 degrees C
     Thermal 4: ok, 36.0 degrees C

But /proc/acpi/fan/FN*/state shows 'on' for all the fans (I think there's only
one actual fan, but as far as the ACPI layer imagines it there may be >1), and
they are definitely on and noisy.  The air coming out is quite cool.  Echoing
'0' or 'off' to the 'state' file doesn't do anything.

Reloading all the ACPI modules (fan, battery, etc.) doesn't help.

Steps to reproduce: Boot into X, swsusp, then reboot.  Listen for fan!
Comment 1 Nigel Cunningham 2005-08-05 04:32:55 UTC
If you build the ACPI fan support as a module and unload it while suspending,
does it help at all?

Regards,

Nigel
Comment 2 Sanjoy Mahajan 2005-08-05 06:28:44 UTC
Created attachment 5517 [details]
syslog msgs with acpi_debug=0x1F

A good idea, which I just tried.  Alas, unloading the fan module doesn't help. 
I've attached the syslog msgs with acpi_debug=0x1F
Comment 3 Sanjoy Mahajan 2005-08-05 07:29:42 UTC
Just tried with 2.6.13-rc5-git3 (with unloading and reloading the fan module). 
Almost the same problem as with -git2: Now it comes back with the fan always
off.  Like before, /proc/acpi/fan/FN*/state says they are all on.  'acpi -t'
says thermal zone 2 is doing active cooling:

# acpi -t
     Battery 1: charged, 100%
     Battery 2: charged, 100%
     Thermal 1: ok, 42.0 degrees C
     Thermal 2: active[0], 39.0 degrees C
     Thermal 3: ok, 33.0 degrees C
     Thermal 4: ok, 35.0 degrees C

It should be doing active cooling, but it isn't.  So -git3 is quieter but more
dangerous than -git2 (which has the fan always on).
Comment 4 Pavel Machek 2005-08-08 05:07:24 UTC
You may want to try echo-ing -n 0/3 into fan status files to force it on/off.
Also fan module lacks error checking; if AML call fails, nothing is printed. 
Comment 5 Sanjoy Mahajan 2005-08-09 08:55:53 UTC
(I changed the kernel version to -rc6 since it happens there too)

I tried 

echo 3 > fan_state_file

and that turned off the fan.  After that the fan turns off and on automatically
as it should.  Should I add some debug lines to fan.c to find out what is going on?
Comment 6 Pavel Machek 2005-08-09 13:18:03 UTC
Well, that fan.c lacks suspend/resume support. Hint hint :-).

Put fan at full speed in _suspend() hook, and make hardware put the fan back to
sane state during _resume() hook.
Comment 7 Konstantin Karasyov 2006-05-08 08:10:44 UTC
Sanjoy,

If it is still an issue, could you try the following patches which add 
suspend/resume support for ACPI devices.

First, apply 'common_suspend_resume.patch' to add suspend/resume functionality 
to ACPI subsystem.
Next, to add suspend/resume implementation for fan, apply 'fan_cleanup.patch' 
followed by 'fan_suspend_resume.patch'.
Comment 8 Konstantin Karasyov 2006-05-08 08:13:53 UTC
Created attachment 8048 [details]
adding suspend/resume functionality

This patch adds suspend/resume functionality to ACPI devices.
Comment 9 Konstantin Karasyov 2006-05-08 08:15:42 UTC
Created attachment 8049 [details]
fan implementation cleanup

Cleans up ACPI fan implementation.
Comment 10 Konstantin Karasyov 2006-05-08 08:18:15 UTC
Created attachment 8050 [details]
Suspend/resume support for fan device

This patch implements suspend/resume support for fan.
Comment 11 Pavel Machek 2006-05-09 04:35:48 UTC
Variable naming is "interesting", but it looks okay to me. Can you post patches
to lmkl for review?
Comment 12 Konstantin Karasyov 2006-05-10 10:33:36 UTC
Created attachment 8086 [details]
suspend/resume functionalty w/ additions from Patrick Mochel

Avoiding sysdevs use, more safe, some clean-ups and debugging.
Comment 13 Len Brown 2006-05-15 00:20:37 UTC
applied patches in comment #10 and comment #12 to acpi-test tree  
did not apply clean-up patch in comment #9 -- 
please e-mail me this cleanup later vs. early 2.6.18. 
Comment 14 Sanjoy Mahajan 2006-05-15 18:49:36 UTC
I just applied the patches (using 2.6.16-rc5).  It still has those problems,
this time of the dangerous variety (fan doesn't turn on when it should).  I
hibernated (swsusp) without unloading fan.  After resuming I ran a few CPU and
disk intensive processes to drive up the temperature, and it got high:

$ acpi -t
     Battery 1: charged, 99%
     Battery 2: charging, 78%, 00:29:22 until charged
     Thermal 1: ok, 89.0 degrees C
     Thermal 2: active[0], 62.0 degrees C
     Thermal 3: ok, 35.0 degrees C
     Thermal 4: ok, 39.0 degrees C

The trip point for thermal 2 is 45 C.  Despite what 'acpi' reports in thermal
zone 2 ('active[0]'), the fan was not on.  I had to turn it on by hand doing
"echo 0' to the fan's state file.
Comment 15 Pavel Machek 2006-05-16 02:21:44 UTC
does your machine use "smbus unhide" hack? If so, try disabling it (and open new
bug report)
Comment 16 Sanjoy Mahajan 2006-05-16 07:26:21 UTC
> does your machine use "smbus unhide" hack?

I don't think so.  I don't know what that hack is, but a google search
for it along with 'thinkpad' did not turn up my machine (TP 600X).

Comment 17 Sanjoy Mahajan 2006-05-16 13:46:10 UTC
The change also seems to confuse S3 sleep/wake, or maybe it's as
confused as it ever was.  I woke it up from S3 and the noticed that
the fan is on even though the thermal system thinks the fan is off:

$ acpi -t
     Battery 1: charged, 95%
     Battery 2: discharging, 74%, 01:07:19 remaining
     Thermal 1: ok, 33.0 degrees C
     Thermal 2: ok, 32.0 degrees C   /* trip point is 45C */
     Thermal 3: ok, 28.0 degrees C
     Thermal 4: ok, 30.0 degrees C

But the fan module knows tha the fan is on:

# cat /proc/acpi/fan/FN20/*
status:                  on

I can probably get it into a correct state by changing the THM2 trip point
to 27C, so the thermal system will then agree with the actual fan
state (i.e. that it's on), then change it back to 45C.  Okay, I did
that and the fan is now off in reality and according to 'acpi -t' and
according to fan/FN20/.

Comment 18 Konstantin Karasyov 2006-06-21 07:22:39 UTC
Created attachment 8362 [details]
updated patch

Sanjoy,
Could you try this patch - it is against 2.6.17
This patch updates thermal resume method to reset fan states.
It worked for me.
Comment 19 Sanjoy Mahajan 2006-06-24 09:47:54 UTC
> Could you try this patch - it is against 2.6.17

I just tried it with no luck.  I hibernated (swsusp) and it came back
with the fan off, but in a state I've never seen before:

$ acpi -t
     Thermal 1: active[0], 45.0 degrees C
     Thermal 2: active[0], 43.0 degrees C
     Thermal 3: active[0], 33.0 degrees C
     Thermal 4: active[0], 35.0 degrees C

Usually I see active[0] only for Thermal 2, but the change may be due
to improvements in 2.6.17's ACPI relative to 2.6.16.  Either way, the
problem is that the fan is off but the system thinks it's on, so it'll
be unlikely to turn on (if the actual temperature drops below the trip
point, then it'll be okay because the actual and real states will
match).

Comment 20 Len Brown 2006-06-25 21:35:16 UTC
patch versions in comment #13 shipped in 2.6.17-git9 
 
Comment 21 Konstantin Karasyov 2006-06-28 09:20:05 UTC
Created attachment 8438 [details]
debug patch

Sanjoy,
Could you try this patch - it adds debug prints to suspend/resume and
updates thermal zone structures during resume routine. It applies over
the patch #8362 (the last patch I've post).

After trying this patch could you also check 'dmesg' output - it
should contain strings similar to the following:
..........................
!!! 0 active[0]: trip 3282 temp 3212 state disabled
..........................
!!! 1 active[0]: trip 3282 temp 3242 state disabled
!!! 2 active[0]: trip 3282 temp 3282 state enabled
!!! 3 active[0]: trip 3282 temp 3242 state disabled
..........................

Each string shows particular trip point info (trip point temperature,
current temperature, whether this trip point entered/not entered) in
differrent stages.
String starting with '0' - from suspend method,
'1' - before acpi_thermal_active() called for this trip point,
'2' - after acpi_thermal_active() exited,
'3' - after acpi_thermal_check() called for all thermal zone.

If you note some inconsistencies between 'dmesg' and 'acpi -t', fan
behavior, etc, could you post 'dmesg' output and describe the
situation.
The system I'm using for testing has only 1 active trip point, so I
cannot validate all possible situations, so your help would be very
important.
Comment 22 Sanjoy Mahajan 2006-06-28 18:14:03 UTC
> updates thermal zone structures during resume routine

That might have fixed it.  With the new patch, the fan is behaving
fine after resume from swsusp.  (I haven't tested it with S3 suspend
because the vanilla kernel needs extensive hacks to avoid hanging in
_PTS -- a.k.a. bug #5989.)

Now after swsusp resume, when the fan is running only one zone (THM2)
is active, which is the usual behavior.  And the 'acpi -t' output is
consistent with the fan state.  All four zones have an active trip
point, but I think they turn on the same physical fan, and I've never
seen the others on (except when testing the previous patch!).

I'll keep running this kernel and let you know if the fan has any
problems.

Comment 23 Len Brown 2006-07-05 16:26:46 UTC
marking as RESOLVED since there is a patch under test/review & consideration 
for pushing upstream.
Comment 24 Konstantin Karasyov 2006-07-06 09:42:27 UTC
Created attachment 8493 [details]
clean patch

Here is the clean patch (debug messages removed, fan functionality updated)
against 2.6.17. It replaces patches ##8362, 8438.
Comment 25 Len Brown 2006-07-11 09:38:03 UTC
Through the wonders of open source, a derivative of the  
debug patch in comment #21 addressing the thermal.c part  
of this problem has made it into Linus' tree:  
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bed936f7eab946c60170bc92a1aea597da158e02  
  
Thus, the cleaned up patch in comment #24 that also addresses  
the fan problem will no longer apply.  
 
As the submitter satisfied.  This bug report is closed. 
 
Konstantin, if there are additional issues that are not addressed 
by the commit above, they will have to be addressed elsewhere 
on top of the commit above.