Most recent kernel where this bug did not occur: swsusp never worked for me before 2.6.13-rc3 or so, and I noticed the fan problem alongside swsusp working. Distribution: Debian testing Hardware Environment: TP 600X, latest BIOS Software Environment: dmidecode etc. output (from a slightly different kernel, but same DSDT/BIOS) is at Bug #4989 Problem Description: After rebooting from swsusp, the fan stays on no matter what I do. Normally the fan behaves well (turning on only when the temperatures exceed trip points). After the reboot, the system knows the temperatures are low enough, and that no active cooling is reported: Thermal 1: ok, 40.0 degrees C Thermal 2: ok, 38.0 degrees C Thermal 3: ok, 34.0 degrees C Thermal 4: ok, 36.0 degrees C But /proc/acpi/fan/FN*/state shows 'on' for all the fans (I think there's only one actual fan, but as far as the ACPI layer imagines it there may be >1), and they are definitely on and noisy. The air coming out is quite cool. Echoing '0' or 'off' to the 'state' file doesn't do anything. Reloading all the ACPI modules (fan, battery, etc.) doesn't help. Steps to reproduce: Boot into X, swsusp, then reboot. Listen for fan!
If you build the ACPI fan support as a module and unload it while suspending, does it help at all? Regards, Nigel
Created attachment 5517 [details] syslog msgs with acpi_debug=0x1F A good idea, which I just tried. Alas, unloading the fan module doesn't help. I've attached the syslog msgs with acpi_debug=0x1F
Just tried with 2.6.13-rc5-git3 (with unloading and reloading the fan module). Almost the same problem as with -git2: Now it comes back with the fan always off. Like before, /proc/acpi/fan/FN*/state says they are all on. 'acpi -t' says thermal zone 2 is doing active cooling: # acpi -t Battery 1: charged, 100% Battery 2: charged, 100% Thermal 1: ok, 42.0 degrees C Thermal 2: active[0], 39.0 degrees C Thermal 3: ok, 33.0 degrees C Thermal 4: ok, 35.0 degrees C It should be doing active cooling, but it isn't. So -git3 is quieter but more dangerous than -git2 (which has the fan always on).
You may want to try echo-ing -n 0/3 into fan status files to force it on/off. Also fan module lacks error checking; if AML call fails, nothing is printed.
(I changed the kernel version to -rc6 since it happens there too) I tried echo 3 > fan_state_file and that turned off the fan. After that the fan turns off and on automatically as it should. Should I add some debug lines to fan.c to find out what is going on?
Well, that fan.c lacks suspend/resume support. Hint hint :-). Put fan at full speed in _suspend() hook, and make hardware put the fan back to sane state during _resume() hook.
Sanjoy, If it is still an issue, could you try the following patches which add suspend/resume support for ACPI devices. First, apply 'common_suspend_resume.patch' to add suspend/resume functionality to ACPI subsystem. Next, to add suspend/resume implementation for fan, apply 'fan_cleanup.patch' followed by 'fan_suspend_resume.patch'.
Created attachment 8048 [details] adding suspend/resume functionality This patch adds suspend/resume functionality to ACPI devices.
Created attachment 8049 [details] fan implementation cleanup Cleans up ACPI fan implementation.
Created attachment 8050 [details] Suspend/resume support for fan device This patch implements suspend/resume support for fan.
Variable naming is "interesting", but it looks okay to me. Can you post patches to lmkl for review?
Created attachment 8086 [details] suspend/resume functionalty w/ additions from Patrick Mochel Avoiding sysdevs use, more safe, some clean-ups and debugging.
applied patches in comment #10 and comment #12 to acpi-test tree did not apply clean-up patch in comment #9 -- please e-mail me this cleanup later vs. early 2.6.18.
I just applied the patches (using 2.6.16-rc5). It still has those problems, this time of the dangerous variety (fan doesn't turn on when it should). I hibernated (swsusp) without unloading fan. After resuming I ran a few CPU and disk intensive processes to drive up the temperature, and it got high: $ acpi -t Battery 1: charged, 99% Battery 2: charging, 78%, 00:29:22 until charged Thermal 1: ok, 89.0 degrees C Thermal 2: active[0], 62.0 degrees C Thermal 3: ok, 35.0 degrees C Thermal 4: ok, 39.0 degrees C The trip point for thermal 2 is 45 C. Despite what 'acpi' reports in thermal zone 2 ('active[0]'), the fan was not on. I had to turn it on by hand doing "echo 0' to the fan's state file.
does your machine use "smbus unhide" hack? If so, try disabling it (and open new bug report)
> does your machine use "smbus unhide" hack? I don't think so. I don't know what that hack is, but a google search for it along with 'thinkpad' did not turn up my machine (TP 600X).
The change also seems to confuse S3 sleep/wake, or maybe it's as confused as it ever was. I woke it up from S3 and the noticed that the fan is on even though the thermal system thinks the fan is off: $ acpi -t Battery 1: charged, 95% Battery 2: discharging, 74%, 01:07:19 remaining Thermal 1: ok, 33.0 degrees C Thermal 2: ok, 32.0 degrees C /* trip point is 45C */ Thermal 3: ok, 28.0 degrees C Thermal 4: ok, 30.0 degrees C But the fan module knows tha the fan is on: # cat /proc/acpi/fan/FN20/* status: on I can probably get it into a correct state by changing the THM2 trip point to 27C, so the thermal system will then agree with the actual fan state (i.e. that it's on), then change it back to 45C. Okay, I did that and the fan is now off in reality and according to 'acpi -t' and according to fan/FN20/.
Created attachment 8362 [details] updated patch Sanjoy, Could you try this patch - it is against 2.6.17 This patch updates thermal resume method to reset fan states. It worked for me.
> Could you try this patch - it is against 2.6.17 I just tried it with no luck. I hibernated (swsusp) and it came back with the fan off, but in a state I've never seen before: $ acpi -t Thermal 1: active[0], 45.0 degrees C Thermal 2: active[0], 43.0 degrees C Thermal 3: active[0], 33.0 degrees C Thermal 4: active[0], 35.0 degrees C Usually I see active[0] only for Thermal 2, but the change may be due to improvements in 2.6.17's ACPI relative to 2.6.16. Either way, the problem is that the fan is off but the system thinks it's on, so it'll be unlikely to turn on (if the actual temperature drops below the trip point, then it'll be okay because the actual and real states will match).
patch versions in comment #13 shipped in 2.6.17-git9
Created attachment 8438 [details] debug patch Sanjoy, Could you try this patch - it adds debug prints to suspend/resume and updates thermal zone structures during resume routine. It applies over the patch #8362 (the last patch I've post). After trying this patch could you also check 'dmesg' output - it should contain strings similar to the following: .......................... !!! 0 active[0]: trip 3282 temp 3212 state disabled .......................... !!! 1 active[0]: trip 3282 temp 3242 state disabled !!! 2 active[0]: trip 3282 temp 3282 state enabled !!! 3 active[0]: trip 3282 temp 3242 state disabled .......................... Each string shows particular trip point info (trip point temperature, current temperature, whether this trip point entered/not entered) in differrent stages. String starting with '0' - from suspend method, '1' - before acpi_thermal_active() called for this trip point, '2' - after acpi_thermal_active() exited, '3' - after acpi_thermal_check() called for all thermal zone. If you note some inconsistencies between 'dmesg' and 'acpi -t', fan behavior, etc, could you post 'dmesg' output and describe the situation. The system I'm using for testing has only 1 active trip point, so I cannot validate all possible situations, so your help would be very important.
> updates thermal zone structures during resume routine That might have fixed it. With the new patch, the fan is behaving fine after resume from swsusp. (I haven't tested it with S3 suspend because the vanilla kernel needs extensive hacks to avoid hanging in _PTS -- a.k.a. bug #5989.) Now after swsusp resume, when the fan is running only one zone (THM2) is active, which is the usual behavior. And the 'acpi -t' output is consistent with the fan state. All four zones have an active trip point, but I think they turn on the same physical fan, and I've never seen the others on (except when testing the previous patch!). I'll keep running this kernel and let you know if the fan has any problems.
marking as RESOLVED since there is a patch under test/review & consideration for pushing upstream.
Created attachment 8493 [details] clean patch Here is the clean patch (debug messages removed, fan functionality updated) against 2.6.17. It replaces patches ##8362, 8438.
Through the wonders of open source, a derivative of the debug patch in comment #21 addressing the thermal.c part of this problem has made it into Linus' tree: http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bed936f7eab946c60170bc92a1aea597da158e02 Thus, the cleaned up patch in comment #24 that also addresses the fan problem will no longer apply. As the submitter satisfied. This bug report is closed. Konstantin, if there are additional issues that are not addressed by the commit above, they will have to be addressed elsewhere on top of the commit above.