Bug 58301

Summary: When resuming after suspend, HP 2510p turns on fan full blast and never turns it off
Product: Power Management Reporter: Jake Edge (jake)
Component: ThermalAssignee: Zhang Rui (rui.zhang)
Status: CLOSED MOVED    
Severity: normal CC: aaron.lu, auxsvr, gojrzan, lenb, me, micgro2, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.7-3.9 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: output from grep + sensors after resume
ll /sys/class/thermal/thermal_zone*/
debug patch to check cooling state transition in step_wise governor
dmesg from boot through resume
debug patch to check cooling state transition in step_wise governor - v2
patch: change cooling device state based on cached value instead of real state
Patch 1/4
patch 2/4
patch 3/4
patch 4/4

Description Jake Edge 2013-05-16 02:55:45 UTC
Created attachment 101721 [details]
output from grep + sensors after resume

Since kernel 3.7, my HP 2510p laptop turns on it fan(s) full blast after it resumes from a suspend.  Debugging in bug #56601 and bug #56591 (including trying the patch in https://bugzilla.kernel.org/show_bug.cgi?id=56591#c32 on top of 3.9 by cherry picking 94a409319561ec1847fd9bf996a2d5843ad00932 from the 3.10-rc mainline) has not solved the problem.

It seems to be related to some bogus temperature reporting that is apparently caused by a BIOS bug, but it all worked before 3.7.  See: https://bugzilla.kernel.org/show_bug.cgi?id=56601#c17 and https://bugzilla.kernel.org/show_bug.cgi?id=56601#c18 for more information.

As requested by Rui I am also attaching:

> the output of "grep . /sys/class/thermal/*/*" when the bug is
> reproduced after resume.
Comment 1 Zhang Rui 2013-05-16 04:17:08 UTC
please attach the output of
"ll /sys/class/thermal/thermal_zone*/" when the bug occurs as well.
Comment 2 Jake Edge 2013-05-16 14:40:21 UTC
Created attachment 101741 [details]
ll /sys/class/thermal/thermal_zone*/

after the problem occurs (after resume, fan comes on full)
Comment 3 Zhang Rui 2013-05-20 03:07:07 UTC
Created attachment 101971 [details]
debug patch to check cooling state transition in step_wise governor

please apply this patch and attach the dmesg output after the problem occurs after resume.
Comment 4 Jake Edge 2013-05-20 03:58:27 UTC
Created attachment 101981 [details]
dmesg from boot through resume

The patch does not apply to 3.9 (no thermal_core.c), but I applied it to thermal_sys.c successfully.  Here is the dmesg from boot through the sleep (around 60s in) and after the resume.
Comment 5 Zhang Rui 2013-05-20 06:02:54 UTC
Okay, I think I've found the problem.

First, In ACPI fan driver, it turns on all fans during suspend.
And update the power state, i.e. make sure they are on again during resume.

And here is how the problem occurs,
before suspend, the fan is in off state, thus the thermal framework thought the fan is off.
after resume, there is an thermal notification and thermal framework starts to update the thermal zones. But the temperature returned is equal to the temperature captured last time, which is before suspend, thus the thermal trend is stabling, then thermal core will keep the fan state as it is, without suspend/resume, the thermal framework will keep the fan in OFF state. but after suspend/resume. the thermal framework will keep the fan in ON state...

please applied the refreshed debug patch and the fix patch attached later.
Comment 6 Zhang Rui 2013-05-20 09:19:24 UTC
Created attachment 102011 [details]
debug patch to check cooling state transition in step_wise governor - v2

please apply this refreshed debug patch instead.
Comment 7 Zhang Rui 2013-05-20 09:21:40 UTC
Created attachment 102021 [details]
patch: change cooling device state based on cached value instead of real state

please try this patch which I think should fix the problem for you.

Note that I'm still thinking of a proper solution for this problem. thus this is probably not the real fix that target for upstream.

But anyway, please check if it helps or not.
Comment 8 Jake Edge 2013-05-20 14:19:26 UTC
that fixed the problem.  multiple suspend/resumes without the fans coming on full blast.  also, temp6 is not at 100° as it had been. thanks!
Comment 9 Zhang Rui 2013-05-27 04:56:08 UTC
H, Jake,

The 4 patches attached below are the ones I proposed to fix the problem for upstream.
can you please try them WITHOUT the patch in comment #7 to see if they help?
Comment 10 Zhang Rui 2013-05-27 04:59:38 UTC
Created attachment 102621 [details]
Patch 1/4
Comment 11 Zhang Rui 2013-05-27 04:59:56 UTC
Created attachment 102631 [details]
patch 2/4
Comment 12 Zhang Rui 2013-05-27 05:00:15 UTC
Created attachment 102641 [details]
patch 3/4
Comment 13 Zhang Rui 2013-05-27 05:00:32 UTC
Created attachment 102651 [details]
patch 4/4
Comment 14 Zhang Rui 2013-05-30 01:47:56 UTC
please apply the patch at
https://patchwork.kernel.org/patch/2633071/
on top of the four patches and see if they help.
Comment 15 Jake Edge 2013-06-03 16:00:47 UTC
Built 3.10-rc4, which exhibited the problem (no surprise) ... added the four patches above and the fan was on at a low level right after booting and stayed that way with a basically idle system.  After suspend then resume, the fan was on at a higher level, but not at full blast as it has been in the past.

added the patch from patchwork, at boot time, no fans are running.  after suspend/resume, the fan comes on briefly (at a low level) and then turns off.  Seems like the last patch fixes things reasonably ...
Comment 16 Zhang Rui 2013-06-21 08:09:59 UTC
Hi, Jake,

please also try this series on top of a clean upstream kernel, or just pull thermal -next branch and see if the problem still exists.
https://patchwork.kernel.org/patch/2733361/
https://patchwork.kernel.org/patch/2733371/
https://patchwork.kernel.org/patch/2733391/
https://patchwork.kernel.org/patch/2733401/
https://patchwork.kernel.org/patch/2733411/
https://patchwork.kernel.org/patch/2733421/
Comment 17 Jake Edge 2013-06-27 17:46:50 UTC
I built 3.10-rc7 and confirmed that it still has the problem (no surprise), though it does have the fan on at a low level after booting, but it seems to turn off the fan after a minute or two.  I then added these 6 patches, booted, slept, resumed and the fan came on full blast :(

So these patches don't fix the problem for me.  What info do you need?

thanks
Comment 18 Grzegorz Ojrzanowski 2013-07-07 13:54:48 UTC
I've experienced this bug just after switching to 3.7.x.
After resume all Fan-type cooling devices's states were set to 1's, however setting them to 0's caused overheating because the fan never turned on again.

Since 3.10.0 (and its RC's) it got a little better. Most of the cooling devices are properly set after resume, except two, which still put the fan at full speed.
I don't know whether it's relevant but they have different than the other five /sys/devices/virtual/thermal/cooling_deviceX/device/path values: _TZ_.C3B1 and _TZ_.C3B2, while the five which now work properly have values from _TZ_.C3C8 to _TZ_.C3CC

Since sensor labeled temp6 seems to report not the actual temperature but fan speed, I came up with these values:

coolingdevice	path		temp6_value	cur_state after resume on 3.10
0		\_TZ_.C3B1	100		1
1		\_TZ_.C3B2	70		1
2		\_TZ_.C3C8	100		0 or 1 if still needed
3		\_TZ_.C3C9	90		0 or 1 if still needed
4		\_TZ_.C3CA	70		0 or 1 if still needed
5		\_TZ_.C3CB	50		0 or 1 if still needed
6		\_TZ_.C3CC	30		0 or 1 if still needed

_TZ_.C3B1 and _TZ_.C3B2 seem to put the fan in the same speeds as _TZ_.C3C8 and _TZ_.C3CA do. I've checked on 3.6.11 and the _TZ_.C3B's are never used even at 100% cpu load with the machine stuffed under a pillow.
Comment 19 Michael Großhäuser 2013-08-07 17:05:56 UTC
The bug is also in 3.11 rc3
Comment 20 Aaron Lu 2013-12-19 07:17:27 UTC
Hi Rui,

What's the status of the patches?
Comment 21 auxsvr 2014-01-17 05:21:06 UTC
Any update on this? I'm forced to stick to an old kernel because of this bug and the patches do not apply to any kernel that openSUSE supports.
Comment 22 Rafael J. Wysocki 2014-05-07 18:35:19 UTC
Can you please try 3.15-rc4?  We've made a change to the ACPI fan driver that may affect this.
Comment 23 Jake Edge 2014-05-08 00:20:14 UTC
Well, it's better, but not completely fixed I would say.

I booted 3.15-rc4 and the fan came on at a low level (which it shouldn't in my opinion).  But, then I slept it, and resumed -- the fan came back on at the low level and fairly quickly slowed down and stopped.  I slept it again and got the same behavior.  So I don't think it should come on at boot (and stay on ... I gave it a few minutes but it never shut down).  FWIW, it comes on as soon as power is applied, before it even gets out of the BIOS ... so maybe Linux just needs to turn it off at boot time (if it isn't too hot) ... here is the fan status (which, interestingly doesn't change) and sensors output before and after:

[root@ouzel talks]# cat /sys/bus/acpi/drivers/fan/PNP0C0B\:0?/thermal_cooling/cu
r_state
0
0
0
0
0
0
1
[root@ouzel talks]# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +25.0°C  (crit = +70.0°C)
temp2:        +50.0°C  (crit = +256.0°C)
temp3:        +51.0°C  (crit = +110.0°C)
temp4:        +39.0°C  (crit = +105.0°C)
temp5:        +28.1°C  (crit = +110.0°C)
temp6:        +30.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +47.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:       +48.0°C  (high = +100.0°C, crit = +100.0°C)

<sleep - resume >

[root@ouzel talks]# cat /sys/bus/acpi/drivers/fan/PNP0C0B\:0?/thermal_cooling/cu
r_state
0
0
0
0                                                                               
0                                                                               
0                                                                               
1                                                                               
[root@ouzel talks]# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +25.0°C  (crit = +70.0°C)
temp2:        +50.0°C  (crit = +256.0°C)
temp3:        +50.0°C  (crit = +110.0°C)
temp4:        +38.0°C  (crit = +105.0°C)
temp5:        +28.2°C  (crit = +110.0°C)
temp6:        +30.0°C  (crit = +110.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +46.0°C  (high = +100.0°C, crit = +100.0°C)
Core 1:       +48.0°C  (high = +100.0°C, crit = +100.0°C)


what other info do you need?
Comment 24 Rafael J. Wysocki 2014-05-08 09:30:13 UTC
Well, so the behavior seems to have changed with respect to the bug subject and description.

I wonder if we should continue debugging it here or open a new bug?
Comment 25 Jake Edge 2014-05-08 14:26:38 UTC
I'm open to most anything, though I don't really use that laptop any more (except rarely for scanning, since Fedora seems to have broken that in 19 and 20, sigh, but I digress) ... want me to open a new bug?  or is there an existing bug for the fan not being turned off at boot?  or we could just drop it ...

thanks,

jake
Comment 26 Rafael J. Wysocki 2014-05-08 17:14:06 UTC
Well, depending on how much effort you're willing to spend on that. :-)

It will involve getting some debug info from that box and trying to figure out what's wrong with it and that may be a couple of things ...
Comment 27 Jake Edge 2014-05-08 17:38:13 UTC
Well, I am certainly willing to do some debugging, so I filed another: bug #75741 ... hopefully tracking this down will help others since I don't use that laptop much any more ...
Comment 28 Rafael J. Wysocki 2014-05-08 19:03:04 UTC
OK, I'll mark this one as resolved.  Hopefully, it won't regress again suspend-wise.