See https://bugzilla.redhat.com/show_bug.cgi?id=895276 for more details and additional thread on lkml: http://lkml.indiana.edu/hypermail/linux/kernel/1305.1/03870.html Summary is that post 3.7 up to 3.8.13, I see cooling devices corresponding to fan levels turn on after a resume from S3 and don't turn off. The bug appears to be fixed in 3.9.1 and on and 3.10-rc1 Copying my comment #32 from above bugzilla: ------------------------------- I see this problem on the Samsung 550 Chromebook (on the Pixel we either have the EC auto-set fan speed, or have user-space communicate directly with the EC to control fan speeds) It looks like this was introduced somewhere in this merge: Merge: 125c4c7 c072fed Author: Len Brown <len.brown@intel.com> AuthorDate: Tue Oct 9 01:35:52 2012 -0400 Commit: Len Brown <len.brown@intel.com> CommitDate: Tue Oct 9 01:35:52 2012 -0400 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux into thermal Conflicts: drivers/staging/omap-thermal/omap-thermal-common. OMAP supplied dummy TC1 and TC2, at the same time that the thermal tree removed them from thermal_zone_device_register() drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c propogate the upstream MAX_IDR_LEVEL re-name to prevent a build failure Previously-fixed-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Len Brown <len.brown@intel.com> but unfortunately, those patches within that merge branch aren't completing a suspend-resume for me, so I can't bisect down to a specific patch, but it's probably one of the ones to the generic thermal framework. I will try to rebase those patches later and see which one is the culprit later. --------------------------- This patch seemed to fix the problem in 3.9: commit 94a409319561ec1847fd9bf996a2d5843ad00932 Author: Zhang Rui <rui.zhang@intel.com> Date: Fri Apr 26 09:19:53 2013 +0000 ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points Commit 4ae46be "Thermal: Introduce thermal_zone_trip_update()" introduced a regression causing the fan to be always on even when the system is idle. however, I'm still seeing it in 3.8.13, which has that patch.
(In reply to comment #0) > See https://bugzilla.redhat.com/show_bug.cgi?id=895276 for more details > > and additional thread on lkml: > http://lkml.indiana.edu/hypermail/linux/kernel/1305.1/03870.html > > Summary is that post 3.7 up to 3.8.13, I see cooling devices corresponding to > fan levels turn on after a resume from S3 and don't turn off. > > The bug appears to be fixed in 3.9.1 and on and 3.10-rc1 > what do you mean? the problem can not be reproduced with 3.10-rc1 kernel?
(In reply to comment #1) > (In reply to comment #0) > > See https://bugzilla.redhat.com/show_bug.cgi?id=895276 for more details > > > > and additional thread on lkml: > > http://lkml.indiana.edu/hypermail/linux/kernel/1305.1/03870.html > > > > Summary is that post 3.7 up to 3.8.13, I see cooling devices corresponding > to > > fan levels turn on after a resume from S3 and don't turn off. > > > > The bug appears to be fixed in 3.9.1 and on and 3.10-rc1 > > > what do you mean? > the problem can not be reproduced with 3.10-rc1 kernel? if the problem still exists. please do the same test as requested at https://bugzilla.kernel.org/show_bug.cgi?id=58301#c3
(In reply to comment #2) > > if the problem still exists. please do the same test as requested at > https://bugzilla.kernel.org/show_bug.cgi?id=58301#c3 please try the refreshed patch in comment #6 and #7 in bug #58301, and see if it helps or not.
That patch doesn't seem to apply to my 3.8 tree: patching file drivers/thermal/step_wise.c Hunk #1 FAILED at 119. Hunk #2 succeeded at 154 (offset -11 lines). 1 out of 2 hunks FAILED -- saving rejects to file drivers/thermal/step_wise.c.rej can't find file to patch at input line 53 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- |diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c |index d755440..aeca923 100644 |--- a/drivers/thermal/thermal_core.c |+++ b/drivers/thermal/thermal_core.c
Created attachment 102281 [details] A thermal logging patch for 3.8 Here's a version of the logging patch that I created for 3.8
Created attachment 102301 [details] Log for 3.8.13 across resume with previous patch Here's a log of a 3.8.13 with the patch from 102281 enabled
This is the state after resume assosicated with logs in comment #6 (none) ~ # cat /sys/class/thermal/cooling_device[01234]/cur_state 1 1 1 0 1
(In reply to comment #4) > That patch doesn't seem to apply to my 3.8 tree: > > patching file drivers/thermal/step_wise.c > Hunk #1 FAILED at 119. > Hunk #2 succeeded at 154 (offset -11 lines). > 1 out of 2 hunks FAILED -- saving rejects to file > drivers/thermal/step_wise.c.rej > can't find file to patch at input line 53 > Perhaps you used the wrong -p or --strip option? > The text leading up to this was: > -------------------------- > |diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > |index d755440..aeca923 100644 > |--- a/drivers/thermal/thermal_core.c > |+++ b/drivers/thermal/thermal_core.c you should replace thermal_core.c with thermal_sys.c in the patch.
Replacing thermal_core.c with thermal_sys.c still doesn't apply: Applying: Debug patch to check cooling state transition. error: patch failed: drivers/thermal/step_wise.c:119 error: drivers/thermal/step_wise.c: patch does not apply error: patch failed: drivers/thermal/thermal_sys.c:441 error: drivers/thermal/thermal_sys.c: patch does not apply Patch failed at 0001 Debug patch to check cooling state transition. The copy of the patch that failed is found in: /home/sonnyrao/trunk/src/third_party/kernel-next/.git/rebase-apply/patch When you have resolved this problem, run "git am --resolved". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". can you look at the patch in the attachment in comment #6 and see if that is roughly equivalent? Also, I didn't have patch7 (since it didn't apply) in there when I made the log the first time, I made a port of it to 3.8: diff --git a/drivers/thermal/step_wise.c b/drivers/thermal/step_wise.c index 6840230..6b7ac56 100644 --- a/drivers/thermal/step_wise.c +++ b/drivers/thermal/step_wise.c @@ -51,6 +51,9 @@ static unsigned long get_target_state(struct thermal_instance *instance, } else if (trend == THERMAL_TREND_DROPPING) { cur_state = cur_state > instance->lower ? (cur_state - 1) : instance->lower; + } else if (trend == THERMAL_TREND_STABLE) { + /* Keep the cooling state as it is */ + cur_state = instance->target; } printk("%s: new cur_state %d\n", __func__, cur_state); return cur_state; but that still didn't fix the issue.
Also, I just rebased the patches that I mentioned above, instead of using a merge commit, and bisected that down to this commit (sha won't match since rebased) 7b4977c3b139b9096ad17083ebd6de00d94804e7 is the first bad commit commit 7b4977c3b139b9096ad17083ebd6de00d94804e7 Author: Zhang Rui <rui.zhang@intel.com> Date: Wed Jun 27 14:13:04 2012 +0800 Thermal: Introduce simple arbitrator for setting device cooling state This fixes the problem that a cooling device may be referenced by by multiple trip points in multiple thermal zones. With this patch, we have two stages for updating a thermal zone, 1. check if a thermal_instance needs to be updated or not 2. update the cooling device, based on the target cooling state of all its instances. Note that, currently, the cooling device is set to the deepest cooling state required. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Eduardo Valentin <eduardo.valentin@ti.com>
I'm starting to think that the problem could be that cdev instances which are "inactive" according to linux state really are active, because they were activated by acpi_fan_suspend() in drivers/acpi/fan.c static int acpi_fan_suspend(struct device *dev) { if (!dev) return -EINVAL; acpi_bus_set_power(to_acpi_device(dev)->handle, ACPI_STATE_D0); return AE_OK; } that codes sets the state to D0 just prior to suspend, and they are still in D0 coming out of suspend into resume, but as far as the thermal zone code is concerned, they are marked inactive, so it does nothing to them.
yes, that is the problem. and my patch in comment #7 bug #50301 should fix it. please forget about the debug patch and try the fix patch only.
I don't know if it is direct related, but I see the same problem (patch not tested and no know how (yet) how to do that). But what I see on Ubuntu 12.10 with 3.9.2 custom compiled kernel is after suspend no fancontrol starting. I have had the sleepd script, but that was not always working. By accident I came across this threat. Did test this (before and after suspend, both resulting in "0"): cat /sys/class/thermal/cooling_device[01234]/cur_state Furthermore, no errors in logs. Main problem is before suspend the fancontrol is running smooth (by a fresh startup) after suspend they are running at max speed. Before suspend an RPM of 3k and after suspend 6k. The only thing I conclude so far is that it ain't been fixed in 3.9.2 Can you clarify how to proceed?
(In reply to comment #12) > yes, that is the problem. > and my patch in comment #7 bug #50301 should fix it. > please forget about the debug patch and try the fix patch only. sorry I made a mistake here. The fix patch should be in #7 bug #58301. https://bugzilla.kernel.org/show_bug.cgi?id=58301#c7
(In reply to comment #14) > Can you clarify how to proceed? please try the patch at https://bugzilla.kernel.org/show_bug.cgi?id=58301#c7 and see if it helps.
Okay, just recompiled kernel with option (fan from acpi on -> just to be sure named here) Furthermore the step_wise.c as followed: case THERMAL_TREND_DROP_FULL: if (cur_state == instance->lower) { if (!throttle) cur_state = -1; } else cur_state = instance->lower; break; case THERMAL_TREND_STABLE: /* Keep the cooling state as it is */ cur_state = instance->target; default: break; } This is not working. Before suspend my service fancontrol is running fine. However after suspend it is not comming back online. This needs to be done by manual restart. Bios all auto options are disabled. Please advise (also if more logs are needed)
(In reply to comment #17) > > This is not working. Before suspend my service fancontrol is running fine. Hmm, I do not know how this works. But as the fan should be controlled by kernel driver, can you reproduce the problem without this service?
BTW, please try the 4 patches attached in bug #58301 and see if they help or not.
please apply the patch at https://patchwork.kernel.org/patch/2633071/ on top of the four patches and see if the problem still exists.
I did a reverse bisect of 3.8 -> 3.9 + the thermal trend rising patch applied during the bisect, which seemed to fix the problem on 3.9.2, just to try to narrow down the combination of patches which fix the problem for me on 3.8 and it bisected to this patch: commit 3dbfff3dfe6714aeefb615c65bec0800dc5a4c51 Author: Zhang Rui <rui.zhang@intel.com> Date: Mon Nov 19 16:10:20 2012 +0800 Introduce THERMAL_TREND_RAISE/DROP_FULL support for step_wise governor the patch right after this, where the problem is fixes is: commit b8bb6cb999858043489c1ddef08eed2127559169 Author: Zhang Rui <rui.zhang@intel.com> Date: Thu Nov 22 15:45:02 2012 +0800 step_wise: Unify the code for both throttle and dethrottle Below is the thermal trend rising patch which fixes the problem for me in 3.9.2 but not 3.8.13 by itself. commit 94a409319561ec1847fd9bf996a2d5843ad00932 Author: Zhang Rui <rui.zhang@intel.com> Date: Fri Apr 26 09:19:53 2013 +0000 ACPI / thermal: do not always return THERMAL_TREND_RAISING for active trip points so, if I cherry pick in these three patches onto 3.8.13, then the problem is fixed (which also contains the THERMAL_TREND_RAISING patch above): commit d069015e268bcac6c8bd8997b3235f5f977d3ab6 Author: Zhang Rui <rui.zhang@intel.com> Date: Mon Nov 19 15:33:51 2012 +0800 Introduce THERMAL_TREND_RAISE_FULL and THERMAL_TREND_DROP_FULL commit 3dbfff3dfe6714aeefb615c65bec0800dc5a4c51 Author: Zhang Rui <rui.zhang@intel.com> Date: Mon Nov 19 16:10:20 2012 +0800 Introduce THERMAL_TREND_RAISE/DROP_FULL support for step_wise governor commit b8bb6cb999858043489c1ddef08eed2127559169 Author: Zhang Rui <rui.zhang@intel.com> Date: Thu Nov 22 15:45:02 2012 +0800 step_wise: Unify the code for both throttle and dethrottle and I confirmed that the last patch is indeed required, which was a bit surprising.
(In reply to comment #21) > I did a reverse bisect of 3.8 -> 3.9 + the thermal trend rising patch applied > during the bisect, which seemed to fix the problem on 3.9.2, just to try to > narrow down the combination of patches which fix the problem for me on 3.8 > and > it bisected to this patch: > > > commit 3dbfff3dfe6714aeefb615c65bec0800dc5a4c51 > Author: Zhang Rui <rui.zhang@intel.com> > Date: Mon Nov 19 16:10:20 2012 +0800 > > Introduce THERMAL_TREND_RAISE/DROP_FULL support for step_wise governor > > > the patch right after this, where the problem is fixes is: > commit b8bb6cb999858043489c1ddef08eed2127559169 > Author: Zhang Rui <rui.zhang@intel.com> > Date: Thu Nov 22 15:45:02 2012 +0800 > > step_wise: Unify the code for both throttle and dethrottle > > > Below is the thermal trend rising patch which fixes the problem for me in > 3.9.2 > but not 3.8.13 by itself. > so the thermal trend patch fixes the problem for 3.9. and the thermal trend patch plus commit b8bb6cb fixes the problem in 3.8 stable kernel, right? > commit 94a409319561ec1847fd9bf996a2d5843ad00932 > Author: Zhang Rui <rui.zhang@intel.com> > Date: Fri Apr 26 09:19:53 2013 +0000 > > ACPI / thermal: do not always return THERMAL_TREND_RAISING for active > trip > points > > > so, if I cherry pick in these three patches onto 3.8.13, then the problem is > fixed (which also contains the THERMAL_TREND_RAISING patch above): > > commit d069015e268bcac6c8bd8997b3235f5f977d3ab6 > Author: Zhang Rui <rui.zhang@intel.com> > Date: Mon Nov 19 15:33:51 2012 +0800 > > Introduce THERMAL_TREND_RAISE_FULL and THERMAL_TREND_DROP_FULL > > commit 3dbfff3dfe6714aeefb615c65bec0800dc5a4c51 > Author: Zhang Rui <rui.zhang@intel.com> > Date: Mon Nov 19 16:10:20 2012 +0800 > > Introduce THERMAL_TREND_RAISE/DROP_FULL support for step_wise governor > > > commit b8bb6cb999858043489c1ddef08eed2127559169 > Author: Zhang Rui <rui.zhang@intel.com> > Date: Thu Nov 22 15:45:02 2012 +0800 > > step_wise: Unify the code for both throttle and dethrottle > > and I confirmed that the last patch is indeed required, which was a bit > surprising. "the last patch", I suppose you mean commit b8bb6cb99, right?
(In reply to comment #22) > (In reply to comment #21) > > I did a reverse bisect of 3.8 -> 3.9 + the thermal trend rising patch > applied > > during the bisect, which seemed to fix the problem on 3.9.2, just to try to > > narrow down the combination of patches which fix the problem for me on 3.8 > and > > it bisected to this patch: > > > > > > commit 3dbfff3dfe6714aeefb615c65bec0800dc5a4c51 > > Author: Zhang Rui <rui.zhang@intel.com> > > Date: Mon Nov 19 16:10:20 2012 +0800 > > > > Introduce THERMAL_TREND_RAISE/DROP_FULL support for step_wise governor > > > > > > the patch right after this, where the problem is fixes is: > > commit b8bb6cb999858043489c1ddef08eed2127559169 > > Author: Zhang Rui <rui.zhang@intel.com> > > Date: Thu Nov 22 15:45:02 2012 +0800 > > > > step_wise: Unify the code for both throttle and dethrottle > > > > > > Below is the thermal trend rising patch which fixes the problem for me in > 3.9.2 > > but not 3.8.13 by itself. > > > so the thermal trend patch fixes the problem for 3.9. > and the thermal trend patch plus commit b8bb6cb fixes the problem in 3.8 > stable > kernel, right? The thermal trend patch did fix it in 3.9, and thermal trend + all 3 of those patches below fix it in 3.8 > > > commit 94a409319561ec1847fd9bf996a2d5843ad00932 > > Author: Zhang Rui <rui.zhang@intel.com> > > Date: Fri Apr 26 09:19:53 2013 +0000 > > > > ACPI / thermal: do not always return THERMAL_TREND_RAISING for active > trip > > points > > > > > > so, if I cherry pick in these three patches onto 3.8.13, then the problem > is > > fixed (which also contains the THERMAL_TREND_RAISING patch above): > > > > commit d069015e268bcac6c8bd8997b3235f5f977d3ab6 > > Author: Zhang Rui <rui.zhang@intel.com> > > Date: Mon Nov 19 15:33:51 2012 +0800 > > > > Introduce THERMAL_TREND_RAISE_FULL and THERMAL_TREND_DROP_FULL > > > > commit 3dbfff3dfe6714aeefb615c65bec0800dc5a4c51 > > Author: Zhang Rui <rui.zhang@intel.com> > > Date: Mon Nov 19 16:10:20 2012 +0800 > > > > Introduce THERMAL_TREND_RAISE/DROP_FULL support for step_wise governor > > > > > > commit b8bb6cb999858043489c1ddef08eed2127559169 > > Author: Zhang Rui <rui.zhang@intel.com> > > Date: Thu Nov 22 15:45:02 2012 +0800 > > > > step_wise: Unify the code for both throttle and dethrottle > > > > and I confirmed that the last patch is indeed required, which was a bit > > surprising. > > "the last patch", I suppose you mean commit b8bb6cb99, right? yes
I see. Close this bug report as the problem has been fixed in upstream kernel. will backport the three patches to stable tree.
Sorry to reactivate such an old bug report, but I fear that this bug came back in kernel 3.17.x and is still alive in 4.2.x (I did not check 4.3.x or later yet) I confirm that it was fixed at least in 3.13.x and 3.16.x. Please comment if you think it is a regression or a new bug, and if I shall open a new bug report for it. Some infos from affected users: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1504751 http://ubuntuforums.org/showthread.php?t=2294906
At this point someone with a device that exhibits the problem probably needs to run a git bisect on it to find out when it broke again.