Bug 48791 - i915 doesn't reach RC6 state after s2ram - Thinkpad T420, X220
i915 doesn't reach RC6 state after s2ram - Thinkpad T420, X220
Status: RESOLVED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel)
All Linux
: P1 normal
Assigned To: intel-gfx-bugs@lists.freedesktop.org
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-14 14:27 UTC by Toralf Förster
Modified: 2014-03-04 12:55 UTC (History)
18 users (show)

See Also:
Kernel Version: 3.10
Tree: Mainline
Regression: Yes


Attachments

Description Toralf Förster 2012-10-14 14:27:20 UTC
I'm observed sometimes hot temperature at my ThinkPad T420 (i5-2540M CPU) w/ integrated intel graphic (i0915 module) after wakeup from s2ram.

Powertop-2.1 shows that the GPU is at 100% - seems that the RC6 state isn't reached any longer.

S2ram again helüs.

BTW 3.5.x is fine -
Comment 1 Florian Bruhin 2012-10-15 16:56:37 UTC
I can confirm this on 3.6.2 with Archlinux on a Thinkpad X220 Tablet with an i7-2620M.
Comment 2 Toralf Förster 2012-10-15 17:17:46 UTC
Furthermore it happens both under a RHEL 64 bit kernel and a stable Gentoo x86 system.
Comment 3 Toralf Förster 2012-10-22 20:10:05 UTC
With 3..6.3 this issue becomes more and more annoying. About 50% of all reboots yields into a system with hot CPUs (86 °C) and a fan running at high speed.
I have to s2ram the system and to wakeup it to get a normal state back.
Comment 4 Toralf Förster 2012-10-26 09:04:15 UTC
Please could somebody point me to a /sys or /proc file where the current state is stored. I'm planning to add a warning to my s2ram/s2disk acpi scripts about this to avoid over-heating/fan-damage at my system.
Comment 5 Florian Bruhin 2012-10-30 19:42:54 UTC
Related bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=54089
https://bugzilla.kernel.org/show_bug.cgi?id=48721

Trying out the patches mentioned there. The patch at https://bugs.freedesktop.org/show_bug.cgi?id=54089#c26 didn't work so far.
Comment 6 Toralf Förster 2012-10-31 20:31:28 UTC
patches don't help /can't be applied - will close this a s a dup

*** This bug has been marked as a duplicate of bug 48721 ***
Comment 7 Toralf Förster 2013-01-17 20:54:44 UTC
due to https://bugzilla.kernel.org/show_bug.cgi?id=48721#c36 probably neither dup nor solved.
Comment 8 Thomas Kahle 2013-01-17 21:04:45 UTC
This behavior can be confirmed with newer versions of PowerTOP which displays GPU state statistics on the second tab.  For reference, I'm using version 2.20.17 of the intel xf86 driver.
Comment 9 Toralf Förster 2013-01-17 21:18:38 UTC
(In reply to comment #8)
> This behavior can be confirmed with newer versions of PowerTOP which displays
> GPU state statistics on the second tab.  For reference, I'm using version
> 2.20.17 of the intel xf86 driver.

A simple script is :

#!/bin/sh
#

ACTION=$1

S=/sys/class/drm/card0/power/rc6pp_residency_ms

B=`cat $S`
sleep 3
A=`cat $S`

if [[ $A -eq $B ]]; then
        if [[ -n "$ACTION" ]]; then
                $ACTION
        else
                echo "
                *
                *
                *       RC6 issue
                *
                *
                "
                aplay /usr/share/sounds/pop.wav
        fi
fi
Comment 10 Thomas Kahle 2013-01-25 09:01:45 UTC
Lukas Hejtmanek who commented on this bug: https://bugs.freedesktop.org/show_bug.cgi?id=54089
Seems to have exactly the same problem.
It may be some interaction of kernel and graphics driver.  I'm running the 3.4 series at the moment with xf86-video-intel-2.20.17.  There the problem does not occur.  If I upgrade to a recent 3.8 rc then it does happen. 
I'm not sure if the patch mentioned in the linked bug already landed in that particular graphics driver.  It was added to Gentoo on Jan 10th.
Comment 11 Thomas Kahle 2013-01-25 12:50:37 UTC
It may be that this patch here solves the issue for me.  So far it looks good:
https://bugzilla.kernel.org/show_bug.cgi?id=52411
Comment 12 Thomas Kahle 2013-01-25 15:57:39 UTC
OK, whatever my issue is, I reproduced it with 3.8-rc4 and the patch from bug 52411.
Comment 13 Jonas Jelten 2013-01-30 17:39:32 UTC
reproduced it with a Thinkpad X220 Tablet, Core i5-2520M, kernel 3.8.0-rc5, GPU 100% active in powertop and /sys/class/drm/card0/power/rc6_residency_ms
Comment 14 Thomas Kahle 2013-01-30 19:17:58 UTC
Interestingly I reproduced the issue with my "backup" kernel 3.4.24 too.  With this kernel powertop does not show the rc6 state, but it seems very likely that this is the same issue:  On a small fraction of resumes the power usage spikes (20W instead of 9W) and going through another suspend/resume cycle solves the problem (with reasonable probability).

If anyone tells me what to do to narrow it down further, I'll be happy to help. Because of the low frequency with which the problem happens, bisecting seems to be out of reach.
Comment 15 Toralf Förster 2013-01-30 19:45:04 UTC
(In reply to comment #14)
> Interestingly I reproduced the issue with my "backup" kernel 3.4.24 too

Interesting - I'm pretty sure I didn't observed a high power consumption during my v3.4.x usage (I'd definitely realized that b/c of the noisy fan of my ThinkPad T420) - and b/c v3.6 was released in Oct 2012 and I closely follow the mainline this means that there's a chance that this behaviour was backported with a stable patch after that.

Said that something between v3.4.12 and v3.2.24 could contain this bug ?
Comment 16 Thomas Kahle 2013-01-30 19:54:37 UTC
I can't exclude the backporting of the problem!  For long time I used 3.4.2 where I can't remember to have noticed the problem (but I'm not 100% sure, my fan is silent).
Comment 17 Toralf Förster 2013-02-01 17:39:36 UTC
FWIW I tried this (for 3.7.6 queued) patch :

https://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob;f=queue-3.7/drm-i915-fix-forcewake-posting-reads.patch;h=72ec3dc35e5fa46d38e73beac8a3f868ec92418d;hb=dead61ec5305d9c2aa485da37012aea4762d41d7

applied on top of 3.7.5 at a stable Gentoo but w/o success. 

Even more with 3.7.5 I do now run into this issue at nearly every boot phase and after s2disk (however not after s2ram)
Comment 18 Kai Hendry 2013-02-05 15:42:16 UTC
I think this bug is related to https://bugzilla.kernel.org/show_bug.cgi?id=52411
I don't know how to mark this bug as being related.
Comment 19 Jani Nikula 2013-02-05 15:56:14 UTC
(In reply to comment #18)
> I think this bug is related to
> https://bugzilla.kernel.org/show_bug.cgi?id=52411
> I don't know how to mark this bug as being related.

Just like that, for example. ;)

However, Toralf says in comment #17 that the commit that fixed bug 52411 did not fix this one for him.
Comment 20 Toralf Förster 2013-02-05 17:03:39 UTC
(In reply to comment #19)
> However, Toralf says in comment #17 that the commit that fixed bug 52411 did
> not fix this one for him.
Right - and to make it clear - it fails w/ s2ram too, but not every time (and not every reboot is affected) - however - if I run into this issue eg. after an s2ram, then I have to s2ram again *and* have to wait few seconds. If I immediately after s2ram do wake up my system then the issue appeared again.
Comment 21 Jani Nikula 2013-02-14 09:49:32 UTC
FWIW, can't reproduce this with X220T running 3.8.0-rc7.
Comment 22 Toralf Förster 2013-02-14 21:21:06 UTC
could this points to the culprit : https://lkml.org/lkml/2013/2/12/5 ?
Comment 23 Thomas Kahle 2013-02-15 08:36:59 UTC
Huh?  Your link is about the CPU, our problem here is the GPU (and was not present for me before the 3.5 series)
Comment 24 Elvis Stansvik 2013-02-17 14:34:44 UTC
I'm running 3.8.0-RC7 on an X220 and I think I'm still seeing this issue.

powertop says my GPU is ~70% in active state when the system is idling :/
Comment 25 Jani Nikula 2013-02-19 13:15:22 UTC
(In reply to comment #24)
> I'm running 3.8.0-RC7 on an X220 and I think I'm still seeing this issue.
> 
> powertop says my GPU is ~70% in active state when the system is idling :/

Not reaching rc6 at all (like in this bug report) is not the same as reaching 30% rc6.
Comment 26 Jani Nikula 2013-02-19 13:57:11 UTC
Please try the patch at https://bugs.freedesktop.org/attachment.cgi?id=74922&action=edit
Comment 27 Toralf Förster 2013-02-19 15:51:58 UTC
(In reply to comment #26)
> Please try the patch at
> https://bugs.freedesktop.org/attachment.cgi?id=74922&action=edit

It would not apply against 3.7.9 :

sudo patch -p1 --dry-run < /home/tfoerste/devel/0001-drm-i915-Fix-SNB-RC6-init-sequence.patch 
patching file drivers/gpu/drm/i915/i915_reg.h
Hunk #1 succeeded at 4140 (offset -41 lines).
Hunk #2 succeeded at 4210 (offset -43 lines).
Hunk #3 succeeded at 4226 with fuzz 2 (offset -43 lines).
patching file drivers/gpu/drm/i915/intel_pm.c
Hunk #1 succeeded at 2479 (offset -117 lines).
Hunk #2 FAILED at 2605.
Hunk #3 FAILED at 4475.
2 out of 3 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/intel_pm.c.rej
Comment 28 Kai Hendry 2013-02-20 00:12:35 UTC
Woke my 3.8.0-1-rc7-mainline-dirty X220 machine from suspend this morning and GPU is 100% active.
http://stats.webconverger.org/x220/temp/051.png
Comment 29 Toralf Förster 2013-02-22 15:04:55 UTC
vanilla-3.8.0 still suffers from this
Comment 30 Tomas Janousek 2013-02-24 01:55:57 UTC
(In reply to comment #25)
> Not reaching rc6 at all (like in this bug report) is not the same as reaching
> 30% rc6.

Is there a bugzilla/launchpad/whatever for that issue, somewhere? I've been stuck with 3.3.8 for months as it's the last power-friendly kernel for ThinkPad T420. :-(
Comment 31 João Gomes 2013-04-20 11:18:55 UTC
I tried kernel 3.8.8 in a samsung series 5 and it seems that the problem is still present.
I also noticed that if I turn the laptop on and I take some time to login, it will enter in the same condition, with the GPU not going to rc6.
Comment 32 Toralf Förster 2013-05-04 13:12:26 UTC
w/ 3.8.8 I didn't observed it any longer here at a ThinkPad T420 with integrated intel graphic (only). - and with 3.9.0 in addition I did not observed it (yet).
Comment 33 Daniel Vetter 2013-05-06 08:31:09 UTC
Can you please all test whether

https://patchwork.kernel.org/patch/2481431/

improves the rc6 behaviour?
Comment 34 Toralf Förster 2013-05-25 08:43:57 UTC
issue is back here on a stable 32 bit Gentoo Linux, somewhere in 3.9.2-3.9.4 it started again IMO.
Comment 35 Jonas Jelten 2013-05-25 15:33:33 UTC
got hit again by the bug after resuming from ram. another s2r cycle turned it back to normal.

Linux 3.10.0-rc2-JJ #1 SMP PREEMPT Tue May 21 16:50:48 CEST 2013 x86_64 Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz GNU/Linux
Comment 36 Tomas Janousek 2013-05-29 17:44:11 UTC
I can confirm that this happens way more often on 3.9.4 than it did on 3.9.0 and 3.9.2. Also, I've been running 3.9.4+https://patchwork.kernel.org/patch/2481431/ for the last 21 hours and I hit the problem just now. So no, it doesn't help, I'm afraid. :-(
Comment 37 David Kellum 2013-06-19 16:22:01 UTC
Confirmed and reproduced with various 3.6+ kernels on Thinkpad T520, i7-2640M, i915 only, on Fedora. I've stayed on Fedora 17 and started building my own late series 3.4.x kernels to avoid this. Currently no issue on the following. Thankfully the issue has not been back-ported to 3.4:

Linux retro 3.4.48-1.dek.fc17.x86_64 #1 SMP Mon Jun 10 18:34:02 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux

All related:

https://bugzilla.redhat.com/show_bug.cgi?id=866212
https://bugs.freedesktop.org/show_bug.cgi?id=54089
https://bbs.archlinux.org/viewtopic.php?id=150743
Comment 38 Jonas Jelten 2013-06-20 00:23:34 UTC
related?
https://bugzilla.kernel.org/show_bug.cgi?id=58971
Comment 39 Toralf Förster 2013-06-20 15:25:09 UTC
(In reply to comment #38)
> related?
> https://bugzilla.kernel.org/show_bug.cgi?id=58971

I don't think so, governor problems occurred at my system completely independendly from this issue much more later.
Comment 40 Toralf Förster 2013-06-24 16:08:01 UTC
(In reply to comment #38)
> related?
> https://bugzilla.kernel.org/show_bug.cgi?id=58971

I checked out 15239099d7 and reverted that particular commit - issue does still exist - seems not to be related.
Comment 41 Daniel Vetter 2014-01-08 20:53:44 UTC
Please retest with 3.13 - we've improved the rc6 and rps logic considerably and it should now be much more robust at entering low-power states on idle systems.
Comment 42 Toralf Förster 2014-01-08 20:59:38 UTC
(In reply to Daniel Vetter from comment #41)
> Please retest with 3.13 - we've improved the rc6 and rps logic considerably
> and it should now be much more robust at entering low-power states on idle
> systems.

evee 3.12.6 is much more stable than before - did no longer observed this since 3.12.x at all
Comment 43 m6tt 2014-03-04 12:55:42 UTC
confirming issue on 3.11.7

cpu does not descend into rc states until machine is suspened, alternates between 8W and 30W as a result.

Kernel command line contains 'i915.modeset=1 i915.i915_enable_rc6=4 i915.i915_enable_fbc=1 i915.lvds_downclock=1 pcie_aspm=force intel-audio-powersave=true' to no effect.

Note You need to log in before you can comment on or make changes to this bug.