Bug 54691

Summary: Error enabling power savings on Ironlake chip since 3.8.0
Product: Drivers Reporter: Coacher (itumaykin+kernel)
Component: Video(DRI - Intel)Assignee: intel-gfx-bugs (intel-gfx-bugs)
Status: RESOLVED CODE_FIX    
Severity: normal CC: daniel, intel-gfx-bugs, ketetefid
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: >= 3.8.0 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: lspci -vvv
enable interrupts earlier when resuming
backported patch
2nd try at a backport

Description Coacher 2013-03-03 04:07:10 UTC
Created attachment 94311 [details]
lspci -vvv

After upgrading kernel to 3.8.0 there is the following stack
trace is in my dmesg upon every boot:

[22762.594676] ------------[ cut here ]------------
[22762.594700] WARNING: at drivers/gpu/drm/i915/i915_gem.c:1021
__wait_seqno+0x4fd/0x530 [i915]()
[22762.594701] Hardware name: Aspire 1830T
[22762.594728] Modules linked in: ipv6 nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack iptable_filter ip_tables bnep bluetooth dm_mod
uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev ums_realtek
hid_logitech_dj i915 intel_agp iTCO_wdt acer_wmi iTCO_vendor_support ath9k
ath9k_common ath9k_hw ath atl1c mac80211 cfg80211 sparse_keymap drm_kms_helper
intel_gtt video wmi
[22762.594731] Pid: 13103, comm: kworker/u:0 Tainted: G        W   
3.8.1-ACER-1830T #1
[22762.594732] Call Trace:
[22762.594743]  [<ffffffff8103907b>] ? warn_slowpath_common+0x7b/0xc0
[22762.594753]  [<ffffffffa01dc5dd>] ? __wait_seqno+0x4fd/0x530 [i915]
[22762.594766]  [<ffffffffa0211501>] ? __i915_update_gfx_val+0x31/0xf0 [i915]
[22762.594778]  [<ffffffffa02141a4>] ? i915_update_gfx_val+0x24/0x40 [i915]
[22762.594788]  [<ffffffffa01dcd5f>] ? i915_add_request+0x2bf/0x2e0 [i915]
[22762.594800]  [<ffffffffa0214a2f>] ? intel_enable_gt_powersave+0x49f/0x870
[i915]
[22762.594813]  [<ffffffffa01fc2aa>] ? intel_modeset_init_hw+0x3a/0x50 [i915]
[22762.594820]  [<ffffffffa01c91c9>] ? __i915_drm_thaw+0xd9/0xf0 [i915]
[22762.594828]  [<ffffffffa01c96f4>] ? i915_resume+0x74/0xd0 [i915]
[22762.594834]  [<ffffffff812d18a0>] ? pci_pm_thaw+0x80/0x80
[22762.594842]  [<ffffffff8137d3b6>] ? dpm_run_callback.isra.5+0x36/0x80
[22762.594845]  [<ffffffff8137d6d6>] ? device_resume+0xa6/0x140
[22762.594849]  [<ffffffff8137d784>] ? async_resume+0x14/0x40
[22762.594853]  [<ffffffff810630f7>] ? async_run_entry_fn+0xa7/0x1c0
[22762.594858]  [<ffffffff81055d07>] ? process_one_work+0x117/0x530
[22762.594861]  [<ffffffff81053119>] ? need_to_create_worker+0x9/0x20
[22762.594865]  [<ffffffff81055a8f>] ? manage_workers+0x1df/0x2a0
[22762.594867]  [<ffffffff81063050>] ? async_schedule+0x10/0x10
[22762.594871]  [<ffffffff81056562>] ? worker_thread+0x182/0x490
[22762.594874]  [<ffffffff810645bf>] ? __wake_up_common+0x4f/0x80
[22762.594878]  [<ffffffff810563e0>] ? rescuer_thread+0x280/0x280
[22762.594881]  [<ffffffff8105b9e3>] ? kthread+0xb3/0xc0
[22762.594884]  [<ffffffff81060000>] ? hrtimers_resume+0x10/0x50
[22762.594887]  [<ffffffff8105b930>] ? kthread_freezable_should_stop+0x60/0x60
[22762.594891]  [<ffffffff81564b2c>] ? ret_from_fork+0x7c/0xb0
[22762.594922]  [<ffffffff8105b930>] ? kthread_freezable_should_stop+0x60/0x60
[22762.594924] ---[ end trace 5bb601912bd1588a ]---
[22762.594927] [drm:ironlake_enable_rc6] *ERROR* failed to enable ironlake
power power savings

There was no such problem with 3.7.*. My notebook is Acer 1830T with Ironlake
chip running Gentoo amd64. I've attached lscpi output and if you need some more
info I am ready to provide it.
Comment 1 Coacher 2013-03-03 04:07:57 UTC
I've done bisecting between 3.7.10 and 3.8.0, here is the result:

3e9605018ab3e333d51cc90fccfde2031886763b is the first bad commit
commit 3e9605018ab3e333d51cc90fccfde2031886763b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 27 16:22:54 2012 +0000

    drm/i915: Rearrange code to only have a single method for waiting upon the ring

    Replace the wait for the ring to be clear with the more common wait for
    the ring to be idle. The principle advantage is one less exported
    intel_ring_wait function, and the removal of a hardcoded value.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 aeb1a24182824a9da2090842ec3e521400037b0e 25ab310832eb71fc0d3ef9d41bfb1d949247a455 M      drivers
Comment 2 Coacher 2013-03-03 04:08:34 UTC
git bisect log output (JIC):

git bisect start
# good: [356d8c6fb2a7cf49e836742738a8b9a47e77cfea] Linux 3.7.10
git bisect good 356d8c6fb2a7cf49e836742738a8b9a47e77cfea
# bad: [19f949f52599ba7c3f67a5897ac6be14bfcb1200] Linux 3.8
git bisect bad 19f949f52599ba7c3f67a5897ac6be14bfcb1200
# good: [29594404d7fe73cd80eaa4ee8c43dcc53970c60e] Linux 3.7
git bisect good 29594404d7fe73cd80eaa4ee8c43dcc53970c60e
# good: [dadfab4873256d2145640c0ce468fcbfb48977fe] Merge tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
git bisect good dadfab4873256d2145640c0ce468fcbfb48977fe
# bad: [992956189de58cae9f2be40585bc25105cd7c5ad] efi: Fix the build with user namespaces enabled.
git bisect bad 992956189de58cae9f2be40585bc25105cd7c5ad
# good: [2b8318881ddbcb67c5e8d2178b42284749442222] Merge tag 'fbdev-for-3.8' of git://gitorious.org/linux-omap-dss2/linux
git bisect good 2b8318881ddbcb67c5e8d2178b42284749442222
# bad: [3c2e81ef344a90bb0a39d84af6878b4aeff568a2] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
git bisect bad 3c2e81ef344a90bb0a39d84af6878b4aeff568a2
# good: [640631d04cd2cfbb4792d6a8fc5fcab14ee273a5] drm/exynos: use sgt instead of pages for framebuffer address
git bisect good 640631d04cd2cfbb4792d6a8fc5fcab14ee273a5
# good: [aed606e3bc1f10753254db308d3fd8c053c41328] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
git bisect good aed606e3bc1f10753254db308d3fd8c053c41328
# good: [221392c3ad0432e39fd74a349364f66cb0ed78f6] sched: numa: Fix build error if CONFIG_NUMA_BALANCING && !CONFIG_TRANSPARENT_HUGEPAGE
git bisect good 221392c3ad0432e39fd74a349364f66cb0ed78f6
# good: [01ce113ca5b18aea4c97dea62287394ca4f8ad7f] drm/exynos: modify wait_for_vblank of fimd
git bisect good 01ce113ca5b18aea4c97dea62287394ca4f8ad7f
# good: [2f3f24061c5c489074ad492bf694a5a76ebd8fc5] Merge branch 'exynos-drm-next' of git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next
git bisect good 2f3f24061c5c489074ad492bf694a5a76ebd8fc5
# bad: [633cf8f5056c3e72158e4dbc387b3d65926d2d55] drm/i915: Don't allow ring tail to reach the same cacheline as head
git bisect bad 633cf8f5056c3e72158e4dbc387b3d65926d2d55
# good: [45e2b5f640b3766da3eda48f6c35f088155c06f3] drm/i915: force restore on lid open
git bisect good 45e2b5f640b3766da3eda48f6c35f088155c06f3
# bad: [3e9605018ab3e333d51cc90fccfde2031886763b] drm/i915: Rearrange code to only have a single method for waiting upon the ring
git bisect bad 3e9605018ab3e333d51cc90fccfde2031886763b
# good: [b662a0663230853fccdfceeda5db031f5d4b657c] drm/i915: Simplify flushing activity on the ring
git bisect good b662a0663230853fccdfceeda5db031f5d4b657c
Comment 4 Daniel Vetter 2013-03-04 10:12:08 UTC
Created attachment 94361 [details]
enable interrupts earlier when resuming

Please test the attached patch, thanks.
Comment 5 Kete Tefid 2013-03-04 10:56:52 UTC
(In reply to comment #4)
> Created an attachment (id=94361) [details]
> enable interrupts earlier when resuming
> 
> Please test the attached patch, thanks.

I would be most thankful if you would kindly provide a patch for 3.8.1 as well. I definitely could change the patch to suit the kernel but I was too afraid of doing that. I am sorry for my lack of expertise.
I would provide immediate testing, since I am affected by this.

patch -p1 </home/tefid/Downloads/i915_pm.patch --dry-run
patching file drivers/gpu/drm/i915/i915_drv.c
Hunk #1 succeeded at 486 (offset -7 lines).
Hunk #2 FAILED at 567.
1 out of 2 hunks FAILED -- saving rejects to file drivers/gpu/drm/i915/i915_drv.c.rej
Comment 6 Daniel Vetter 2013-03-04 16:28:27 UTC
Created attachment 94391 [details]
backported patch

Some other patches are required, hopefully I've picked up all the right ones. Cumulative diff attached.
Comment 7 Kete Tefid 2013-03-04 20:41:19 UTC
The patch from the last comment made an unbootable kernel. Unfortunately, my configuration got lost and I am in the process of building a new kernel.
Is the patch completely OK? It does not produce any messages -- just and immediate black screen and a hard reset is needed.
Comment 8 Daniel Vetter 2013-03-04 23:05:58 UTC
Hm, I haven't tested the backport at all besides with the compiler ... Can you first check whether the first patch works on top of 3.9-rc1, then we'll worry about the backport?
Comment 9 Coacher 2013-03-04 23:46:55 UTC
I confirm that with the suggested patch ("enable interrupts earlier when resuming" one) on top of kernel-3.9-rc1 the problem is fixed.

@Kete Tefid can you confirm this so I can mark this as resolved?
Comment 10 Coacher 2013-03-05 00:04:18 UTC
(In reply to comment #9)
> I confirm that with the suggested patch ("enable interrupts earlier when
> resuming" one) on top of kernel-3.9-rc1 the problem is fixed.

Out of curiosity I've tested pure 3.9.0-rc1 and the problem is gone too. Seems that it was fixed already by some other change.
Comment 11 Coacher 2013-03-05 00:55:18 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > I confirm that with the suggested patch ("enable interrupts earlier when
> > resuming" one) on top of kernel-3.9-rc1 the problem is fixed.
> 
> Out of curiosity I've tested pure 3.9.0-rc1 and the problem is gone too.
> Seems
> that it was fixed already by some other change.

Sorry for the disinformation. This issue is still presented in vanilla 3.9.0-rc1 and the patch mentioned above really do fix it. On 3.8.0 this issue was triggered upon every boot on my machine, but on 3.9.0-rc1 a suspend/resume cycle is needed, that's why I thought it's gone completely. My bad.
Comment 12 Daniel Vetter 2013-03-05 09:15:26 UTC
Fix is merged into drm-intel-fixes now:

commit 58db81c51c080595995219f591ca834e7352731c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Mar 5 09:50:58 2013 +0100

    drm/i915: enable irqs earlier when resuming

It is a bit unclear to me why 3.8 also regressed, since all the seemingly relevant commits are only in 3.9-rc1.
Comment 13 Daniel Vetter 2013-03-05 09:16:22 UTC
Created attachment 94541 [details]
2nd try at a backport

I've found the prerequisite patch I think and added it to the backport. Please test this on top of 3.8.
Comment 14 Coacher 2013-03-06 20:50:23 UTC
(In reply to comment #12)
> It is a bit unclear to me why 3.8 also regressed, since all the seemingly
> relevant commits are only in 3.9-rc1.

I don't know either, but on my machine 3.8.0 is also affected.
Comment 15 Daniel Vetter 2013-03-06 20:52:00 UTC
Yeah, the commit which regressed things is actually in 3.8 already. I've added a cc: stable to the fix and will forward it asap:

commit 15239099d7a7a9ecdc1ccb5b187ae4cda5488ff9
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Mar 5 09:50:58 2013 +0100

    drm/i915: enable irqs earlier when resuming
Comment 16 Coacher 2013-03-06 20:58:16 UTC
Thank you for your help!
Comment 17 Coacher 2013-03-27 15:31:58 UTC
Hello everyone.

I haven't tested fresh 3.8.x kernels lately because there is also another bug that's really annoying and I was running 3.7.10, but I've finally decided to give 3.8.4 a try.
Surprisingly, this bug is there.

After a little investigation I found out that at some point patch named:
"drm/i915: enable irqs earlier when resuming" was introduced to 3.8 branch and this is merely the patch attached above, but recently revert of this patch was also merged into 3.8 branch AND is included in 3.8.4. That's why this bug is resurfaced again.

The commit message of revert says: "It caused problems in the 3.8-stable series, but 3.9-rc is just fine." So, this revert breaking things again on 3.8.4 kernel and doesn't belong to 3.8 branch as stated in commit message.

Please revert this revert, i.e. leave the changes made with the patch "drm/i915: enable irqs earlier when resuming".
Comment 18 Daniel Vetter 2013-03-27 15:43:45 UTC
Nope, ilk rc6 is known broken, disabled by default and this patch caused at least 5 different other bugs (and tons more "me, too" reports). Since the init sequence is this messy I've essentially given up on fixing this for 3.8, since 3.9-rc kernels work correctly.

Everything should work as expected with default options on 3.8.
Comment 19 Coacher 2013-03-27 16:33:14 UTC
Ok, thank you for clarification. Will wait till 3.9 then.