Bug 42907

Summary: [SNB] 3.3.0-rc5+git: WARNING: at drivers/gpu/drm/i915/i915_irq.c:652 ironlake_irq_handler+0x4ea/0x500()
Product: Drivers Reporter: Maciej Rutecki (maciej.rutecki)
Component: Video(DRI - Intel)Assignee: Ben Widawsky (ben)
Status: CLOSED CODE_FIX    
Severity: normal CC: ben, daniel, florian, jbarnes, maciej.rutecki, mmokrejs, patryk, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.3.0-rc5+git Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 42644    

Description Maciej Rutecki 2012-03-11 19:35:06 UTC
Subject    : 3.3.0-rc5+git: WARNING: at drivers/gpu/drm/i915/i915_irq.c:652 ironlake_irq_handler+0x4ea/0x500()
Submitter  : Soeren Sonnenburg <sonne@debian.org>
Date       : 2012-03-05 23:59
Message-ID : 1330991976.9223.16.camel@no
References : http://marc.info/?l=linux-kernel&m=133099242810332&w=2

This entry is being used for tracking a regression from 3.2. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Daniel Vetter 2012-03-18 16:40:56 UTC
We've had sightings of this before 3.2 and tried to fix it for 3.2. Evidently there's still something not quite right in the logic, but afaics this does not smell like a regression.

While I have the attention of the regression tracking team, can someone please look at:

https://bugzilla.kernel.org/show_bug.cgi?id=42762
Comment 2 Ben Widawsky 2012-03-30 02:40:15 UTC
(In reply to comment #0)
> Subject    : 3.3.0-rc5+git: WARNING: at drivers/gpu/drm/i915/i915_irq.c:652
> ironlake_irq_handler+0x4ea/0x500()
> Submitter  : Soeren Sonnenburg <sonne@debian.org>
> Date       : 2012-03-05 23:59
> Message-ID : 1330991976.9223.16.camel@no
> References : http://marc.info/?l=linux-kernel&m=133099242810332&w=2
> 
> This entry is being used for tracking a regression from 3.2. Please don't
> close it until the problem is fixed in the mainline.

Would it be possible to bisect this? Daniel put a fix which first went in v3.2-rc1, and has been there until now. This logic shouldn't have changed much since then.
Comment 3 Daniel Vetter 2012-04-15 10:07:23 UTC
*** Bug 43107 has been marked as a duplicate of this bug. ***
Comment 4 Jesse Barnes 2012-04-18 21:11:53 UTC
Maciej, any update?
Comment 5 Maciej Rutecki 2012-04-19 18:56:05 UTC
I have no new information.

Regards
Comment 6 Jesse Barnes 2012-04-19 19:05:12 UTC
Any chance you can bisect like Ben asked?
Comment 7 Patryk Rządziński 2012-04-30 17:34:03 UTC
Hello,

I started receiving strange behavior the moment I disabled PM Runtime in the kernel (vanilla-3.3.3). I realized that when booting, progress would get stuck from 30 seconds to few minutes moments after init starts. Please note the timing.

[    2.957943] Freeing unused kernel memory: 424k freed
[    2.959940] Freeing unused kernel memory: 756k freed
[   26.700741] ------------[ cut here ]------------
[   26.700754] WARNING: at drivers/gpu/drm/i915/i915_irq.c:652 0xffffffff81308c22()
[   26.700761] Hardware name: Dell System Vostro 3750
[   26.700765] Missed a PM interrupt
[   26.700769] Modules linked in:
[   26.700778] Pid: 0, comm: swapper/0 Not tainted 3.3.3 #1
[   26.700783] Call Trace:
[   26.700787]  <IRQ>  [<ffffffff8107058b>] ? 0xffffffff8107058b
[   26.700800]  [<ffffffff81070685>] ? 0xffffffff81070685
[   26.700806]  [<ffffffff8108839e>] ? 0xffffffff8108839e
[   26.700812]  [<ffffffff81308c22>] ? 0xffffffff81308c22
[   26.700818]  [<ffffffff810cb79a>] ? 0xffffffff810cb79a
[   26.700833]  [<ffffffff810cb8e1>] ? 0xffffffff810cb8e1
[   26.700835]  [<ffffffff810ce7ff>] ? 0xffffffff810ce7ff
[   26.700837]  [<ffffffff81037625>] ? 0xffffffff81037625
[   26.700839]  [<ffffffff81037533>] ? 0xffffffff81037533
[   26.700841]  [<ffffffff81589dee>] ? 0xffffffff81589dee
[   26.700843]  [<ffffffff81096691>] ? 0xffffffff81096691
[   26.700845]  [<ffffffff81076260>] ? 0xffffffff81076260
[   26.700847]  [<ffffffff810aa0ef>] ? 0xffffffff810aa0ef
[   26.700849]  [<ffffffff8158b8dc>] ? 0xffffffff8158b8dc
[   26.700851]  [<ffffffff81037695>] ? 0xffffffff81037695
[   26.700853]  [<ffffffff8107663e>] ? 0xffffffff8107663e
[   26.700855]  [<ffffffff810501f8>] ? 0xffffffff810501f8
[   26.700857]  [<ffffffff8158b09e>] ? 0xffffffff8158b09e
[   26.700858]  <EOI>  [<ffffffff8126cd70>] ? 0xffffffff8126cd70
[   26.700862]  [<ffffffff8126cd4f>] ? 0xffffffff8126cd4f
[   26.700864]  [<ffffffff81437481>] ? 0xffffffff81437481
[   26.700866]  [<ffffffff81034125>] ? 0xffffffff81034125
[   26.700868]  [<ffffffff818748e0>] ? 0xffffffff818748e0
[   26.700870]  [<ffffffff81874000>] ? 0xffffffff81874000
[   26.700872]  [<ffffffff8187421a>] ? 0xffffffff8187421a
[   26.700875] ---[ end trace b7fe085284267851 ]---

I think this is the same issue - if you disagree, please feel free to delete this comment. Furthermore, I hope it points you in the right direction of resolving it.
Comment 8 Jesse Barnes 2012-06-20 20:08:58 UTC
I have a new theory that this message is bogus due to our two level interrupt scheme.  Our IIR can hold up to two events, so if we get two PM related interrupts in rapid succession (before masking or acking it), we'll go through the mask/ack code and on the next interrupt will read out the queued value, which may be the same as the one we just received.

So unless there are bad effects from this warning, I'd say we should just remove it, or somehow handle the queued events better.
Comment 9 Florian Mickler 2012-07-01 09:42:55 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc5:

commit 58bf8062d0b293b8e1028e5b0342082002886bd4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Jun 21 14:55:22 2012 +0200

    drm/i915: rip out the PM_IIR WARN