Bug 37752

Summary: Kernel Panic in drm_vblank_put+0x13/0x50 on P4 HT machine with 82915G/GV/910GL Integrated Graphics Controller
Product: Drivers Reporter: Martin Rogge (marogge)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: RESOLVED CODE_FIX    
Severity: high CC: chris, daniel, florian, maciej.rutecki, marogge, rbyshko, rjw, samuel-kbugs
Priority: P1    
Hardware: i386   
OS: Linux   
URL: https://bugs.freedesktop.org/show_bug.cgi?id=34211
Kernel Version: 2.6.39.3, 3.0 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 32012    
Attachments: kernel config
output of lspci -vv
Screenshots of two different panics

Description Martin Rogge 2011-06-17 13:02:11 UTC
Created attachment 62512 [details]
kernel config

The kernel panic occurs non-deterministically. I have never seen it in runlevel 3 (console). For me, the most reliable way to trigger it within minutes or hours is to run the X screensaver Atlantis.

The stack trace can look very different each time. Most often it looks like in Foto0105.jpg with a lot of hardware irq routines on the stack. Once I caught the infamous "Hangcheck timer elapsed... render ring idle" (Foto0102.jpg)

Btw, the same kernel version does not show this issue on my other machines with i915 drm: one with 945GM and one with Clarkdale graphics.

I'll attach the kernel config and output of lspci.

The screenshots can be downloaded here:
http://www.wupload.com/file/21411414/panic_screenshots.tar
Comment 1 Martin Rogge 2011-06-17 13:03:29 UTC
Created attachment 62522 [details]
output of lspci -vv
Comment 2 Martin Rogge 2011-06-17 13:04:50 UTC
Created attachment 62532 [details]
Screenshots of two different panics
Comment 3 Rafael J. Wysocki 2011-06-28 21:43:03 UTC
On Tuesday, June 28, 2011, Martin wrote:
> On Monday 27 June 2011 00:35:16 Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.38 and 2.6.39.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.38 and 2.6.39.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=37752
> > Subject             : Kernel Panic in drm_vblank_put+0x13/0x50 on P4 HT
> machine 
> with
> > 82915G/GV/910GL Integrated Graphics Controller Submitter    : Martin Rogge
> > <marogge@onlinehome.de>
> > Date                : 2011-06-17 13:02 (10 days old)
> 
> As far as I can see the bug is still present in the 2.6.39 line. However. I 
> have good news. Out of a whim I've been trying 3.0-rc4 and the panic has not 
> occurred in a few days.
> 
> I can try and bisect which commit fixed the issue. I guess I have to reverse 
> the semantics of good and bad for this one. Don't tell the clergy.
> 
> Anyway, since it takes a while to establish the absence of a panic, don't 
> expect any results soon.
> 
> Martin
Comment 4 Martin Rogge 2011-06-29 08:38:00 UTC
I have good news and bad news: the bad news is, kernel 3.0-rc4 did freeze after 5 days of uptime. The good news is, it didn't panic but threw a kernel BUG with EIP the same as the panics I was getting before. This is what I caught in the syslog:

Jun 29 10:08:38 darkstar kernel: ------------[ cut here ]------------
Jun 29 10:08:38 darkstar kernel: kernel BUG at drivers/gpu/drm/drm_irq.c:924!
Jun 29 10:08:38 darkstar kernel: invalid opcode: 0000 [#1] PREEMPT SMP 
Jun 29 10:08:38 darkstar kernel: 
Jun 29 10:08:38 darkstar kernel: Pid: 11234, comm: git Not tainted 3.0.0-rc4 #1 IBM 8143WZG/IBM
Jun 29 10:08:38 darkstar kernel: EIP: 0060:[<c1192a22>] EFLAGS: 00010046 CPU: 0
Jun 29 10:08:38 darkstar kernel: EIP is at drm_vblank_put+0x13/0x50
Jun 29 10:08:38 darkstar kernel: EAX: 00000000 EBX: f726f800 ECX: f7104c00 EDX: f7246dc0
Jun 29 10:08:38 darkstar kernel: ESI: 00000000 EDI: 00ac1e80 EBP: 00000000 ESP: f7009f08
Jun 29 10:08:38 darkstar kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jun 29 10:08:38 darkstar kernel: Process git (pid: 11234, ti=f7008000 task=f696b4e0 task.ti=dfc28000)
Jun 29 10:08:38 darkstar kernel: Stack:
Jun 29 10:08:38 darkstar kernel:  f726f800 00000000 c11b10ba 00000001 00000002 c0d40180 cecae6d8 122bb12b
Jun 29 10:08:38 darkstar kernel:  f7104c00 f7104db0 00000000 00000082 f71f6000 4e0add86 0005fcb7 4e0add86
Jun 29 10:08:38 darkstar kernel:  0006011f f71f6000 f7104c00 00000000 00000800 c11a267b 00000001 00000000
Jun 29 10:08:38 darkstar kernel: Call Trace:
Jun 29 10:08:38 darkstar kernel:  [<c11b10ba>] ? do_intel_finish_page_flip+0x187/0x1e5
Jun 29 10:08:38 darkstar kernel:  [<c11a267b>] ? i915_driver_irq_handler+0x2dc/0x530
Jun 29 10:08:38 darkstar kernel:  [<c11f3300>] ? ata_bmdma_port_intr+0x75/0xcb
Jun 29 10:08:38 darkstar kernel:  [<c11ffc1a>] ? tg3_interrupt_tagged+0x2d/0xa5
Jun 29 10:08:38 darkstar kernel:  [<c104869e>] ? handle_irq_event_percpu+0x1d/0xf9
Jun 29 10:08:38 darkstar kernel:  [<c10487a3>] ? handle_irq_event+0x29/0x42
Jun 29 10:08:38 darkstar kernel:  [<c1049e5d>] ? handle_level_irq+0x91/0x91
Jun 29 10:08:38 darkstar kernel:  [<c1049ec0>] ? handle_fasteoi_irq+0x63/0x7f
Jun 29 10:08:38 darkstar kernel:  <IRQ> 
Jun 29 10:08:38 darkstar kernel:  [<c100372e>] ? do_IRQ+0x2e/0x84
Jun 29 10:08:38 darkstar kernel:  [<c1311329>] ? common_interrupt+0x29/0x30
Jun 29 10:08:38 darkstar kernel: Code: ff 8b 54 24 14 8b 44 24 0c e8 58 d9 17 00 89 f8 83 c4 28 5b 5e 5f 5d c3 56 53 89 c1 c1 e2 02 03 90 74 01 00 00 8b 02 85 c0 75 02 <0f> 0b f0 ff 0a 0f 94 c0 84 c0 74 2e a1 e8 91 42 c1 85 c0 74 25 
Jun 29 10:08:38 darkstar kernel: EIP: [<c1192a22>] drm_vblank_put+0x13/0x50 SS:ESP 0068:f7009f08
Jun 29 10:08:38 darkstar kernel: BUG: scheduling while atomic: git/11234/0x00010002
Jun 29 10:08:38 darkstar kernel: 
Jun 29 10:08:38 darkstar kernel: Pid: 11234, comm: git Not tainted 3.0.0-rc4 #1 IBM 8143WZG/IBM
Jun 29 10:08:38 darkstar kernel: EIP: 0073:[<080b137a>] EFLAGS: 00000206 CPU: 0
Jun 29 10:08:38 darkstar kernel: EIP is at 0x80b137a
Jun 29 10:08:38 darkstar kernel: EAX: 087be190 EBX: 087388f8 ECX: 08c1ea68 EDX: 086a0108
Jun 29 10:08:38 darkstar kernel: ESI: 08981870 EDI: 000000f1 EBP: bfe98e28 ESP: bfe98de0
Jun 29 10:08:38 darkstar kernel:  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Jun 29 10:08:38 darkstar kernel: Process git (pid: 11234, ti=f7008000 task=f696b4e0 task.ti=dfc28000)
Jun 29 10:08:38 darkstar kernel: 
Jun 29 10:08:38 darkstar kernel: Call Trace:
Jun 29 10:09:38 darkstar kernel: INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 1} (detected by 0, t=18002 jiffies)
Jun 29 10:12:38 darkstar kernel: INFO: rcu_preempt_state detected stalls on CPUs/tasks: { 1} (detected by 0, t=72034 jiffies)
Comment 5 Chris Wilson 2011-06-29 09:47:30 UTC
Looks similar to the races found in https://bugs.freedesktop.org/show_bug.cgi?id=34211
Comment 6 Martin Rogge 2011-06-29 11:45:02 UTC
It is very likely a race condition because it seems to trigger as soon as I put the system under load (while the 3D screensaver is running). 

NB: I've just had another kernel BUG ("EIP is at drm_vblank_put+0x13/0x50"), followed by a panic ("fatal exception in interrupt") when I went through the Alt-SysRq sequence.
Comment 7 Rafael J. Wysocki 2011-07-11 19:42:13 UTC
On Monday, July 11, 2011, Martin wrote:
> On Sunday 10 July 2011 12:58:54 Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.38 and 2.6.39.
> > 
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.38 and 2.6.39.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=37752
> > Subject             : Kernel Panic in drm_vblank_put+0x13/0x50 on P4 HT
> machine 
> with
> > 82915G/GV/910GL Integrated Graphics Controller Submitter    : Martin Rogge
> > <marogge@onlinehome.de>
> > Date                : 2011-06-17 13:02 (24 days old)
> 
> I have verified today that both 2.6.39.3 and 3.0-rc6 still show the problem. 
> As before, 3.0-rc6 seems to have a longer uptime than 3.6.39.3. Both times I 
> did not catch a proper kernel panic. The machine simply froze.
Comment 8 Martin Rogge 2011-07-26 18:35:34 UTC
just for info, I tested kernel v3.0 today. After a few hours of running the Atlantis screen saver while simultaneously compiling a kernel the BUG was triggered again.
Comment 9 Daniel Vetter 2012-03-25 12:26:17 UTC
Can you please retest with at least 3.2. That kernel contains the fix for

https://bugs.freedesktop.org/show_bug.cgi?id=34211

I presume this is it, if I'm wrong, please reopen this bug (and hit me with the cluestick ;-). Relevant commit:

commit 7317c75e66fce0c9f82fbe6f72f7e5256b315422
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Aug 29 09:45:28 2011 -0700

    drm/i915: don't set unpin_work if vblank_get fails
Comment 10 Daniel Vetter 2012-03-25 13:01:25 UTC
*** Bug 35092 has been marked as a duplicate of this bug. ***