Bug 13490

Summary: after some time of inactivity display turns black - requires hard restart
Product: Drivers Reporter: Khashayar Naderehvandi (khashayar.lists)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: bgamari, gordon.jin, jbarnes
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30-rc8 Subsystem:
Regression: No Bisected commit-id:
Attachments: GPU dump

Description Khashayar Naderehvandi 2009-06-08 20:08:32 UTC
I'm using an up-to-date Xorg stack from the ubuntu xorg-edgers PPA on ubuntu 9.04.
and kernel 2.6.30-rc8 from Ubuntu's mainline builds.

After some time of inactivity, the screen turns dark and the only way to proceed is through a sysrq+REISUB.

dmesg shows me this:

[   31.511419] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   32.865067] padlock: VIA PadLock not detected.
[   42.260047] wlan0: no IPv6 routers present
[   91.717152] CE: hpet increasing min_delta_ns to 15000 nsec
[  103.692112] CE: hpet increasing min_delta_ns to 22500 nsec
[  213.670931] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
[ 4440.721189] INFO: task events/0:9 blocked for more than 120 seconds.
[ 4440.721196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4440.721201] events/0      D f0ea7d3c     0     9      2
[ 4440.721210]  f7077f20 00000046 f72fa000 f0ea7d3c 000003e9 00000000 f5943a40 00000000
[ 4440.721225]  c07c5ee0 c07c5ee0 f7036ff0 f7037284 c201bee0 00000000 f0ea91f2 000003e9
[ 4440.721239]  f7037284 f6b5fc18 f6b5fc14 ffffffff f7077f4c c053fcfe f7077f4c f7036ff0
[ 4440.721252] Call Trace:
[ 4440.721266]  [<c053fcfe>] __mutex_lock_slowpath+0xbe/0x120
[ 4440.721273]  [<c053fa80>] mutex_lock+0x20/0x40
[ 4440.721300]  [<f85241d8>] i915_gem_retire_work_handler+0x28/0x70 [i915]
[ 4440.721309]  [<c014c6f5>] run_workqueue+0x95/0x130
[ 4440.721329]  [<f85241b0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
[ 4440.721336]  [<c014c818>] worker_thread+0x88/0xf0
[ 4440.721345]  [<c01506e0>] ? autoremove_wake_function+0x0/0x50
[ 4440.721352]  [<c014c790>] ? worker_thread+0x0/0xf0
[ 4440.721359]  [<c0150336>] kthread+0x46/0x80
[ 4440.721365]  [<c01502f0>] ? kthread+0x0/0x80
[ 4440.721373]  [<c0103f17>] kernel_thread_helper+0x7/0x10
[ 4440.721410] INFO: task compiz.real:4649 blocked for more than 120 seconds.
[ 4440.721414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4440.721419] compiz.real   D e3679024     0  4649   4590
[ 4440.721426]  f52a1e34 00200082 f5a73200 e3679024 000003eb 00000001 f5943a40 00000000
[ 4440.721441]  c07c5ee0 c07c5ee0 f66e4aa0 f66e4d34 c201bee0 00000000 00000000 00000000
[ 4440.721454]  f66e4d34 f6b5fc18 f6b5fc14 ffffffff f52a1e60 c053fcfe c0452f71 f66e4aa0
[ 4440.721468] Call Trace:
[ 4440.721476]  [<c053fcfe>] __mutex_lock_slowpath+0xbe/0x120
[ 4440.721483]  [<c0452f71>] ? __sock_recvmsg+0x61/0x70
[ 4440.721490]  [<c053fa80>] mutex_lock+0x20/0x40
[ 4440.721512]  [<f8524aaa>] i915_gem_set_domain_ioctl+0x7a/0xe0 [i915]
[ 4440.721541]  [<f843950a>] drm_ioctl+0x13a/0x3a0 [drm]
[ 4440.721562]  [<f8524a30>] ? i915_gem_set_domain_ioctl+0x0/0xe0 [i915]
[ 4440.721571]  [<c029f6cf>] ? security_file_permission+0xf/0x20
[ 4440.721580]  [<c01506e0>] ? autoremove_wake_function+0x0/0x50
[ 4440.721592]  [<c01da179>] vfs_ioctl+0x79/0x90
[ 4440.721599]  [<c01da542>] do_vfs_ioctl+0x72/0x2d0
[ 4440.721605]  [<c01da803>] sys_ioctl+0x63/0x70
[ 4440.721612]  [<c01033ec>] syscall_call+0x7/0xb
[ 4560.720156] INFO: task events/0:9 blocked for more than 120 seconds.
[ 4560.720163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4560.720169] events/0      D f0ea7d3c     0     9      2
[ 4560.720177]  f7077f20 00000046 f72fa000 f0ea7d3c 000003e9 00000000 f5943a40 00000000
[ 4560.720192]  c07c5ee0 c07c5ee0 f7036ff0 f7037284 c201bee0 00000000 f0ea91f2 000003e9
[ 4560.720206]  f7037284 f6b5fc18 f6b5fc14 ffffffff f7077f4c c053fcfe f7077f4c f7036ff0
[ 4560.720220] Call Trace:
[ 4560.720234]  [<c053fcfe>] __mutex_lock_slowpath+0xbe/0x120
[ 4560.720241]  [<c053fa80>] mutex_lock+0x20/0x40
[ 4560.720268]  [<f85241d8>] i915_gem_retire_work_handler+0x28/0x70 [i915]
[ 4560.720277]  [<c014c6f5>] run_workqueue+0x95/0x130
[ 4560.720297]  [<f85241b0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915]
[ 4560.720305]  [<c014c818>] worker_thread+0x88/0xf0
[ 4560.720313]  [<c01506e0>] ? autoremove_wake_function+0x0/0x50
[ 4560.720320]  [<c014c790>] ? worker_thread+0x0/0xf0
[ 4560.720327]  [<c0150336>] kthread+0x46/0x80
[ 4560.720334]  [<c01502f0>] ? kthread+0x0/0x80
[ 4560.720342]  [<c0103f17>] kernel_thread_helper+0x7/0x10
[ 4560.720379] INFO: task compiz.real:4649 blocked for more than 120 seconds.
[ 4560.720383] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4560.720387] compiz.real   D e3679024     0  4649   4590
[ 4560.720395]  f52a1e34 00200082 f5a73200 e3679024 000003eb 00000001 f5943a40 00000000
[ 4560.720410]  c07c5ee0 c07c5ee0 f66e4aa0 f66e4d34 c201bee0 00000000 00000000 00000000
[ 4560.720423]  f66e4d34 f6b5fc18 f6b5fc14 ffffffff f52a1e60 c053fcfe c0452f71 f66e4aa0
[ 4560.720436] Call Trace:
[ 4560.720445]  [<c053fcfe>] __mutex_lock_slowpath+0xbe/0x120
[ 4560.720452]  [<c0452f71>] ? __sock_recvmsg+0x61/0x70
[ 4560.720460]  [<c053fa80>] mutex_lock+0x20/0x40
[ 4560.720482]  [<f8524aaa>] i915_gem_set_domain_ioctl+0x7a/0xe0 [i915]
[ 4560.720511]  [<f843950a>] drm_ioctl+0x13a/0x3a0 [drm]
[ 4560.720532]  [<f8524a30>] ? i915_gem_set_domain_ioctl+0x0/0xe0 [i915]
[ 4560.720542]  [<c029f6cf>] ? security_file_permission+0xf/0x20
[ 4560.720551]  [<c01506e0>] ? autoremove_wake_function+0x0/0x50
[ 4560.720562]  [<c01da179>] vfs_ioctl+0x79/0x90
[ 4560.720569]  [<c01da542>] do_vfs_ioctl+0x72/0x2d0
[ 4560.720575]  [<c01da803>] sys_ioctl+0x63/0x70
[ 4560.720582]  [<c01033ec>] syscall_call+0x7/0xb
Comment 1 Ben Gamari 2009-06-08 21:14:31 UTC
I am seeing this as well on GM965.
Comment 2 Khashayar Naderehvandi 2009-06-08 21:35:43 UTC
Sorry, I forgot to mention the hardware.

I'm seeing this on a GM965 (Dell XPS m1330) as well as a G45 (Asus N20A).
The snippet above is from the GM965 laptop.
Comment 3 Ben Gamari 2009-06-14 09:05:58 UTC
Indeed, the problem does seem to be a gpu wedge,

Xorg backtrace:
(gdb) bt
#0  0x00007fcd99d6bec7 in ioctl () from /lib/libc.so.6
#1  0x00007fcd98b0879e in drmIoctl (fd=8, request=25688, arg=0x0)
    at xf86drm.c:187
#2  0x00007fcd98b0c518 in drmCommandNone (fd=8, drmCommandIndex=24)
    at xf86drm.c:2313
#3  0x00007fcd9868aca0 in I830BlockHandler (i=0, 
    blockData=<value optimized out>, pTimeout=0x7fff18e176e0, 
    pReadmask=0x87d8e0) at i830_driver.c:2295
#4  0x00000000004e7999 in AnimCurScreenBlockHandler (screenNum=0, 
    blockData=0x0, pTimeout=0x7fff18e176e0, pReadmask=0x87d8e0)
    at animcur.c:222
#5  0x00000000005d1ca2 in compBlockHandler (i=0, blockData=0x0, 
    pTimeout=0x7fff18e176e0, pReadmask=0x87d8e0) at compinit.c:166
#6  0x000000000042c420 in BlockHandler (pTimeout=0x7fff18e176e0, 
    pReadmask=0x87d8e0) at dixutils.c:379
#7  0x000000000047db6f in WaitForSomething (pClientsReady=0x4be0a50)
    at WaitFor.c:215
#8  0x0000000000446924 in Dispatch () at dispatch.c:362
#9  0x000000000042689e in main (argc=10, argv=0x7fff18e178d8, 
    envp=0x7fff18e17930) at main.c:283



Xorg components as of Sun Jun 14 04:59:52 EDT 2009
drm: 	6e88027eb5ae669cbe9710bca5309b3a06b0adc5
xf86-video-intel: 	374368109c1db603b6fb514212d0e9661b93f913
mesa: 	d9617deb008b75f4a605a30408aeb1948139c33e
xserver: 	92bc088aab7a904d64641d6e5d2a76058e9fa6fc

Linux ben-laptop 2.6.30-ben #21 SMP Wed Jun 10 13:27:14 EDT 2009 x86_64 GNU/Linux
Comment 4 Ben Gamari 2009-06-14 09:07:23 UTC
Created attachment 21909 [details]
GPU dump
Comment 5 Ben Gamari 2009-06-14 16:36:51 UTC
I am fairly certain this is not a kernel issue. It looks like the problem is in the userland driver (DDX). When the chip goes down the batch buffer head is located at,

0x0c1db014: HEAD 0x54f08806: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 1, dst tile 1)
0x0c1db018: HEAD 0x03cc0780:    format 8888, dst pitch 1920, clipping disabled
0x0c1db01c: HEAD 0x04980766:    dst (1894,1176)
0x0c1db020: HEAD 0x04b00780:    dst (1920,1200)
0x0c1db024: HEAD 0x04331000:    dst offset 0x04331000
0x0c1db028: HEAD 0x04980766:    src (1894,1176)
0x0c1db02c: HEAD 0x00000780:    src pitch 1920
0x0c1db030: HEAD 0x06696000:    src offset 0x06696000

The dst and src offsets seem to be invalid given that tiling is enabled:

Destination Base Address: (base address of the destination surface: X=0, Y=0)
When Dest Tiling is enabled (Bit 11 enabled), this address is limited to 4Kbytes.

Source Base Address: (base address of the source surface: X=0, Y=0)
When Src Tiling is enabled (Bit 15 enabled), this address is limited to 4Kbytes.
Comment 6 Ben Gamari 2009-06-14 16:48:07 UTC
It seems that Ave has attempted to bisect this. It seems that xf86-video-intel commit ec2fde7c8250fdc30984f16c8a1d3587d70b0144 is the first bad commit.
Comment 7 Ben Gamari 2009-06-14 17:01:07 UTC
Given this is probably not a kernel bug, I've opened a report on bugs.fdo.org: Bug #22283 (https://bugs.freedesktop.org/show_bug.cgi?id=22283). I think future discussion should be moved to this new bug. Someone probably ought to close this one as well.
Comment 8 Gordon Jin 2009-09-16 06:50:44 UTC
Thanks Ben. You are right we'd better track it at freedesktop.og (and it's said fixed there).