I'm using an up-to-date Xorg stack from the ubuntu xorg-edgers PPA on ubuntu 9.04. and kernel 2.6.30-rc8 from Ubuntu's mainline builds. After some time of inactivity, the screen turns dark and the only way to proceed is through a sysrq+REISUB. dmesg shows me this: [ 31.511419] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready [ 32.865067] padlock: VIA PadLock not detected. [ 42.260047] wlan0: no IPv6 routers present [ 91.717152] CE: hpet increasing min_delta_ns to 15000 nsec [ 103.692112] CE: hpet increasing min_delta_ns to 22500 nsec [ 213.670931] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 [ 4440.721189] INFO: task events/0:9 blocked for more than 120 seconds. [ 4440.721196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4440.721201] events/0 D f0ea7d3c 0 9 2 [ 4440.721210] f7077f20 00000046 f72fa000 f0ea7d3c 000003e9 00000000 f5943a40 00000000 [ 4440.721225] c07c5ee0 c07c5ee0 f7036ff0 f7037284 c201bee0 00000000 f0ea91f2 000003e9 [ 4440.721239] f7037284 f6b5fc18 f6b5fc14 ffffffff f7077f4c c053fcfe f7077f4c f7036ff0 [ 4440.721252] Call Trace: [ 4440.721266] [<c053fcfe>] __mutex_lock_slowpath+0xbe/0x120 [ 4440.721273] [<c053fa80>] mutex_lock+0x20/0x40 [ 4440.721300] [<f85241d8>] i915_gem_retire_work_handler+0x28/0x70 [i915] [ 4440.721309] [<c014c6f5>] run_workqueue+0x95/0x130 [ 4440.721329] [<f85241b0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915] [ 4440.721336] [<c014c818>] worker_thread+0x88/0xf0 [ 4440.721345] [<c01506e0>] ? autoremove_wake_function+0x0/0x50 [ 4440.721352] [<c014c790>] ? worker_thread+0x0/0xf0 [ 4440.721359] [<c0150336>] kthread+0x46/0x80 [ 4440.721365] [<c01502f0>] ? kthread+0x0/0x80 [ 4440.721373] [<c0103f17>] kernel_thread_helper+0x7/0x10 [ 4440.721410] INFO: task compiz.real:4649 blocked for more than 120 seconds. [ 4440.721414] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4440.721419] compiz.real D e3679024 0 4649 4590 [ 4440.721426] f52a1e34 00200082 f5a73200 e3679024 000003eb 00000001 f5943a40 00000000 [ 4440.721441] c07c5ee0 c07c5ee0 f66e4aa0 f66e4d34 c201bee0 00000000 00000000 00000000 [ 4440.721454] f66e4d34 f6b5fc18 f6b5fc14 ffffffff f52a1e60 c053fcfe c0452f71 f66e4aa0 [ 4440.721468] Call Trace: [ 4440.721476] [<c053fcfe>] __mutex_lock_slowpath+0xbe/0x120 [ 4440.721483] [<c0452f71>] ? __sock_recvmsg+0x61/0x70 [ 4440.721490] [<c053fa80>] mutex_lock+0x20/0x40 [ 4440.721512] [<f8524aaa>] i915_gem_set_domain_ioctl+0x7a/0xe0 [i915] [ 4440.721541] [<f843950a>] drm_ioctl+0x13a/0x3a0 [drm] [ 4440.721562] [<f8524a30>] ? i915_gem_set_domain_ioctl+0x0/0xe0 [i915] [ 4440.721571] [<c029f6cf>] ? security_file_permission+0xf/0x20 [ 4440.721580] [<c01506e0>] ? autoremove_wake_function+0x0/0x50 [ 4440.721592] [<c01da179>] vfs_ioctl+0x79/0x90 [ 4440.721599] [<c01da542>] do_vfs_ioctl+0x72/0x2d0 [ 4440.721605] [<c01da803>] sys_ioctl+0x63/0x70 [ 4440.721612] [<c01033ec>] syscall_call+0x7/0xb [ 4560.720156] INFO: task events/0:9 blocked for more than 120 seconds. [ 4560.720163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4560.720169] events/0 D f0ea7d3c 0 9 2 [ 4560.720177] f7077f20 00000046 f72fa000 f0ea7d3c 000003e9 00000000 f5943a40 00000000 [ 4560.720192] c07c5ee0 c07c5ee0 f7036ff0 f7037284 c201bee0 00000000 f0ea91f2 000003e9 [ 4560.720206] f7037284 f6b5fc18 f6b5fc14 ffffffff f7077f4c c053fcfe f7077f4c f7036ff0 [ 4560.720220] Call Trace: [ 4560.720234] [<c053fcfe>] __mutex_lock_slowpath+0xbe/0x120 [ 4560.720241] [<c053fa80>] mutex_lock+0x20/0x40 [ 4560.720268] [<f85241d8>] i915_gem_retire_work_handler+0x28/0x70 [i915] [ 4560.720277] [<c014c6f5>] run_workqueue+0x95/0x130 [ 4560.720297] [<f85241b0>] ? i915_gem_retire_work_handler+0x0/0x70 [i915] [ 4560.720305] [<c014c818>] worker_thread+0x88/0xf0 [ 4560.720313] [<c01506e0>] ? autoremove_wake_function+0x0/0x50 [ 4560.720320] [<c014c790>] ? worker_thread+0x0/0xf0 [ 4560.720327] [<c0150336>] kthread+0x46/0x80 [ 4560.720334] [<c01502f0>] ? kthread+0x0/0x80 [ 4560.720342] [<c0103f17>] kernel_thread_helper+0x7/0x10 [ 4560.720379] INFO: task compiz.real:4649 blocked for more than 120 seconds. [ 4560.720383] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4560.720387] compiz.real D e3679024 0 4649 4590 [ 4560.720395] f52a1e34 00200082 f5a73200 e3679024 000003eb 00000001 f5943a40 00000000 [ 4560.720410] c07c5ee0 c07c5ee0 f66e4aa0 f66e4d34 c201bee0 00000000 00000000 00000000 [ 4560.720423] f66e4d34 f6b5fc18 f6b5fc14 ffffffff f52a1e60 c053fcfe c0452f71 f66e4aa0 [ 4560.720436] Call Trace: [ 4560.720445] [<c053fcfe>] __mutex_lock_slowpath+0xbe/0x120 [ 4560.720452] [<c0452f71>] ? __sock_recvmsg+0x61/0x70 [ 4560.720460] [<c053fa80>] mutex_lock+0x20/0x40 [ 4560.720482] [<f8524aaa>] i915_gem_set_domain_ioctl+0x7a/0xe0 [i915] [ 4560.720511] [<f843950a>] drm_ioctl+0x13a/0x3a0 [drm] [ 4560.720532] [<f8524a30>] ? i915_gem_set_domain_ioctl+0x0/0xe0 [i915] [ 4560.720542] [<c029f6cf>] ? security_file_permission+0xf/0x20 [ 4560.720551] [<c01506e0>] ? autoremove_wake_function+0x0/0x50 [ 4560.720562] [<c01da179>] vfs_ioctl+0x79/0x90 [ 4560.720569] [<c01da542>] do_vfs_ioctl+0x72/0x2d0 [ 4560.720575] [<c01da803>] sys_ioctl+0x63/0x70 [ 4560.720582] [<c01033ec>] syscall_call+0x7/0xb
I am seeing this as well on GM965.
Sorry, I forgot to mention the hardware. I'm seeing this on a GM965 (Dell XPS m1330) as well as a G45 (Asus N20A). The snippet above is from the GM965 laptop.
Indeed, the problem does seem to be a gpu wedge, Xorg backtrace: (gdb) bt #0 0x00007fcd99d6bec7 in ioctl () from /lib/libc.so.6 #1 0x00007fcd98b0879e in drmIoctl (fd=8, request=25688, arg=0x0) at xf86drm.c:187 #2 0x00007fcd98b0c518 in drmCommandNone (fd=8, drmCommandIndex=24) at xf86drm.c:2313 #3 0x00007fcd9868aca0 in I830BlockHandler (i=0, blockData=<value optimized out>, pTimeout=0x7fff18e176e0, pReadmask=0x87d8e0) at i830_driver.c:2295 #4 0x00000000004e7999 in AnimCurScreenBlockHandler (screenNum=0, blockData=0x0, pTimeout=0x7fff18e176e0, pReadmask=0x87d8e0) at animcur.c:222 #5 0x00000000005d1ca2 in compBlockHandler (i=0, blockData=0x0, pTimeout=0x7fff18e176e0, pReadmask=0x87d8e0) at compinit.c:166 #6 0x000000000042c420 in BlockHandler (pTimeout=0x7fff18e176e0, pReadmask=0x87d8e0) at dixutils.c:379 #7 0x000000000047db6f in WaitForSomething (pClientsReady=0x4be0a50) at WaitFor.c:215 #8 0x0000000000446924 in Dispatch () at dispatch.c:362 #9 0x000000000042689e in main (argc=10, argv=0x7fff18e178d8, envp=0x7fff18e17930) at main.c:283 Xorg components as of Sun Jun 14 04:59:52 EDT 2009 drm: 6e88027eb5ae669cbe9710bca5309b3a06b0adc5 xf86-video-intel: 374368109c1db603b6fb514212d0e9661b93f913 mesa: d9617deb008b75f4a605a30408aeb1948139c33e xserver: 92bc088aab7a904d64641d6e5d2a76058e9fa6fc Linux ben-laptop 2.6.30-ben #21 SMP Wed Jun 10 13:27:14 EDT 2009 x86_64 GNU/Linux
Created attachment 21909 [details] GPU dump
I am fairly certain this is not a kernel issue. It looks like the problem is in the userland driver (DDX). When the chip goes down the batch buffer head is located at, 0x0c1db014: HEAD 0x54f08806: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 1, dst tile 1) 0x0c1db018: HEAD 0x03cc0780: format 8888, dst pitch 1920, clipping disabled 0x0c1db01c: HEAD 0x04980766: dst (1894,1176) 0x0c1db020: HEAD 0x04b00780: dst (1920,1200) 0x0c1db024: HEAD 0x04331000: dst offset 0x04331000 0x0c1db028: HEAD 0x04980766: src (1894,1176) 0x0c1db02c: HEAD 0x00000780: src pitch 1920 0x0c1db030: HEAD 0x06696000: src offset 0x06696000 The dst and src offsets seem to be invalid given that tiling is enabled: Destination Base Address: (base address of the destination surface: X=0, Y=0) When Dest Tiling is enabled (Bit 11 enabled), this address is limited to 4Kbytes. Source Base Address: (base address of the source surface: X=0, Y=0) When Src Tiling is enabled (Bit 15 enabled), this address is limited to 4Kbytes.
It seems that Ave has attempted to bisect this. It seems that xf86-video-intel commit ec2fde7c8250fdc30984f16c8a1d3587d70b0144 is the first bad commit.
Given this is probably not a kernel bug, I've opened a report on bugs.fdo.org: Bug #22283 (https://bugs.freedesktop.org/show_bug.cgi?id=22283). I think future discussion should be moved to this new bug. Someone probably ought to close this one as well.
Thanks Ben. You are right we'd better track it at freedesktop.og (and it's said fixed there).