Created attachment 33792 [details] Syslog Due to various i915 display related issues on a couple of machines (GM45, i3 Clarkdale) I am frequently updating and testing parts of the i915 stack. With some of the latest user space cpomponents (xorg 1.8.2, libdrm-2.4.22, mesa-7.9, xf86-video-intel-2.13.0, pixman-0.19.4, cairo-1.10.0) I now hit a sporadic display freeze that leaves a trace in the syslog. Unfortunately I cannot really bisect due to the sporadicness. I logged two different symptoms in the attached system log. The first one was less severe, allowed me to switch to a virtual terminal. I could kill and restart X, although the display remained black and I had to reboot. The log has a lot of lines of the form [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 7527114 at 7527108) The kernel was a BFS-modified 2.6.35.4. The second incident (for which this ticket is created) happened under a vanilla 2.6.36-rc8 kernel. The symptom was harder. I could not switch to virtual terminals but I could ssh in. Interestingly the system did not react to kill -9 of the X server ie. the process list remained unchanged (X and all children in state S). I had to reboot. Main characteristics in the log is the kernel BUG and stack trace.
Created attachment 33802 [details] Kernel config
Created attachment 33812 [details] Dmesg
Created attachment 33822 [details] Lspci -vv
I have upgraded to vanilla 2.6.36. After 24 hours I was caught out by an X freeze with more or less the same symptoms as before with 2.6.35.4-ck1. However, there was nothing logged in the syslog. I had full control from an ssh session, but the display was irrevocably stuck. It would not switch back to text mode nor display a freshly started xorg server. Reboot was the only option.
Another crash last night. Again the phenomenon that I could not kill any processes. I had to ssh in and reboot. There was no event caught in the system logs.
The BUG was coincidentally fixed with commit 69dc4987cbe5fe70ae1c2a08906d431d53cdd242 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Oct 19 10:36:51 2010 +0100 drm/i915: Track objects in global active list (as well as per-ring) To handle retirements, we need per-ring tracking of active objects. To handle evictions, we need global tracking of active objects. As we enable more rings, rebuilding the global list from the individual per-ring lists quickly grows tiresome and overly complicated. Tracking the active objects in two lists is the lesser of two evils. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> The underlying issue (the cause of the hangs leading to this BUG) is a broken userspace driver.
Thanks! Due to dependencies the commit doesn't apply to vanilla 2.6.36. Has it been submitted to the stable branch of 2.6.36?
No, I was looking at solving a different problem and only realized later the bug that lurked there. The minimal patch for stable would be a candidate for stable is: diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 90b1d67..a538002 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2045,6 +2045,8 @@ i915_gpu_idle(struct drm_device *dev) if (seqno1 == 0) return -ENOMEM; ret = i915_wait_request(dev, seqno1, &dev_priv->render_ring); + if (ret) + return ret; if (HAS_BSD(dev)) { seqno2 = i915_add_request(dev, NULL, I915_GEM_GPU_DOMAINS,
Thanks again. I shall apply the patch and if the system survives for a week we can call it 100% test success. ;-)
me again. I have not observed the kernel bug since applying the patch. So we can call it tested and closed. Unfortunately I have experienced a couple of different Xorg freezes for which I may open another ticket if I can find the motivation for it. To be honest, after 6 months of trying I am getting very tired of this buggy stuff. All I want is a stable system but obviously that is asking too much.