Bug 15004
Summary: | i915: *ERROR* Execbuf while wedged | ||
---|---|---|---|
Product: | Drivers | Reporter: | tomas m (tmezzadra) |
Component: | Video(DRI - Intel) | Assignee: | drivers_video-dri-intel (drivers_video-dri-intel) |
Status: | CLOSED DOCUMENTED | ||
Severity: | normal | CC: | anarsoul, colin, finstaden, james, jasondbecker, jbarnes, jcnengel, loonyphoenix, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32.2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 14230 | ||
Attachments: |
config
Xorg log Part of syslog |
Description
tomas m
2010-01-07 18:53:44 UTC
Saw a load of these after a suspend/resume cycle. (Also running BFS.) Same messages as above. may i add more info: ive got 2 screens, a 1280x800 LVDS notebook screen and a 1680x1050 screen attached to the VGA port. both are placed in xorg in a frambuffer size of intel(0): Allocate new frame buffer 1680x1850 stride 2048 and compiz is running. im attaching xorg.0.log too Created attachment 24483 [details]
Xorg log
*** Bug 15045 has been marked as a duplicate of this bug. *** What's the last working kernel here? hmmm, cant remember when started happening... im building 2.6.31.11 right now to test. im almost sure it wasnt a problem with 2.6.31, meanwhile testing with my distribution's 2.6.31.6 Looks like the GPU hang check code is getting it wrong, the EIR is all zeros. Maybe some batch is just taking forever to finish? Can you bisect this down to a specific bad commit? For me I can use a 2.6.32.3 kernel fine with intel driver 2.9.x but with 2.10 it bombs out with this error. So while the problem could be in the kernel, the trigger case could be something in 2.10 only? May or may not be helpful :p i tested 2.6.31.11 and while it locks the hardware similarly, it does so with a different error and i could not switch to a tty. i could ssh into the system though. im trying to bisect the kernel. could you do the same with the intel driver? I can confirm both statements: 2.10.0 together with 2.6.32.* shows the behaviour described above, 2.9.x + 2.6.31.11 works, 2.9.x + 2.6.32.* does not throw the error messages, but freezes as tomas said. Since I cannot ssh into my machine, I cannot give the exact error message though. I'm using 2.6.33-rc3-git5 with intel 2.9.1 and its working perfectly. Using 2.10.0 gives exactly the same symptoms as johannes explains I have this issue as well on (openSUSE) 2.6.32-3-pae kernel with a dual monitor setup: 1280x800 notebook screen and a 1680x1050 screen attached to the VGA port. lspci: 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA controller]) Subsystem: Toshiba America Info Systems Device ff03 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at dc100000 (32-bit, non-prefetchable) [size=512K] I/O ports at 1800 [size=8] Memory at c0000000 (32-bit, prefetchable) [size=256M] Memory at dc200000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [d0] Power Management version 2 Kernel driver in use: i915 dmesg: [171330.189103] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [171330.189119] render error detected, EIR: 0x00000000 [171330.189125] i915: Waking up sleeping processes [171330.189193] reboot required [171330.189209] [drm:i915_wait_request] *ERROR* i915_wait_request returns -5 (awaiting 1778246 at 1778244) [171330.222994] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged [171330.249447] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged [171330.249783] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged [snip] Let me know if I can provide additional info to troubleshoot. Cheers I'm also having this on Arch Linux with both these kernels: 2.6.32.3-1 (Arch's default kerenl) 2.6.32-5 (a patched kernel with BFS support, http://aur.archlinux.org/packages.php?ID=15224) This intel driver: 2.10.0-2 (http://aur.archlinux.org/packages.php?ID=22968) And this Intel DRI module: 7.7-1 (don't remember its source) im trying to bisect this, but it has proven more dificult that i first thought. sometimes it takes 12hs to trigger the bug. so its difficult to tell good commits from bad ones... if anyone is willing to help me trigger this, mail me privately so that i can send the bisect log, and we can tackle this together. Created attachment 24614 [details]
Part of syslog
I'm getting this error from time to time on my 945gm with latest stable kernel
(2.6.32.2 with gentoo patches). I've noticed that it's really easy to reproduce
this bug with kde 4.4rc1 - just switch plasma to netbook mode.
As you can see in attached log, "[drm:i915_gem_object_pin] *ERROR* Failure to
install fence: -28" message preceeded subj message - it seems there's a fence leak somewhere.
Video driver components versions:
mesa-7.7
libdrm-2.4.17
xf86-video-intel-2.10.0
*** Bug 15072 has been marked as a duplicate of this bug. *** i was given an easy way to trigger this through email. -enable compiz resize in normal mode (windows contents are upgraded on the fly) - resize like crazy for about 10 ~ 20 secs. ive found that at a certain point in the drm intel next branch, the bug turns, from what we have reported to: [drm:i915_gem_object_bind_to_gtt] *ERROR* Invalid object alignment requested 4096 where the screen hard locks (no tty switch available), i can ssh into the system. im not sure if its the same bug, or an old bug that was fixed. this is somewhere around the 2.6.31-rc kernels. right now im hunting for the commit that flips this error. Am i doing the right thing? my results: e67b8ce1b59006ba41245838db60b6fcda365ba8 is the first bad commit commit e67b8ce1b59006ba41245838db60b6fcda365ba8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Sep 14 16:50:26 2009 +0100 this commit introduces the execbuf while wedged error. where the mouse pointer still works, and you can click on the frozen screen to close programs. prior to that, i get the "Invalid object alignment" error where the screen + mouse freeze and i need to ssh into the box to reboot it. it looks like they are both related. whats the next step? should i hunt down the Invalid object bug? maybe its an intel driver issue. 2.9.1 works ok with 2.6.32.4 2.9.99.901 fails with same kernel. wedged message error i tried to bisect this but half of the commits between those versions fail to build or break xorg badly. should i file a bug with the xf86-video-intel people? there already is a bug: https://bugs.freedesktop.org/show_bug.cgi?id=25475 On Wednesday 27 January 2010, Chris Wilson wrote:
> On Wed, 27 Jan 2010 10:01:44 -0800, Jesse Barnes <jbarnes@virtuousgeek.org>
> wrote:
> > On Sun, 24 Jan 2010 23:23:11 +0100 (CET)
> > "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> >
> > > This message has been generated automatically as a part of a report
> > > of regressions introduced between 2.6.31 and 2.6.32.
> > >
> > > The following bug entry is on the current list of known regressions
> > > introduced between 2.6.31 and 2.6.32. Please verify if it still should
> > > be listed and let me know (either way).
> > >
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=15004
> > > Subject : i915: *ERROR* Execbuf while wedged
> > > Submitter : tomas m <tmezzadra@gmail.com>
> > > Date : 2010-01-07 18:53 (18 days old)
> >
> > Chris, any ideas about this one? I remember seeing a few like this
> > that have been bisected to 2D driver changes recently...
>
> Yes, this is almost certainly our userspace driver sending the GPU into a
> spin. And no, we haven't identified the cause yet, there have a been a lot
> of conflicting reports and guesswork, with very little information.
Rafael, i dont know what to make of your LKML quote. concerning what chris said. i dont know how to provide more information. whichever info i could collect, ive already provided, here, and in the bug report @ xorg. (comment #20). This is an additional status update, it doesn't mean you're expected to do anything more. there has been a patch applied to libdrm that appears to fix this bug...and many others. this appears to be fixed for me. more info here: http://bugs.freedesktop.org/show_bug.cgi?id=25475#c88 so i guess this bug can be closed now ;) OK, closing. |