Bug 42172 - WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x39f/0x3d0()
WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x39f...
Status: RESOLVED DUPLICATE of bug 42162
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel)
All Linux
: P1 normal
Assigned To: drivers_video-dri
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-01 15:35 UTC by nissarin
Modified: 2011-09-03 01:18 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.1-rc3
Tree: Mainline
Regression: Yes


Attachments

Description nissarin 2011-09-01 15:35:49 UTC
Basically, some time after login I experience GPU stall, after that the X environment becomes unusable due to more errors ("couldn't schedule IB"),so I've to switch to console to reboot the PC.

If I'm not mistaken, I hit this as soon as I tried 3.1 (rc2, perhaps even rc1 ?) but at this point I'm not sure about this. 
Kernel 3.0 (+some stuff from airlied - radeon-testing/fixes) works fine. 

I've no idea how to reproduce this - usually I'm running only Firefox and/or Thunderbird and some terminal windows (ssh sessions, rtorrent). It can happen a few minutes after login or after a few hours, the one below actually took some time as normally it's within a hour. 

I'm using...
 - Radeon 6850
 - Gentoo/AMD64 system
 - Xfce/xfwm v4.8.1 w/ compositor enabled
 - xorg-server 1.10.4/1.11
 - ati-driver (r600g), libdrm and mesa from git

I could try to bisect this but due to nature of the bug I don't know how long it would take. 


Sep 01 15:06:00 [kernel] [15000.167025] radeon 0000:01:00.0: GPU lockup CP stall for more than 10036msec
Sep 01 15:06:00 [kernel] [15000.167032] ------------[ cut here ]------------
Sep 01 15:06:00 [kernel] [15000.167046] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x39f/0x3d0()
Sep 01 15:06:00 [kernel] [15000.167053] Hardware name: GA-MA78G-DS3H
Sep 01 15:06:00 [kernel] [15000.167058] GPU lockup (waiting for 0x000A529F last fence id 0x000A529C)
Sep 01 15:06:00 [kernel] [15000.167062] Modules linked in: reiserfs
Sep 01 15:06:00 [kernel] [15000.167073] Pid: 3280, comm: X Not tainted 3.1.0-rc4-00131-g9e79e3e #242
Sep 01 15:06:00 [kernel] [15000.167078] Call Trace:
Sep 01 15:06:00 [kernel] [15000.167092]  [<ffffffff810694ab>] ? warn_slowpath_common+0x7b/0xc0
Sep 01 15:06:00 [kernel] [15000.167101]  [<ffffffff810695a5>] ? warn_slowpath_fmt+0x45/0x50
Sep 01 15:06:00 [kernel] [15000.167111]  [<ffffffff812ea3cf>] ? radeon_fence_wait+0x39f/0x3d0
Sep 01 15:06:00 [kernel] [15000.167119]  [<ffffffff81084be0>] ? wake_up_bit+0x40/0x40
Sep 01 15:06:00 [kernel] [15000.167129]  [<ffffffff812b47ef>] ? ttm_bo_wait+0x10f/0x1b0
Sep 01 15:06:00 [kernel] [15000.167139]  [<ffffffff8130413f>] ? radeon_gem_wait_idle_ioctl+0x8f/0x110
Sep 01 15:06:00 [kernel] [15000.167147]  [<ffffffff8129d4e1>] ? drm_ioctl+0x401/0x4a0
Sep 01 15:06:00 [kernel] [15000.167156]  [<ffffffff813040b0>] ? radeon_gem_set_tiling_ioctl+0xb0/0xb0
Sep 01 15:06:00 [kernel] [15000.167164]  [<ffffffff810773a8>] ? set_current_blocked+0x38/0x60
Sep 01 15:06:00 [kernel] [15000.167172]  [<ffffffff81031d2a>] ? do_signal+0x21a/0x770
Sep 01 15:06:00 [kernel] [15000.167181]  [<ffffffff8110da7c>] ? do_vfs_ioctl+0x9c/0x540
Sep 01 15:06:00 [kernel] [15000.167188]  [<ffffffff810773a8>] ? set_current_blocked+0x38/0x60
Sep 01 15:06:00 [kernel] [15000.167195]  [<ffffffff810324f8>] ? sys_rt_sigreturn+0x1e8/0x200
Sep 01 15:06:00 [kernel] [15000.167203]  [<ffffffff8110df69>] ? sys_ioctl+0x49/0x80
Sep 01 15:06:00 [kernel] [15000.167212]  [<ffffffff815d6b7b>] ? system_call_fastpath+0x16/0x1b
Sep 01 15:06:00 [kernel] [15000.167218] ---[ end trace f6bfd0dc5ce37413 ]---
Sep 01 15:06:00 [kernel] [15000.168398] radeon 0000:01:00.0: GPU softreset 
Sep 01 15:06:00 [kernel] [15000.168405] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003828
Sep 01 15:06:00 [kernel] [15000.168410] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
Sep 01 15:06:00 [kernel] [15000.168416] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
Sep 01 15:06:00 [kernel] [15000.168422] radeon 0000:01:00.0:   SRBM_STATUS=0x20020EC0
Sep 01 15:06:00 [kernel] [15000.345258] radeon 0000:01:00.0: Wait for MC idle timedout !
Sep 01 15:06:00 [kernel] [15000.345265] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
Sep 01 15:06:00 [kernel] [15000.345372] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
Sep 01 15:06:00 [kernel] [15000.345377] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
Sep 01 15:06:00 [kernel] [15000.345383] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
Sep 01 15:06:00 [kernel] [15000.345388] radeon 0000:01:00.0:   SRBM_STATUS=0x200206C0
Sep 01 15:06:00 [kernel] [15000.346396] radeon 0000:01:00.0: GPU reset succeed
Sep 01 15:06:00 [kernel] [15000.556795] radeon 0000:01:00.0: Wait for MC idle timedout !
Sep 01 15:06:00 [kernel] [15000.744306] radeon 0000:01:00.0: Wait for MC idle timedout !
Sep 01 15:06:00 [kernel] [15000.747925] radeon 0000:01:00.0: WB enabled
Sep 01 15:06:00 [kernel] [15000.969041] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8504)=0xCAFEDEAD)
Sep 01 15:06:00 [kernel] [15000.969049] [drm:evergreen_resume] *ERROR* evergreen startup failed on resume
Sep 01 15:06:00 [kernel] [15000.977509] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(15).
Sep 01 15:06:00 [kernel] [15000.977518] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
Sep 01 15:06:00 [kernel] [15000.982304] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(0).
Sep 01 15:06:00 [kernel] [15000.982307] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
Sep 01 15:06:00 [kernel] [15000.984280] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(1).
Sep 01 15:06:00 [kernel] [15000.984283] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
Sep 01 15:06:00 [kernel] [15000.984819] [drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(2).
Sep 01 15:06:00 [kernel] [15000.984821] [drm:radeon_cs_ioctl] *ERROR* Failed to schedule IB !
Comment 1 Niels Ole Salscheider 2011-09-01 15:45:56 UTC
This might be the same as bug 42162, which I bisected. You could try if commit b03e7495a862b028294f59fc87286d6d78ee7fa1 is the first bad commit...
Comment 2 nissarin 2011-09-01 16:42:41 UTC
OK, after some "extensive" testing (as in glxgears x 20, duh) b03e7495a862b028294f59fc87286d6d78ee7fa1 "crashed" near 30 minute mark. Currently I'm running 5f66d2b58ca879e70740c82422354144845d6dd3, lets see what happens.

As a side note, I've noticed you are also using Gigabyte mobo...
I might be oversensitive here but I encountered some strange bugs before, bugs which can be annoying yet I didn't saw many ppl reporting it.
Comment 3 nissarin 2011-09-02 00:59:20 UTC
More than 8 hours have passed without a single glitch (same as before.. glxgears, web browser, wesnoth, etc.) so yeah, it appears that b03e7495a862b028294f59fc87286d6d78ee7fa1 is most likely the cause.
Comment 4 Alex Deucher 2011-09-02 05:02:08 UTC
Should mark this as a dupe of bug 42162 then.  Does the patch on bug 42162 help?
Comment 5 nissarin 2011-09-02 09:52:26 UTC
Yes, it seems to be the same issue. Currently I'm testing the patch, I'll notify you later if it worked for me.
Comment 6 nissarin 2011-09-03 01:18:38 UTC
The patch works, thanks.

*** This bug has been marked as a duplicate of bug 42162 ***

Note You need to log in before you can comment on or make changes to this bug.