Bug 38792

Summary: Radeon HD 5750: GPU lockup CP stall while browsing in Firefox
Product: Drivers Reporter: Jure Repinc (jlp.bugs)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, alexdeucher, jdelvare, Jonathon.Reinhart, synfin
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0-rc5 Tree: Mainline
Regression: No
Attachments: dmesg
lspci
Xorg.0.log

Description Jure Repinc 2011-07-05 16:48:32 UTC
I was browsing with Firefox 5.0 (in KDE 4.6.4 with desktop effects enabled) and when I clicked on a button to return to the previous page the screen suddenly all got black and when it came back the mouse pointer was not visible but the windows under it reacted to mouse-over. I had to switch to bob-X virtual console and then back to restore everything. Well I also immediately checked dmesg and this is what was in there:

radeon 0000:01:00.0: GPU lockup CP stall for more than 131410msec
------------[ cut here ]------------
WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x2f1/0x330 [radeon]()
Hardware name: System Product Name
GPU lockup (waiting for 0x00A08F1B last fence id 0x00A08F18)
Modules linked in: w83627ehf hwmon_vid coretemp snd_seq snd_seq_device usb_storage radeon snd_hda_codec_hdmi snd_hda_codec_realtek ttm drm_kms_helper drm snd_hda_intel snd_hda_codec hwmon i2c_i801 snd_pcm snd_timer xhci_hcd backlight ehci_hcd i2c_algo_bit r8169 mii snd snd_page_alloc
Pid: 2162, comm: X Not tainted 3.0.0-rc5-dirty #9
Call Trace:
 [<ffffffffa0167b00>] ? radeon_fence_wait+0x2f0/0x330 [radeon]
 [<ffffffff81038e9b>] ? warn_slowpath_common+0x7b/0xc0
 [<ffffffff81038f95>] ? warn_slowpath_fmt+0x45/0x50
 [<ffffffffa0167b01>] ? radeon_fence_wait+0x2f1/0x330 [radeon]
 [<ffffffff81054dd0>] ? add_wait_queue+0x60/0x60
 [<ffffffffa007cb0a>] ? ttm_bo_wait+0xba/0x1c0 [ttm]
 [<ffffffffa007f12b>] ? ttm_bo_move_buffer+0x7b/0x180 [ttm]
 [<ffffffffa007d642>] ? ttm_bo_reserve_locked+0xc2/0x160 [ttm]
 [<ffffffffa018311e>] ? radeon_cs_parser_relocs+0x5e/0x250 [radeon]
 [<ffffffffa0082b65>] ? ttm_eu_list_ref_sub+0x35/0x50 [ttm]
 [<ffffffffa007f2c5>] ? ttm_bo_validate+0x95/0x120 [ttm]
 [<ffffffffa01698c3>] ? radeon_bo_list_validate+0x73/0xd0 [radeon]
 [<ffffffffa018377c>] ? radeon_cs_ioctl+0xac/0x210 [radeon]
 [<ffffffffa00cb21c>] ? drm_ioctl+0x39c/0x460 [drm]
 [<ffffffffa01836d0>] ? radeon_cs_finish_pages+0xa0/0xa0 [radeon]
 [<ffffffff81022589>] ? do_page_fault+0x189/0x420
 [<ffffffff810b726d>] ? mmap_region+0x27d/0x520
 [<ffffffff810ecc0e>] ? do_vfs_ioctl+0x8e/0x4f0
 [<ffffffff810ed0b9>] ? sys_ioctl+0x49/0x80
 [<ffffffff8143aa7b>] ? system_call_fastpath+0x16/0x1b
---[ end trace edd1ae68ff942751 ]---
radeon 0000:01:00.0: GPU softreset 
radeon 0000:01:00.0:   GRBM_STATUS=0xA0003828
radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
radeon 0000:01:00.0: GPU reset succeed
radeon 0000:01:00.0: WB enabled
[drm] ring test succeeded in 1 usecs
[drm] ib test succeeded in 1 usecs
[drm] force priority to high
[drm] force priority to high
[drm] force priority to high
[drm] force priority to high

Other software:
Mesa 7.12-devel (git-3fccc14)
libdrm from git
xorg-server 1.10.99.901
Comment 1 Jure Repinc 2011-07-05 16:49:19 UTC
Created attachment 64672 [details]
dmesg
Comment 2 Jure Repinc 2011-07-05 16:49:50 UTC
Created attachment 64682 [details]
lspci
Comment 3 Jure Repinc 2011-07-05 16:50:29 UTC
Created attachment 64692 [details]
Xorg.0.log
Comment 4 Alex Deucher 2011-07-05 17:04:02 UTC
The driver reset the GPU because it detected a lockup.  This is more likely a bug in the userspace acceleration drivers rather than a kernel bug.  Are you using webgl?
Comment 5 Jure Repinc 2011-07-05 17:10:12 UTC
As far as I remember WebGL isn't enabled by default in Firefox 5. Also I checked on this page - http://www.doesmybrowsersupportwebgl.com/ - and it says "Nay".
Comment 6 jdk 2011-07-22 11:52:12 UTC
Same experience here with HD5770. Reported to Xorg Driver/Radeon at https://bugs.freedesktop.org/show_bug.cgi?id=39469
Comment 7 Jonathon Reinhart 2011-08-25 21:58:40 UTC
FWIW, I am seeing this in Ubuntu, 2.6.38-11-generic.  This particular time, I have two of these in dmesg, within 60 sec of boot.  Radeon HD 6850 on HP h8-1070t (Sandy bridge core i7-2600)
Comment 8 Alex Deucher 2011-08-25 22:07:08 UTC
As I said before, this is most likely a bug in the ddx or 3D driver.  The drm is doing what it's supposed to do: resetting the GPU when it detects a hang.  Does upgrading mesa or xf86-video-ati help?  I'd suggest mesa 7.11 or git master.
Comment 9 Jean Delvare 2012-07-01 09:40:01 UTC
I am experiencing GPU lockup CP stall on Radeon HD 6450 / openSUSE 12.1 x86_64 / Gnome 3.2, since kernel 3.5-rc3+. No problem with kernels 3.4.x. I didn't test rc1 nor rc2 so I can't comment on those.

Alex, you say this is a user-space issue, and you may be right, but from a user perspective, what I see is a regression when updating my kernel. I am currently trying to bisect it to find out which commit introduced (or revealed) the problem. However, unrelated build breaker bugs in the kernel tree make it a little difficult. I'll report if I find anything.
Comment 10 Jean Delvare 2012-07-01 12:56:39 UTC
Bisection pointed to:

commit 416a2bd274566a6f607a271f524b2dc0b84d9106
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Thu May 31 19:00:25 2012 -0400

    drm/radeon: fixup tiling group size and backendmap on r6xx-r9xx (v4)
    
    Tiling group size is always 256bits on r6xx/r7xx/r8xx/9xx. Also fix and
    simplify render backend map. This now properly sets up the backend map
    on r6xx-9xx which should improve 3D performance.
    
    Vadim benchmarked also:
    Some benchmarks on juniper (5750), fullscreen 1920x1080,
    first result - kernel 3.4.0+ (fb21affa), second - with these patches:
    
    Lightsmark:   91 fps => 123 fps    +35%
    Doom3:        74 fps => 101 fps    +36%
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Jerome Glisse <jglisse@redhat.com>
    Signed-off-by: Dave Airlie <airlied@redhat.com>

Reverting this single commit on top of kernel 3.5-rc5 fixes the problem for me. Jure, Jonathon and everyone else hitting this issue, please try reverting  416a2bd274566a6f607a271f524b2dc0b84d9106 and report whether it helped or not. At least this will tell us if my problem is really the same as yours.
Comment 11 Alex Deucher 2012-07-01 19:08:59 UTC
(In reply to comment #10)
> Reverting this single commit on top of kernel 3.5-rc5 fixes the problem for
> me.
> Jure, Jonathon and everyone else hitting this issue, please try reverting 
> 416a2bd274566a6f607a271f524b2dc0b84d9106 and report whether it helped or not.
> At least this will tell us if my problem is really the same as yours.

There are tons of things that could cause a GPU lock up (kernel issue, userspace issue, or some bad combination of kernel and userspace bits).  416a2bd274566a6f607 is only a month or so old so it's not related to the original bug report and should probably be reported as a separate issue.
Comment 12 Jean Delvare 2012-07-01 19:37:17 UTC
Oops, sorry. I misread the kernel version field as 3.5-rc5 while it is 3.0-rc5. Obviously the commit I pointed out can't be the cause of it. I'll open a separate report for my issue, sorry for the confusion and noise.