With kernels 3.5-rc3 to 3.5-rc5, I hit a GPU lockup CP stall issue whenever I do some actions in Firefox: if I need to authenticate to access a given site, or when the download target selection window pops up. I'm running Gnome 3.2 on openSUSE 12.1. When this happens, the whole Gnome interface freezes, with gnome-shell stuck at 100% CPU. In the kernel logs I see the following: radeon 0000:08:00.0: GPU lockup CP stall for more than 10000msec radeon 0000:08:00.0: GPU lockup (waiting for 0x00000000000113f3 last fence id 0x00000000000113f0) radeon 0000:08:00.0: GPU softreset radeon 0000:08:00.0: GRBM_STATUS=0xE55008A0 radeon 0000:08:00.0: GRBM_STATUS_SE0=0xEC000001 radeon 0000:08:00.0: GRBM_STATUS_SE1=0x00000007 radeon 0000:08:00.0: SRBM_STATUS=0x200000C0 radeon 0000:08:00.0: GRBM_SOFT_RESET=0x00007F6B radeon 0000:08:00.0: GRBM_STATUS=0x00003828 radeon 0000:08:00.0: GRBM_STATUS_SE0=0x00000007 radeon 0000:08:00.0: GRBM_STATUS_SE1=0x00000007 radeon 0000:08:00.0: SRBM_STATUS=0x200000C0 radeon 0000:08:00.0: GPU reset succeed [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). radeon 0000:08:00.0: WB enabled radeon 0000:08:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff88013557bc00 [drm] ring test on 0 succeeded in 0 usecs [drm] ib test on ring 0 succeeded in 0 usecs I have more samples if needed. No problem when doing the same with kernel 3.4.4. I ran "git bisect" and found that reverting the following commit fixes the problem: commit 416a2bd274566a6f607a271f524b2dc0b84d9106 Author: Alex Deucher <alexander.deucher@amd.com> Date: Thu May 31 19:00:25 2012 -0400 drm/radeon: fixup tiling group size and backendmap on r6xx-r9xx (v4) Tiling group size is always 256bits on r6xx/r7xx/r8xx/9xx. Also fix and simplify render backend map. This now properly sets up the backend map on r6xx-9xx which should improve 3D performance. Vadim benchmarked also: Some benchmarks on juniper (5750), fullscreen 1920x1080, first result - kernel 3.4.0+ (fb21affa), second - with these patches: Lightsmark: 91 fps => 123 fps +35% Doom3: 74 fps => 101 fps +36% Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Let me know if you need more debugging information, I'll do whatever I can to help.
Can you dump the following registers using radeonreg or avivotool (http://cgit.freedesktop.org/~airlied/radeontool/) with the patch applied and reverted and attach both results? CC_RB_BACKEND_DISABLE (0x98F4) CC_SYS_RB_BACKEND_DISABLE (0x3F88) GC_USER_RB_BACKEND_DISABLE (0x9B7C) CC_GC_SHADER_PIPE_CONFIG (0x8950) GB_BACKEND_MAP (0x98FC) (as root): radeonreg regmatch 0x98F4 etc.
With 3.5-rc5 kernel (failing) : 0x98F4 0x00000001 (1) 0x3F88 0x00000001 (1) 0x9B7C 0x00000000 (0) 0x8950 0xfffcf001 (-200703) 0x98FC 0x00000000 (0) With commit 416a2bd2 reverted (working) : 0x98F4 0x00000001 (1) 0x3F88 0x00000001 (1) 0x9B7C 0x00fe0000 (16646144) 0x8950 0xfffcf001 (-200703) 0x98FC 0x00000000 (0) So, value of register GC_USER_RB_BACKEND_DISABLE (0x9B7C) differs.
Created attachment 74671 [details] properly disable render backend Does this patch fix it ?
I tested the patch in comment #3 but unfortunately it doesn't solve the problem.
With this patch applied, I get: 0x98F4 0x00000001 (1) 0x3F88 0x00000001 (1) 0x9B7C 0x00fe0000 (16646144) 0x8950 0xfffcf001 (-200703) 0x98FC 0x00000000 (0) 0x8954 0x00000000 (0) So the value of register 0x9B7C is correct now, but this was not sufficient.
Created attachment 74701 [details] properly disable render backend This one ?
Created attachment 74711 [details] possible fix or this variant. Although AFAIK, programming the USER register variants shouldn't be necessary as the default values (0) are valid.
Does booting up a clean kernel without any patches applied or reverted work if you manually set the following registers to their "patch reverted" values using radeonreg? Just to be sure, write all of them even if the values are the same. Do this without X running. 0x98F4 0x3F88 0x9B7C 0x8950 0x98FC 0x8954 e.g., radeonreg regset 0x8950 0xfffcf001
Patch from comment #6 doesn't work, testing patch from comment #7 now.
Patch from comment #7 did not work either. Then I followed the instructions from comment #8, but it also did not help.
Created attachment 74771 [details] possible fix Another possible fix, but I don't think it will help as it touches things never previously touched. I don't think the issue is the USER registers, but it's worth a shot I suppose.
Patch from comment #11 didn't work at all, not only it didn't fix the original issue but it even caused additional trouble (gdm wouldn't even show up.)
Reproducibility information: * I cannot reproduce the GPU lockup on a Radeon HD 4350 card. * On the Radeon HD 6450, I can reproduce the GPU lockup with applications other than Firefox. I was able to do so with Claws Mail for example. The parent window has to be maximized for it to happen. Then, as soon as a title-less dialog box is opened (for example by pressing Ctrl+S for "Save As..."), the GPU lockup happens.
I managed to fix the problem with a user-space stack update. I updated: * libdrm from version 2.4.26 to 2.4.33 * Mesa from version 7.11 to 8.0.3 * from xorg-x11-libX11 version 7.6 to libX11 version 1.5.0 and I no longer see the GPU lockup. So I guess I can close this bug as invalid, if the actual bug was in user-space.
That's the problem with GPU drivers. It's impossible to test every combination of userspace and kernel drivers and there can be very subtle bugs with certain combinations like this one that are almost impossible to track down.