Bug 42678
Summary: | [3.3-rc1] radeon stuck in kernel after lockup | ||
---|---|---|---|
Product: | Drivers | Reporter: | Maciej Rutecki (maciej.rutecki) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | normal | CC: | alexandre.f.demers, alexdeucher, bart, camaradetux, clouserw, fedux, florian, glisse, hater.zlin, just.for.lkml, L.Bonnaud, maciej.rutecki, rjw, szg00000, vono29 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.3-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 42644 |
Description
Maciej Rutecki
2012-01-28 15:16:27 UTC
Regression is kernel stuck after lockup http://marc.info/?l=linux-kernel&m=132774626706709&w=2 The lockup is not a regression in itself. For the lockup itself I have filed: https://bugs.freedesktop.org/show_bug.cgi?id=45329 The kernel regression has been partly addressed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=9fc04b503df9a34ec1a691225445c5b7dfd022e7 The mutex deadlock has been fixed, but X still fails to recover from the GPU lockup. With kernel 3.1 or 3.2 I didn't even noticed that these lockups where happening. See http://marc.info/?l=linux-kernel&m=132739068529857&w=2 for a SysRq+W backtrace of the stuck X process The fix for the lockup itself in now in mainline and should be released in 3.3-rc3. But I can confirm that the regression (that X is no longer recovering from the GPU lockup / GPU reset) is still there in 3.3-rc2. For my log, first the lockup: Feb 4 08:55:25 thoregon kernel: [15457.570126] radeon 0000:07:00.0: GPU lockup CP stall for more than 10000msec Feb 4 08:55:25 thoregon kernel: [15457.570134] GPU lockup (waiting for 0x00070CAA last fence id 0x00070CA9) Feb 4 08:55:25 thoregon kernel: [15457.586330] radeon 0000:07:00.0: GPU softreset Feb 4 08:55:25 thoregon kernel: [15457.586337] radeon 0000:07:00.0: R_008010_GRBM_STATUS=0xA0003028 Feb 4 08:55:25 thoregon kernel: [15457.586343] radeon 0000:07:00.0: R_008014_GRBM_STATUS2=0x00000002 Feb 4 08:55:25 thoregon kernel: [15457.586349] radeon 0000:07:00.0: R_000E50_SRBM_STATUS=0x200000C0 Feb 4 08:55:25 thoregon kernel: [15457.586362] radeon 0000:07:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Feb 4 08:55:25 thoregon kernel: [15457.601387] radeon 0000:07:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 Feb 4 08:55:25 thoregon kernel: [15457.617378] radeon 0000:07:00.0: R_008010_GRBM_STATUS=0x00003028 Feb 4 08:55:25 thoregon kernel: [15457.617384] radeon 0000:07:00.0: R_008014_GRBM_STATUS2=0x00000002 Feb 4 08:55:25 thoregon kernel: [15457.617390] radeon 0000:07:00.0: R_000E50_SRBM_STATUS=0x200000C0 Feb 4 08:55:25 thoregon kernel: [15457.618393] radeon 0000:07:00.0: GPU reset succeed Feb 4 08:55:25 thoregon kernel: [15457.623326] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). Feb 4 08:55:25 thoregon kernel: [15457.623361] radeon 0000:07:00.0: WB enabled Feb 4 08:55:25 thoregon kernel: [15457.623367] [drm] fence driver on ring 0 use gpu addr 0x20000c00 and cpu addr 0xffff880328696c00 Feb 4 08:55:25 thoregon kernel: [15457.669623] [drm] ring test on 0 succeeded in 1 usecs Feb 4 08:55:25 thoregon kernel: [15457.669648] [drm] ib test on ring 0 succeeded in 1 usecs Then, when the X server tries to unblank the screens it gets stuck. There no longer is a mutex deadlock for the hung task detector to log, but SysRq+W shows X in D state: Feb 4 09:28:30 thoregon kernel: [17441.917129] SysRq : Changing Loglevel Feb 4 09:28:30 thoregon kernel: [17441.917140] Loglevel set to 6 Feb 4 09:28:31 thoregon kernel: [17443.659030] SysRq : Show Blocked State Feb 4 09:28:31 thoregon kernel: [17443.659040] task PC stack pid father Feb 4 09:28:31 thoregon kernel: [17443.659122] X D ffff880337d50a00 0 3048 3027 0x00400004 Feb 4 09:28:31 thoregon kernel: [17443.659133] ffff880328709700 0000000000000082 ffff8802f2dc5c00 0000000000010a00 Feb 4 09:28:31 thoregon kernel: [17443.659143] ffff88031bf2bfd8 0000000000010a00 ffff88031bf2a000 ffff88031bf2bfd8 Feb 4 09:28:31 thoregon kernel: [17443.659152] 0000000000010a00 ffff880328709700 0000000000010a00 0000000000010a00 Feb 4 09:28:31 thoregon kernel: [17443.659161] Call Trace: Feb 4 09:28:31 thoregon kernel: [17443.659177] [<ffffffff815ee9d7>] ? schedule_timeout+0x157/0x220 Feb 4 09:28:31 thoregon kernel: [17443.659188] [<ffffffff8103fcb0>] ? run_timer_softirq+0x240/0x240 Feb 4 09:28:31 thoregon kernel: [17443.659197] [<ffffffff8133ee39>] ? radeon_fence_wait+0x239/0x3b0 Feb 4 09:28:31 thoregon kernel: [17443.659207] [<ffffffff8104f420>] ? wake_up_bit+0x40/0x40 Feb 4 09:28:31 thoregon kernel: [17443.659215] [<ffffffff81352f77>] ? radeon_ib_get+0x257/0x2e0 Feb 4 09:28:31 thoregon kernel: [17443.659224] [<ffffffff81354f4a>] ? radeon_cs_ioctl+0x27a/0x4d0 Feb 4 09:28:31 thoregon kernel: [17443.659232] [<ffffffff812f4184>] ? drm_ioctl+0x3e4/0x490 Feb 4 09:28:31 thoregon kernel: [17443.659240] [<ffffffff81354cd0>] ? radeon_cs_finish_pages+0xa0/0xa0 Feb 4 09:28:31 thoregon kernel: [17443.659249] [<ffffffff810247e9>] ? do_page_fault+0x199/0x420 Feb 4 09:28:31 thoregon kernel: [17443.659257] [<ffffffff810af4dc>] ? mmap_region+0x1dc/0x570 Feb 4 09:28:31 thoregon kernel: [17443.659265] [<ffffffff810de636>] ? do_vfs_ioctl+0x96/0x4e0 Feb 4 09:28:31 thoregon kernel: [17443.659273] [<ffffffff810deac9>] ? sys_ioctl+0x49/0x90 Feb 4 09:28:31 thoregon kernel: [17443.659281] [<ffffffff815f18e2>] ? system_call_fastpath+0x16/0x1b Feb 4 09:28:41 thoregon kernel: [17453.327296] SysRq : Emergency Sync Feb 4 09:28:41 thoregon kernel: [17453.327912] Emergency Sync complete Apart from the X server the system was still working. I was able to ssh into it and do a normal shutdown. How do you trigger the lockup ? Not completely sure about that. I wait until the screensaver kicks in (or better: let KDEs powerdevil switch the monitor off, I do not have a screensaver programm running) an then let the system idle for 10..20min. The cause of the lockup as now been fixed: https://bugs.freedesktop.org/show_bug.cgi?id=45329 But I was still seeing the regression that X fails to recover in 3.3-rc2. Until 3.3-rc1 X always recovered from these lockups, I didn't even notice they where happening. The earliest of these lockups I found in my logs was under 3.1, but the trigger that caused them to happen was not the kernel upgrade to 3.1, but an upgrade of xf86-video-ati from 6.14.2 to 6.14.3. Handled-By : Jérôme Glisse <glisse@freedesktop.org> You no longer have those lockup ? The fix in the ddx might explain why the kernel was no longer able to recover from lockup. Sadly userspace change can affect kernel successfulness at things like lockup recovering. I think, you're not getting away with blaming userspace. ;-) But this issue is rather complicated, because there is more then one bug / change involved. To summarize the issues: * a change in xf86-video-ati-6.14.2 -> 6.14.3: That was the initial trigger for the GPU lockup messages on my system. While this changes was partly buggy (This has now been fixed, but I think that fix is not released yet) it was merely a trigger for a kernel bug. "Prove" that 6.14.3 is to blame for this: 6.14.2 + kernel 3.1 -> no GPU lockup messages 6.14.3 + kernel 3.1 -> first GPU lockup messages also downgrading to 6.14.2 no longer showed this with later kernels "Prove" that the real bug causing these lockups was a kernel bug: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=1b61925061660009f5b8047f93c5297e04541273 -> with this change 6.14.3 can no longer trigger GPU lockups * the kernel bug causing GPU lockups -> wrong DESKTOP_HEIGHT setup. That was probably always triggerable from userspace, but only the changes in 6.14.3 made this bug visible. This is fixed with above commit 1b61925061660009f5b8047f93c5297e04541273 This bug is not this regression wrt. 3.3-rcX, as I was seeing this since 3.1 * first regression in 3.3-rc1: mutex deadlock that you have already fixed. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=9fc04b503df9a34ec1a691225445c5b7dfd022e7 * a second, still open regression in 3.3-rc1 that had been masked by the first regression: Even with the mutex fix applied to the kernel (i.e. 3.3-rc2) X was still failing to recover from the GPU lockups. See comment #3 This is the issue, why I would still consider this bug (42678) to be still open. And I think this is a kernel regression and not a userspace issue because: * 6.14.3 (with the GPU lockup trigger) and 3.2 (with the GPU lockup bug) will cause the GPU lockup messages in dmesg, but I did not even notice this was happening at all, because X was always able to recover without noticeable effects. * the same userspace (6.14.3 with the trigger) and 3.3-rc2 (still with the GPU lockup bug, but without the mutex deadlock) will trigger the GPU lockup messages in dmesg, but X will be stuck in the kernel and fail to turn my monitors back on. So I think the stuck X process is caused by the kernel changes between 3.2 and 3.3-rc2. Since 3.3-rc3 X did not get stuck again, but this is because the underlying kernel GPU lockup bug has been fixed, so there never was a need to recover and any recovery bug could no longer be triggered. Does this description of the issues involved make sense for you? Please ask, if I was unclear or messed up my explanation. I think I'm hitting the same issue but I'm reproducing it very easily. Basically, I start the computer, KDM, KDE, and boom. http://notk.org/~adrien/heat_issue/lockup/dmesg As for X, I'm using xf86-video-ati 6.14.4 which came out today, and libdrm 2.4.33, along with xorg-server 1.12.0. I'll try with a 3.2.12 kernel tomorrow instead of my 3.3.0+ (middle of the merge window for 3.4). I'm fairly motivated to get this sorted out. Adrien question is: Is Xorg stuck inside the kernel ? Fixing root cause of GPU lockup is a different matter (basicly you have to go though several G of datas and there is no tools to do that, the only tool you can make is one that help you shrink the amount of data you have to analyze). Torsten is 3.4 still affected for you ? > Is 3.4 still affected? I don't know, but I suspect it. Because since the fix of my underlying GPU hang in 3.3-rc3 there wasn't a need for a recovery or a change to hang X again. As I tried to explain in comment #8 there where 3 kernel bugs involved for me. 1.: a GPU lockup that happened since 3.1 and was fixed in 3.3-rc3 2.: a regression 3.2 -> 3.3-rc1 in the mutex locking, fixed in 3.3-rc2 3.: a regression 3.2 -> 3.3-rc1 that prevents X to recover from the GPU lockup. 3. was visible in 3.3-rc2 (see comment #3) as 2. was already fixed, but 1. was still happening. But after 3.3-rc3 1. has been fixed, so 3. no longer triggers for me, but I suspect that the GPU lockup recovery is still broken, because I did not see any patch that claimed to fix it. Well other things might have fixed it. I will force lockup but code inspection never leaded me to the issue. My previous comment was missing some bits because of my lack of sleep. My first issue was that, with a new laptop, which has two AMD cards (one integrated, one discrete), the both cards are enabled and running even though it's the integrated one which is actually used. That makes the laptop use a lot more power than it should and it gets very hot. I finally found out that I could save a lot of power by using vga_switcheroo to switch to the dedicated card and then to the integrated one. That saves almost 45W of power consumption. However, when I start X, even with only twm, I get MANY MANY MANY lockups as soon as X is starting. At some point, X seems unable to recover. After something like maybe 20 lockups... Maybe that X could manage better but the current rate of lockups is one every 10 seconds, and that's with each lockup taking 10 seconds before a reset. Maybe that this should go in another bug report however. At Jérôme's request, I've exported my dmesg in order to check that X was stuck inside the kernel. It contains my dmesg output with several executions of "echo t > /proc/sysrq-trigger". http://notk.org/~adrien/heat_issue/lockup/dmesg_sysrq_t_lockup I've tried on 3.2.13; 3.3.0, 3.4-rc1(+) and I've had issues with all of these. I'm running the latest individual tarballs of X (as of 5 days ago), along with the latest libdrm, and a mesa git. PS: I've started my laptop a bit after starting to write this. I've just reached the 53 lockups, after abit more than 10 minutes after issuing "startx" (and X is still recovering, oh, it seeems it has stopped recovering after 56 lockups) I'm affected with the same problem, I've log a bug report on launchpad (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/986524). Don't know if it will help. I can reproduce the lockups easily by switching from my 2 monitors in default mode to dual screen mode. Lockups start happening right away. Worked flawlessy with Ubuntu 11.10's kernel/radeon driver. If there's anything i can do to help debug this? (In reply to comment #15) > I can reproduce the lockups easily by switching from my 2 monitors in default > mode to dual screen mode. Lockups start happening right away. Worked > flawlessy > with Ubuntu 11.10's kernel/radeon driver. If there's anything i can do to > help > debug this? Can you track down the problematic component (kernel, ddx, mesa, etc.) and bisect? I managed to finally switch to dual screen mode without hangs. But while using the desktop, i have frequent hangs -> black screen -> restore loops. Using kernel 3.4.0-rc4 provided by the launchpad bugreport in comment #14. If you can give me some pointers, i'll do my best to get some more info! (In reply to comment #15) > I can reproduce the lockups easily by switching from my 2 monitors in default > mode to dual screen mode. Lockups start happening right away. Note that this bug report isn't about lockups per se but about the inability to recover from a lockup. You should probably look for another bug report about monitor switching causing lockups, or file your own. Please one persone one bug report we will mark appropriate bug as duplicate. As a side note, it could be related to Bug 45018. For me, it all started at the same time. Since, it happens a lot less with latest drm, ddx, mesa, kernel and X server, but it still happens from time to time randomly. As I was saying it may be related, or not, but since everything happened at the same time as Bug 45018 and using a 3.2 kernel fixes most of what I see, I think there is a similar root to all this. I can usually reproduce the lockup followed by a stucked X by playing a movie (this is the easiest way I've been able to do it or with what I reported in bug 45018). It often locks up after nearly 40 minutes of video. In a few seconds, image skips, turns greenish on some parts, and BAM! Locks up, resets and hangs with X unable to come back on its feet. Lately, in some occasions, it was able to get in X, but everything related to 3D is then dead. I may have missed it, but which video card/chipset is Maciej using? Radeon 6950 over here. This bug is still present in Ubuntu raring with this kernel package: Package: linux-image-3.8.0-19-generic Version: 3.8.0-19.29 which is based on kernel 3.8.8. The GPU did hang a few times and the kernel was able to recover. But later the kernel was caught in an infinite loop of GPU hangs and I was not able to take back the control of the X server and therefore I lost unsaved work in my X session. Between 2 hangs I was able to switch to a VT and run dmesg, so here is the end of the kernel log: [73670.536108] radeon 0000:01:00.0: GPU lockup CP stall for more than 10168msec [73670.536200] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000c7f78) [73670.536203] radeon 0000:01:00.0: failed to get a new IB (-35) [73670.536255] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib ! [73670.537377] radeon 0000:01:00.0: Saved 1017 dwords of commands on ring 0. [73670.537380] radeon 0000:01:00.0: GPU softreset: 0x00000003 [73670.691016] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA27034E0 [73670.691018] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000103 [73670.691021] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200200C0 [73670.691023] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x01000000 [73670.691025] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00001002 [73670.691027] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00028C86 [73670.691030] radeon 0000:01:00.0: R_008680_CP_STAT = 0x808386C5 [73670.691032] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [73670.705914] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 [73670.720796] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030 [73670.720799] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003 [73670.720801] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200280C0 [73670.720803] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [73670.720805] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [73670.720808] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [73670.720810] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000 [73670.731190] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [73670.748806] [drm] probing gen 2 caps for device 8086:2a41 = 1/0 [73670.920570] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [73670.920609] radeon 0000:01:00.0: WB enabled [73670.920612] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000010000c00 and cpu addr 0xffff8801347e0c00 [73670.920614] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000010000c0c and cpu addr 0xffff8801347e0c0c [73671.118349] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD) [73671.118480] [drm:r600_resume] *ERROR* r600 startup failed on resume Hi, my log shows: radeon: failed testing IB on GFX ring (-35). instead of: radeon 0000:01:00.0: failed to get a new IB (-35) on: Linux black 3.8.0-26-generic #38-Ubuntu SMP Mon Jun 17 21:43:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux perhaps it helps. Jul 8 20:44:11 black kernel: [ 3220.802169] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec Jul 8 20:44:11 black kernel: [ 3220.802183] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000002fb90 last fence id 0x000000000002fb80) Jul 8 20:44:11 black kernel: [ 3220.802192] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35). Jul 8 20:44:11 black kernel: [ 3220.802202] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35). Jul 8 20:44:11 black kernel: [ 3220.802208] radeon 0000:01:00.0: ib ring test failed (-35). Jul 8 20:44:11 black kernel: [ 3220.818242] radeon 0000:01:00.0: GPU softreset: 0x00000003 Jul 8 20:44:11 black kernel: [ 3220.818434] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003028 Jul 8 20:44:11 black kernel: [ 3220.818441] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000002 Jul 8 20:44:11 black kernel: [ 3220.818448] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200048C0 Jul 8 20:44:11 black kernel: [ 3220.818454] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jul 8 20:44:11 black kernel: [ 3220.818460] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010002 Jul 8 20:44:11 black kernel: [ 3220.818466] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00005086 Jul 8 20:44:11 black kernel: [ 3220.818472] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80098647 Jul 8 20:44:11 black kernel: [ 3220.818478] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Jul 8 20:44:11 black kernel: [ 3220.833358] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 Jul 8 20:44:11 black kernel: [ 3220.848230] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0x00003028 Jul 8 20:44:11 black kernel: [ 3220.848237] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000002 Jul 8 20:44:11 black kernel: [ 3220.848243] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200040C0 Jul 8 20:44:11 black kernel: [ 3220.848250] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jul 8 20:44:11 black kernel: [ 3220.848256] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jul 8 20:44:11 black kernel: [ 3220.848262] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jul 8 20:44:11 black kernel: [ 3220.848268] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Jul 8 20:44:11 black kernel: [ 3220.850258] radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jul 8 20:44:11 black kernel: [ 3220.852076] [drm] probing gen 2 caps for device 1022:9603 = 2/0 Jul 8 20:44:11 black kernel: [ 3220.852080] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 Jul 8 20:44:11 black kernel: [ 3220.855831] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). Jul 8 20:44:11 black kernel: [ 3220.855944] radeon 0000:01:00.0: WB enabled Jul 8 20:44:11 black kernel: [ 3220.855954] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8801187bac00 Jul 8 20:44:11 black kernel: [ 3220.855962] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8801187bac0c Jul 8 20:44:11 black kernel: [ 3220.902019] [drm] ring test on 0 succeeded in 1 usecs Jul 8 20:44:11 black kernel: [ 3220.902091] [drm] ring test on 3 succeeded in 1 usecs Jul 8 20:44:11 black kernel: [ 3220.902136] [drm] ib test on ring 0 succeeded in 0 usecs Jul 8 20:44:11 black kernel: [ 3220.902167] [drm] ib test on ring 3 succeeded in 1 usecs Jul 8 20:44:24 black kernel: [ 3233.257577] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec Jul 8 20:44:24 black kernel: [ 3233.257592] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000002fbc8 last fence id 0x000000000002fb9c) Jul 8 20:44:24 black kernel: [ 3233.273631] radeon 0000:01:00.0: Saved 1545 dwords of commands on ring 0. Jul 8 20:44:24 black kernel: [ 3233.273647] radeon 0000:01:00.0: GPU softreset: 0x00000007 Jul 8 20:44:24 black kernel: [ 3233.281226] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003028 Jul 8 20:44:24 black kernel: [ 3233.281234] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000002 Jul 8 20:44:24 black kernel: [ 3233.281241] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200400C0 Jul 8 20:44:24 black kernel: [ 3233.281247] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jul 8 20:44:24 black kernel: [ 3233.281253] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010002 Jul 8 20:44:24 black kernel: [ 3233.281260] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000086 Jul 8 20:44:24 black kernel: [ 3233.281267] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80018647 Jul 8 20:44:24 black kernel: [ 3233.281273] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE Jul 8 20:44:24 black kernel: [ 3233.296145] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001 Jul 8 20:44:24 black kernel: [ 3233.311017] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0x00003028 Jul 8 20:44:24 black kernel: [ 3233.311024] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000002 Jul 8 20:44:24 black kernel: [ 3233.311030] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200400C0 Jul 8 20:44:24 black kernel: [ 3233.311036] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jul 8 20:44:24 black kernel: [ 3233.311042] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jul 8 20:44:24 black kernel: [ 3233.311049] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jul 8 20:44:24 black kernel: [ 3233.311055] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Jul 8 20:44:24 black kernel: [ 3233.311062] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x46483146 Jul 8 20:44:24 black kernel: [ 3233.311121] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jul 8 20:44:24 black kernel: [ 3233.313109] radeon 0000:01:00.0: GPU reset succeeded, trying to resume Jul 8 20:44:24 black kernel: [ 3233.315390] [drm] probing gen 2 caps for device 1022:9603 = 2/0 Jul 8 20:44:24 black kernel: [ 3233.315400] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 Jul 8 20:44:24 black kernel: [ 3233.319041] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). Jul 8 20:44:24 black kernel: [ 3233.319161] radeon 0000:01:00.0: WB enabled Jul 8 20:44:24 black kernel: [ 3233.319172] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8801187bac00 Jul 8 20:44:24 black kernel: [ 3233.319179] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8801187bac0c Jul 8 20:44:24 black kernel: [ 3233.365400] [drm] ring test on 0 succeeded in 1 usecs Jul 8 20:44:24 black kernel: [ 3233.365464] [drm] ring test on 3 succeeded in 1 usecs Jul 8 20:44:34 black kernel: [ 3243.743621] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec Thanks |