Bug 42678 - [3.3-rc1] radeon stuck in kernel after lockup
[3.3-rc1] radeon stuck in kernel after lockup
Status: NEW
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel)
All Linux
: P1 normal
Assigned To: drivers_video-dri
:
Depends on:
Blocks: 42644
  Show dependency treegraph
 
Reported: 2012-01-28 15:16 UTC by Maciej Rutecki
Modified: 2016-03-23 18:36 UTC (History)
15 users (show)

See Also:
Kernel Version: 3.3-rc1
Tree: Mainline
Regression: Yes


Attachments

Description Maciej Rutecki 2012-01-28 15:16:27 UTC
Subject    : [3.3-rc1]radeon 0000:07:00.0: GPU lockup CP stall for more than 10000msec
Submitter  : Torsten Kaiser <just.for.lkml@googlemail.com>
Date       : 2012-01-21 19:03
Message-ID : CAPVoSvSXMvRb=1itu9DjF+s=6zfAChvUxS-x=b8EV9kOZinNpA@mail.gmail.com
References : http://marc.info/?l=linux-kernel&m=132717279606670&w=2

This entry is being used for tracking a regression from 3.2. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Jérôme Glisse 2012-01-28 18:27:51 UTC
Regression is kernel stuck after lockup
http://marc.info/?l=linux-kernel&m=132774626706709&w=2

The lockup is not a regression in itself.
Comment 2 Torsten Kaiser 2012-01-29 20:31:43 UTC
For the lockup itself I have filed: https://bugs.freedesktop.org/show_bug.cgi?id=45329

The kernel regression has been partly addressed by:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=9fc04b503df9a34ec1a691225445c5b7dfd022e7

The mutex deadlock has been fixed, but X still fails to recover from the GPU lockup. With kernel 3.1 or 3.2 I didn't even noticed that these lockups where happening.

See http://marc.info/?l=linux-kernel&m=132739068529857&w=2 for a SysRq+W backtrace of the stuck X process
Comment 3 Torsten Kaiser 2012-02-04 08:39:42 UTC
The fix for the lockup itself in now in mainline and should be released in 3.3-rc3.

But I can confirm that the regression (that X is no longer recovering from the GPU lockup / GPU reset) is still there in 3.3-rc2.

For my log, first the lockup:
Feb  4 08:55:25 thoregon kernel: [15457.570126] radeon 0000:07:00.0: GPU lockup CP stall for more than 10000msec
Feb  4 08:55:25 thoregon kernel: [15457.570134] GPU lockup (waiting for 0x00070CAA last fence id 0x00070CA9)
Feb  4 08:55:25 thoregon kernel: [15457.586330] radeon 0000:07:00.0: GPU softreset 
Feb  4 08:55:25 thoregon kernel: [15457.586337] radeon 0000:07:00.0:   R_008010_GRBM_STATUS=0xA0003028
Feb  4 08:55:25 thoregon kernel: [15457.586343] radeon 0000:07:00.0:   R_008014_GRBM_STATUS2=0x00000002
Feb  4 08:55:25 thoregon kernel: [15457.586349] radeon 0000:07:00.0:   R_000E50_SRBM_STATUS=0x200000C0
Feb  4 08:55:25 thoregon kernel: [15457.586362] radeon 0000:07:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
Feb  4 08:55:25 thoregon kernel: [15457.601387] radeon 0000:07:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Feb  4 08:55:25 thoregon kernel: [15457.617378] radeon 0000:07:00.0:   R_008010_GRBM_STATUS=0x00003028
Feb  4 08:55:25 thoregon kernel: [15457.617384] radeon 0000:07:00.0:   R_008014_GRBM_STATUS2=0x00000002
Feb  4 08:55:25 thoregon kernel: [15457.617390] radeon 0000:07:00.0:   R_000E50_SRBM_STATUS=0x200000C0
Feb  4 08:55:25 thoregon kernel: [15457.618393] radeon 0000:07:00.0: GPU reset succeed
Feb  4 08:55:25 thoregon kernel: [15457.623326] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Feb  4 08:55:25 thoregon kernel: [15457.623361] radeon 0000:07:00.0: WB enabled
Feb  4 08:55:25 thoregon kernel: [15457.623367] [drm] fence driver on ring 0 use gpu addr 0x20000c00 and cpu addr 0xffff880328696c00
Feb  4 08:55:25 thoregon kernel: [15457.669623] [drm] ring test on 0 succeeded in 1 usecs
Feb  4 08:55:25 thoregon kernel: [15457.669648] [drm] ib test on ring 0 succeeded in 1 usecs

Then, when the X server tries to unblank the screens it gets stuck. There no longer is a mutex deadlock for the hung task detector to log, but SysRq+W shows X in D state:
 Feb  4 09:28:30 thoregon kernel: [17441.917129] SysRq : Changing Loglevel
Feb  4 09:28:30 thoregon kernel: [17441.917140] Loglevel set to 6
Feb  4 09:28:31 thoregon kernel: [17443.659030] SysRq : Show Blocked State
Feb  4 09:28:31 thoregon kernel: [17443.659040]   task                        PC stack   pid father
Feb  4 09:28:31 thoregon kernel: [17443.659122] X               D ffff880337d50a00     0  3048   3027 0x00400004
Feb  4 09:28:31 thoregon kernel: [17443.659133]  ffff880328709700 0000000000000082 ffff8802f2dc5c00 0000000000010a00
Feb  4 09:28:31 thoregon kernel: [17443.659143]  ffff88031bf2bfd8 0000000000010a00 ffff88031bf2a000 ffff88031bf2bfd8
Feb  4 09:28:31 thoregon kernel: [17443.659152]  0000000000010a00 ffff880328709700 0000000000010a00 0000000000010a00
Feb  4 09:28:31 thoregon kernel: [17443.659161] Call Trace:
Feb  4 09:28:31 thoregon kernel: [17443.659177]  [<ffffffff815ee9d7>] ? schedule_timeout+0x157/0x220
Feb  4 09:28:31 thoregon kernel: [17443.659188]  [<ffffffff8103fcb0>] ? run_timer_softirq+0x240/0x240
Feb  4 09:28:31 thoregon kernel: [17443.659197]  [<ffffffff8133ee39>] ? radeon_fence_wait+0x239/0x3b0
Feb  4 09:28:31 thoregon kernel: [17443.659207]  [<ffffffff8104f420>] ? wake_up_bit+0x40/0x40
Feb  4 09:28:31 thoregon kernel: [17443.659215]  [<ffffffff81352f77>] ? radeon_ib_get+0x257/0x2e0
Feb  4 09:28:31 thoregon kernel: [17443.659224]  [<ffffffff81354f4a>] ? radeon_cs_ioctl+0x27a/0x4d0
Feb  4 09:28:31 thoregon kernel: [17443.659232]  [<ffffffff812f4184>] ? drm_ioctl+0x3e4/0x490
Feb  4 09:28:31 thoregon kernel: [17443.659240]  [<ffffffff81354cd0>] ? radeon_cs_finish_pages+0xa0/0xa0
Feb  4 09:28:31 thoregon kernel: [17443.659249]  [<ffffffff810247e9>] ? do_page_fault+0x199/0x420
Feb  4 09:28:31 thoregon kernel: [17443.659257]  [<ffffffff810af4dc>] ? mmap_region+0x1dc/0x570
Feb  4 09:28:31 thoregon kernel: [17443.659265]  [<ffffffff810de636>] ? do_vfs_ioctl+0x96/0x4e0
Feb  4 09:28:31 thoregon kernel: [17443.659273]  [<ffffffff810deac9>] ? sys_ioctl+0x49/0x90
Feb  4 09:28:31 thoregon kernel: [17443.659281]  [<ffffffff815f18e2>] ? system_call_fastpath+0x16/0x1b
Feb  4 09:28:41 thoregon kernel: [17453.327296] SysRq : Emergency Sync
Feb  4 09:28:41 thoregon kernel: [17453.327912] Emergency Sync complete

Apart from the X server the system was still working. I was able to ssh into it and do a normal shutdown.
Comment 4 Jérôme Glisse 2012-02-06 22:11:19 UTC
How do you trigger the lockup ?
Comment 5 Torsten Kaiser 2012-02-07 06:59:30 UTC
Not completely sure about that. I wait until the screensaver kicks in (or better: let KDEs powerdevil switch the monitor off, I do not have a screensaver programm running) an then let the system idle for 10..20min.

The cause of the lockup as now been fixed: https://bugs.freedesktop.org/show_bug.cgi?id=45329

But I was still seeing the regression that X fails to recover in 3.3-rc2.

Until 3.3-rc1 X always recovered from these lockups, I didn't even notice they where happening. The earliest of these lockups I found in my logs was under 3.1, but the trigger that caused them to happen was not the kernel upgrade to 3.1, but an upgrade of xf86-video-ati from 6.14.2 to 6.14.3.
Comment 6 Rafael J. Wysocki 2012-02-23 22:32:02 UTC
Handled-By : Jérôme Glisse <glisse@freedesktop.org>
Comment 7 Jérôme Glisse 2012-02-24 02:53:43 UTC
You no longer have those lockup ? The fix in the ddx might explain why the kernel was no longer able to recover from lockup. Sadly userspace change can affect kernel successfulness at things like lockup recovering.
Comment 8 Torsten Kaiser 2012-02-24 08:42:55 UTC
I think, you're not getting away with blaming userspace. ;-)

But this issue is rather complicated, because there is more then one bug / change involved.

To summarize the issues:

* a change in xf86-video-ati-6.14.2 -> 6.14.3: That was the initial trigger for the GPU lockup messages on my system. While this changes was partly buggy (This has now been fixed, but I think that fix is not released yet) it was merely a trigger for a kernel bug.
"Prove" that 6.14.3 is to blame for this:
6.14.2 + kernel 3.1 -> no GPU lockup messages
6.14.3 + kernel 3.1 -> first GPU lockup messages
also downgrading to 6.14.2 no longer showed this with later kernels
"Prove" that the real bug causing these lockups was a kernel bug:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=1b61925061660009f5b8047f93c5297e04541273
-> with this change 6.14.3 can no longer trigger GPU lockups

* the kernel bug causing GPU lockups -> wrong DESKTOP_HEIGHT setup.
That was probably always triggerable from userspace, but only the changes in 6.14.3 made this bug visible.
This is fixed with above commit 1b61925061660009f5b8047f93c5297e04541273
This bug is not this regression wrt. 3.3-rcX, as I was seeing this since 3.1

* first regression in 3.3-rc1: mutex deadlock that you have already fixed.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=9fc04b503df9a34ec1a691225445c5b7dfd022e7

* a second, still open regression in 3.3-rc1 that had been masked by the first regression: Even with the mutex fix applied to the kernel (i.e. 3.3-rc2) X was still failing to recover from the GPU lockups. See comment #3
This is the issue, why I would still consider this bug (42678) to be still open.

And I think this is a kernel regression and not a userspace issue because:
* 6.14.3 (with the GPU lockup trigger) and 3.2 (with the GPU lockup bug) will cause the GPU lockup messages in dmesg, but I did not even notice this was happening at all, because X was always able to recover without noticeable effects.
* the same userspace (6.14.3 with the trigger) and 3.3-rc2 (still with the GPU lockup bug, but without the mutex deadlock) will trigger the GPU lockup messages in dmesg, but X will be stuck in the kernel and fail to turn my monitors back on.
So I think the stuck X process is caused by the kernel changes between 3.2 and 3.3-rc2.

Since 3.3-rc3 X did not get stuck again, but this is because the underlying kernel GPU lockup bug has been fixed, so there never was a need to recover and any recovery bug could no longer be triggered.

Does this description of the issues involved make sense for you? Please ask, if I was unclear or messed up my explanation.
Comment 9 Adrien Nader 2012-03-29 22:07:17 UTC
I think I'm hitting the same issue but I'm reproducing it very easily. Basically, I start the computer, KDM, KDE, and boom.

  http://notk.org/~adrien/heat_issue/lockup/dmesg

As for X, I'm using xf86-video-ati 6.14.4 which came out today, and libdrm 2.4.33, along with xorg-server 1.12.0.

I'll try with a 3.2.12 kernel tomorrow instead of my 3.3.0+ (middle of the merge window for 3.4). I'm fairly motivated to get this sorted out.
Comment 10 Jérôme Glisse 2012-03-29 22:57:06 UTC
Adrien question is: Is Xorg stuck inside the kernel ? Fixing root cause of GPU lockup is a different matter (basicly you have to go though several G of datas and there is no tools to do that, the only tool you can make is one that help you shrink the amount of data you have to analyze).

Torsten is 3.4 still affected for you ?
Comment 11 Torsten Kaiser 2012-04-03 19:38:37 UTC
> Is 3.4 still affected?

I don't know, but I suspect it.
Because since the fix of my underlying GPU hang in 3.3-rc3 there wasn't a need for a recovery or a change to hang X again.

As I tried to explain in comment #8 there where 3 kernel bugs involved for me.
1.: a GPU lockup that happened since 3.1 and was fixed in 3.3-rc3
2.: a regression 3.2 -> 3.3-rc1 in the mutex locking, fixed in 3.3-rc2
3.: a regression 3.2 -> 3.3-rc1 that prevents X to recover from the GPU lockup.

3. was visible in 3.3-rc2 (see comment #3) as 2. was already fixed, but 1. was still happening.
But after 3.3-rc3 1. has been fixed, so 3. no longer triggers for me, but I suspect that the GPU lockup recovery is still broken, because I did not see any patch that claimed to fix it.
Comment 12 Jérôme Glisse 2012-04-03 21:24:00 UTC
Well other things might have fixed it. I will force lockup but code inspection never leaded me to the issue.
Comment 13 Adrien Nader 2012-04-04 12:43:51 UTC
My previous comment was missing some bits because of my lack of sleep.

My first issue was that, with a new laptop, which has two AMD cards (one integrated, one discrete), the both cards are enabled and running even though it's the integrated one which is actually used. That makes the laptop use a lot more power than it should and it gets very hot.

I finally found out that I could save a lot of power by using vga_switcheroo to switch to the dedicated card and then to the integrated one. That saves almost 45W of power consumption.

However, when I start X, even with only twm, I get MANY MANY MANY lockups as soon as X is starting. At some point, X seems unable to recover. After something like maybe 20 lockups...

Maybe that X could manage better but the current rate of lockups is one every 10 seconds, and that's with each lockup taking 10 seconds before a reset.

Maybe that this should go in another bug report however.


At Jérôme's request, I've exported my dmesg in order to check that X was stuck inside the kernel. It contains my dmesg output with several executions of "echo t > /proc/sysrq-trigger".

  http://notk.org/~adrien/heat_issue/lockup/dmesg_sysrq_t_lockup

I've tried on 3.2.13; 3.3.0, 3.4-rc1(+) and I've had issues with all of these. I'm running the latest individual tarballs of X (as of 5 days ago), along with the latest libdrm, and a mesa git.

PS: I've started my laptop a bit after starting to write this. I've just reached the 53 lockups, after  abit more than 10 minutes after issuing "startx" (and X is still recovering, oh, it seeems it has stopped recovering after 56 lockups)
Comment 14 Yvon TANGUY 2012-05-03 11:17:41 UTC
I'm affected with the same problem, I've log a bug report on launchpad (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/986524). Don't know if it will help.
Comment 15 Bart Verwilst 2012-05-04 08:10:21 UTC
I can reproduce the lockups easily by switching from my 2 monitors in default mode to dual screen mode. Lockups start happening right away. Worked flawlessy with Ubuntu 11.10's kernel/radeon driver. If there's anything i can do to help debug this?
Comment 16 Alex Deucher 2012-05-04 12:50:55 UTC
(In reply to comment #15)
> I can reproduce the lockups easily by switching from my 2 monitors in default
> mode to dual screen mode. Lockups start happening right away. Worked flawlessy
> with Ubuntu 11.10's kernel/radeon driver. If there's anything i can do to help
> debug this?

Can you track down the problematic component (kernel, ddx, mesa, etc.) and bisect?
Comment 17 Bart Verwilst 2012-05-04 13:04:11 UTC
I managed to finally switch to dual screen mode without hangs. But while using the desktop, i have frequent hangs -> black screen -> restore loops. Using kernel 3.4.0-rc4 provided by the launchpad bugreport in comment #14.

If you can give me some pointers, i'll do my best to get some more info!
Comment 18 Michel Dänzer 2012-05-04 13:08:18 UTC
(In reply to comment #15)
> I can reproduce the lockups easily by switching from my 2 monitors in default
> mode to dual screen mode. Lockups start happening right away.

Note that this bug report isn't about lockups per se but about the inability to recover from a lockup. You should probably look for another bug report about monitor switching causing lockups, or file your own.
Comment 19 Jérôme Glisse 2012-05-04 14:21:53 UTC
Please one persone one bug report we will mark appropriate bug as duplicate.
Comment 20 Alexandre Demers 2012-05-04 17:27:14 UTC
As a side note, it could be related to Bug 45018. For me, it all started at the same time. Since, it happens a lot less with latest drm, ddx, mesa, kernel and X server, but it still happens from time to time randomly. As I was saying it may be related, or not, but since everything happened at the same time as Bug 45018 and using a 3.2 kernel fixes most of what I see, I think there is a similar root to all this. I can usually reproduce the lockup followed by a stucked X by playing a movie (this is the easiest way I've been able to do it or with what I reported in bug 45018). It often locks up after nearly 40 minutes of video. In a few seconds, image skips, turns greenish on some parts, and BAM! Locks up, resets and hangs with X unable to come back on its feet. Lately, in some occasions, it was able to get in X, but everything related to 3D is then dead.

I may have missed it, but which video card/chipset is Maciej using? Radeon 6950 over here.
Comment 21 Laurent Bonnaud 2013-05-01 17:50:14 UTC
This bug is still present in Ubuntu raring with this kernel package:

Package: linux-image-3.8.0-19-generic
Version: 3.8.0-19.29

which is based on kernel 3.8.8.

The GPU did hang a few times and the kernel was able to recover.  But later the kernel was caught in an infinite loop of GPU hangs and I was not able to take back the control of the X server and therefore I lost unsaved work in my X session.  Between 2 hangs I was able to switch to a VT and run dmesg, so here is the end of the kernel log:

[73670.536108] radeon 0000:01:00.0: GPU lockup CP stall for more than 10168msec
[73670.536200] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000c7f78)
[73670.536203] radeon 0000:01:00.0: failed to get a new IB (-35)
[73670.536255] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib !
[73670.537377] radeon 0000:01:00.0: Saved 1017 dwords of commands on ring 0.
[73670.537380] radeon 0000:01:00.0: GPU softreset: 0x00000003
[73670.691016] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA27034E0
[73670.691018] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000103
[73670.691021] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200200C0
[73670.691023] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
[73670.691025] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00001002
[73670.691027] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00028C86
[73670.691030] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x808386C5
[73670.691032] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[73670.705914] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
[73670.720796] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
[73670.720799] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
[73670.720801] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200280C0
[73670.720803] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[73670.720805] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[73670.720808] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[73670.720810] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
[73670.731190] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[73670.748806] [drm] probing gen 2 caps for device 8086:2a41 = 1/0
[73670.920570] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[73670.920609] radeon 0000:01:00.0: WB enabled
[73670.920612] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000010000c00 and cpu addr 0xffff8801347e0c00
[73670.920614] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000010000c0c and cpu addr 0xffff8801347e0c0c
[73671.118349] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[73671.118480] [drm:r600_resume] *ERROR* r600 startup failed on resume
Comment 22 Ľubomír Mlích 2013-07-12 20:13:30 UTC
Hi, 

my log shows:
radeon: failed testing IB on GFX ring (-35).

instead of:
radeon 0000:01:00.0: failed to get a new IB (-35)

on:
Linux black 3.8.0-26-generic #38-Ubuntu SMP Mon Jun 17 21:43:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

perhaps it helps. 

Jul  8 20:44:11 black kernel: [ 3220.802169] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
Jul  8 20:44:11 black kernel: [ 3220.802183] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000002fb90 last fence id 0x000000000002fb80)
Jul  8 20:44:11 black kernel: [ 3220.802192] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
Jul  8 20:44:11 black kernel: [ 3220.802202] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
Jul  8 20:44:11 black kernel: [ 3220.802208] radeon 0000:01:00.0: ib ring test failed (-35).
Jul  8 20:44:11 black kernel: [ 3220.818242] radeon 0000:01:00.0: GPU softreset: 0x00000003
Jul  8 20:44:11 black kernel: [ 3220.818434] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003028
Jul  8 20:44:11 black kernel: [ 3220.818441] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000002
Jul  8 20:44:11 black kernel: [ 3220.818448] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200048C0
Jul  8 20:44:11 black kernel: [ 3220.818454] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jul  8 20:44:11 black kernel: [ 3220.818460] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
Jul  8 20:44:11 black kernel: [ 3220.818466] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00005086
Jul  8 20:44:11 black kernel: [ 3220.818472] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80098647
Jul  8 20:44:11 black kernel: [ 3220.818478] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
Jul  8 20:44:11 black kernel: [ 3220.833358] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Jul  8 20:44:11 black kernel: [ 3220.848230] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0x00003028
Jul  8 20:44:11 black kernel: [ 3220.848237] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000002
Jul  8 20:44:11 black kernel: [ 3220.848243] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200040C0
Jul  8 20:44:11 black kernel: [ 3220.848250] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000                                     
Jul  8 20:44:11 black kernel: [ 3220.848256] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jul  8 20:44:11 black kernel: [ 3220.848262] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Jul  8 20:44:11 black kernel: [ 3220.848268] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Jul  8 20:44:11 black kernel: [ 3220.850258] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Jul  8 20:44:11 black kernel: [ 3220.852076] [drm] probing gen 2 caps for device 1022:9603 = 2/0
Jul  8 20:44:11 black kernel: [ 3220.852080] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
Jul  8 20:44:11 black kernel: [ 3220.855831] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Jul  8 20:44:11 black kernel: [ 3220.855944] radeon 0000:01:00.0: WB enabled
Jul  8 20:44:11 black kernel: [ 3220.855954] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8801187bac00
Jul  8 20:44:11 black kernel: [ 3220.855962] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8801187bac0c
Jul  8 20:44:11 black kernel: [ 3220.902019] [drm] ring test on 0 succeeded in 1 usecs
Jul  8 20:44:11 black kernel: [ 3220.902091] [drm] ring test on 3 succeeded in 1 usecs
Jul  8 20:44:11 black kernel: [ 3220.902136] [drm] ib test on ring 0 succeeded in 0 usecs
Jul  8 20:44:11 black kernel: [ 3220.902167] [drm] ib test on ring 3 succeeded in 1 usecs
Jul  8 20:44:24 black kernel: [ 3233.257577] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
Jul  8 20:44:24 black kernel: [ 3233.257592] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000002fbc8 last fence id 0x000000000002fb9c)
Jul  8 20:44:24 black kernel: [ 3233.273631] radeon 0000:01:00.0: Saved 1545 dwords of commands on ring 0.
Jul  8 20:44:24 black kernel: [ 3233.273647] radeon 0000:01:00.0: GPU softreset: 0x00000007
Jul  8 20:44:24 black kernel: [ 3233.281226] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003028
Jul  8 20:44:24 black kernel: [ 3233.281234] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000002
Jul  8 20:44:24 black kernel: [ 3233.281241] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200400C0
Jul  8 20:44:24 black kernel: [ 3233.281247] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jul  8 20:44:24 black kernel: [ 3233.281253] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
Jul  8 20:44:24 black kernel: [ 3233.281260] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000086
Jul  8 20:44:24 black kernel: [ 3233.281267] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80018647
Jul  8 20:44:24 black kernel: [ 3233.281273] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
Jul  8 20:44:24 black kernel: [ 3233.296145] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Jul  8 20:44:24 black kernel: [ 3233.311017] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0x00003028
Jul  8 20:44:24 black kernel: [ 3233.311024] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000002
Jul  8 20:44:24 black kernel: [ 3233.311030] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200400C0
Jul  8 20:44:24 black kernel: [ 3233.311036] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Jul  8 20:44:24 black kernel: [ 3233.311042] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Jul  8 20:44:24 black kernel: [ 3233.311049] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Jul  8 20:44:24 black kernel: [ 3233.311055] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Jul  8 20:44:24 black kernel: [ 3233.311062] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x46483146
Jul  8 20:44:24 black kernel: [ 3233.311121] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Jul  8 20:44:24 black kernel: [ 3233.313109] radeon 0000:01:00.0: GPU reset succeeded, trying to resume                                        
Jul  8 20:44:24 black kernel: [ 3233.315390] [drm] probing gen 2 caps for device 1022:9603 = 2/0
Jul  8 20:44:24 black kernel: [ 3233.315400] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
Jul  8 20:44:24 black kernel: [ 3233.319041] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Jul  8 20:44:24 black kernel: [ 3233.319161] radeon 0000:01:00.0: WB enabled
Jul  8 20:44:24 black kernel: [ 3233.319172] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8801187bac00
Jul  8 20:44:24 black kernel: [ 3233.319179] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8801187bac0c
Jul  8 20:44:24 black kernel: [ 3233.365400] [drm] ring test on 0 succeeded in 1 usecs
Jul  8 20:44:24 black kernel: [ 3233.365464] [drm] ring test on 3 succeeded in 1 usecs
Jul  8 20:44:34 black kernel: [ 3243.743621] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec

Thanks

Note You need to log in before you can comment on or make changes to this bug.