Bug 78661

Summary: GPU sometimes locks up after boot and/or resume
Product: Drivers Reporter: Nikolaus Waxweiler (madigens)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: abandonedaccountubdprczb8hs, Actualize.in.Material+bugzillakernel, alexdeucher, bjo, Dieter, zazdxscf+bugzilla.kernel.org
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.15.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: Full dmesg output for the day where the hangs occured up to the bug report
It happened again with radeon.dpm=0 and power_profile set to low.
Rebuilt kernel with patch, just got a hang again :(
Got a temporary hang again on boot-up, managed to reboot...
New hangs on cold boot on 3.15.6

Description Nikolaus Waxweiler 2014-06-22 09:34:26 UTC
I've been getting these hangs since at least 3.13.x on Ubuntu 14.04, but I remember getting them before that, too.

It doesn't always happen, mostly the system works as usual. Sometimes though, I boot up or resume and upon opening an application or moving the mouse, I get a garbled screen, after a few seconds it goes back to normal, only to hang shortly thereafter again or hang completely. I can't remember this happening during normal use, meaning if it doesn't happen shortly after boot or resume, the system works fine.

I cut some suspicious lines from syslog:

Jun 22 10:31:40 nikolaus-desktop kernel: [   86.067026] radeon 0000:01:00.0: ring 5 stalled for more than 81420msec
Jun 22 10:31:40 nikolaus-desktop kernel: [   86.067034] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000001 last fence id 0x0000000000000000 on ring 5)
Jun 22 10:31:41 nikolaus-desktop kernel: [   86.929541] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Jun 22 10:31:41 nikolaus-desktop kernel: [   86.986723] [drm] PCIE gen 2 link speeds already enabled
Jun 22 10:31:41 nikolaus-desktop kernel: [   86.988854] [drm] PCIE GART of 1024M enabled (table at 0x000000000025D000).
Jun 22 10:31:41 nikolaus-desktop kernel: [   86.988930] radeon 0000:01:00.0: WB enabled
Jun 22 10:31:41 nikolaus-desktop kernel: [   86.988932] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8800d9752c00
Jun 22 10:31:41 nikolaus-desktop kernel: [   86.988933] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8800d9752c0c
Jun 22 10:31:41 nikolaus-desktop kernel: [   86.989307] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc90004f9c418
Jun 22 10:31:41 nikolaus-desktop kernel: [   87.005388] [drm] ring test on 0 succeeded in 1 usecs
Jun 22 10:31:41 nikolaus-desktop kernel: [   87.005442] [drm] ring test on 3 succeeded in 1 usecs
Jun 22 10:31:41 nikolaus-desktop kernel: [   87.200919] [drm] ring test on 5 succeeded in 1 usecs
Jun 22 10:31:41 nikolaus-desktop kernel: [   87.200923] [drm] UVD initialized successfully.
Jun 22 10:31:41 nikolaus-desktop kernel: [   87.200941] [drm] ib test on ring 0 succeeded in 0 usecs
Jun 22 10:31:41 nikolaus-desktop kernel: [   87.200959] [drm] ib test on ring 3 succeeded in 1 usecs
Jun 22 10:31:42 nikolaus-desktop kernel: [   87.370683] [drm:uvd_v1_0_ib_test] *ERROR* radeon: failed to get create msg (-22).
Jun 22 10:31:42 nikolaus-desktop kernel: [   87.370688] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-22).
Jun 22 10:31:42 nikolaus-desktop kernel: [   87.370708] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
Jun 22 10:31:52 nikolaus-desktop kernel: [   98.016760] radeon 0000:01:00.0: ring 5 stalled for more than 10000msec
Jun 22 10:31:52 nikolaus-desktop kernel: [   98.016768] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000001 last fence id 0x0000000000000000 on ring 5)
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.669107] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.725911] [drm] PCIE gen 2 link speeds already enabled
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.727900] [drm] PCIE GART of 1024M enabled (table at 0x000000000025D000).
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.727974] radeon 0000:01:00.0: WB enabled
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.727976] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8800d9752c00
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.727977] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8800d9752c0c
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.728365] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc90004f9c418
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.744428] [drm] ring test on 0 succeeded in 1 usecs
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.744483] [drm] ring test on 3 succeeded in 1 usecs
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.940008] [drm] ring test on 5 succeeded in 1 usecs
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.940012] [drm] UVD initialized successfully.
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.940031] [drm] ib test on ring 0 succeeded in 0 usecs
Jun 22 10:45:27 nikolaus-desktop kernel: [  912.940050] [drm] ib test on ring 3 succeeded in 1 usecs
Jun 22 10:45:27 nikolaus-desktop kernel: [  913.109743] [drm:uvd_v1_0_ib_test] *ERROR* radeon: failed to get create msg (-22).
Jun 22 10:45:27 nikolaus-desktop kernel: [  913.109748] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-22).
Jun 22 10:45:27 nikolaus-desktop kernel: [  913.109763] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed

What else do you need from me?
Comment 1 Nikolaus Waxweiler 2014-06-22 09:38:12 UTC
Argh, I forgot: I have a HD5870.
Comment 2 Alex Deucher 2014-06-23 14:49:53 UTC
Does booting with radeon.dpm=0 on the kernel command line in grub help?  Please attach your full dmesg output.
Comment 3 Nikolaus Waxweiler 2014-06-24 14:35:47 UTC
Created attachment 140851 [details]
Full dmesg output for the day where the hangs occured up to the bug report
Comment 4 Nikolaus Waxweiler 2014-06-24 14:39:01 UTC
Will try radeon.dpm=0 and report back.
Comment 5 Nikolaus Waxweiler 2014-07-01 23:38:12 UTC
Created attachment 141821 [details]
It happened again with radeon.dpm=0 and power_profile set to low.

syslog since using radeon.dpm=0 on ther kernel command line.
Comment 6 Dieter Nützel 2014-07-02 01:06:16 UTC
(In reply to Nikolaus Waxweiler from comment #1)
> Argh, I forgot: I have a HD5870.

Try this patch:
https://bugzilla.kernel.org/attachment.cgi?id=141741&action=diff

With radeon.dpm=1 ;-)
Comment 7 Nikolaus Waxweiler 2014-07-03 15:09:02 UTC
Created attachment 142021 [details]
Rebuilt kernel with patch, just got a hang again :(
Comment 8 Nikolaus Waxweiler 2014-07-08 14:44:04 UTC
Speaking of hangs, every once in a while I get lockups on boot where the screen stays corrupted or black and even the reset button doesn't help and I have to long-press the power button. Theres no log for those hangs so I don't know if they're related to this bug. Anything I can do to further analyze these hangs?
Comment 9 Alex Deucher 2014-07-08 15:23:14 UTC
This might be related to this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=76998

Can you try this patch:
https://bugs.freedesktop.org/attachment.cgi?id=102392
Comment 10 Nikolaus Waxweiler 2014-07-08 18:40:53 UTC
Alright, both patches active. Will test and report back.
Comment 11 Nikolaus Waxweiler 2014-07-09 23:28:30 UTC
Created attachment 142621 [details]
Got a temporary hang again on boot-up, managed to reboot...
Comment 12 Alex Deucher 2014-07-10 12:59:31 UTC
Make sure you power off completely before testing the new patch rather than just a warm reboot to make sure the old register state is not retained.
Comment 13 Nikolaus Waxweiler 2014-07-13 14:51:37 UTC
Okay :)

After a few days of use, I got my first corrupted screen this morning. I could reisub though. Unfortunately, nothing in the logs... will keep testing.
Comment 14 Nikolaus Waxweiler 2014-07-21 04:25:21 UTC
Created attachment 143581 [details]
New hangs on cold boot on 3.15.6

Alright, the lockups were few and far between on 3.15.5 with both patches, now with 3.15.6 (installed from Ubuntu's mainline kernel repo) I always get a short hang on the first boot in the morning. It continues but I have to reboot once to avoid further short hangs and work normally. Log from first cold boot attached.
Comment 15 Nikolaus Waxweiler 2014-08-19 16:47:54 UTC
The amount of hangs has increased since 3.16 :(
Comment 16 Bjoern Franke 2015-03-25 18:05:09 UTC
I got lockups with 3.19.2 and 3.14.36:

[  334.649402] radeon 0000:02:00.0: ring 5 stalled for more than 10000msec
[  334.649408] radeon 0000:02:00.0: GPU lockup (current fence id 0x0000000000000002 last fence id 0x0000000000000004 on ring 5)
[  334.649520] [drm:uvd_v1_0_ib_test [radeon]] *ERROR* radeon: fence wait failed (-35).
[  334.649563] [drm:radeon_ib_ring_tests [radeon]] *ERROR* radeon: failed testing IB on ring 5 (-35).
[  334.832010] [drm:rv770_dpm_set_power_state [radeon]] *ERROR* rv770_set_sw_state failed