Bug 117171

Summary: Radeon GPU lockup in X windows after resuming from blank screen
Product: Drivers Reporter: Erik Brangs (erik.brangs)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED INVALID    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.6 rc5 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-rc5-wily/ Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output acquired from frozen system with GPU lockup
Xorg log
glxinfo from working system

Description Erik Brangs 2016-04-25 13:01:04 UTC
Created attachment 213991 [details]
dmesg output acquired from frozen system with GPU lockup

I'm using Ubuntu 16.04 with the 4.6 rc5 kernel build from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-rc5-wily/ . AFAIK this is a mainline kernel with an Ubuntu configuration.

Sometimes, when I'm using X windows and resuming from a blank screen, X windows freezes: The previous screen content is properly restored but X windows locks up and ceases to process inputs. The system isn't completely frozen, though: I can still SSH into it. In this case, the dmesg log doesn't have any message about the lockup at the end and the of the Xorg log only has entries about mieq overflowing. This behaviour has been present as long as I have actually tried to get some debugging output.

At other times, I can still attempt to switch to the terminal. In that case, the system locks up after the X windows screen content has been removed but before the terminal is shown, e.g. the screen content consists of a single colour. In that case, the tail of the dmesg output contained some lines about GPU lockup:

[ 9553.364081] radeon 0000:01:00.0: ring 0 stalled for more than 10440msec
[ 9553.364093] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000009b9e last fence id 0x0000000000009b9f on ring 0)
[ 9553.368211] radeon 0000:01:00.0: Saved 215675 dwords of commands on ring 0.
[ 9553.369258] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[ 9553.369272] radeon 0000:01:00.0: ffff880035298800 unpin not necessary
[ 9553.385589] [drm] radeon: 1 quad pipes, 1 z pipes initialized.
[ 9553.386659] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[ 9553.386666] radeon 0000:01:00.0: WB enabled
[ 9553.386672] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000008000000 and cpu addr 0xffff880034dee000
[ 9553.386721] [drm] radeon: ring at 0x0000000008001000
[ 9553.941187] [drm:r100_ring_test [radeon]] *ERROR* radeon: ring test failed (scratch(0x15E8)=0xCAFEDEAD)
[ 9553.941220] [drm:r100_cp_init [radeon]] *ERROR* radeon: cp isn't working (-22).
[ 9553.941224] radeon 0000:01:00.0: failed initializing CP (-22).

I don't think that this is a regression because I haven seen freezes like this for a long time (since before the 4.x series). The basic problem (i.e. lockup of X windows) hasn't changed in that time. The frequency of occurence and the timing of the lockup seem to vary based on the kernel version.
Comment 1 Erik Brangs 2016-04-25 13:02:33 UTC
Created attachment 214001 [details]
Xorg log
Comment 2 Erik Brangs 2016-04-25 13:03:12 UTC
Created attachment 214011 [details]
glxinfo from working system
Comment 3 Erik Brangs 2016-04-29 14:58:41 UTC
The problem might be related to DPMS: I haven't seen any freezes after deactivating DPMS in xscreensaver.

When I use DPMS with "Standby" in xscreensaver and set the times for "Suspend" and "Off" to a high value so that they won't be triggered, the problem still occurs.
Comment 4 Erik Brangs 2016-06-06 06:36:27 UTC
I think that this was caused by failing hardware.