Bug 65911

Summary: radeon: garbled output/only noise through HDMI and GPU lockups
Product: Drivers Reporter: tomka (tom)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.12.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg 3.12.1 with radeon.audio=1
dmesg 3.12.1 with radeon.audio=0
Xorg log file
avivotool regs hdmi with radeon.audio=0
avivotool regs hdmi with radeon.audio=1
Xorg log file

Description tomka 2013-11-27 04:00:14 UTC
Created attachment 116301 [details]
dmesg 3.12.1 with radeon.audio=1

Hi, I am not able to start X on computer with an ATI Radeon HD 7540D graphics card through HDMI running the linux 3.12.1 kernel. I also tried with 3.10.10 which didn't change anything. I can see my TTY without any problems, though. Starting X however gives me a totally messed up sceen without any structure or information---is is basically color noise. Here and there a black or blue block, but in general it is noise.

Looking at the logs, I can see that there are GPU lockups after which X resets it. This happens indenpently of me setting radeon.audio=1 or 0. However, I've attached a dmesg output with radeon.audio=1 and with radeon.audio=0.

After a couple of GPU lockups the kernel will eventually panic. This is the top of the callstack:

Call Trace:
 <IRQ>
 ? __wake_up
 drm_send_vblank_event [drm]
 radeon_crtc_handle_flip [radeon]
 evergreen_irq_process [radeon]

I've see issues #60709 and #60687, but the fix posted there is already in my kernel and aparantly doesn't fix my issue. I also tried something suggested in #60687: hdmi regset 0x12c44 0x00000033

With radeon.audio=0, before I try to start X:
OLD: 0x12c44 (12c44)    0x00000033 (51)
NEW: 0x12c44 (12c44)    0x00000033 (51)

With radeon.audio=0, after I have started X:
OLD: 0x12c44 (12c44)    0x00000000 (0)
NEW: 0x12c44 (12c44)    0x00000033 (51)

With radeon.audio=1, before I try to start X:
OLD: 0x12c44 (12c44)    0x00000000 (0)
NEW: 0x12c44 (12c44)    0x00000033 (51)

With radeon.audio=1, after I have started X:
OLD: 0x12c44 (12c44)    0x00000033 (51)
NEW: 0x12c44 (12c44)    0x00000033 (51)

All this, however, didn't change anything.

If readeon.audio=1 is set, I can play audio over HDMI until there a GPU lockup and again after the GPU reset.

I've also tried radeon options dpm and no_wb without any success. It also didn't help to add 'Option "AccelMethod" "EXA"' to the radeon driver/device section in /etc/X11/xorg.conf.d/20-radeon.conf.

Are there more things I could test or provide?
Comment 1 tomka 2013-11-27 04:00:50 UTC
Created attachment 116311 [details]
dmesg 3.12.1 with radeon.audio=0
Comment 2 tomka 2013-11-27 04:01:07 UTC
Created attachment 116321 [details]
Xorg log file
Comment 3 tomka 2013-11-27 04:01:34 UTC
Created attachment 116331 [details]
avivotool regs hdmi with radeon.audio=0
Comment 4 tomka 2013-11-27 04:01:47 UTC
Created attachment 116341 [details]
avivotool regs hdmi with radeon.audio=1
Comment 5 tomka 2013-11-27 04:03:00 UTC
Created attachment 116351 [details]
Xorg log file

Addedd the Xorg log by accident as patch before.
Comment 6 Alex Deucher 2013-11-27 05:37:29 UTC
Is this a regression?  If so when was it last working properly?  I don't think it has anything to do with audio.  I'd suggest trying a different userspace driver stack.  Generally GPU resets are caused by a bad combination of commands sent by the mesa 3D driver (r600g) or Xorg ddx (xf86-video-ati).
Comment 7 tomka 2013-11-27 14:41:11 UTC
I cannot tell if it is a regression since I never used X on this machine before. However, I tested with the 3.10.10 kernel which didn't make any difference. Being on arch linux, I tried xf86-video-ati-git and mesa-git, both also in different versions. All without effect. And isn't my GPU (HD7540D) rather part of the Northern Islands family than of the r600g family?
Comment 8 Alex Deucher 2013-11-27 14:58:05 UTC
(In reply to tomka from comment #7)
> I cannot tell if it is a regression since I never used X on this machine
> before. However, I tested with the 3.10.10 kernel which didn't make any
> difference. Being on arch linux, I tried xf86-video-ati-git and mesa-git,
> both also in different versions. All without effect. And isn't my GPU
> (HD7540D) rather part of the Northern Islands family than of the r600g
> family?

All r6xx-NI parts use the same 3D driver.  The hw has a similar programming interface and ISA for all the included families.
Comment 9 tomka 2013-11-28 05:34:25 UTC
Thanks for the clarification. I just tested different versions of the userspace tools---unfortunately, without any success. For every component tested, I made sure the others are used in their resent stable version.


libdrm-git with the current master and

c6d73cfeeaff9596c735d0a10b248f94b2e1e347
Tue Jul 2 09:24:53 2013 +0100

040f6b015ef7d9c1bda09f78a8873f6da45d5e95 (first this year)
Thu May 9 12:55:42 2013 +1000

2089a0080edb1b42449ee9a97f2cef7399c16d53
Mon Nov 5 22:21:42 2012 +0000


xf86-video-ati-git with current master and

67fb82a3f0759b171fea21b475a70fa825693570
Tue Oct 1 09:35:30 2013 -0400

fdb7563a5cbc736b09c2864b67a93b475c98b2bd
Thu Jan 24 21:17:11 2013 -0500

4e35b2f530e2ca8c7b7220cacd05c661de43d20d
Thu Jan 10 12:10:52 2013 +0100

60cd6ceaf44b506433aebf6b3a639a17604dfddd
Wed Nov 21 18:42:56 2012


mesa-git (and thereby mesa-libgl-git and ati-dri-gitwith current master and

e556286802811b4b99c692d1ff5197f8ee1f011b
Wed Mar 20 11:54:33 2013 -0700

5ffa28df4e4cc22481b4ed41c78632f35765f41d
Wed Jul 31 15:18:52 2013 +0200


For mesa-git, I couldn't get further back in history easily, because Bison 3 wasn't supported before. Like I said, no change couldn't fix the GPU lockups or even change the error itself. During the tests, however, I figured there was slightly more detail in the dmesg log which I didn't see before:

kernel: radeon 0000:00:01.0: GPU lockup CP stall f...ec
kernel: radeon 0000:00:01.0: GPU lockup (waiting f...2)
kernel: [drm:r600_ib_test] *ERROR* radeon: fence w...).
kernel: [drm:radeon_ib_ring_tests] *ERROR* radeon:...).
kernel: radeon 0000:00:01.0: ib ring test failed (-35).

Do you have any suggestions on what I could try next?
Comment 10 tomka 2013-11-29 06:14:50 UTC
Oops, I figured the last dmesg log lines were truncated. Here they are again:

[  170.607701] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec
[  170.607711] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000000007 last fence id 0x0000000000000002)
[  170.607717] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[  170.607723] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
[  170.607727] radeon 0000:00:01.0: ib ring test failed (-35).
Comment 11 tomka 2013-11-29 23:55:52 UTC
In another bug report (https://bugzilla.redhat.com/show_bug.cgi?id=990986) it was suggested to disable glx-tls during build configuration. So I replaced the line "--enable-glx-tls" with "--disable-glx-tls". However, the GPU lock ups still happen and I can't start X.

Then I tried setting several environment variables like it was suggested in yet another bug report (https://bugs.freedesktop.org/show_bug.cgi?id=69728): Neither R600_DEBUG=nohyperz, R600_DEBUG=nodma, R600_DEBUG=nosb or R600_LLVM=0 did help. Not alone and not combined.

Additionally, I removed the lines "--with-llvm-shared-libs" and "--enable-gallium-llvm" from the ./configure parameters, because I read in bug 69728 there might be some issues with LLVM. This also didn't change anything (the binaries got obisously much bigger, though).

The next thing I wanted to try is to disable "the new DMA ring for ttm bo moves" like suggested by you (Alex) in another thread about GPU lockups (https://groups.google.com/forum/#!topic/fa.linux.kernel/1_KzqknQn_U). However, this change seems to be already in the mainline kernel.
Comment 12 tomka 2013-11-30 00:35:11 UTC
To verify that this problem doesn't originate from my particula operating system setup I tested recent live USB systems of Ubuntu 13.10 and Manjaro 0.8.8. Both produce the same GPU lockups and therefore can't start X on startup. So I wonder: could this actually be a hardware problem?
Comment 13 tomka 2013-11-30 01:24:05 UTC
Another thing I tested was to run weston-launch from a TTY to utilize Wayland. This leads to the same distorted/garbled display. However I can't set the GPU lock up in the logs, nor is there a reset---it just seems to crash. Therefore, I assume the problem is not xf86-video-ati, but either Mesa or the radeon driver (or the hardware is broken).
Comment 14 tomka 2013-11-30 02:29:47 UTC
Yet another data point: I just installed ATI's catalyst driver. Having this in place and configured, everything works as expected: I can start X and play audio through HDMI without any error. This at least means my hardware is alright. Anyway---it would still be great to be able to use mesa and the radeon driver.