Bug 65911 - radeon: garbled output/only noise through HDMI and GPU lockups
Summary: radeon: garbled output/only noise through HDMI and GPU lockups
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-27 04:00 UTC by tomka
Modified: 2016-03-23 18:58 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.12.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg 3.12.1 with radeon.audio=1 (68.01 KB, text/plain)
2013-11-27 04:00 UTC, tomka
Details
dmesg 3.12.1 with radeon.audio=0 (70.22 KB, text/plain)
2013-11-27 04:00 UTC, tomka
Details
Xorg log file (32.49 KB, patch)
2013-11-27 04:01 UTC, tomka
Details | Diff
avivotool regs hdmi with radeon.audio=0 (3.67 KB, text/plain)
2013-11-27 04:01 UTC, tomka
Details
avivotool regs hdmi with radeon.audio=1 (3.67 KB, text/plain)
2013-11-27 04:01 UTC, tomka
Details
Xorg log file (32.49 KB, text/plain)
2013-11-27 04:03 UTC, tomka
Details

Description tomka 2013-11-27 04:00:14 UTC
Created attachment 116301 [details]
dmesg 3.12.1 with radeon.audio=1

Hi, I am not able to start X on computer with an ATI Radeon HD 7540D graphics card through HDMI running the linux 3.12.1 kernel. I also tried with 3.10.10 which didn't change anything. I can see my TTY without any problems, though. Starting X however gives me a totally messed up sceen without any structure or information---is is basically color noise. Here and there a black or blue block, but in general it is noise.

Looking at the logs, I can see that there are GPU lockups after which X resets it. This happens indenpently of me setting radeon.audio=1 or 0. However, I've attached a dmesg output with radeon.audio=1 and with radeon.audio=0.

After a couple of GPU lockups the kernel will eventually panic. This is the top of the callstack:

Call Trace:
 <IRQ>
 ? __wake_up
 drm_send_vblank_event [drm]
 radeon_crtc_handle_flip [radeon]
 evergreen_irq_process [radeon]

I've see issues #60709 and #60687, but the fix posted there is already in my kernel and aparantly doesn't fix my issue. I also tried something suggested in #60687: hdmi regset 0x12c44 0x00000033

With radeon.audio=0, before I try to start X:
OLD: 0x12c44 (12c44)    0x00000033 (51)
NEW: 0x12c44 (12c44)    0x00000033 (51)

With radeon.audio=0, after I have started X:
OLD: 0x12c44 (12c44)    0x00000000 (0)
NEW: 0x12c44 (12c44)    0x00000033 (51)

With radeon.audio=1, before I try to start X:
OLD: 0x12c44 (12c44)    0x00000000 (0)
NEW: 0x12c44 (12c44)    0x00000033 (51)

With radeon.audio=1, after I have started X:
OLD: 0x12c44 (12c44)    0x00000033 (51)
NEW: 0x12c44 (12c44)    0x00000033 (51)

All this, however, didn't change anything.

If readeon.audio=1 is set, I can play audio over HDMI until there a GPU lockup and again after the GPU reset.

I've also tried radeon options dpm and no_wb without any success. It also didn't help to add 'Option "AccelMethod" "EXA"' to the radeon driver/device section in /etc/X11/xorg.conf.d/20-radeon.conf.

Are there more things I could test or provide?
Comment 1 tomka 2013-11-27 04:00:50 UTC
Created attachment 116311 [details]
dmesg 3.12.1 with radeon.audio=0
Comment 2 tomka 2013-11-27 04:01:07 UTC
Created attachment 116321 [details]
Xorg log file
Comment 3 tomka 2013-11-27 04:01:34 UTC
Created attachment 116331 [details]
avivotool regs hdmi with radeon.audio=0
Comment 4 tomka 2013-11-27 04:01:47 UTC
Created attachment 116341 [details]
avivotool regs hdmi with radeon.audio=1
Comment 5 tomka 2013-11-27 04:03:00 UTC
Created attachment 116351 [details]
Xorg log file

Addedd the Xorg log by accident as patch before.
Comment 6 Alex Deucher 2013-11-27 05:37:29 UTC
Is this a regression?  If so when was it last working properly?  I don't think it has anything to do with audio.  I'd suggest trying a different userspace driver stack.  Generally GPU resets are caused by a bad combination of commands sent by the mesa 3D driver (r600g) or Xorg ddx (xf86-video-ati).
Comment 7 tomka 2013-11-27 14:41:11 UTC
I cannot tell if it is a regression since I never used X on this machine before. However, I tested with the 3.10.10 kernel which didn't make any difference. Being on arch linux, I tried xf86-video-ati-git and mesa-git, both also in different versions. All without effect. And isn't my GPU (HD7540D) rather part of the Northern Islands family than of the r600g family?
Comment 8 Alex Deucher 2013-11-27 14:58:05 UTC
(In reply to tomka from comment #7)
> I cannot tell if it is a regression since I never used X on this machine
> before. However, I tested with the 3.10.10 kernel which didn't make any
> difference. Being on arch linux, I tried xf86-video-ati-git and mesa-git,
> both also in different versions. All without effect. And isn't my GPU
> (HD7540D) rather part of the Northern Islands family than of the r600g
> family?

All r6xx-NI parts use the same 3D driver.  The hw has a similar programming interface and ISA for all the included families.
Comment 9 tomka 2013-11-28 05:34:25 UTC
Thanks for the clarification. I just tested different versions of the userspace tools---unfortunately, without any success. For every component tested, I made sure the others are used in their resent stable version.


libdrm-git with the current master and

c6d73cfeeaff9596c735d0a10b248f94b2e1e347
Tue Jul 2 09:24:53 2013 +0100

040f6b015ef7d9c1bda09f78a8873f6da45d5e95 (first this year)
Thu May 9 12:55:42 2013 +1000

2089a0080edb1b42449ee9a97f2cef7399c16d53
Mon Nov 5 22:21:42 2012 +0000


xf86-video-ati-git with current master and

67fb82a3f0759b171fea21b475a70fa825693570
Tue Oct 1 09:35:30 2013 -0400

fdb7563a5cbc736b09c2864b67a93b475c98b2bd
Thu Jan 24 21:17:11 2013 -0500

4e35b2f530e2ca8c7b7220cacd05c661de43d20d
Thu Jan 10 12:10:52 2013 +0100

60cd6ceaf44b506433aebf6b3a639a17604dfddd
Wed Nov 21 18:42:56 2012


mesa-git (and thereby mesa-libgl-git and ati-dri-gitwith current master and

e556286802811b4b99c692d1ff5197f8ee1f011b
Wed Mar 20 11:54:33 2013 -0700

5ffa28df4e4cc22481b4ed41c78632f35765f41d
Wed Jul 31 15:18:52 2013 +0200


For mesa-git, I couldn't get further back in history easily, because Bison 3 wasn't supported before. Like I said, no change couldn't fix the GPU lockups or even change the error itself. During the tests, however, I figured there was slightly more detail in the dmesg log which I didn't see before:

kernel: radeon 0000:00:01.0: GPU lockup CP stall f...ec
kernel: radeon 0000:00:01.0: GPU lockup (waiting f...2)
kernel: [drm:r600_ib_test] *ERROR* radeon: fence w...).
kernel: [drm:radeon_ib_ring_tests] *ERROR* radeon:...).
kernel: radeon 0000:00:01.0: ib ring test failed (-35).

Do you have any suggestions on what I could try next?
Comment 10 tomka 2013-11-29 06:14:50 UTC
Oops, I figured the last dmesg log lines were truncated. Here they are again:

[  170.607701] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec
[  170.607711] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000000007 last fence id 0x0000000000000002)
[  170.607717] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[  170.607723] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
[  170.607727] radeon 0000:00:01.0: ib ring test failed (-35).
Comment 11 tomka 2013-11-29 23:55:52 UTC
In another bug report (https://bugzilla.redhat.com/show_bug.cgi?id=990986) it was suggested to disable glx-tls during build configuration. So I replaced the line "--enable-glx-tls" with "--disable-glx-tls". However, the GPU lock ups still happen and I can't start X.

Then I tried setting several environment variables like it was suggested in yet another bug report (https://bugs.freedesktop.org/show_bug.cgi?id=69728): Neither R600_DEBUG=nohyperz, R600_DEBUG=nodma, R600_DEBUG=nosb or R600_LLVM=0 did help. Not alone and not combined.

Additionally, I removed the lines "--with-llvm-shared-libs" and "--enable-gallium-llvm" from the ./configure parameters, because I read in bug 69728 there might be some issues with LLVM. This also didn't change anything (the binaries got obisously much bigger, though).

The next thing I wanted to try is to disable "the new DMA ring for ttm bo moves" like suggested by you (Alex) in another thread about GPU lockups (https://groups.google.com/forum/#!topic/fa.linux.kernel/1_KzqknQn_U). However, this change seems to be already in the mainline kernel.
Comment 12 tomka 2013-11-30 00:35:11 UTC
To verify that this problem doesn't originate from my particula operating system setup I tested recent live USB systems of Ubuntu 13.10 and Manjaro 0.8.8. Both produce the same GPU lockups and therefore can't start X on startup. So I wonder: could this actually be a hardware problem?
Comment 13 tomka 2013-11-30 01:24:05 UTC
Another thing I tested was to run weston-launch from a TTY to utilize Wayland. This leads to the same distorted/garbled display. However I can't set the GPU lock up in the logs, nor is there a reset---it just seems to crash. Therefore, I assume the problem is not xf86-video-ati, but either Mesa or the radeon driver (or the hardware is broken).
Comment 14 tomka 2013-11-30 02:29:47 UTC
Yet another data point: I just installed ATI's catalyst driver. Having this in place and configured, everything works as expected: I can start X and play audio through HDMI without any error. This at least means my hardware is alright. Anyway---it would still be great to be able to use mesa and the radeon driver.

Note You need to log in before you can comment on or make changes to this bug.