Bug 60791

Summary: Display corruption with Radeon driver during boot and on desktop
Product: Drivers Reporter: Brian Hall (hallbw)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher, dragosh44, hallbw, Hamsi2k, szg00000
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.10.5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: showing the problem during boot
showing the problem during boot
green arc at lightdm login prompt
hash of flickering green lines when displaying website
hash of flickering green lines when displaying website
hash of flickering green lines when displaying website
dmesg output
lspci -vvv output
Xorg log
radeonreg regs dce3 from 3.10.5 (broken.regs)
radeonreg regs dce3 from 3.10.4 (working.regs)
fixes radeon DVI display corruption
copy of video bios
possible fix
3.11.0 Working register dump
3.11.0 Non-Working register dump
copy of video bios
possible fix

Description Brian Hall 2013-08-25 21:20:07 UTC
Created attachment 107302 [details]
showing the problem during boot

As of vanilla kernel 3.10.5, I am seeing green speckled display artifacts during boot, and on my desktop. During boot and on solid backgrounds they take the form of green arcs in multiple places; other times they take the form of hashed green lines in small areas of the display (such as on the Phoronix website rendered in Chromium). This corruption does NOT occur on kernels 3.10.4 and below, nor on 3.9 kernels (tested up to 3.9.11 vanilla). I think the corruption is somehow happening AFTER the display buffer, because when I take a screenshot, the green speckling is only visible when booted to a problematic kernel- if I reboot into a kernel with the issue, when I display the screenshot I previously took the corruption is not visible.

I have tried using "radeon.audio=0" on the kernel command line, but that had no effect. I am not using HDMI video or audio, only DVI.

Since I couldn't take a screenshot, I took some pictures with a camera. Hard to see due to focus and because the short green horizontal lines of the corruption flicker madly.

System details:
x86_64 Solydx 2013.08.06 (Debian jessie/sid rolling base)
BIOSTAR A785GE with onboard Radeon HD4200 video
AMD Phenom II X4 940 CPU
4GB DDR2 memory

More details attached.
Comment 1 Brian Hall 2013-08-25 21:20:53 UTC
Created attachment 107303 [details]
showing the problem during boot
Comment 2 Brian Hall 2013-08-25 21:22:15 UTC
Created attachment 107304 [details]
green arc at lightdm login prompt
Comment 3 Brian Hall 2013-08-25 21:24:04 UTC
Created attachment 107305 [details]
hash of flickering green lines when displaying website
Comment 4 Brian Hall 2013-08-25 21:24:18 UTC
Created attachment 107306 [details]
hash of flickering green lines when displaying website
Comment 5 Brian Hall 2013-08-25 21:24:31 UTC
Created attachment 107307 [details]
hash of flickering green lines when displaying website
Comment 6 Brian Hall 2013-08-25 21:26:42 UTC
Created attachment 107308 [details]
dmesg output
Comment 7 Brian Hall 2013-08-25 21:27:11 UTC
Created attachment 107309 [details]
lspci -vvv output
Comment 8 Brian Hall 2013-08-25 21:27:50 UTC
Created attachment 107310 [details]
Xorg log
Comment 9 Brian Hall 2013-08-25 21:46:05 UTC
Problem only occurs when using the DVI output from my motherboard. When I switch to the VGA motherboard output, the problem on kernels > 3.10.4 disappears. I'd prefer to use DVI, so I don't consider this a permanent workaround, I'd still like to get the corruption problem with DVI fixed.
Comment 10 Alex Deucher 2013-08-26 13:01:02 UTC
Can you bisect?
Comment 11 Brian Hall 2013-08-27 00:48:59 UTC
Working the bisection now, may take me a day or two.
Comment 12 Brian Hall 2013-08-27 23:54:58 UTC
Bisect results below. According to my Xorg.0.log, my board is a 
"ATI Radeon HD 4200" (ChipID = 0x9710) and lspci calls it a RS880.

6f8bbaf568c7f2c497558bfd04654c0b9841ad57 is the first bad commit
commit 6f8bbaf568c7f2c497558bfd04654c0b9841ad57
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Tue Jul 30 00:22:53 2013 -0400

    drm/radeon/atom: initialize more atom interpretor elements to 0
    
    commit 42a21826dc54583cdb79cc8477732e911ac9c376 upstream.
    
    The ProcessAuxChannel table on some rv635 boards assumes
    the divmul members are initialized to 0 otherwise we get
    an invalid fb offset since it has a bad mask set when
    setting the fb base.  While here initialize all the
    atom interpretor elements to 0.
    
    Fixes:
    https://bugzilla.kernel.org/show_bug.cgi?id=60639
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

:040000 040000 d2bb057047f71419a89def40e6e21dc948c5784c 7e49987ae73078e644723f0cb6c791e15e102ab0 M	drivers
Comment 13 Alex Deucher 2013-08-28 13:04:25 UTC
Can you revert parts of the patch to find out which element is causing the problem.  E.g., try:

       /* reset data block */
//       ctx->data_block = 0;

and see if that helps, then:

       /* reset divmul */
//       ctx->divmul[0] = 0;
       ctx->divmul[1] = 0;

etc.

Additionally, can you dump the display registers in the working and non-working states using radeonreg (http://cgit.freedesktop.org/~airlied/radeontool/)?

(as root)

boot with broken kernel:
./radeonreg regs dce3 > broken.regs

boot with working kernel:
./radeonreg regs dce3 > working.regs
Comment 14 Brian Hall 2013-08-29 02:11:48 UTC
Created attachment 107349 [details]
radeonreg regs dce3 from 3.10.5 (broken.regs)
Comment 15 Brian Hall 2013-08-29 02:15:21 UTC
Created attachment 107350 [details]
radeonreg regs dce3 from 3.10.4 (working.regs)
Comment 16 Brian Hall 2013-09-03 03:00:01 UTC
Cannot reproduce the problem by modifying drivers/gpu/drm/radeon/atom.c, apparently my bisect was incorrect. Problem does not occur even if I undo all the code changes for that commit. 

I did reconfirm the basic problem still occurs with 3.10.5 but not 3.10.4. Will attempt to re-bisect at the next opportunity.
Comment 17 Brian Hall 2013-09-08 17:22:36 UTC
Fixed it!

The problem was in atom.c, my bisect was correct. Starting with the bad 3.10.5 atom.c, I copied it into the good 3.10.4 tree, commented out the reset data block part, rebuilt, and that fixed it. Commenting out the reset divmul part, without removing the reset data block part, did not fix the corruption.

So I generated a patch, and applied that to a 3.11 tree. Fixed the corruption on 3.11. This is the first time I've booted anything higher than 3.10.4 without this problem. Now I'm running 3.11+fix_radeon_dvi_corruption.patch on DVI and there's no corruption during boot or on my desktop.
Comment 18 Brian Hall 2013-09-08 17:23:44 UTC
Created attachment 107621 [details]
fixes radeon DVI display corruption
Comment 19 Alex Deucher 2013-09-08 21:42:09 UTC
Please attach a copy of your vbios.

(as root)
(use lspci to get the bus id)
cd /sys/bus/pci/devices/<pci bus id>
echo 1 > rom
cat rom > /tmp/vbios.rom
echo 0 > rom
Comment 20 Brian Hall 2013-09-09 00:01:02 UTC
Created attachment 107711 [details]
copy of video bios
Comment 21 Quallenauge 2013-09-09 04:29:29 UTC
*** Bug 61011 has been marked as a duplicate of this bug. ***
Comment 22 Alex Deucher 2013-09-09 15:08:34 UTC
Created attachment 107911 [details]
possible fix

Does this patch fix the issue?  Please apply without your patch.
Comment 23 Quallenauge 2013-09-09 19:16:54 UTC
Sadly it doesn't fix the issue on my setup.
I applied it on the 3.11.0 release without previous patches.
Comment 24 Alex Deucher 2013-09-09 19:18:35 UTC
(In reply to Hamsi2k from comment #23)
> Sadly it doesn't fix the issue on my setup.
> I applied it on the 3.11.0 release without previous patches.

Can you attach a copy of your vbios and dump the broken and working registers as per comment 13?
Comment 25 Quallenauge 2013-09-09 19:36:02 UTC
Created attachment 107941 [details]
3.11.0 Working register dump
Comment 26 Quallenauge 2013-09-09 19:36:23 UTC
Created attachment 107951 [details]
3.11.0 Non-Working register dump
Comment 27 Quallenauge 2013-09-09 19:44:58 UTC
Created attachment 107961 [details]
copy of video bios
Comment 28 Brian Hall 2013-09-10 01:16:19 UTC
(In reply to Alex Deucher from comment #22)
> Created attachment 107911 [details]
> possible fix
> 
> Does this patch fix the issue?  Please apply without your patch.

Patch 0001-drm-radeon-atom-workaround-vbios-bug-in-transmitter-.patch did fix my issue, on 3.11.0 without the fix_radeon_dvi_corruption.patch. I had some trouble getting the atom-workaround patch to apply, so I just manually modified atombios_encoders.c as per the patch:

                        /* some early dce3.2 boards have a bug in their transmitter control table */                                                                                    
//                      if ((rdev->family != CHIP_RV710) && (rdev->family != CHIP_RV730))                                                                                               
                        /* some dce3.x boards have a bug in their transmitter control table.                                                                                            
                        * ACTION_ENABLE_OUTPUT can probably be dropped since ACTION_ENABLE                                                                                              
                        * does the same thing and more.                                                                                                                                 
                        */                                                                                                                                                              
                       if ((rdev->family != CHIP_RV710) && (rdev->family != CHIP_RV730) &&                                                                                              
                           (rdev->family != CHIP_RS780))                                                                                                                                
                                atombios_dig_transmitter_setup(encoder, ATOM_TRANSMITTER_ACTION_ENABLE_OUTPUT, 0, 0);
Comment 29 Alex Deucher 2013-09-10 14:05:45 UTC
Created attachment 108011 [details]
possible fix

Sorry, I had the wrong chip check in the last version.  This patch should be correct.
Comment 30 Quallenauge 2013-09-10 18:08:12 UTC
The last patch works for me :)
Thanks!
Comment 31 Brian Hall 2013-09-27 21:40:36 UTC
Since we have a patch that fixes the problem, can we get it submitted for 3.12, or at least 3.12.1?
Comment 33 Dragos Taranu 2013-10-10 19:36:41 UTC
I seem to have a related problem - in my case the DVI is completely disabled upon boot, so I have a blank screen (monitor seems to go to standby). I have an older, 780G-based mainboard:
Biostar A780G M2+ SE Ver. 6.0
Fedora 19 x86_64 (kernel 3.10.4 is the last working one).

The patch works for me too, if I change CHIP_RS880 to CHIP_RS780 (or just add RS780 to the comparison)