Bug 100871

Summary: radeon fails to initialize one DisplayPort monitor
Product: Drivers Reporter: Charles R. Anderson (cra)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher, reg, szg00000, tiwai, vedran
Priority: P1    
Hardware: All   
OS: Linux   
See Also: https://bugzilla.redhat.com/show_bug.cgi?id=1232562
Kernel Version: v3.19-7478-g796e1c55717e Subsystem:
Regression: Yes Bisected commit-id:
Attachments: lspci-nn.txt
Xorg journal messages from bad kernel
Xorg journal messages from good 3.19.8-200.fc21 kernel
kernel journal messages from bad kernel
kernel journal messages from good 3.19.8-200.fc21 kernel
dmsg - All 6 screens good kernel-3.16.7-35-default
Xorg.0.log - All 6 screens good kernel-3.16.7-35-default
Logs to compare all screens good on boot to some bad on boot

Description Charles R. Anderson 2015-07-03 16:00:17 UTC
Ever since Fedora 21 was updated to kernel-4.x, the radeon drm driver fails to initialize one DisplayPort monitor of a multi-monitor setup.  Setup is four mini-DP ports on [AMD/ATI] Cedar GL [FirePro 2460], two Dell U2410 monitors connected via DP, and two Dell 2001FP monitors connected via DVI using DP-to-DVI adapters.

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar GL [FirePro 2460] (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 2002
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 29
	Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at f7e20000 (64-bit, non-prefetchable) [size=128K]
	Region 4: I/O ports at e000 [size=256]
	Expansion ROM at f7e00000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: radeon
	Kernel modules: radeon

I did a git bisect between Linux v3.19-6676-g1fa185ebcbce and v3.19-7478-g796e1c55717e and found:

# bad: [e55bca26188e45f209597abf986c87cc5a49894a] radeon/audio: enable DP audio
git bisect bad e55bca26188e45f209597abf986c87cc5a49894a
# first bad commit: [e55bca26188e45f209597abf986c87cc5a49894a] radeon/audio: enable DP audio

commit e55bca26188e45f209597abf986c87cc5a49894a
Author: Slava Grigorev <slava.grigorev@amd.com>
Date:   Fri Dec 12 17:01:42 2014 -0500

    radeon/audio: enable DP audio
    
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Slava Grigorev <slava.grigorev@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

See also https://bugzilla.redhat.com/show_bug.cgi?id=1232562
Comment 1 Alex Deucher 2015-07-03 19:29:54 UTC
Does it work correctly if you set radeon.audio=0 on the kernel command line in grub?  Also does the problematic monitor support audio?  Please attach your xorg log and dmesg output.
Comment 2 Charles R. Anderson 2015-07-04 21:19:07 UTC
I'll try radeon.audio=0.  I never tried DisplayPort audio on the Dell U2410, but it apparently does support it:

http://en.community.dell.com/support-forums/peripherals/f/3529/t/19311234

This problem happens on the text VC even before X is started.  You can find the logs in the Red Hat bugzilla linked above.
Comment 3 Charles R. Anderson 2015-07-18 03:10:30 UTC
radeon.audio=0 indeed works around the problem.
Comment 4 Charles R. Anderson 2015-10-20 22:48:11 UTC
(In reply to Charles R. Anderson from comment #3)
> radeon.audio=0 indeed works around the problem.

This problem still exists in 4.2.3 as released in Fedora 22 (kernel-4.2.3-200.fc22).  I still need the workaround.
Comment 5 Charles R. Anderson 2015-10-20 22:56:48 UTC
Created attachment 190671 [details]
lspci-nn.txt
Comment 6 Charles R. Anderson 2015-10-20 22:57:10 UTC
Created attachment 190681 [details]
Xorg journal messages from bad kernel
Comment 7 Charles R. Anderson 2015-10-20 22:57:34 UTC
Created attachment 190691 [details]
Xorg journal messages from good 3.19.8-200.fc21 kernel
Comment 8 Charles R. Anderson 2015-10-20 22:58:02 UTC
Created attachment 190701 [details]
kernel journal messages from bad kernel
Comment 9 Charles R. Anderson 2015-10-20 22:58:26 UTC
Created attachment 190711 [details]
kernel journal messages from good 3.19.8-200.fc21 kernel
Comment 10 Charles R. Anderson 2016-08-08 15:23:27 UTC
Problem still exists in Linux 4.6.4 in Fedora 24 (kernel-4.6.4-301.fc24.x86_64).
Comment 11 Reg 2016-09-01 08:13:36 UTC
I think I have the same problem but I have a more complicated setup and because of that I have been able to identify more symptoms which may help.

In any case, here's everything I have been able to determine but first the hardware setup: My graphics card is "HD 5870 Eyefinity 6" which has 6 DisplayPorts. I have them setup in a grid of 3 across by 2 down. Each display is at a resolution of 2560x1440 creating a total work area of 7680x2880 in a Xinerama setup running on the KDE4 desktop.

I currently have 3 kernels in my grub list which are:
  kernel-3.16.7
  kernel-4.7.0
  kernel-4.7.2

Of these 3.16.7 was with opensuse 13.2 and the other two came into being when I switched over to Tumbleweed, SUSE's rolling distribution.

With the kernel 3.16.7 I had no problem with all DisplayPorts turning on as they should all the time. When I changed over to Tumbleweed it still worked fine. However, the other two kernels would only turn on the first two displays. That happens during boot long before Xorg gets loaded.

In Xorg the behavior is a little strange when it gets DisplayPorts off from the kernel. Xorg will acknowledge all 6 displays but it is not able to turn on any that are initially off when the kernel was handling them. E.g.: the last 4 monitors in the case of the 4.x kernels.

The upshot is that when I go to the multidisplay setup part of KDE all 6 displays are showing as active even though only the first two are turned on in reality. If I disable and re-enable the displays turned off, they don't turn on. If I use xrandr to turn them on, no dice. That is, if they are off when the kernel was handling them they are off for good, nothing in Xorg or KDE can change it that I have found.

There was a bunch of updates for Tumbleweed a few days ago. With this update the kernel 4.7.2 was added and 3.16.7 started to not always boot with all the displays on. In fact, it was consistently leaving out display 0 and 5 (first and last on the graphics card). However, after much playing with the "radeon." kernel boot parameters I found that setting radeon.agpmode=-1 seemed to make it consistently on leave only 1 monitor off, monitor 0. No other "radeon." setting seemed to help.

However, on several boots I could get variations... I must have rebooted 50+ times last night. Occasionally I would get only 4 of the 6 on and even more occasionally I would get all 6 on like it should be.

Trying all the "radeon." settings seemed to have no effect on the 4.x kernels and they still only booted with 2 of the 6 displays on... as if someone hard coded a 2 output limit in the kernel code for testing and forgot to remove the test code.

I also found two other curious symptoms on the 3.16.7 kernel:

- If I turn the monitor off and back on while booting that the kernel left off sometimes I can get the kernel to recognise that display and leave it on during boot. If it gets to the gui before I can turn off/on the monitor then it's too late. Again, this is very iffy, some times it works and some times it doesn't.

- Since the latest updates, if I let KDE turn off all the monitors, say I walk away for a while so that power saving kicks in, then all the monitors that were on will come back on. However if I leave it too long, like over night, then some displays may not come back on, and, once they are off when they should be on again, there is no turning them back on without clearing the KDE cache and rebooting before that cache gets refreshed. This usually means logging out of my profile, logging in as root user, clearing my profile's KDE/plasma cache, rebooting, making sure I get a boot that the kernel turns on all the displays and the logging in to my profile again... not exactly and long term workable way to be.

I have attached my dmesg with the 3.16.7 kernel working correctly (I just got very lucky so I preserved the logs). Tomorrow, I can get you the logs of 3.16.7 not coming up correctly and the other two kernels coming up with only 2 displays on out of the 6 there should be.

That's all that I have figured out so far. As you can guess with my setup it's rather important that I get this fixed or I'll have to revert back to an older release which I don't want to do for several reasons. The upshot, I am at your disposal to figure this out, just tell me what you need me to do.
Comment 12 Reg 2016-09-01 08:19:17 UTC
Created attachment 231661 [details]
dmsg - All 6 screens good kernel-3.16.7-35-default
Comment 13 Reg 2016-09-01 08:20:14 UTC
Created attachment 231671 [details]
Xorg.0.log - All 6 screens good kernel-3.16.7-35-default
Comment 14 Reg 2016-09-06 10:28:10 UTC
Created attachment 232241 [details]
Logs to compare all screens good on boot to some bad on boot

First I have to take one thing back, the radeon.audo=0 definitely makes a difference and I am not so sure that radeon.apgmode=-1 helps anymore. That said, things still go wrong. Because the biggest issue here seems to be a lack of reproducibility and therefore it's almost impossible to track down I went to the trouble to write a script to gather information.

In the tarred file I found that to see what's different between a good and bad boot all you have to do is a diff on the files:
    ./Logs/timing-stripped/filtered-drm/
        screens-0-4-good-5-bad_kernel-4.7.2-1-default_logo.nologo-radeon.audio=0-debug-debug_objects_dmsg.txt
        screens-0-5-good_kernel-4.7.2-1-default_logo.nologo-radeon.audio=0-debug-debug_objects_dmsg.txt

Anybody who wanted to also gather comprehensive information for the developers could take the file ./gather-info-for-diagnostics.sh in the tarred file and modify as needed for their own system.

That said, below explains in detail what's in the tarred compressed file.

Directory structure
===================
.
└── logs
    ├── filtered-drm
    └── timing-stripped
        └── filtered-drm

This structure is as follows:
    . 
    =
    The script that creates the log files and script to turn on any screens that are off during boot (more on this one later).

    ./Logs
    ======
    The raw log files the script gathered which include:
        dmsg.txt                            - from dmesg
        proc-cmdline.txt                    - from /proc/cmdline
        module-kernel-parameters.txt        - from /module/kernel/parameters/*
        module-processor-parameters.txt     - from /module/processor/parameters/*
        sys-module-radeon-parameters.txt    - from /module/radeon/parameters/*
        Xorg.0.log.txt                      - from /var/log/Xorg.0.log

    ./Logs/filtered-drm
    ===================
    Some of the above raw Log files with lines that do not contain radeon information removed. Makes it easier to see what's relevant. If you want to know exactly how the lines were filtered you can look at the script ./gather-info-for-diagnostics.sh.

    ./Logs/timing-stripped
    ======================
    The above raw Log files with the timing at the beginning of each line removed. This makes using diff programs easier. If you want to know exactly how this was done you can look at the script ./gather-info-for-diagnostics.sh.

    ./Logs/timing-stripped/filtered-drm
    ===================================
    Some of the above raw Log files with the timing at the beginning of each line removed and lines that do not contain radeon information removed. Again, makes it easier to see what's relevant.  If you want to know exactly how this was done you can look at the script ./gather-info-for-diagnostics.sh.


Scripts
=======

./gather-info-for-diagnostics.sh
--------------------------------
Does all the heavy lifting in gathering the info.

./display-on.sh
---------------
This was a curious discovery and may make fixing the issue easier. This is because I found when the script was like this:

    xrandr --output DisplayPort-${1} --mode 1920x1080
    xrandr --output DisplayPort-${1} --mode 2560x1440

I found that sometimes it would turn the display on but others it would turn it off. To consistantly turn the display on I had to change it to this:

    xrandr --output DisplayPort-${1} --mode 1920x1080
    sleep 
    xrandr --output DisplayPort-${1} --mode 2560x1440

suggesting there might be a timing problem that needs to be addressed.


File Names
==========

File names take the form of:
    <what happened to the screens at boot>_<partial command line when booting the kernel>_<the file name>.txt
    E.g. The file:
        screens-0-4-good-5-bad_kernel-4.7.2-1-default_logo.nologo-radeon.audio=0-debug-debug_objects_dmsg.txt

    can be broken down to:
        screens-0-4-good-5-bad      = The first 5 of the 6 screens came on as they should during boot but the 6th one (number 5) did not.
        kernel-4.7.2-1-default_logo.nologo-radeon.audio=0-debug-debug_objects
                                    = shows most of the boot command line
        dmsg                        = A key indicating the file contents, from dmesg in this case
        .txt                        = That is is a text file

If the file starts off with something like this:  screens-0-5-good-after-5-fixed-with_display-on.sh it means after booting and logging in I ran the script ./display-on.sh to turn on the display and then gathered all the log information. I will have gathered the log information prior to running the script as well so you will also see files prefixed with just screens-0-5-good in such a case.
Comment 15 Reg 2016-09-06 10:50:20 UTC
Hm, I can't edit that last message. Here are a couple of corrections:

Where I showed:
    xrandr --output DisplayPort-${1} --mode 1920x1080
    sleep 
    xrandr --output DisplayPort-${1} --mode 2560x1440

It should have been:
    xrandr --output DisplayPort-${1} --mode 1920x1080
    sleep 5
    xrandr --output DisplayPort-${1} --mode 2560x1440

Where I showed: 
    from /module/...
It should have been:
    from /sys/module/...
Comment 16 Reg 2016-09-06 18:13:31 UTC
Update: Regarding c14 (comment 14) about the ./display-on.sh. Even though running this script can turn the display on that was erroneously off during boot the display will turn itself back off after a few seconds or so so it's not a usable workaround. I guess there is some status flag during boot in the kernel that ultimately can't be changed or overridden that eventually reasserts itself.
Comment 17 Charles R. Anderson 2022-08-08 21:37:43 UTC
Still a problem on Fedora 36 / Linux kernel 5.18.16-200.fc36.x86_64.  I'm now using two newer monitors (DELL U3219Q) connected via DP instead of the previous four monitors (2 DP, 2 DVI) and neither monitor turns on unless radeon.audio=0 is passed.

Maybe there needs to be a quirk added to the driver to keep audio turned off for this card?

Advanced Micro Devices, Inc. [AMD/ATI] Cedar GL [FirePro 2460] (prog-if 00 [VGA controller])
Comment 18 Charles R. Anderson 2022-08-08 21:48:30 UTC
I believe the PCI ID is 1002:68f1:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cedar GL [FirePro 2460] [1002:68f1] (prog-if 00 [VGA controller])