Bug 207901

Summary: Nouveau: In a 4 monitor setup, 1-2 displays remains black after boot
Product: Drivers Reporter: Maurice Gale (mauricegale1)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: imirkin, lyude
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.3.0-51 Subsystem:
Regression: No Bisected commit-id:
Attachments: Stack Trace that occurs when issue arises
Log file with drm debugging enabled
PCI Information
Config File
CPU Information
Log for 5.6
Log for 5.7
Log after firmware installation
Log after Nouveau Patch
New log with drm 116

Description Maurice Gale 2020-05-26 19:17:33 UTC
Created attachment 289307 [details]
Stack Trace that occurs when issue arises

When four displays are connected to a gpu (NVIDIA p2000), 2 of the 4 displays will remain black. The monitors that remain black are the same exact ones on each boot. The connectors for the respective monitors are displayport->miniDisplayport, and displayport->hdmi. This issue was also identified during install time of Ubuntu.
Comment 1 Maurice Gale 2020-05-26 19:19:42 UTC
Created attachment 289311 [details]
Log file with drm debugging enabled
Comment 2 Maurice Gale 2020-05-26 19:20:16 UTC
Created attachment 289313 [details]
PCI Information
Comment 3 Maurice Gale 2020-05-26 19:20:50 UTC
Created attachment 289315 [details]
Config File
Comment 4 Maurice Gale 2020-05-26 19:21:31 UTC
Created attachment 289317 [details]
CPU Information
Comment 5 Ilia Mirkin 2020-05-26 19:26:16 UTC
Does this continue to occur on recent kernels? e.g. 5.6 or 5.7-rc?

It looks like aux port 9 is stuck somehow, not 100% sure that's something that'd have been addressed, but DP stuff definitely got improvements.
Comment 6 Maurice Gale 2020-05-26 19:28:53 UTC
(In reply to Ilia Mirkin from comment #5)
> Does this continue to occur on recent kernels? e.g. 5.6 or 5.7-rc?
> 
> It looks like aux port 9 is stuck somehow, not 100% sure that's something
> that'd have been addressed, but DP stuff definitely got improvements.

I will install install 5.7 to give that a shot, Thanks!
Comment 7 Maurice Gale 2020-05-27 15:37:21 UTC
Interestingly, after installing both 5.6 and 5.7-rc, I get no displays at all when connected to my Nvidia p2000. 

However, when I use integrated graphics, I get both of the displays. It seems as though things have gotten worse.
Comment 8 Maurice Gale 2020-05-27 15:38:10 UTC
Created attachment 289347 [details]
Log for 5.6
Comment 9 Maurice Gale 2020-05-27 15:38:42 UTC
Created attachment 289349 [details]
Log for 5.7
Comment 10 Ilia Mirkin 2020-05-27 16:36:51 UTC
It looks like you don't have firmware present, which starting with v5.6 fails the whole module load. (I believe that's a separate bug.)

Try installing linux-firmware, and making the firmware available in /lib/firmware at nouveau module load time (so if it's loaded from initrd, it needs to be in initrd).
Comment 11 Maurice Gale 2020-05-27 19:35:17 UTC
Ok, I installed the firmware. Now I have 2/4 displays.
Comment 12 Maurice Gale 2020-05-27 19:35:49 UTC
Created attachment 289357 [details]
Log after firmware installation
Comment 13 Ilia Mirkin 2020-05-27 19:37:18 UTC
OK, that looks roughly identical to your v5.3 situation.
Comment 14 Lyude Paul 2020-06-10 22:28:18 UTC
Hi! I was wondering if you could give this a try with the closed source nvidia driver and see if your displays work there? I mainly ask because I'm suspicious that the stuck i2c aux channel we're seeing might just be throwing some status we don't handle correctly in nouveau, and I'd also like to make sure there isn't something strange going on with the monitors here.
Comment 15 Maurice Gale 2020-06-10 23:49:06 UTC
Hi! I have installed the closed source nvidia driver(nvidia-driver-440), and I am able to successfully get every display on every single boot.
Comment 16 Lyude Paul 2020-06-11 17:31:14 UTC
(In reply to Maurice Gale from comment #15)
> Hi! I have installed the closed source nvidia driver(nvidia-driver-440), and
> I am able to successfully get every display on every single boot.

Cool-so we at least should be able to trace the nvidia blob to figure out what's going on. Do you think you could grab an mmiotrace of the nvidia blob with all of your displays hooked up?

https://wiki.ubuntu.com/X/MMIOTracing

You might also need to add:

iomem=relaxed

To get that working. fwiw as well, that guide says somewhere to stop X afterwards before stopping the trace. That may or may not work, and it should be fine if you just leave X running when stopping the trace instead.

For submitting the mmiotrace to me, it's probably going to be somewhat large so I would recommend compressing it with xz and just sending it directly to my email.
Comment 17 Lyude Paul 2020-06-18 15:14:05 UTC
(got the mmiotrace via email, thanks!)

Also another update on this: we actually got some feedback from nvidia regarding this issue and I think we have a pretty good idea of what's happening, so I should have some patches for you to try soon
Comment 18 Maurice Gale 2020-06-18 15:30:03 UTC
That's awesome. Thanks!
Comment 19 Lyude Paul 2020-06-23 16:49:29 UTC
Hi! Ben Skeggs pushed a fix for this to their kernel repository:

https://github.com/skeggsb/nouveau/commit/2b9345f98be41e2145f561f6cfecc99686f4ef71.patch

You should be able to apply it (you will need to pass --directory=drivers/gpu/ to git am for it to apply), would you mind testing this and letting me know if it fixes your issue?
Comment 20 Maurice Gale 2020-06-25 15:27:39 UTC
Hi,

I tried the patch, but I am still missing two displays. I have attached the new log.
Comment 21 Maurice Gale 2020-06-25 15:28:36 UTC
Created attachment 289887 [details]
Log after Nouveau Patch
Comment 22 Lyude Paul 2020-07-02 19:03:49 UTC
Hi! Sorry this took me a little bit to reply to. So-it looks like we did indeed fix the i2c timeout issue that I was seeing on your board, so the next suspicious thing in your log seems to be the fact one of your DP ports (DP-7 in particular) appears to think it's being continuously hotplugged:

[   90.996867] nouveau 0000:01:00.0: DRM: unplugged DP-7
[   91.191704] [drm:drm_add_display_info [drm]] non_desktop set to 0
[   91.191711] [drm:drm_add_display_info [drm]] HDMI: DVI dual 0, max TMDS clock 300000 kHz
[   91.297638] [drm:drm_dp_dpcd_access [drm_kms_helper]] Too many retries, giving up. First error: -110
[   91.297642] [drm:drm_helper_hpd_irq_event [drm_kms_helper]] [CONNECTOR:73:DP-4] status updated from connected to connected
[   91.493173] [drm:drm_add_display_info [drm]] non_desktop set to 0
[   91.493179] [drm:drm_add_display_info [drm]] HDMI: DVI dual 0, max TMDS clock 300000 kHz
[   91.599120] [drm:drm_dp_dpcd_access [drm_kms_helper]] Too many retries, giving up. First error: -110
[   91.599123] [drm:drm_helper_hpd_irq_event [drm_kms_helper]] [CONNECTOR:76:DP-5] status updated from connected to connected
[   91.599335] nouveau 0000:01:00.0: DRM: display: 4x540000 dpcd 0x12
[   91.599336] nouveau 0000:01:00.0: DRM: encoder: 4x810000
[   91.599336] nouveau 0000:01:00.0: DRM: maximum: 4x540000
[   91.605980] [drm:drm_add_display_info [drm]] non_desktop set to 0
[   91.605986] [drm:drm_add_display_info [drm]] HDMI: DVI dual 0, max TMDS clock 0 kHz
[   91.606300] [drm:drm_helper_hpd_irq_event [drm_kms_helper]] [CONNECTOR:79:DP-6] status updated from connected to connected
[   91.606518] nouveau 0000:01:00.0: DRM: display: 4x540000 dpcd 0x12
[   91.606519] nouveau 0000:01:00.0: DRM: encoder: 4x810000
[   91.606519] nouveau 0000:01:00.0: DRM: maximum: 4x540000
[   91.613389] [drm:drm_add_display_info [drm]] non_desktop set to 0
[   91.613394] [drm:drm_add_display_info [drm]] HDMI: DVI dual 0, max TMDS clock 0 kHz
[   91.613722] [drm:drm_helper_hpd_irq_event [drm_kms_helper]] [CONNECTOR:82:DP-7] status updated from connected to connected
[   91.614016] nouveau 0000:01:00.0: DRM: plugged DP-7

What kind of setup do you have your monitors hooked up through? Are they on any laptop docks/standalone MST hubs, do they go through any video adaptors, what models/brands and what kind of connectors do they use, etc.

As well, do you think you can grab another log from your system but using:

log_buf_len=50M drm.debug=0x116 nouveau.debug=disp=trace

(include the full log from your boot if you can and don't trim it, since it's likely going to be a big log)

I have a feeling one of your MST devices is doing some weird out-of-spec behavior that we might need to teach nouveau to handle
Comment 23 Maurice Gale 2020-07-13 14:51:17 UTC
Hi!

Here's my environment:

Monitor:
 - dell p4317q (Quad display Monitor)
 - Aside from the above, I have also tested on 4 separate monitors.

Connectors:
 - Display -> HDMI (Brand: Ivanky)
 - Display -> HDMI (Brand: Ivanky)
 - Display -> Mini Display (Brand: Ivanky)
 - Display -> Display (Brand: Ivanky)

Desktop:
 - Dell Optiplex Xe3

- No laptops or special adaptors are used

Link to connectors:
https://www.amazon.com/gp/product/B076RMR48C/ref=ppx_yo_dt_b_asin_title_o06_s00?ie=UTF8&psc=1

I have attached the new log with suggested boot options (116-drm-log.txt)
Please let me know if there is anything else that you need.
Comment 24 Maurice Gale 2020-07-13 14:52:43 UTC
Created attachment 290253 [details]
New log with drm 116
Comment 25 Lyude Paul 2020-07-20 21:44:40 UTC
Hi! could you please retry this with a kernel from drm-tip? I don't see any DisplayPort/DP MST debugging output there, so I'm guessing you either forgot to add drm.debug=0x116 or your kernel (which appears to be based off the ubuntu kernel) is too old to support that. As well, I can see a couple of other bugs from nouveau in that log which have already been fixed in more recent kernels.
Comment 26 Maurice Gale 2020-07-27 15:32:42 UTC
Sorry for the delay and the mistake. It loaded the 5.3 kernel by default. I recreated the log using 5.7-rc5 with the i2c timeout patch. I also added the new grub options. The file size is much larger than the old one, so it should be correct this time.

I noticed that I have more luck getting all monitors displayed when I do not have the Display->MiniDisplay connected (Swap it out for Display->HDMI). It seems as though mini display is causing some issues.

Again, thank you so much for your help, and please let me know if you need anything else.
Comment 27 Ilia Mirkin 2020-07-27 15:37:14 UTC
DP -> HDMI is, most likely, just passive, so the displays are being driven as HDMI rather than as DP. DP is the problematic protocol...
Comment 28 Lyude Paul 2020-07-31 21:23:19 UTC
(In reply to Ilia Mirkin from comment #27)
> DP -> HDMI is, most likely, just passive, so the displays are being driven
> as HDMI rather than as DP. DP is the problematic protocol...

Hi-so, my first guess then might actually be a bad DP adapter. In my experience DP cables (usually the full-size ones, but I've had it happen with one or two passive adapters as well) have a bad habit of breaking really easily. One time I actually spent a week trying to fix a bug that'd I'd already tried troubleshooting by replacing the DP cable, only to eventually find out the cable I replaced it with also magically went bad.

Mind posting one more log with the same parameters as last time? I probably can't tell for sure but usually you'll see a lot of link training failures or timeouts when the cable has gone bad (but sometimes the symptoms are a lot more sinister...). As well, I'd also try another adapter of course.
Comment 29 Maurice Gale 2020-08-10 15:32:28 UTC
I can try to replace the DP cable to see if that does the trick--DP adapters seems to be so finicky. I would still be curious as to why it works with nvidia drivers, however. 

I have emailed an additional log. I really appreciate your help. After getting a new DP adapter, I will update you on whether it worked or not, as well as give you an additional log.
Comment 30 Lyude Paul 2020-09-21 18:54:13 UTC
Hi! Sorry for the long wait, I got distracted with some other work that needed to be done. Also, sorry for making you buy that adapter since it looks like things aren't working still. So I took a look at the logs that you gave me and noticed some interesting things that didn't catch my eye before. In the last log you gave me with the new DP adapter I can't see nouveau noticing any of the monitors. On the log before that though, I actually see it detect all four monitors. I also see it actually turn all four on successfully, huh.

The other thing I noticed is that it looks like you're using the nouveau DDX in your X config. That makes me wonder, do the screens come up -at all- during the boot process before the GUI comes up? As well, have you tried the modesetting DDX to see if that makes any difference, or perhaps wayland?
Comment 31 Maurice Gale 2020-11-02 15:48:51 UTC
Sorry for the late reply, I too was working a different project.

To answer the above questions:

1. Only the main screen is displayed during grub, the other are displayed once Ubuntu finish loading.

2. I have tried modesetting DDX, but got the same results.

3. I have also tried wayland, but got the same results.

I have also tried starting off with 1 display, then 2 and so on. I have noticed that it can consistently load 3 displays. The trouble seems to occur once the 4th display is introduced.

I also get the occasional all displays booting black when 4 displays are connected.