Bug 200695
Summary: | Blank screen on RX 580 with amdgpu.dc=1 enabled (no displays detected) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Claude Heiland-Allen (claude) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | normal | CC: | 2kbrian, adia, alexdeucher, alexvkaam, andrey.arapov, anode.dev, babgozdtb, bjo, exiele.dfo, farshad, harry.wentland, lukas.fink1, magicmyth, mericsakarya, nicholas.kazlauskas, smirky, sndirsch, ty, virtuousfox |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.17.19, 4.18.5 -- 4.18.20, 4.19-rc1 -- 4.19.89, 4.20-rc1 -- 4.20.17, 5.0-rc1 -- 5.0.5, 5.1-rc3, 5.3.9, 5.3.16, 5.4.3, 5.5-rc2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg after boot with amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=6
dmesg after replugging monitor with amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=6 dmesg after boot with amdgpu.dc=0 drm.debug=6 Xorg.0.log with amdgpu=1 Xorg.0.log with amdgpu.dc=0 xorg.conf dmesg for 4.19-rc1 amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=6 dmesg logs after booting to X with displays connected to both DVI and HDMI 5.4.0 amdgpu.dc=0 drm.debug=0xe 5.4.0 amdgpu.dc=1 drm.debug=0xe |
Created attachment 277635 [details]
dmesg after replugging monitor with amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=6
Created attachment 277637 [details]
dmesg after boot with amdgpu.dc=0 drm.debug=6
Created attachment 277639 [details]
Xorg.0.log with amdgpu=1
Created attachment 277641 [details]
Xorg.0.log with amdgpu.dc=0
Created attachment 277643 [details]
xorg.conf
still an issue with 4.18.5 still an issue with 4.19-rc1, I will attach dmesg Created attachment 278167 [details]
dmesg for 4.19-rc1 amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=6
still an issue with 4.18.6 still an issue with 4.19-rc2 if you need more logs, let me know which boot options I should add console displays fine on first boot but after a few seconds dmesg reports [ 5.318879] [drm] Cannot find any crtc or sizes and screen goes blank, and the monitor turns itself off after a few moments still an issue with 4.18.7 still an issue with 4.19-rc3 Could you try enabling CONFIG_DRM_AMD_DC_PRE_VEGA option when rebuilding the kernel and see if that works with amdgpu.dc=1 ? Oh, that option is gone from 4.18. not an issue with 4.14.70 (I think it does not have amdgpu.dc as an option?) still an issue with 4.17.19 compiled with CONFIG_DRM_AMD_DC_PRE_VEGA=y still an issue with 4.18.8 still an issue with 4.19-rc4 Could you please try reverting this commit https://github.com/torvalds/linux/commit/e03fd3f300f6184c1264186a4c815e93bf658abb , rebuilding your kernel and let us know if it fixes your issue? Not sure if your problem is related to mine here https://github.com/Dunedan/mbp-2016-linux/issues/73#issuecomment-422397681 But it has helped in my case. I checked out linux v4.19-rc4 from git, then reverted that commit - no change, display goes blank about 5 seconds into boot. I noticed something else in the dmesg (it was there in 4.19-rc1,rc2,rc3,rc4,rc4 with reverted commit, but not earlier versions): [ 5.109572] amdgpu: [powerplay] Failed to retrieve minimum clocks. [ 5.109577] amdgpu: [powerplay] Error in phm_get_clock_info [ 5.109627] [drm] DM_PPLIB: values for Engine clock [ 5.109629] [drm] DM_PPLIB: 300000 [ 5.109631] [drm] DM_PPLIB: 600000 [ 5.109632] [drm] DM_PPLIB: 900000 [ 5.109633] [drm] DM_PPLIB: 1145000 [ 5.109634] [drm] DM_PPLIB: 1215000 [ 5.109636] [drm] DM_PPLIB: 1257000 [ 5.109637] [drm] DM_PPLIB: 1300000 [ 5.109638] [drm] DM_PPLIB: 1366000 [ 5.109640] [drm] DM_PPLIB: Validation clocks: [ 5.109641] [drm] DM_PPLIB: engine_max_clock: 136600 [ 5.109642] [drm] DM_PPLIB: memory_max_clock: 200000 [ 5.109644] [drm] DM_PPLIB: level : 8 [ 5.109646] [drm] DM_PPLIB: values for Memory clock [ 5.109647] [drm] DM_PPLIB: 300000 [ 5.109648] [drm] DM_PPLIB: 1000000 [ 5.109649] [drm] DM_PPLIB: 2000000 [ 5.109651] [drm] DM_PPLIB: Validation clocks: [ 5.109652] [drm] DM_PPLIB: engine_max_clock: 136600 [ 5.109653] [drm] DM_PPLIB: memory_max_clock: 200000 [ 5.109655] [drm] DM_PPLIB: level : 8 [ 5.124083] [drm] Display Core initialized with v3.1.59! The last (largest) value for "engine clock" and "memory clock" are 10x the validation values for "engine clock max" and "memory clock max". I see in the amd/powerplay sources some values are in units of 10kHz, some in units of 1kHz(?) - maybe a conversion was missed somewhere? Or maybe the printout is totally normal and I know nothing :) The error message common to all kernels with amdgpu.dc=1 since 4.17 is: [ 5.256378] [drm] Cannot find any crtc or sizes still an issue in 4.18.9 still an issue in 4.18.10 still an issue in 4.18.11 still an issue in 4.19-rc5 still an issue in 4.19-rc6 still an issue in 4.18.12 still an issue in 4.18.13 still an issue in 4.19-rc7 still an issue in 4.18.14 still an issue in 4.18.15 still an issue in 4.18.16 still an issue in 4.19-rc8 still an issue in 4.19 still an issue in 4.18.17 still an issue in 4.19.1 still an issue in 4.20-rc1 (configured with HSA enabled) still an issue in 4.18.18 still an issue in 4.20-rc2 still an issue in 4.18.19 still an issue in 4.19.2 still an issue in 4.20-rc3 still an issue in 4.18.20 4.19.3 4.19.4 4.19.5 4.19.6 4.20-rc4 4.20-rc5 > amdgpu: [powerplay] Failed to retrieve minimum clocks.
Confirmed, I started getting this error and entire black screen from 4.19.4 and all latest kernels
product: Lexa PRO [Radeon RX 550/550X]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
same issue with an HP Notebook - 17-ca0710nd Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98e4] (rev da) Kernel 4.19.12 need to use amdgpu.dc=0 OR video=1024x768M@60m to get the build in screen to work still an issue in 4.19.15 4.20.2 5.0-rc1 (I didn't check intermediate versions since comment 21 above https://bugzilla.kernel.org/show_bug.cgi?id=200695#c21 ) [drm] Cannot find any crtc or sizes followed by screen going blank, monitor turning off, no display detected still an issue in 4.19.25 4.20.12 5.0-rc8 (I didn't check intermediate versions since comment 24 above) hi on request of the people over at SUSE I tested my Tumbleweed install with the Kernel of the Day. first did: BOOT_IMAGE=/boot/vmlinuz-5.0.0-rc8-1.g4ddf057-default root=UUID=fce8b6dd-98d6-4b86-a5d9-2812f3c1e242 splash=silent resume=/dev/disk/by-uuid/c9f35ed5-bdbe-413a-801b-7df1c8a64145 quiet and it booted with a screen then did: BOOT_IMAGE=/boot/vmlinuz-5.0.0-rc8-1.g4ddf057-default root=UUID=fce8b6dd-98d6-4b86-a5d9-2812f3c1e242 splash=silent resume=/dev/disk/by-uuid/c9f35ed5-bdbe-413a-801b-7df1c8a64145 quiet amdgpu.dc=1 and it also booted with a screen so on this specifik hardware HP Notebook - 17-ca0710nd, Advanced Micro Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] it seems to be fixed My RX550 is successfully booted up with 4.20 kernels and newer, but my OS freezes every minute for 30 secs if I open youtube or play game, and anything doesn't work upon freeze - I couldn't move mouse or press REISUB. Entire system is stuck. I downgraded kernel to 4.14 - and it works like a charm without lags/freezes/stucks. And it seems 4.14 doesn't have AMD DC (amdgpu.dc=0 doesn't work for me) (In reply to Alex van Kaam from comment #26) > hi > > on request of the people over at SUSE I tested my Tumbleweed install with > the Kernel of the Day. > > first did: BOOT_IMAGE=/boot/vmlinuz-5.0.0-rc8-1.g4ddf057-default > root=UUID=fce8b6dd-98d6-4b86-a5d9-2812f3c1e242 splash=silent > resume=/dev/disk/by-uuid/c9f35ed5-bdbe-413a-801b-7df1c8a64145 quiet > > and it booted with a screen > > then did: BOOT_IMAGE=/boot/vmlinuz-5.0.0-rc8-1.g4ddf057-default > root=UUID=fce8b6dd-98d6-4b86-a5d9-2812f3c1e242 splash=silent > resume=/dev/disk/by-uuid/c9f35ed5-bdbe-413a-801b-7df1c8a64145 quiet > amdgpu.dc=1 > > and it also booted with a screen > > so on this specifik hardware HP Notebook - 17-ca0710nd, Advanced Micro > Devices, Inc. [AMD/ATI] Stoney [Radeon R2/R3/R4/R5 Graphics] it seems to be > fixed Why do you set amdgpu.dc=1? Isn't this the default value anyway? From what I understood, you tested exactly the same thing. amdgpu.dc=1 set by default for all kernels since 4.17 and it seems that amdgpu.dc=0 is deprecated attribute as new Display core is replacement for old amdgpu code 5.0 version is works same as 4.20 with a lot of warnings and messages in dmesg https://bugzilla.kernel.org/show_bug.cgi?id=201957 >
> Why do you set amdgpu.dc=1? Isn't this the default value anyway?
> From what I understood, you tested exactly the same thing.
just to be sure, nothing more. I did not know for 100% it was the default now and is the default in the opensuse daily kernel.
Created attachment 282101 [details]
dmesg logs after booting to X with displays connected to both DVI and HDMI
Good news: the bug only seems to affect the DVI output from my card.
I managed to connect another screen via an HDMI port and it works ok with amdgpu=1 (both screens have kernel messages mirrored until about 5 seconds into boot, when the DVI screen turns off - the HDMI screen stays on and I can log into X).
Attached are 7 dmesg logs, with amdgpu=1 unless specified.
All works perfectly in 5.3, no black screen and no any errors (Polaris RX540) This probably has reports from multiple separate bugs resulting in loss of output. It's easy to mix up with other bugs if you have only one monitor and your system is silently stuck on invisible login screen or if entire driver failed. In my particular case, it's triggered by UEFI boot mode: when amdgpu is loaded it shuts down the output port that was used by UEFI BIOS during the boot, so I stuck with my secondary monitor only. It behaves as if monitor was yanked out the port. Reconnecting it physically does nothing. It works fine if CSM is used in motherboard but my new MB forces full UEFI, so I'm stuck with that. Does anyone get "[drm] Cannot find any crtc or sizes" with multiple monitors ? I'm getting this issue with only one monitor via HDMI connected. It was gone some kernel versions before and came back with 5.4.x it seems. Still getting this issue: with monitor connected via DVI, screen goes blank and monitor turns off a few seconds into boot (is fine before that, in motherboard power on display and grub and start of boot process). With same monitor connected via HDMI and HDMI/DVI adaptor there are no issues. Kernel versions tested today, with behaviour as above: - 4.19.89 - 5.3.16 - 5.4.3 (partial test, forgot to test with DVI; HDMI works fine) - 5.5-rc2 I've run into this issue now when moving a Radeon RX 580 card into a Intel Z77 system with a 2500K. The same card has worked perfectly with DC enabled on AMD B350 and X470 motherboards. It is only this Asus P8Z77-V LX motherboard that has demonstrated the issue. I tested with both Legacy and UEFI boot modes with the same result. The testing has been done with both Linux 5.3 and 5.4 kernels from Ubuntu. As several comments mention different behavior between the different connector types I found a spare DVI cable and tried connecting to the same monitor via that input and it worked! So once logged in I plugged in the HDMI cable and switched input and it worked just fine (monitor was glitchy at first but once I disabled DVI output in monitor configuration it worked perfectly). To confirm this was indeed working with the DC code path I tested audio over HDMI and it worked just fine (and does not work with DC disabled, as expected). So I rebooted with both the DVI and HDMI in and the output over HDMI worked all through boot. I then disconnected the DVI and rebooted and the HDMI continues to work so far. Unfortunately I don't have a displayport monitor around to test if that behaves as weirdly. Just to be clear, I had tried to make this monitor work for a couple weeks over HDMI and the only thing that worked was amdgpu.dc=0 until I plugged in the DVI connector so this was not a one off fail. I forgot to mention before that when the HDMI output was not working with DC my Xorg logs also showed no monitor detected. I can attach the two different Xorg logs if that helps? (In reply to Adam from comment #37) > As several comments mention different behavior between the different > connector types I found a spare DVI cable and tried connecting to the same > monitor via that input and it worked! So once logged in I plugged in the > HDMI cable and switched input and it worked just fine (monitor was glitchy > at first but once I disabled DVI output in monitor configuration it worked > perfectly). To confirm this was indeed working with the DC code path I > tested audio over HDMI and it worked just fine (and does not work with DC > disabled, as expected). So I rebooted with both the DVI and HDMI in and the > output over HDMI worked all through boot. I then disconnected the DVI and > rebooted and the HDMI continues to work so far. Unfortunately I don't have a > displayport monitor around to test if that behaves as weirdly. Just to be > clear, I had tried to make this monitor work for a couple weeks over HDMI > and the only thing that worked was amdgpu.dc=0 until I plugged in the DVI > connector so this was not a one off fail. > > I forgot to mention before that when the HDMI output was not working with DC > my Xorg logs also showed no monitor detected. > > I can attach the two different Xorg logs if that helps? I am following this thread since it has created. My R9 380 with DVI-I output is connected to my VGA monitor with DVI-I male to VGA female adapter and I am getting black/blank screen while booting with amdgpu.dc=1 which is default since kernel 4.17. It is fine with amdgpu.dc=0. I can provide logs too if any kernel contributor wants to investigate the problem any further. I experience the same problem with a R9 380X. Still can't boot into any distro with DC enabled. My monitor doesn't offer any other connection than VGA input, my GPU only offers DVI output, so i use a VGA to DVI adapter. It's disappointing this issue still continues after so many kernel revisions. I'll watch this thread hoping for good news. (In reply to Meric07 from comment #39) > I experience the same problem with a R9 380X. Still can't boot into any > distro with DC enabled. My monitor doesn't offer any other connection than > VGA input, my GPU only offers DVI output, so i use a VGA to DVI adapter. > It's disappointing this issue still continues after so many kernel > revisions. I'll watch this thread hoping for good news. I don't think this is related to the original report. Does passing amdgpu.dc=0 on the kernel command line in grub fix the issue? (In reply to Alex Deucher from comment #40) > (In reply to Meric07 from comment #39) > > I experience the same problem with a R9 380X. Still can't boot into any > > distro with DC enabled. My monitor doesn't offer any other connection than > > VGA input, my GPU only offers DVI output, so i use a VGA to DVI adapter. > > It's disappointing this issue still continues after so many kernel > > revisions. I'll watch this thread hoping for good news. > > I don't think this is related to the original report. Does passing > amdgpu.dc=0 on the kernel command line in grub fix the issue? Yes, with dc=0 the system starts up normally. I'm not sure if this is a fix or a workaround though. (In reply to Meric07 from comment #41) > (In reply to Alex Deucher from comment #40) > > (In reply to Meric07 from comment #39) > > > I experience the same problem with a R9 380X. Still can't boot into any > > > distro with DC enabled. My monitor doesn't offer any other connection > than > > > VGA input, my GPU only offers DVI output, so i use a VGA to DVI adapter. > > > It's disappointing this issue still continues after so many kernel > > > revisions. I'll watch this thread hoping for good news. > > > > I don't think this is related to the original report. Does passing > > amdgpu.dc=0 on the kernel command line in grub fix the issue? > Yes, with dc=0 the system starts up normally. I'm not sure if this is a fix > or a workaround though. It has to be just a workaround. There is a problem with AMDGPU Display Code or something that utilizes it but since even console goes black, problem is something within kernel. Setting "amdgpu.dc=0" basically disable AMDGPU Display Code so we do not benefit from it. (In reply to babgozd from comment #38) > I am following this thread since it has created. My R9 380 with DVI-I output > is connected to my VGA monitor with DVI-I male to VGA female adapter and I > am getting black/blank screen while booting with amdgpu.dc=1 which is > default since kernel 4.17. It is fine with amdgpu.dc=0. I can provide logs > too if any kernel contributor wants to investigate the problem any further. To me it seems, as if the DVI-I port is somehow misdetected/misconfigured as a DVI-D port. With dc=0 I find the files "card1-DVI-I-1" and "card1-DVI-D-1" in "/sys/class/drm/card1". With dc=1 there are "card1-DVI-D-1" and "card1-DVI-D-2". It appears to correctly negotiate resolution and refresh rate over DDC, but because it only outputs the digital signal, a display connected over VGA receives nothing and turns off. There is no support for analog displays in DC. (In reply to Alex Deucher from comment #44) > There is no support for analog displays in DC. Which was a horrible decision. Luckily, decent DP->VGA adapters actually work even with proper ≥1080p ≥85fps CRTs. But it still has some problems with DVI ports and/or HDMI->DVI adapters or certain monitors. Seems my issue with no output at all in UEFI mode was a separate one and has to do with BIOS's signing DRM nonsense. Hacked VBIOS works fine with UEFI… if you don't count just initializing sole DP port at boot… which has backup/special-purpose CRT connected and not real primary display in DVI. But in either mode with dc=1 it just doesn't want to acknowledge my BENQ G2320HDB connected with a simple HDMI->DVI adapter, as if it doesn't exist, and there is no second DVI port. Maybe HDCP shenanigans ? There is also another issue with amdgpu failing to initialize and hanging display output at boot, if 64-bit PCIe addressing ("above 4G decoding") is enabled, unless pcie=nocrs specified for kernel but that's a third issue and likely has to do with crappy BIOS of my new unlicensed LGA2011 motherboard (even though it works fine on Windows): amdgpu: [powerplay] failed to send message 254 ret is 0 amdgpu: [powerplay] SMU load firmware failed amdgpu: [powerplay] fw load failed smu firmware loading failed amdgpu 0000:03:00.0: amdgpu_device_ip_init failed amdgpu 0000:03:00.0: Fatal error during GPU init [drm] amdgpu: finishing device. I have the same problem. Kernel version 5.4.0-39-generic on ubuntu. Graphics card: AMD RX580. Motherboard: ASRock Z87. Monitor: Compaq x22LED (Using HDMI to DVI-D adapter connected) If you use the default configuration the startup will quickly black screen, if you use amdgpu.dc=0 it will display normally. I'll post the demsg that provides when the dc is 0 and 1 to help diagnose the problem. Created attachment 289905 [details]
5.4.0 amdgpu.dc=0 drm.debug=0xe
Created attachment 289907 [details]
5.4.0 amdgpu.dc=1 drm.debug=0xe
Had the same problem on a Dell Latitude 5410 with a Lexa Radeon E9171 MCM and linux kernel 4.19.0.13 (Debian Buster), Read it's still and issue with 4.20-rc5 *Now operational with a kernel 5.11* default options. The Lexa serie is in the RX540, RX550 550X et RX550X range, (sold 80$ en 2017) DRM was loading followed by ACPI error, field tmpb at bot offset exceed size of target buffer... method parse/execution failed SB.PCIO.GFX0.ATRM, AE_AML_BUFFER_LIMIT Error message was : Error DC : number of connector is zero |
Created attachment 277633 [details] dmesg after boot with amdgpu.dc=1 amdgpu.dc_log=1 drm.debug=6 When amdgpu.dc=1 is initialized at boot, the console goes blank as it thinks all displays are disconnected. Xorg is not able to enable the display either. With amdgpu.dc=0 all is fine. Tried with various (mostly Debian) kernels from 4.16 through 4.18~rc4, all have the issue. I built a 4.18~rc7 from kernel.org to rule out Debian patches being the issue and will provide logs.