Bug 206351

Summary: RX 5600 XT Not Correctly Recognized, Max Memory Frequency Below Where it Should Be
Product: Drivers Reporter: Matt McDonald (gardotd426)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.5.0, 5.4.14 Subsystem:
Regression: No Bisected commit-id:
Attachments: glxinfo output

Description Matt McDonald 2020-01-30 09:24:54 UTC
Created attachment 287033 [details]
glxinfo output

NOTE: I hope DRI- non-intel is the correct component for this bug. I wasn't sure whether to file it under that, or under console/framebuffers

My 5600 XT is not correctly recognized as such by my Arch Linux system. Some applications and system utilities report it as "UNKNOWN AMD GPU," Some report it just as "RADV/NAVI10 GPU," "AMD NAVI10" by glxinfo, and others, such as inxi, report it as a "RADEON RX 5700 / 5700 XT." This occurs regardless of vBIOS version, on both performance and silent vBIOS, also on both the original vBIOS and the "upgraded" one AMD pushed out right before launch. Also, the max memory frequency on this card is supposed to be 1500MHz, which it shows in Windows, but on Linux the memory range is shown as 625-930MHz (and that's with amdgpu.ppfeaturemask set). There are also multiple rendering issues, but I've filed a report with mesa for those.


I know there's supposed to be a new firmware release that's supposed to fix the performance issues with the new vBIOS, but this isn't a performance issue, and it's present even with the original vBIOS (that had no reported performance issues). It seems this is a matter of the 5600 XT's compatibility  not properly being built into the kernel yet (which is to be expected since it's so new). However I will say that with my 3200G Ryzen processor that I got within a month of it's launch, it was properly recognized and it's frequency limits were also properly recognized and implemented. 

inxi -Gxxz:
Graphics:
  Device-1: AMD Navi 10 [Radeon RX 5700 / 5700 XT] vendor: Sapphire Limited 
  driver: amdgpu v: kernel bus ID: 09:00.0 chip ID: 1002:731f 
  Display: x11 server: X.Org 1.20.7 driver: amdgpu compositor: kwin_x11 
  resolution: 1366x768~60Hz, 1920x1080~60Hz 
  OpenGL: renderer: AMD NAVI10 (DRM 3.36.0 5.5.0-3-tkg-pds LLVM 9.0.1) 
  v: 4.6 Mesa 20.0.0-devel (git-6e1411c9e8) direct render: Yes

sudo cat /sys/class/drm/card0/device/pp_od_clk_voltage:
 OD_SCLK:
0: 800Mhz
1: 1780Mhz
OD_MCLK:
1: 900MHz
OD_VDDC_CURVE:
0: 800MHz @ 0mV
1: 1290MHz @ 0mV
2: 1780MHz @ 0mV
OD_RANGE:
SCLK:     800Mhz       1820Mhz
MCLK:     625Mhz        930Mhz
VDDC_CURVE_SCLK[0]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[0]:     800mV        1050mV
VDDC_CURVE_SCLK[1]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[1]:     800mV        1050mV
VDDC_CURVE_SCLK[2]:     800Mhz       1820Mhz
VDDC_CURVE_VOLT[2]:     800mV        1050mV


vulkaninfo | grep -i "AMD GPU":
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
		GPU id 	: 0 (Unknown AMD GPU)
		GPU id 	: 1 (Unknown AMD GPU)
GPU id : 0 (Unknown AMD GPU):
GPU id : 1 (Unknown AMD GPU):
GPU id : 0 (Unknown AMD GPU):
GPU id : 1 (Unknown AMD GPU):
			Unknown AMD GPU (ID: 0)
		Unknown AMD GPU (ID: 0)
			Unknown AMD GPU (ID: 0)
			Unknown AMD GPU (ID: 0)
		Unknown AMD GPU (ID: 0)
			Unknown AMD GPU (ID: 0)
	deviceName     = Unknown AMD GPU
	deviceName     = Unknown AMD GPU


sudo pacman -Q linux-firmware:
linux-firmware 20191220.6871bff-1


Kernels: issue persists across at least 5 different kernels. linux-zen-5.4.14, linux-fsync-5.4.14, linux-5.4.14, linux-amd-staging-drm-next-git-5.5, linux-55-tkg-pds (TK-Glitch custom kernel). I've also attached the output of glxinfo, as it was far to long to include in this post and it still be readable. I'm happy to provide any other information needed, as well as help with testing patches or doing any debugging necessary. Whatever I can do to help.
Comment 1 Matt McDonald 2020-01-30 09:31:14 UTC
Actually, the new vBIOS memory clock should be 1750MHz, not 1500. Either way it's far to low on Linux, and this was not an issue with the Polaris cards, they correctly set memory frequencies in line with the vBIOS.
Comment 2 Alex Deucher 2020-01-30 18:12:39 UTC
The only thing missing is the marketing strings.  See this libdrm merge request:
https://gitlab.freedesktop.org/mesa/drm/merge_requests/44
that will provide a proper string for mesa libraries like OpenGL or vulkan.
Tools like lspci and inxi use the pci ids database (http://pci-ids.ucw.cz/) to get their strings.  However the strings on that site are tied to the device ids.  The marketing names for AMD parts vary based on the device id and the revision id as well as the subsystem ids in some cases.  the pci ids database does not provide a way to provide a string based on a combination of ids.

If you want the newer smc firmware, it's available here:
https://people.freedesktop.org/~agd5f/radeon_ucode/navi10/new/navi10_smc.bin
it will be upstreamed to linux-firmware once it's finished it's internal QA cycle.

As to the clocks, they are provided in the vbios.  That is where the driver gets them.  Can you provide more information about what OEM product this is?
Comment 3 Matt McDonald 2020-01-30 18:52:51 UTC
This is the card and its specifications from the official website: https://www.sapphiretech.com/en/consumer/pulse-radeon-rx-5600-xt-6g-gddr6#Specification

As you'll see, memory clock is supposed to be 14Gbps on the new vBIOS, which is equal to 1750MHz, which is exactly what GPU-z reports on Windows (as does every other piece of GPU monitoring software like MSI Afterburner and Radeon Wattman). And if you look at the specs for the card with the old performance vBIOS, it was 12Gbps effective which is 1500MHz, and again GPU-z and every other windows utility correctly reports this. Also, Linux correctly reports memory clock frequency for older cards like the RX 580, my XFX GTS XXX model was correctly reported at memory clock boosting to 1750 MHz

However with the 5600 XT, even with overclocking enabled, the powerplay tables range is only 930MHz at the maximum end.
Comment 4 Alex Deucher 2020-01-30 21:21:32 UTC
(In reply to Matt McDonald from comment #3)
> This is the card and its specifications from the official website:
> https://www.sapphiretech.com/en/consumer/pulse-radeon-rx-5600-xt-6g-
> gddr6#Specification
> 

The 1750 Mhz clock on the webpage is the engine boost clock which is inline with the settings for your card.

OD_SCLK:
1: 1780Mhz

With respect to the memory clock, The default clock on your board is 900Mhz and it has a 192 bit interface.  GDDR6 sends data on the rising and falling edge of the wave, so you multiply it by two.  Hence:

OD_MCLK:
1: 900MHz

900 Mhz * 192 bit * 2 = 345,600 Mbps = 337.5 Gbps

Which is right in line with the bandwidth for your card.
Comment 5 Matt McDonald 2020-01-31 01:46:38 UTC
I'm not referring to the 1750MHz boost clock. I'm referring to the 14Gbps Memory clock on the same page. Which is 1750MHz (1750Mhz * 8 octopumped GDDR6 = 14Gbps or 14GT/s, which is the stated memory frequency of the card) . Which is how it's reported in Windows as well. Like explained on here:



https://forums.tomshardware.com/threads/effective-memory-clock-speed-confusions.3518637/ 



and here:
https://www.techpowerup.com/forums/threads/how-to-calculate-gddr6-speed-from-gpu-z.250747/



Literally everything I can find has said to calculate the GDDR6 clock frequency as DOUBLE(DoubleDataRate, so x2) rate and Quad(x4) pumped, so 1750 * 2 * 4 = 14000, or 14Gbps. The specs for the card itself show it's memory frequency at 14Gbps, which fits everything I've seen. Windows reports the Memory clock (no, not the Boost clock, they're listed separately) as 1750MHz which also lines up. 




Am I missing something? If I am, I apologize but literally everything I can find says otherwise. If I am missing something, how does 14Gbps (which is the official memory clock frequency of the card) end up being 900MHz?
Comment 6 Alex Deucher 2020-01-31 18:00:35 UTC
We expose the actual memory controller clock rate in Linux, not the effective memory clock of the DRAMs.  To translate it, it follows the following formula:

Clock conversion (Mhz):
HBM: effective_memory_clock = memory_controller_clock * 1
G5:  effective_memory_clock = memory_controller_clock * 1
G6:  effective_memory_clock = memory_controller_clock * 2

DRAM data rate (MT/s):
HBM: effective_memory_clock * 2 = data_rate
G5:  effective_memory_clock * 4 = data_rate
G6:  effective_memory_clock * 8 = data_rate

Bandwidth (MB/s):
data_rate * vram_bit_width / 8 = memory_bandwidth

Some examples:
G5 on RX460:
memory_controller_clock = 1750 Mhz
effective_memory_clock = 1750 Mhz * 1 = 1750 Mhz
data rate = 1750 * 4 = 7000 MT/s
memory_bandwidth = 7000 * 128 bits / 8 = 112000 MB/s

G6 on RX5600:
memory_controller_clock = 900 Mhz
effective_memory_clock = 900 Mhz * 2 = 1800 Mhz
data rate = 1800 * 8 = 14400 MT/s
memory_bandwidth = 14400 * 192 bits / 8 = 345600 MB/s
Comment 7 Matt McDonald 2020-01-31 18:28:30 UTC
Ahhh I understand now. So, just to recap and make sure everything is covered...

- The new firmware for the new vBIOS is in QA right now but available for download (which I've tested, it seems to work for me).

- The Memory clock is correct, but it's the true clock and not the effective speed

- The marketing strings required for the card to properly be recognized (and recognized as a 5600 XT as opposed to a 5700/5700 XT) for Vulkan/OpenGL/etc. are currently being worked on/added?

- This wasn't mentioned in the initial report but I'm assuming the lack of memory voltage reporting is just an aspect of Navi? For example, Polaris cards provide a core frequency and a voltage for both the memory and core, but with Navi it just provides a vddc curve (which is for some reason listed as 0mV for every state), but I've heard this is just down to Navi. Is this true, or will we at some point be able to see (and potentially set) voltages for memory (and individual core states)?

- And finally, there are some rendering issues in certain games that did not exist with the same drivers with my Polaris card (even with the new firmware). Are these bugs that should be reported to Mesa, or the amdgpu kernel devs?
Comment 8 Alex Deucher 2020-01-31 19:26:41 UTC
(In reply to Matt McDonald from comment #7)
> Ahhh I understand now. So, just to recap and make sure everything is
> covered...
> 
> - The new firmware for the new vBIOS is in QA right now but available for
> download (which I've tested, it seems to work for me).

Yes, I provided the link in comment 2.  It will be upstreamed to linux-firmware soon.

> 
> - The Memory clock is correct, but it's the true clock and not the effective
> speed
> 

Correct.

> - The marketing strings required for the card to properly be recognized (and
> recognized as a 5600 XT as opposed to a 5700/5700 XT) for Vulkan/OpenGL/etc.
> are currently being worked on/added?
> 

The patch to add the strings is here:
https://gitlab.freedesktop.org/mesa/drm/merge_requests/44

> - This wasn't mentioned in the initial report but I'm assuming the lack of
> memory voltage reporting is just an aspect of Navi? For example, Polaris
> cards provide a core frequency and a voltage for both the memory and core,
> but with Navi it just provides a vddc curve (which is for some reason listed
> as 0mV for every state), but I've heard this is just down to Navi. Is this
> true, or will we at some point be able to see (and potentially set) voltages
> for memory (and individual core states)?
> 

vega20 and newer do not have discrete power states like older parts did.  So there are no states to adjust.  You can only adjust the min and max engine clocks and the max memory clock and the vddc curve.  You can adjust the voltage curve today, the only thing that is missing is printing the default voltage levels of the curve.  I'm not sure how to query that as of yet.

> - And finally, there are some rendering issues in certain games that did not
> exist with the same drivers with my Polaris card (even with the new
> firmware). Are these bugs that should be reported to Mesa, or the amdgpu
> kernel devs?

Probably mesa.