Bug 206309

Summary: Experimental amdgpu w/ Dell E6540 with HD 8790M (MARS XTX), massive performance improvement after ACPI suspend
Product: Drivers Reporter: Travis Juntara (changhaitravis)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED WILL_NOT_FIX    
Severity: low CC: jerbear3.14159
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.4.12-200.fc31.x86_64 Subsystem:
Regression: No Bisected commit-id:

Description Travis Juntara 2020-01-26 01:41:28 UTC
Hi, 
I have a Dell E6540 with HD 8790M (MARS XTX)
I also have the `radeon.si_support=0 amdgpu.si_support=1` kernel flags set to enable experimental amdgpu support.

I run games such as Insurgency, Tomb Raider (2013), Talos Principle, and Half-Life 2 Episode 1 from it with the `DRI_PRIME=1` env variable.

On all of these games, they seem to perform underwhelmingly (under 30fps for most games), until I suspend the computer and wake it from suspension. (This even works while in game).

Here are some things I've noticed:
1. setting the `amdgpu.dpm=0` kernel flag makes it so that suspending the computer and waking it up DOES NOT double/triple the gaming graphics performance like it usually does.
2. setting `/sys/class/drm/card1/device/power_dpm_force_performance_level` or `/sys/class/drm/card1/device/power_dpm_state` before having suspended the computer does not noticeably impact performance.
3. If I reboot the computer after having suspended it, instead of shutting down or hibernating, it seems to at least sometime hold on to its gaming graphics performance so that I don't need to suspend the computer once more while playing a game.
4. This suspend-wakeup performance improvement does not seem reproducible on RadeonSI (`radeon.si_support=1`), although I can't tell if it's lower performance or higher, since it'll perform well in Half-Life 2, but closer to the reduced (pre-suspend) performance state in many other games.
5. Wayland or X11 makes no difference.
6. Games using the Intel IGP are not affected by suspend + wake.

In the below Log's dmesg, I suspend and wake up the computer at ~1588

LOG:
[travis@Claes ~]$ dmesg | grep -E -i "dpm|amdgpu|radeon"
[    0.000000] Command line: BOOT_IMAGE=(hd0,msdos2)/vmlinuz-5.4.12-200.fc31.x86_64 root=/dev/mapper/altarus-root ro resume=/dev/mapper/altarus-swap rd.lvm.lv=altarus/root rd.lvm.lv=altarus/swap rhgb quiet radeon.si_support=0 amdgpu.si_support=1 zswap.enabled=1 zswap.compressor=lz4 zswap.zpool=z3fold amdgpu.dpm=1 amdgpu.dc=1
[    0.129882] Kernel command line: BOOT_IMAGE=(hd0,msdos2)/vmlinuz-5.4.12-200.fc31.x86_64 root=/dev/mapper/altarus-root ro resume=/dev/mapper/altarus-swap rd.lvm.lv=altarus/root rd.lvm.lv=altarus/swap rhgb quiet radeon.si_support=0 amdgpu.si_support=1 zswap.enabled=1 zswap.compressor=lz4 zswap.zpool=z3fold amdgpu.dpm=1 amdgpu.dc=1
[    3.018497] [drm] radeon kernel modesetting enabled.
[    3.018785] radeon 0000:01:00.0: SI support disabled by module param
[    3.271878] [drm] amdgpu kernel modesetting enabled.
[    3.272148] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
[    3.272149] amdgpu 0000:01:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xf7c00000 -> 0xf7c3ffff
[    3.272158] amdgpu 0000:01:00.0: enabling device (0000 -> 0003)
[    3.272841] [drm] add ip block number 5 <si_dpm>
[    3.272843] amdgpu 0000:01:00.0: kfd not supported on this ASIC
[    3.306053] amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[    3.306054] amdgpu 0000:01:00.0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
[    3.306399] [drm] amdgpu: 2048M of VRAM memory ready
[    3.306401] [drm] amdgpu: 3072M of GTT memory ready.
[    3.306899] amdgpu 0000:01:00.0: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[    3.307310] [drm] amdgpu: dpm initialized
[    3.307352] [drm] AMDGPU Display Connectors
[    3.878634] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:01:00.0 on minor 1
[   12.958605] amdgpu 0000:01:00.0: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[   27.123524] amdgpu 0000:01:00.0: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[ 1588.479472] amdgpu 0000:01:00.0: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[ 1590.890634] amdgpu 0000:01:00.0: PCIE GART of 1024M enabled (table at 0x000000F400000000).
[ 1676.994027] amdgpu 0000:01:00.0: PCIE GART of 1024M enabled (table at 0x000000F400000000).

[travis@Claes ~]$ DRI_PRIME=1 glxinfo -B
name of display: :1
display: :1  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: X.Org (0x1002)
    Device: AMD Radeon HD 8790M (OLAND, DRM 3.35.0, 5.4.12-200.fc31.x86_64, LLVM 9.0.0) (0x6606)
    Version: 19.2.8
    Accelerated: yes
    Video memory: 2048MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 4.5
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 2039 MB, largest block: 2039 MB
    VBO free aux. memory - total: 3055 MB, largest block: 3055 MB
    Texture free memory - total: 2039 MB, largest block: 2039 MB
    Texture free aux. memory - total: 3055 MB, largest block: 3055 MB
    Renderbuffer free memory - total: 2039 MB, largest block: 2039 MB
    Renderbuffer free aux. memory - total: 3055 MB, largest block: 3055 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 2048 MB
    Total available memory: 5120 MB
    Currently available dedicated video memory: 2039 MB
OpenGL vendor string: X.Org
OpenGL renderer string: AMD Radeon HD 8790M (OLAND, DRM 3.35.0, 5.4.12-200.fc31.x86_64, LLVM 9.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.2.8
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.2.8
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.2.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

[travis@Claes ~]$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Mars XTX [Radeon HD 8790M]
Comment 1 Travis Juntara 2020-02-11 02:03:02 UTC
So I just got a Dell E-series docking station, and I've just noticed that if I boot while docked, I get the full performance. 

I'll test sometime to see if the AC power brick makes a difference, and if it does I'll close this out and chalk it up to a poor 3rd party power supply.
Comment 2 jerbear3.14159 2024-06-15 13:31:56 UTC
Hi, did you ever figure anything else out with this bug? I have the same problem on a Dell Precision M2800 which has pretty much identical specs:

~$ DRI_PRIME=1 glxinfo -B
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: AMD (0x1002)
    Device: AMD Radeon HD 8790M (oland, LLVM 15.0.7, DRM 3.54, 6.5.0-41-generic) (0x6606)
    Version: 23.2.1
    Accelerated: yes
    Video memory: 2048MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.6
    Max compat profile version: 4.6
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
Memory info (GL_ATI_meminfo):
    VBO free memory - total: 1667 MB, largest block: 1667 MB
    VBO free aux. memory - total: 3882 MB, largest block: 3882 MB
    Texture free memory - total: 1667 MB, largest block: 1667 MB
    Texture free aux. memory - total: 3882 MB, largest block: 3882 MB
    Renderbuffer free memory - total: 1667 MB, largest block: 1667 MB
    Renderbuffer free aux. memory - total: 3882 MB, largest block: 3882 MB
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 2048 MB
    Total available memory: 5966 MB
    Currently available dedicated video memory: 1667 MB
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon HD 8790M (oland, LLVM 15.0.7, DRM 3.54, 6.5.0-41-generic)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 23.2.1-1ubuntu3.1
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.6 (Compatibility Profile) Mesa 23.2.1-1ubuntu3.1
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

I also set `radeon.si_support=0 amdgpu.si_support=1` in order to force the amdgpu driver.

The interesting thing is that the bug happens specifically when I boot up with the charger plugged in (AC power brick; not a docking station), and when I unplug it, the GPU clock speeds tank to 300MHz memory/300MHz shader. I've tried forcing the performance level back up to "high" which gives me this error:

echo -n high > /sys/class/drm/card1/device/power_dpm_force_performance_level
bash: echo: write error: Invalid argument

up until I reboot again, and the cycle restarts. It's pretty annoying how careful I have to be about staying plugged in. Any help is appreciated!
Comment 3 Travis Juntara 2024-06-15 18:28:45 UTC
Pretty sure this is due to chipset or bios. After upgrading to i7 4810MQ, from i5 4310M standby mode no longer resets the power throttling, but rebooting does. Also I believe I've seen similar behavior reported by windows users.
Comment 4 jerbear3.14159 2024-06-15 18:45:15 UTC
Thanks for the quick reply. Guess I'll have to dust off ye ol' Windows installer and confirm once and for all who's to blame here. I'll post my findings here probably in a few weeks when I can. Thanks again.
Comment 5 jerbear3.14159 2024-09-08 03:54:05 UTC
It's a Linux bug!

I finally set aside a day to waste on Windows installation to check whether it behaves the same way. Windows 10 repeatedly crashed and bricked itself because of course it did. It works perfectly on Windows 8.1 with the official graphics driver on Dell's website.
https://dl.dell.com/FOLDER02868143M/5/AMD-FirePro-Graphics-Driver_CWRN6_WIN_14.502.1005_A02_02.EXE

Memory Clock (MHz)/Shader Clock (MHz):
Idle: 150/300
Battery: 600/400
AC: 1000/900

On Xubuntu 23.10 it's more like:
Idle: 150/300
Battery: 150/300
AC: 1000/900 (but it drops down to 150/300 after unplugging, and requires a reboot to bring it back up)

I've gone through 2 official 130W Dell chargers and 2 90W Targus universal chargers. Linux exhibits the power management bug regardless. I have never tried a docking station.

uname -a gives:
Linux 6.5.0-44-generic #44-Ubuntu SMP PREEMPT_DYNAMIC Fri Jun  7 15:10:09 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

The only reprieve I've found is to follow these steps:
1. Boot with the laptop plugged in. If using a 90W charger and this is a cold start (as opposed to a reboot), need to reboot as soon as it finishes booting.
2. echo -n high > /sys/class/drm/card1/device/power_dpm_force_performance_level
3. Pray
4. Unplug
5. Now for some reason, there's a small chance that the clock speeds stick high and don't drop. If not, start over from step 1. If yes, then typically (but not always) I can repeatedly unplug+plug safely up until it's time to shut down.

Problems I've encountered with this method include:
- Low success rate
- Runs hotter/harms battery life
- If I'm working unplugged, I need to manually lower the clock speeds down to 600/400. If I forget and leave it at 1000/900 and try to do anything graphically intense, the battery can't provide enough power and the entire laptop shuts down immediately. You can see that Windows does this automatically when unplugged.

All in all, this is a nightmare for a game developer who needs maximum GPU performance. I live in constant fear of bumping the power cord out or needing to switch rooms. I'm happy to provide any additional information, but glancing at the amdgpu source code it seems a little over my head!
Comment 6 Travis Juntara 2024-09-08 05:40:09 UTC
I ended up upgrading to a slightly newer dell business laptop with an AMD GPU (with none of these issues) and getting rid of my Latitude E6540.

As such, I can't really help test anymore. I also can't remember who set this ticket status to be "Resolved - WILL_NOT_FIX", but hesitate to re-open in case it wasn't me. If you want to pursue this further, you'll want to open up a new ticket.