Bug 208839 - AMDGPU: DPM is not enabled after hibernate and resume for CIK/Hawaii GPUs (e.g R9 390)
Summary: AMDGPU: DPM is not enabled after hibernate and resume for CIK/Hawaii GPUs (e....
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-06 16:59 UTC by sandy.8925
Modified: 2020-08-30 07:20 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Resume after suspend ftrace (172.49 KB, text/plain)
2020-08-06 16:59 UTC, sandy.8925
Details
Resume after hibernate ftrace (127.04 KB, text/plain)
2020-08-06 17:00 UTC, sandy.8925
Details
dmesg on Arch Linux 5.8 kernel, after hibernate and resume (103.16 KB, text/plain)
2020-08-06 18:48 UTC, sandy.8925
Details

Description sandy.8925 2020-08-06 16:59:58 UTC
Created attachment 290801 [details]
Resume after suspend ftrace

After hibernating and resuming, DPM is not enabled. This remains the case even if you test hibernate using the steps here: https://www.kernel.org/doc/html/latest/power/basic-pm-debugging.html

I debugged the problem, and figured out that in the file hardwaremanager.c, in the function, phm_enable_dynamic_state_management(), the check 'if (!hwmgr->pp_one_vf && smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev) && adev->in_suspend)' returns true for the hibernate case, and false for the suspend case.

This means that for the hibernate case, the AMDGPU driver doesn't enable DPM (even though it should) and simply returns from that function. In the suspend case, it goes ahead and enables DPM, even though it doesn't need to.

I debugged further, and found out that in the case of suspend, for the CIK/Hawaii GPUs, smum_is_dpm_running(hwmgr) returns false, while in the case of hibernate, smum_is_dpm_running(hwmgr) returns true.

For CIK, the function ci_is_smc_ram_running() is ultimately used to determine if DPM is currentle enabled or not, and this seems to provide the wrong answer.

I've attached ftrace traces for the resume after suspend and resume after hibernate cases (with some functions excluded for brevity).
Comment 1 sandy.8925 2020-08-06 17:00:37 UTC
Created attachment 290803 [details]
Resume after hibernate ftrace
Comment 2 Alex Deucher 2020-08-06 17:42:03 UTC
Please attach your dmesg output.  

ci_is_smc_ram_running() is correct.  With hibernate, everything happens twice.  On resume from hibernate, the system loads, then suspends that state, and then resumes the original hibernated state.  For suspend to ram, there is just one suspend and one resume.  For some reason the SMU was not properly shutdown on the first suspend during the resume from disk cycle, so it shows up as running.
Comment 3 sandy.8925 2020-08-06 18:39:19 UTC
That doesn't seem to be the case, based on the traces I took. AFAICT the kernel just calls the freeze function and then the restore function.
Comment 4 sandy.8925 2020-08-06 18:48:54 UTC
Created attachment 290805 [details]
dmesg on Arch Linux 5.8 kernel, after hibernate and resume
Comment 5 Alex Deucher 2020-08-06 18:59:16 UTC
(In reply to sandy.8925 from comment #3)
> That doesn't seem to be the case, based on the traces I took. AFAICT the
> kernel just calls the freeze function and then the restore function.

It's not necessarily the same kernel.  See:
https://www.kernel.org/doc/html/latest/driver-api/pm/devices.html
Comment 6 sandy.8925 2020-08-06 19:04:01 UTC
It is the same kernel in this case. I didn't install any new kernel before hibernating, therefore the kernel that entered the hibernate is the same kernel that resumed from hibernation.
Comment 7 Alex Deucher 2020-08-06 19:10:02 UTC
It has nothing to do with kernels that are installed.  From the link:

"Although in principle the image might be loaded into memory and the pre-hibernation memory contents restored by the boot loader, in practice this can’t be done because boot loaders aren’t smart enough and there is no established protocol for passing the necessary information. So instead, the boot loader loads a fresh instance of the kernel, called “the restore kernel”, into memory and passes control to it in the usual way. Then the restore kernel reads the system image, restores the pre-hibernation memory contents, and passes control to the image kernel. Thus two different kernel instances are involved in resuming from hibernation. In fact, the restore kernel may be completely different from the image kernel: a different configuration and even a different version."

Anyway, your patch looks correct.
Comment 8 sandy.8925 2020-08-30 07:20:13 UTC
Resolving, since the patch was merged.

Note You need to log in before you can comment on or make changes to this bug.