Bug 215436
Summary: | admgpu: suspend and resuming from suspend don't work | ||
---|---|---|---|
Product: | Drivers | Reporter: | spasswolf |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alexdeucher, juha-matti.rensujeff, mario.limonciello, superm1 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.16-rc{1-8} | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
full kern.log from boot to reboot
dmesg with linux-5.16-rc8 dmesg with linux-5.16-rc8 while ignoring adev->in_s0ix Possible patch that ignores WA in s0ix Possible patch that ignores WA in s0ix w/ debugging statements (v2) debugging patch corrected debugging patch new debuggin patch |
Description
spasswolf
2021-12-29 12:28:51 UTC
Could you please help add some prints and confirm that the commit doesn't change the code sequence in rn_update_clocks() especially for the following part in suspend and wake sequence: if (display_count == 0 && !hpd_state) { rn_vbios_smu_set_dcn_low_power_state(clk_mgr, DCN_PWR_STATE_LOW_POWER); /* update power state */ clk_mgr_base->clks.pwr_state = DCN_PWR_STATE_LOW_POWER; It would also be appreciated if you could print out the time that was spent in the function rn_update_clocks() with and w/o the commit. I added printk at the beginnig and exit(s) of rn_clocks_update: This is the result when resuming from suspend on commit d020970959169627d59a711769f8c4b87bf5f90c Dec 30 13:55:42 lisa kernel: [ 140.198849] PM: suspend exit Dec 30 13:55:42 lisa kernel: [ 140.208063] amdgpu: entering rn_update_clocks ... Dec 30 13:55:42 lisa kernel: [ 140.209251] amdgpu: leaving rn_update_clocks ... Dec 30 13:55:42 lisa kernel: [ 140.211147] amdgpu: entering rn_update_clocks ... Dec 30 13:55:42 lisa kernel: [ 140.211148] amdgpu: leaving rn_update_clocks ... Dec 30 13:55:42 lisa kernel: [ 140.226830] Generic FE-GE Realtek PHY r8169-0-500:00: attached PHY driver (mii_bus:phy_addr=r8169-0-500:00, irq=MAC) Dec 30 13:55:42 lisa kernel: [ 140.277915] amdgpu: entering rn_update_clocks ... Dec 30 13:55:42 lisa kernel: [ 140.277925] amdgpu: leaving rn_update_clocks ... Dec 30 13:55:42 lisa kernel: [ 140.278233] amdgpu: entering rn_update_clocks ... Dec 30 13:55:42 lisa kernel: [ 140.278234] amdgpu: leaving rn_update_clocks ... In this case the system does not hang, but suspend doesn't seem to work properly, either (Fan and keyboard do not switch off). I probably missed that during the first bisection. Unfortunately I optimized my .config to make bisections faster, but now I cannot reproduce the Hang with commit 2a50edbf10c8c56e930bfb53d8f9f00a33fd837e. So now I will start over with the bisection ... There might also be different suspend errors we're dealing with: linux-5.16-rc7 e.g. seems to hang when going into suspend with the last log message before reboot being: Dec 30 14:53:19 lisa kernel: [ 85.056243] PM: suspend entry (s2idle) After sorting out some .config troubles I've reverted commit 2a50edbf10c8c56e930bfb53d8f9f00a33fd837e in linux-5.16-rc7 and this seems to cure both the not properly suspending issue (fan and keyboard lights stayed on) and the hang on resuming from suspend issue. The [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002 are still there, but they don't seem to be harmful, in fact they were there even in linux.5.15.11 where suspend and resume seemed to work. Just did a bit of experimentation and changed dc_get_hpd_state_dcn21 (in 5.16-rc7) to uint32_t dc_get_hpd_state_dcn21(struct irq_service *irq_service, enum dc_irq_source source) { const struct irq_source_info *info; uint32_t addr; uint32_t value; uint32_t current_status; info = find_irq_source_info(irq_service, source); if (!info) return 0; addr = info->status_reg; if (!addr) return 0; value = dm_read_reg(irq_service->ctx, addr); //current_status = // get_reg_field_value( // value, // HPD0_DC_HPD_INT_STATUS, // DC_HPD_SENSE); //return current_status; return 0; } This also leads to working suspend resume, the get_reg_field_value seems to trigger the problems. get_reg_field_value(value, HPD0_DC_HPD_INT_STATUS, DC_HPD_SENSE) is just a macro expanding to an inline function, i.e. current_status = (value & HPD0_DC_HPD_INT_STATUS__DC_HPD_SENSE_MASK) >> HPD0_DC_HPD_IN_STATUS__DC_HPD_SENSE__SHIFT; with ..._MASK = 0x2 being 0x2 and ...__SHIFT = 0x1, problems seem to occur when current_status != 0. current_status != 0 causes rn_update_clocks to behave differently, basically rn_vbios_smu_set_dcn_low_power_state can never get called. I wonder if we want to just not apply this WA as part of the suspend/resume paths, but only during the runtime paths? Created attachment 300211 [details]
dmesg with linux-5.16-rc8
With linux-5.16-rc8, suspend does still not work properly (fan stays on). Resume seemed to work at first, but on the next suspend/resume test I get the usual hang.
Suspend works again if I ignore hpd_state in rn_update_clocks. // if (display_count == 0 && !hpd_state) { if (display_count == 0) { I added a some dev_info calls: static int amdgpu_pmops_suspend(struct device *dev) { struct drm_device *drm_dev = dev_get_drvdata(dev); struct amdgpu_device *adev = drm_to_adev(drm_dev); int r; if (amdgpu_acpi_is_s0ix_active(adev)) adev->in_s0ix = true; adev->in_s3 = true; r = amdgpu_device_suspend(drm_dev, true); dev_info(adev->dev, "amdgpu_device_suspend(...) = %d\n", r); adev->in_s3 = false; if (r) return r; dev_info(adev->dev, "adev->in_s0ix = %d\n", adev->in_s0ix); if (!adev->in_s0ix) r = amdgpu_asic_reset(adev); return r; } which gives Jan 3 11:13:40 lisa kernel: [ 38.792335] amdgpu 0000:08:00.0: amdgpu: amdgpu_device_suspend(...) = 0 Jan 3 11:13:40 lisa kernel: [ 38.792341] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 1 So the gpu is not actually reset ... Created attachment 300212 [details]
dmesg with linux-5.16-rc8 while ignoring adev->in_s0ix
Simply ignoring the value of adev->in_s0ix and calling reset anyway does not work either:
Jan 3 11:23:51 lisa kernel: [ 32.649142] amdgpu 0000:08:00.0: amdgpu: amdgpu_device_suspend(...) = 0
Jan 3 11:23:51 lisa kernel: [ 32.649146] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 1
Jan 3 11:23:51 lisa kernel: [ 32.649149] amdgpu 0000:08:00.0: amdgpu: MODE2 reset
Jan 3 11:23:51 lisa kernel: [ 37.404614] amdgpu 0000:08:00.0: amdgpu: Mode2 reset failed!
Jan 3 11:23:51 lisa kernel: [ 37.404616] PM: pci_pm_suspend(): amdgpu_pmops_suspend+0x0/0x60 [amdgpu] returns -62
Jan 3 11:23:51 lisa kernel: [ 37.404733] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x150 returns -62
Created attachment 300214 [details]
Possible patch that ignores WA in s0ix
Actually this is more like what I was suggesting. It effectively doesn't use the WA during s0ix suspend or resume.
Added a dev_info: dev_info(adev->dev, "adev->in_s0ix = %d\n", adev->in_s0ix); if (display_count == 0 && (adev->in_s0ix || !hpd_state)) { with the result: It seems that adev->in_s0ix = 0 during suspend. Only later during resume (after ACPI: EC: interrupt unblocked) do we have adev->in_s0ix = 1. Jan 3 18:00:24 lisa kernel: [ 40.367310] wlp4s0: deauthenticating from 54:67:51:3d:a2:e0 by local choice (Reason: 3=DEAUTH_LEAVING) Jan 3 18:00:24 lisa kernel: [ 40.437382] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 0 Jan 3 18:00:27 lisa kernel: [ 44.031004] PM: suspend entry (s2idle) Jan 3 18:00:28 lisa kernel: [ 44.252353] Filesystems sync: 0.221 seconds Jan 3 18:00:58 lisa kernel: [ 44.253185] Freezing user space processes ... (elapsed 0.002 seconds) done. Jan 3 18:00:58 lisa kernel: [ 44.255407] OOM killer disabled. Jan 3 18:00:58 lisa kernel: [ 44.255407] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. Jan 3 18:00:58 lisa kernel: [ 44.256406] printk: Suspending console(s) (use no_console_suspend to debug) Jan 3 18:00:58 lisa kernel: [ 44.277805] queueing ieee80211 work while going to suspend Jan 3 18:00:58 lisa kernel: [ 44.831434] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000). Jan 3 18:00:58 lisa kernel: [ 44.831464] [drm] PSP is resuming... Jan 3 18:00:58 lisa kernel: [ 45.019729] [drm] reserve 0xa00000 from 0x81fe000000 for PSP TMR Jan 3 18:00:58 lisa kernel: [ 45.091174] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available Jan 3 18:00:58 lisa kernel: [ 45.100301] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available Jan 3 18:00:58 lisa kernel: [ 45.100305] amdgpu 0000:03:00.0: amdgpu: SMU is resuming... Jan 3 18:00:58 lisa kernel: [ 45.100308] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw version = 0x003b2500 (59.37.0) Jan 3 18:00:58 lisa kernel: [ 45.100311] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched Jan 3 18:00:58 lisa kernel: [ 45.151255] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully! Jan 3 18:00:58 lisa kernel: [ 45.152433] [drm] DMUB hardware initialized: version=0x02020007 Jan 3 18:00:58 lisa kernel: [ 45.194058] [drm] kiq ring mec 2 pipe 1 q 0 Jan 3 18:00:58 lisa kernel: [ 45.198331] [drm] VCN decode and encode initialized successfully(under DPG Mode). Jan 3 18:00:58 lisa kernel: [ 45.198603] [drm] JPEG decode initialized successfully. Jan 3 18:00:58 lisa kernel: [ 45.198652] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198654] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198655] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198655] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198656] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198657] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198658] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198659] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198659] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198660] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198661] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198661] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0 Jan 3 18:00:58 lisa kernel: [ 45.198662] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1 Jan 3 18:00:58 lisa kernel: [ 45.198662] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1 Jan 3 18:00:58 lisa kernel: [ 45.198663] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1 Jan 3 18:00:58 lisa kernel: [ 45.198663] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1 Jan 3 18:00:58 lisa kernel: [ 45.202326] amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes Jan 3 18:00:58 lisa kernel: [ 45.527398] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002 Jan 3 18:00:58 lisa kernel: [ 45.844705] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002 Jan 3 18:00:58 lisa kernel: [ 45.889752] [drm] free PSP TMR buffer Jan 3 18:00:58 lisa kernel: [ 45.939048] amdgpu 0000:03:00.0: amdgpu: MODE1 reset Jan 3 18:00:58 lisa kernel: [ 45.939052] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset Jan 3 18:00:58 lisa kernel: [ 45.939132] amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset Jan 3 18:00:58 lisa kernel: [ 46.450444] ACPI: EC: interrupt blocked Jan 3 18:00:58 lisa kernel: [ 46.472489] xhci_hcd 0000:08:00.3: refused to change power state from D0 to D3hot Jan 3 18:00:58 lisa kernel: [ 55.081262] ACPI: EC: interrupt unblocked Jan 3 18:00:58 lisa kernel: [ 55.632167] pci 0000:00:00.2: can't derive routing for PCI INT A Jan 3 18:00:58 lisa kernel: [ 55.632172] pci 0000:00:00.2: PCI INT A: no GSI Jan 3 18:00:58 lisa kernel: [ 55.635561] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000). Jan 3 18:00:58 lisa kernel: [ 55.637155] [drm] PSP is resuming... Jan 3 18:00:58 lisa kernel: [ 55.637252] [drm] PCIE GART of 1024M enabled. Jan 3 18:00:58 lisa kernel: [ 55.637254] [drm] PTB located at 0x000000F400900000 Jan 3 18:00:58 lisa kernel: [ 55.637769] amdgpu 0000:08:00.0: amdgpu: SMU is resuming... Jan 3 18:00:58 lisa kernel: [ 55.640076] amdgpu 0000:08:00.0: amdgpu: dpm has been disabled Jan 3 18:00:58 lisa kernel: [ 55.643194] amdgpu 0000:08:00.0: amdgpu: SMU is resumed successfully! Jan 3 18:00:58 lisa kernel: [ 55.649205] [drm] DMUB hardware initialized: version=0x0101001C Jan 3 18:00:58 lisa kernel: [ 55.661720] nvme nvme0: 15/0/0 default/read/poll queues Jan 3 18:00:58 lisa kernel: [ 55.702951] nvme nvme1: Shutdown timeout set to 10 seconds Jan 3 18:00:58 lisa kernel: [ 55.704396] nvme nvme1: 8/0/0 default/read/poll queues Jan 3 18:00:58 lisa kernel: [ 55.737425] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 1 Jan 3 18:00:58 lisa kernel: [ 55.828624] [drm] reserve 0xa00000 from 0x81fe000000 for PSP TMR Jan 3 18:00:58 lisa kernel: [ 55.898619] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available Jan 3 18:00:58 lisa kernel: [ 55.907754] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available Jan 3 18:00:58 lisa kernel: [ 55.907757] amdgpu 0000:03:00.0: amdgpu: SMU is resuming... Jan 3 18:00:58 lisa kernel: [ 55.907761] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw version = 0x003b2500 (59.37.0) Jan 3 18:00:58 lisa kernel: [ 55.907767] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched Jan 3 18:00:58 lisa kernel: [ 55.958092] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully! Jan 3 18:00:58 lisa kernel: [ 55.959299] [drm] DMUB hardware initialized: version=0x02020007 Jan 3 18:00:58 lisa kernel: [ 55.968918] amdgpu 0000:08:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110) Jan 3 18:00:58 lisa kernel: [ 55.969120] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110 Jan 3 18:00:58 lisa kernel: [ 55.969298] amdgpu 0000:08:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). Jan 3 18:00:58 lisa kernel: [ 55.969301] PM: dpm_run_callback(): pci_pm_resume+0x0/0xd0 returns -110 Jan 3 18:00:58 lisa kernel: [ 55.969313] amdgpu 0000:08:00.0: PM: failed to resume async: error -110 Created attachment 300215 [details]
Possible patch that ignores WA in s0ix w/ debugging statements (v2)
OK thanks for trying it! Here's an updated patch, to try to give an extra explicit call to update the clocks again during the suspend path. Also included a debugging statement so we can see the state of the variables that matter in each call to update_clocks.
If you can please share the output from your kernel log again with it applied.
The problem seems to be that rn_update_clocks tries to call rn_vbios_smu_set_dcn_low power before adev->in_s0ix is set to 1 in amdpgu_pmops_suspend: Changes to rn_update_clocks: dev_info(adev->dev, "adev->in_s0ix = %d\n", adev->in_s0ix); //if (display_count == 0 && (adev->in_s0ix || !hpd_state)) { if (display_count == 0) { dev_info(adev->dev, "calling rn_vbios_smu_set_dcn_low_power\n"); rn_vbios_smu_set_dcn_low_power_state(clk_mgr, DCN_PWR_STATE_LOW_POWER); Changes to amdgpu_pmops_suspend: if (amdgpu_acpi_is_s0ix_active(adev)) adev->in_s0ix = true; dev_info(adev->dev, "amdgpu_pmops_suspend: adev->in_s0ix = %d\n", adev->in_s0ix); [ 36.010259] wlp4s0: deauthenticating from 54:67:51:3d:a2:e0 by local choice (Reason: 3=DEAUTH_LEAVING) [ 36.043902] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 0 [ 36.043904] amdgpu 0000:08:00.0: amdgpu: calling rn_vbios_smu_set_dcn_low_power [ 40.015570] PM: suspend entry (s2idle) [ 40.239013] Filesystems sync: 0.223 seconds [ 40.239912] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 40.242407] OOM killer disabled. [ 40.242407] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 40.243513] printk: Suspending console(s) (use no_console_suspend to debug) [ 40.246850] amdgpu 0000:08:00.0: amdgpu: amdgpu_pmops_suspend: adev->in_s0ix = 1 [ 40.266376] queueing ieee80211 work while going to suspend Created attachment 300222 [details]
debugging patch
I added some more dev_info calls to figure out where things are called:
[ 34.946911] wlp4s0: deauthenticating from 54:67:51:3d:a2:e0 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 35.079021] amdgpu 0000:08:00.0: amdgpu: Calling update_clocks from dcn10_set_clock
[ 35.079026] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 0
[ 35.079027] amdgpu 0000:08:00.0: amdgpu: calling rn_vbios_smu_set_dcn_low_power
[ 39.030889] PM: suspend entry (s2idle)
[ 39.270864] Filesystems sync: 0.239 seconds
[ 39.271929] Freezing user space processes ... (elapsed 0.002 seconds) done.
[ 39.274694] OOM killer disabled.
The WLAN message is the first message one gets after initiating suspend. The interesting thing is that rn_vbios_set_dc_low_power is called before suspend entry (and there's also a long 4s gap).
I also have no idea where dcn10_set_clock is called. The only place I found is from within dc_set_clock, but dc_set_clock is called nowhere in the entire amdgpu module.
Created attachment 300223 [details]
corrected debugging patch
The patch above contained an error
rn_update_clocks is actually called from dcn20_optimize_bandwidth
Created attachment 300224 [details]
new debuggin patch
Seems the call trace of rn_vbios_smu_set_dcn_low_power is amdgpu_dm_atomic_commit_tail -> dc_commit_state -> dc_commit_state_nocheck -> dcn20_optimize_bandwith -> rn_update_clocks -> rn_vbios_smu_set_dcn_low_power
53.562322] amdgpu 0000:08:00.0: amdgpu: Calling dc_commit_state from amdgpu_dm_atomic_commit_tail
[ 53.562426] amdgpu 0000:08:00.0: amdgpu: Calling dc_commit_state from amdgpu_dm_atomic_commit_tail
[ 53.563183] wlp4s0: deauthenticating from 54:67:51:3d:a2:e0 by local choice (Reason: 3=DEAUTH_LEAVING)
[ 53.673655] amdgpu 0000:08:00.0: amdgpu: Calling optimize_bandwidth from dc_commit_state_no_check
[ 53.673669] amdgpu 0000:08:00.0: amdgpu: Calling update_clocks from dcn20_optimize_bandwidth
[ 53.673672] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 0
[ 53.673673] amdgpu 0000:08:00.0: amdgpu: calling rn_vbios_smu_set_dcn_low_power
[ 53.674562] amdgpu 0000:08:00.0: amdgpu: Calling dc_commit_state from amdgpu_dm_atomic_commit_tail
This seems to be not really connected to suspend.
I get similar message if I just lock the screen which then switches off: [ 1732.526162] amdgpu 0000:08:00.0: amdgpu: Calling dc_commit_state from amdgpu_dm_atomic_commit_tail [ 1732.640992] amdgpu 0000:08:00.0: amdgpu: Calling optimize_bandwidth from dc_commit_state_no_check [ 1732.641005] amdgpu 0000:08:00.0: amdgpu: Calling update_clocks from dcn20_optimize_bandwidth [ 1732.641007] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 0 [ 1732.641008] amdgpu 0000:08:00.0: amdgpu: calling rn_vbios_smu_set_dcn_low_power Did you try out my patch? It should have given an extra explicit call in the suspend path. That's it! The extra call to clk_mgr_optimize_pwr_state now leads to rn_update_clock being called with adev->in_s0ix = 1: [ 31.142514] wlp4s0: deauthenticating from 54:67:51:3d:a2:e0 by local choice (Reason: 3=DEAUTH_LEAVING) [ 31.176805] amdgpu 0000:08:00.0: amdgpu: Calling optimize_bandwidth from dc_commit_state_no_check [ 31.176815] amdgpu 0000:08:00.0: amdgpu: Calling update_clocks from dcn20_optimize_bandwidth [ 31.176818] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 0 [ 35.056209] PM: suspend entry (s2idle) [ 36.535688] Filesystems sync: 1.479 seconds [ 36.536404] Freezing user space processes ... (elapsed 0.024 seconds) done. [ 36.561024] OOM killer disabled. [ 36.561025] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 36.562212] printk: Suspending console(s) (use no_console_suspend to debug) [ 36.566381] amdgpu 0000:08:00.0: amdgpu: amdgpu_pmops_suspend: adev->in_s0ix = 1 [ 36.566557] amdgpu 0000:08:00.0: amdgpu: calling update_clocks from dcn21_optimize_pwr_state [ 36.566567] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 1 [ 36.566569] amdgpu 0000:08:00.0: amdgpu: calling rn_vbios_smu_set_dcn_low_power Can you check the state of the other variables though with that extra call? That was in my debugging patch but not inyours. Is the HPD active? For the extra call hpd_state = 0: [ 30.684201] wlp4s0: deauthenticating from 54:67:51:3d:a2:e0 by local choice (Reason: 3=DEAUTH_LEAVING) [ 30.794571] amdgpu 0000:08:00.0: amdgpu: Calling optimize_bandwidth from dc_commit_state_no_check [ 30.794609] amdgpu 0000:08:00.0: amdgpu: Calling update_clocks from dcn20_optimize_bandwidth [ 30.794613] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 0, hpd_state = 1 [ 35.026604] PM: suspend entry (s2idle) [ 35.256153] Filesystems sync: 0.229 seconds [ 35.257044] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 35.259636] OOM killer disabled. [ 35.259636] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 35.260642] printk: Suspending console(s) (use no_console_suspend to debug) [ 35.264212] amdgpu 0000:08:00.0: amdgpu: amdgpu_pmops_suspend: adev->in_s0ix = 1 [ 35.264369] amdgpu 0000:08:00.0: amdgpu: calling update_clocks from dcn21_optimize_pwr_state [ 35.264379] amdgpu 0000:08:00.0: amdgpu: adev->in_s0ix = 1, hpd_state = 0 [ 35.264381] amdgpu 0000:08:00.0: amdgpu: calling rn_vbios_smu_set_dcn_low_power So one could go back to the old version of rn_update_clocks with if (display_count == 0 && !hpd_state) { Great! OK, let me send that out for review then and see what folks think. As this is now in 5.17-rc1: https://github.com/torvalds/linux/commit/c4849f88164b13dd141885e28210f599741b304b this issue can be closed. |