Bug 211875
Summary: | CPU frequency scaling lost after "WARNING: CPU: 2 PID: 2358578 at smu8_send_msg_to_smc_with_parameter+0xfe/0x140 [amdgpu]" | ||
---|---|---|---|
Product: | Drivers | Reporter: | Erhard F. (erhard_f) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED UNREPRODUCIBLE | ||
Severity: | normal | CC: | alexdeucher, christian.koenig |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.11 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg (kernel 5.11, A10-9700E)
kernel .config (kernel 5.11, A10-9700E) kernel .config (kernel 5.12-rc1, A10-9700E) dmesg (kernel 5.12-rc5, A10-9700E) kernel .config (kernel 5.12-rc5, A10-9700E) dmesg (kernel 4.14.228, A10-9700E) dmesg (kernel 5.13-rc1, A10-9700E) kernel .config (kernel 5.13-rc1, A10-9700E) |
Created attachment 295383 [details]
kernel .config (kernel 5.11, A10-9700E)
------------[ cut here ]------------ smu8_send_msg_to_smc_with_parameter(0x0011, 0x460) timed out after 3236309 us WARNING: CPU: 1 PID: 403 at smu8_send_msg_to_smc_with_parameter+0xfe/0x140 [amdgpu] Modules linked in: dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher dm_mod fuse input_leds joydev hid_generic usbhid hid rfkill raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx ext4 crc16 mbcache jbd2 md_mod amdgpu evdev crc32_generic crc32_pclmul f2fs aesni_intel libaes crypto_simd cryptd glue_helper snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio led_class drm_ttm_helper snd_hda_codec_hdmi k10temp ttm ehci_pci gpu_sched fam15h_power ehci_hcd i2c_algo_bit r8169 i2c_piix4 snd_hda_intel snd_intel_dspcfg realtek mdio_devres xhci_pci libphy snd_hda_codec drm_kms_helper snd_hwdep xhci_hcd snd_hda_core cfbfillrect syscopyarea cfbimgblt sysfillrect usbcore sysimgblt snd_pcm fb_sys_fops usb_common cfbcopyarea snd_timer fb snd font fbdev soundcore video acpi_cpufreq button processor nfsd nct6775 hwmon_vid zram auth_rpcgss zsmalloc lockd jc42 grace drm hwmon drm_panel_orientation_quirks sunrpc backlight nfs_ssc CPU: 1 PID: 403 Comm: X Not tainted 5.11.0-gentoo #2 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R3.0, BIOS P3.10 06/26/2019 RIP: 0010:smu8_send_msg_to_smc_with_parameter+0xfe/0x140 [amdgpu] Code: c0 48 c7 c7 a0 5b ac c0 48 89 c1 48 f7 ea 48 89 c8 44 89 f1 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89 ea e8 db 98 e0 cf <0f> 0b eb b3 41 bc ea ff ff ff eb ab 48 8b 7d 60 be c0 01 00 00 48 RSP: 0018:ffff9fc60749b788 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 00000d3740b628cb RCX: ffff9d430f492d58 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9d430f492d50 RBP: ffff9d400f23a800 R08: 0000000000000000 R09: ffff9fc60749b5c0 R10: 0000000000000003 R11: ffffffff90e57e48 R12: 00000000ffffffc2 R13: 0000000000000011 R14: 0000000000000460 R15: ffff9d405006988c FS: 00007ff812a7b980(0000) GS:ffff9d430f480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff7e8fef000 CR3: 000000010f26e000 CR4: 00000000001506e0 Call Trace: smum_send_msg_to_smc_with_parameter+0xbe/0xf0 [amdgpu] smu8_set_power_state_tasks+0x1d6/0x360 [amdgpu] phm_set_power_state+0x44/0x68 [amdgpu] psm_adjust_power_state_dynamic+0xb6/0x1b8 [amdgpu] ? amdgpu_fence_wait_empty+0x80/0xd0 [amdgpu] pp_dpm_dispatch_tasks+0x35/0x58 [amdgpu] dm_pp_apply_display_requirements+0x197/0x1c0 [amdgpu] dce11_update_clocks+0x8f/0xe8 [amdgpu] dc_commit_state+0x1d2/0xa70 [amdgpu] amdgpu_dm_atomic_commit_tail+0x3ee/0x2378 [amdgpu] ? free_one_page+0x4e/0xc0 ? bw_calcs+0x656/0x1df0 [amdgpu] ? _cond_resched+0x11/0x38 ? __ww_mutex_lock.constprop.0+0x2c/0x6f8 ? kfree+0x9f/0x3a0 ? ttm_bo_move_to_lru_tail+0x1c/0xc0 [ttm] ? ttm_eu_backoff_reservation+0x3e/0x58 [ttm] ? amdgpu_bo_gpu_offset+0x65/0xb0 [amdgpu] ? dm_plane_helper_prepare_fb+0x127/0x248 [amdgpu] ? _cond_resched+0x11/0x38 ? wait_for_completion_timeout+0x31/0xe0 ? wait_for_completion_interruptible+0x2e/0x128 commit_tail+0x8d/0x120 [drm_kms_helper] drm_atomic_helper_commit+0x111/0x138 [drm_kms_helper] drm_atomic_connector_commit_dpms+0xd4/0xf0 [drm] drm_mode_obj_set_property_ioctl+0x183/0x3a0 [drm] ? drm_connector_set_obj_prop+0x80/0x80 [drm] drm_connector_property_set_ioctl+0x32/0x50 [drm] drm_ioctl_kernel+0xa5/0xf0 [drm] drm_ioctl+0x20a/0x398 [drm] ? drm_connector_set_obj_prop+0x80/0x80 [drm] amdgpu_drm_ioctl+0x44/0x78 [amdgpu] __x64_sys_ioctl+0x81/0xa0 do_syscall_64+0x33/0x48 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7ff812686cc7 Code: 3c 1c 48 f7 d8 49 39 c4 72 b9 e8 24 ff ff ff 85 c0 78 be 5b 4c 89 e0 5d 41 5c c3 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 71 41 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffd6465ce88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffd6465cec0 RCX: 00007ff812686cc7 RDX: 00007ffd6465cec0 RSI: 00000000c01064ab RDI: 000000000000000c RBP: 00000000c01064ab R08: 00007ffd64740080 R09: 00007ff811fa7d10 R10: 00007ff811fa7d20 R11: 0000000000000246 R12: 0000555c25ecedd0 R13: 000000000000000c R14: 0000000000000000 R15: 0000555c24e06540 ---[ end trace c56a82515c29af45 ]--- Created attachment 295573 [details]
kernel .config (kernel 5.12-rc1, A10-9700E)
Same for kernel v5.12-rc1. Backtrace looks slightly different with KASAN.
[...]
------------[ cut here ]------------
smu8_send_msg_to_smc_with_parameter(0x0011, 0x460) timed out after 3350731 us
WARNING: CPU: 3 PID: 598651 at smu8_send_msg_to_smc_with_parameter+0x180/0x1e0 [amdgpu]
Modules linked in: rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds led_class joydev hid_generic dm_mod usbhid hid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod amdgpu f2fs crc32_pclmul evdev ext4 crc16 mbcache jbd2 aesni_intel libaes crypto_simd cryptd gpu_sched i2c_algo_bit drm_ttm_helper ttm snd_hda_codec_generic snd_hda_codec_hdmi drm_kms_helper cfbfillrect snd_hda_intel syscopyarea cfbimgblt snd_intel_dspcfg sysfillrect sysimgblt xhci_pci fb_sys_fops snd_hda_codec ehci_pci cfbcopyarea snd_hwdep xhci_hcd snd_hda_core snd_pcm ehci_hcd fb i2c_piix4 snd_timer usbcore k10temp fam15h_power font fbdev snd soundcore usb_common 8250 8250_base serial_core video acpi_cpufreq button processor nfsd zram auth_rpcgss zsmalloc nct6775 hwmon_vid lockd jc42 hwmon grace drm drm_panel_orientation_quirks fuse backlight configfs sunrpc nfs_ssc efivarfs
CPU: 3 PID: 598651 Comm: kworker/3:0 Not tainted 5.12.0-rc1-Excavator #2
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R3.0, BIOS P3.10 06/26/2019
Workqueue: events dm_irq_work_func [amdgpu]
RIP: 0010:smu8_send_msg_to_smc_with_parameter+0x180/0x1e0 [amdgpu]
Code: 20 48 c7 c7 a0 c5 16 c1 48 89 c1 48 f7 ea 48 89 c8 44 89 f1 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89 ea e8 22 db e9 d2 <0f> 0b eb ac 41 bc ea ff ff ff eb a4 4c 89 ef e8 dc 54 5d d2 48 8b
RSP: 0018:ffffc900090174d0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000011 RCX: 0000000000000000
RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52001202e8c
RBP: ffff88810abc6800 R08: ffffffff931ecfbe R09: ffff8883d39ab09b
R10: ffffed107a735613 R11: 0000000000000001 R12: 00000000ffffffc2
R13: 0000000000000011 R14: 0000000000000460 R15: ffffffffc114fce0
FS: 0000000000000000(0000) GS:ffff8883d3980000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6d79d34000 CR3: 00000001163d6000 CR4: 00000000001506e0
Call Trace:
smum_send_msg_to_smc_with_parameter+0x13c/0x170 [amdgpu]
smu8_set_power_state_tasks+0x34c/0x560 [amdgpu]
phm_set_power_state+0xb2/0xf0 [amdgpu]
? phm_power_down_asic+0x60/0x60 [amdgpu]
? smu8_apply_state_adjust_rules+0x1bf/0x290 [amdgpu]
psm_adjust_power_state_dynamic+0x12c/0x300 [amdgpu]
? psm_set_user_performance_state+0xd0/0xd0 [amdgpu]
? amdgpu_fence_wait_empty+0x24f/0x380 [amdgpu]
? mutex_lock_io_nested+0xac0/0xac0
? memcpy+0x39/0x60
hwmgr_handle_task+0xf1/0x180 [amdgpu]
? hwmgr_resume+0x70/0x70 [amdgpu]
? amdgpu_fence_fallback+0x20/0x20 [amdgpu]
? program_stutter_watermark+0x1b4/0x1d0 [amdgpu]
pp_dpm_dispatch_tasks+0x40/0x60 [amdgpu]
? pp_get_power_limit+0xe0/0xe0 [amdgpu]
amdgpu_pm_compute_clocks.part.0+0x18a/0xa00 [amdgpu]
? wait_for_completion_killable_timeout+0x200/0x200
? amdgpu_dpm_get_vrefresh+0xd0/0xd0 [amdgpu]
? phm_store_dal_configuration_data+0x165/0x190 [amdgpu]
? pp_get_current_clocks+0x2d0/0x2d0 [amdgpu]
? pp_get_current_clocks+0x2d0/0x2d0 [amdgpu]
dm_pp_apply_display_requirements+0x3ae/0x3f0 [amdgpu]
dce11_update_clocks+0x149/0x1e0 [amdgpu]
? dce11_pplib_apply_display_requirements+0x470/0x470 [amdgpu]
dc_commit_updates_for_stream+0x243b/0x3c20 [amdgpu]
? __mutex_lock+0x2e9/0xb80
? dm_set_dpms_off+0xc0/0x190 [amdgpu]
? dc_check_update_surfaces_for_stream+0x1130/0x1130 [amdgpu]
? dmub_reg_value_burst_set_pack+0x240/0x240 [amdgpu]
? dal_gpio_create_irq+0x20/0x20 [amdgpu]
? kasan_quarantine_put+0xd1/0x1f0
? trace_hardirqs_on+0x1c/0x100
dm_set_dpms_off+0x120/0x190 [amdgpu]
? dm_handle_hpd_rx_irq+0x270/0x270 [amdgpu]
? clk_mgr_optimize_pwr_state+0x8e/0xd0 [amdgpu]
handle_hpd_irq+0x1ed/0x250 [amdgpu]
? handle_hpd_rx_irq+0x2f0/0x2f0 [amdgpu]
? lock_release+0xb2/0x430
? __schedule+0x5f8/0x10e0
? lockdep_hardirqs_on_prepare+0xe/0x200
? _raw_spin_unlock_irq+0x1f/0x30
? handle_hpd_rx_irq+0x2f0/0x2f0 [amdgpu]
dm_irq_work_func+0x77/0x90 [amdgpu]
process_one_work+0x4f3/0x9b0
? pwq_dec_nr_in_flight+0x110/0x110
? rwlock_bug.part.0+0x50/0x50
worker_thread+0x2ce/0x6c0
? process_one_work+0x9b0/0x9b0
kthread+0x1d9/0x200
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x22/0x30
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<ffffffff930690fa>] copy_process+0x8fa/0x31e0
softirqs last enabled at (0): [<ffffffff930690fa>] copy_process+0x8fa/0x31e0
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 6e48f7e48c11b7c0 ]---
amdgpu: smu8_send_msg_to_smc_with_parameter(0x026e) aborted; SMU still servicing msg (0x0011)
amdgpu: smu8_send_msg_to_smc_with_parameter(0x002f) aborted; SMU still servicing msg (0x0011)
amdgpu: smu8_send_msg_to_smc_with_parameter(0x0012) aborted; SMU still servicing msg (0x0011)
If this is a regression, can you bisect? https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html I don't know. This is a new machine and have seen this since 5.10.x which was the first kernel I tried on it. Created attachment 296145 [details]
dmesg (kernel 5.12-rc5, A10-9700E)
Gave kernel 5.12-rc5 a try but the issue stays.
Looks like I really need to find an older kernel where this does not happen and start a bisect from there. This will take some time...
Created attachment 296147 [details]
kernel .config (kernel 5.12-rc5, A10-9700E)
Created attachment 296273 [details]
dmesg (kernel 4.14.228, A10-9700E)
Traced the issue back to kernel v4.14.228 which is still affected (v4.19.184 and v5.4.109 too). On v4.14.228 no stack trace like in recent kernels but these messages:
[...]
[28541.868617] amdgpu: [powerplay] min_core_set_clock not set
[28542.483228] cz_send_msg_to_smc_async (0x0011) failed
[28543.097905] cz_send_msg_to_smc_async (0x026e) failed
[28543.712424] cz_send_msg_to_smc_async (0x002f) failed
[28543.712719] amdgpu: [powerplay] min_core_set_clock not set
[28544.330105] cz_send_msg_to_smc_async (0x0011) failed
[28544.947054] cz_send_msg_to_smc_async (0x026e) failed
[28545.564013] cz_send_msg_to_smc_async (0x002f) failed
[28545.564251] amdgpu: [powerplay] min_core_set_clock not set
[28546.179695] cz_send_msg_to_smc_async (0x0011) failed
[28546.794880] cz_send_msg_to_smc_async (0x026e) failed
[28547.409986] cz_send_msg_to_smc_async (0x002f) failed
Apart from that the machine behaves the same after these "cz_send_msg_to_smc_async (0x002f) failed" - CPU permanently downclocked to 800 MHz, desktop 'freezing' issues with display going black after some time. Access via ssh still works.
(In reply to Alex Deucher from comment #4) > If this is a regression, can you bisect? > https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html Sorry but my bisecting efforts came to a halt. Last kernel I was able to boot was 4.14.228. And this one still has the issue. Kernels 4.13.16, 4.12.14, 4.11.12, 4.10.17, 4.9.264 I was able to build but they don't boot into desktop or even console. Just getting a "no signal" message von my monitor after the kernel has booted, and some of these kernels do a reboot. Don't know how to proceed from here. Created attachment 296715 [details]
dmesg (kernel 5.13-rc1, A10-9700E)
Some locking stacktrace on kernel v5.13-rc1 before the usual "smu8_send_msg_to_smc_with_parameter" messages:
[...]
================================
WARNING: inconsistent lock state
5.13.0-rc1-bdver4 #9 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
X/591 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffffffffbe37d160 (fs_reclaim){?.+.}-{0:0}, at: fs_reclaim_acquire+0xf7/0x160
{HARDIRQ-ON-W} state was registered at:
lock_acquire+0x1a0/0x6e0
fs_reclaim_acquire+0x117/0x160
kmem_cache_alloc_trace+0x3b/0x320
init_rescuer+0x80/0x330
workqueue_init+0x12f/0x2fd
kernel_init_freeable+0x305/0x584
kernel_init+0x8/0x116
ret_from_fork+0x22/0x30
irq event stamp: 68143332
hardirqs last enabled at (68143331): [<ffffffffbd800baf>] asm_sysvec_apic_timer_interrupt+0xf/0x20
hardirqs last disabled at (68143332): [<ffffffffbd774184>] common_interrupt+0x14/0xa0
softirqs last enabled at (68143304): [<ffffffffbc0d0cd9>] irq_exit_rcu+0x119/0x1a0
softirqs last disabled at (68143299): [<ffffffffbc0d0cd9>] irq_exit_rcu+0x119/0x1a0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(fs_reclaim);
<Interrupt>
lock(fs_reclaim);
*** DEADLOCK ***
5 locks held by X/591:
#0: ffff88810397fa20 (crtc_ww_class_acquire){+.+.}-{0:0}, at: set_property_atomic+0xb2/0x2f0 [drm]
#1: ffff88812bc00538 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock+0x364/0x500 [drm]
#2: ffff88812bc1cb58 (&adev->dm.dc_lock){+.+.}-{3:3}, at: amdgpu_dm_atomic_commit_tail+0xa12/0x9870 [amdgpu]
#3: ffff8881538d6880 (&hwmgr->smu_lock){+.+.}-{3:3}, at: pp_dpm_dispatch_tasks+0x50/0x90 [amdgpu]
#4: ffff8881538d6910 (&hwmgr->msg_lock){+.+.}-{3:3}, at: smum_send_msg_to_smc_with_parameter+0x1bf/0x300 [amdgpu]
stack backtrace:
CPU: 1 PID: 591 Comm: X Not tainted 5.13.0-rc1-bdver4 #9
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R3.0, BIOS P3.10 06/26/2019
Call Trace:
<IRQ>
dump_stack+0xa5/0xe6
mark_lock.cold+0x145/0x14f
? lock_chain_count+0x20/0x20
? mark_lock+0xee/0x2fd0
? lock_chain_count+0x20/0x20
? lock_chain_count+0x20/0x20
? lock_chain_count+0x20/0x20
? rcu_read_lock_sched_held+0x3a/0x70
__lock_acquire+0x146a/0x5d40
? debug_check_no_locks_held+0xa0/0xa0
? mark_lock+0xee/0x2fd0
? debug_check_no_locks_held+0xa0/0xa0
lock_acquire+0x1a0/0x6e0
? fs_reclaim_acquire+0xf7/0x160
? lock_release+0x730/0x730
? amdgpu_dm_irq_handler+0x1ad/0x780 [amdgpu]
? lock_downgrade+0x6e0/0x6e0
fs_reclaim_acquire+0x117/0x160
? fs_reclaim_acquire+0xf7/0x160
? amdgpu_dm_irq_handler+0x2d1/0x780 [amdgpu]
kmem_cache_alloc_trace+0x3b/0x320
amdgpu_dm_irq_handler+0x2d1/0x780 [amdgpu]
amdgpu_irq_dispatch+0x280/0x590 [amdgpu]
? amdgpu_irq_add_id+0x2c0/0x2c0 [amdgpu]
? drm_print_bits+0x190/0x190 [drm]
? __lock_acquire+0xd5b/0x5d40
amdgpu_ih_process+0x1c4/0x390 [amdgpu]
? amdgpu_irq_disable_all+0x300/0x300 [amdgpu]
amdgpu_irq_handler+0x22/0x210 [amdgpu]
? amdgpu_irq_disable_all+0x300/0x300 [amdgpu]
__handle_irq_event_percpu+0x248/0x600
handle_irq_event+0xfa/0x260
? handle_irq_event_percpu+0x140/0x140
? lapic_next_event+0x53/0x80
? clockevents_program_event+0x1c7/0x280
handle_edge_irq+0x1f8/0xa90
__common_interrupt+0x70/0x150
common_interrupt+0x76/0xa0
</IRQ>
asm_common_interrupt+0x1b/0x40
RIP: 0010:delay_halt_mwaitx+0x31/0x40
Code: 00 65 48 03 05 a8 83 22 43 31 d2 48 89 d1 0f 01 fa bb ff ff ff ff b8 f0 00 00 00 b9 02 00 00 00 48 39 de 48 0f 46 de 0f 01 fb <5b> c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 83 ec 08 48 89 f8
RSP: 0018:ffff88810397ec38 EFLAGS: 00000297
RAX: 00000000000000f0 RBX: 0000000000000bb3 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000bb3 RDI: 00002c4339a8e963
RBP: 0000000000000bb3 R08: 0000000000000000 R09: ffffffffbe61f63f
R10: 0000000000000001 R11: 0000000000000001 R12: ffff8881538d6800
R13: 00000000000001d0 R14: 0000000000000000 R15: ffffed102a71ad3e
delay_halt+0x36/0x60
phm_wait_for_register_unequal+0xd5/0x240 [amdgpu]
? amdgpu_device_wreg.part.0+0x2ae/0x350 [amdgpu]
smu8_send_msg_to_smc_with_parameter+0x1e9/0x380 [amdgpu]
smum_send_msg_to_smc_with_parameter+0x215/0x300 [amdgpu]
smu8_set_power_state_tasks+0x691/0xe10 [amdgpu]
? debug_check_no_locks_held+0xa0/0xa0
phm_set_power_state+0xcc/0x130 [amdgpu]
? phm_power_down_asic+0x90/0x90 [amdgpu]
psm_adjust_power_state_dynamic+0x172/0x570 [amdgpu]
? psm_set_user_performance_state+0x1c0/0x1c0 [amdgpu]
? mutex_lock_io_nested+0xfd0/0xfe0
? memcpy+0x39/0x60
? psm_set_states+0x109/0x190 [amdgpu]
hwmgr_handle_task+0x10a/0x1f0 [amdgpu]
? hwmgr_resume+0xc0/0xc0 [amdgpu]
? amdgpu_debugfs_fence_info_show+0x450/0x450 [amdgpu]
pp_dpm_dispatch_tasks+0x5e/0x90 [amdgpu]
? pp_get_power_limit+0x250/0x250 [amdgpu]
amdgpu_pm_compute_clocks.part.0+0x245/0x1500 [amdgpu]
? wait_for_completion_killable_timeout+0x2c0/0x2c0
? amdgpu_dpm_get_vrefresh+0x1c0/0x1c0 [amdgpu]
? phm_store_dal_configuration_data+0x2d0/0x3f0 [amdgpu]
? pp_get_current_clocks+0x6e0/0x6e0 [amdgpu]
? pp_get_current_clocks+0x6e0/0x6e0 [amdgpu]
dm_pp_apply_display_requirements+0x92c/0xc60 [amdgpu]
dce11_update_clocks+0x27f/0x3d0 [amdgpu]
? dce11_pplib_apply_display_requirements+0xbb0/0xbb0 [amdgpu]
dc_commit_state+0x422/0x27a0 [amdgpu]
? amdgpu_dm_atomic_commit_tail+0xa12/0x9870 [amdgpu]
? unwind_next_frame+0x15a6/0x1cf0
? dc_allow_idle_optimizations+0x1c0/0x1c0 [amdgpu]
? drm_atomic_helper_calc_timestamping_constants+0x15a/0x1f0 [drm_kms_helper]
amdgpu_dm_atomic_commit_tail+0xa47/0x9870 [amdgpu]
? mark_lock+0xee/0x2fd0
? dm_crtc_helper_atomic_check+0x960/0x960 [amdgpu]
? mark_lock+0xee/0x2fd0
? lock_chain_count+0x20/0x20
? lock_chain_count+0x20/0x20
? __lock_acquire+0xd5b/0x5d40
? check_chain_key+0x24a/0x580
? __lock_acquire+0xd5b/0x5d40
? mark_lock+0xee/0x2fd0
? debug_check_no_locks_held+0xa0/0xa0
? mark_lock+0xee/0x2fd0
? lock_chain_count+0x20/0x20
? __lock_acquire+0xd5b/0x5d40
? check_chain_key+0x24a/0x580
? wait_for_completion_timeout+0x19d/0x220
? mark_held_locks+0x9e/0xe0
? rwlock_bug.part.0+0x90/0x90
? lockdep_hardirqs_on_prepare+0x273/0x3f0
? _raw_spin_unlock_irq+0x1f/0x30
? trace_hardirqs_on+0x1c/0x110
? _raw_spin_unlock_irq+0x1f/0x30
? wait_for_completion_timeout+0x19d/0x220
? wait_for_completion_io+0x210/0x210
? hpet_msi_interrupt_handler+0x50/0x50
? mark_held_locks+0x9e/0xe0
? drm_crtc_commit_wait+0x29/0x60 [drm]
? drm_atomic_helper_wait_for_dependencies+0x429/0x670 [drm_kms_helper]
commit_tail+0x1a9/0x300 [drm_kms_helper]
drm_atomic_helper_commit+0x216/0x290 [drm_kms_helper]
drm_atomic_connector_commit_dpms+0x2d7/0x4f0 [drm]
? drm_modeset_acquire_init+0x1e9/0x390 [drm]
? set_property_atomic+0xb2/0x2f0 [drm]
set_property_atomic+0x1be/0x2f0 [drm]
? drm_mode_object_get+0x110/0x110 [drm]
? lock_acquire+0x1a0/0x6e0
drm_mode_obj_set_property_ioctl+0x339/0x4c0 [drm]
? drm_connector_set_obj_prop+0x160/0x160 [drm]
drm_connector_property_set_ioctl+0xee/0x180 [drm]
? drm_connector_set_obj_prop+0x160/0x160 [drm]
? drm_is_current_master+0x65/0x120 [drm]
? drm_connector_set_obj_prop+0x160/0x160 [drm]
? drm_ioctl_permit+0x132/0x160 [drm]
drm_ioctl_kernel+0x1c8/0x260 [drm]
? drm_ioctl_permit+0x160/0x160 [drm]
? debug_check_no_locks_held+0xa0/0xa0
drm_ioctl+0x419/0x890 [drm]
? drm_connector_set_obj_prop+0x160/0x160 [drm]
? drm_ioctl_kernel+0x260/0x260 [drm]
? lock_downgrade+0x6e0/0x6e0
? lockdep_hardirqs_on_prepare+0x273/0x3f0
? _raw_spin_unlock_irqrestore+0x42/0x50
? trace_hardirqs_on+0x1c/0x110
amdgpu_drm_ioctl+0xc9/0x1a0 [amdgpu]
__x64_sys_ioctl+0x11b/0x170
do_syscall_64+0x61/0x80
? lockdep_hardirqs_on_prepare+0x273/0x3f0
? do_syscall_64+0x6e/0x80
? do_syscall_64+0x6e/0x80
? lockdep_hardirqs_on_prepare+0x273/0x3f0
? do_syscall_64+0x6e/0x80
? do_syscall_64+0x6e/0x80
? do_syscall_64+0x6e/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7acfa36937
Code: 3c 1c 48 f7 d8 4c 39 e0 77 b9 e8 24 ff ff ff 85 c0 78 be 5b 4c 89 e0 5d 41 5c c3 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 01 45 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcba76bd48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f7acfa36937
RDX: 00007ffcba76bd98 RSI: 00000000c01064ab RDI: 000000000000000c
RBP: 00000000c01064ab R08: 0000562d03ee9cf8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000562d040468b0
R13: 00007ffcba76bd98 R14: 0000000000000000 R15: 0000562d027ab540
------------[ cut here ]------------
Created attachment 296717 [details]
kernel .config (kernel 5.13-rc1, A10-9700E)
Sold the hardware, can no longer test nor reproduce. |
Created attachment 295381 [details] dmesg (kernel 5.11, A10-9700E) I get this on kernel 5.10.17 and it happens on 5.11 too (though a lot less often). Leaving the machine running for some hours sooner or later I get this "smu8_send_msg_to_smc_with_parameter(0x0011, 0x460) timed out after 3234846 us" in kernel dmesg and after that CPU clock frequency is hampered to the 800 MHz base clock of the A10-9700E. Setting cpu manually to performance or back to ondemand does not change a thing, clock stays down at 800 MHz. The machine does not crash and is still usable but well ... slow.. [...] ------------[ cut here ]------------ smu8_send_msg_to_smc_with_parameter(0x0011, 0x460) timed out after 3234846 us WARNING: CPU: 2 PID: 2358578 at smu8_send_msg_to_smc_with_parameter+0xfe/0x140 [amdgpu] Modules linked in: fuse rfkill input_leds joydev ext4 crc16 mbcache jbd2 hid_generic usbhid hid crc32_generic crc32_pclmul f2fs evdev amdgpu aesni_intel libaes crypto_simd cryptd snd_hda_codec_hdmi glue_helper drm_ttm_helper snd_hda_codec_realtek ttm snd_hda_codec_generic ledtrig_audio led_class r8169 fam15h_power realtek gpu_sched k10temp mdio_devres ehci_pci ehci_hcd i2c_piix4 libphy snd_hda_intel snd_intel_dspcfg i2c_algo_bit snd_hda_codec drm_kms_helper snd_hwdep cfbfillrect snd_hda_core syscopyarea xhci_pci cfbimgblt snd_pcm xhci_hcd sysfillrect sysimgblt snd_timer usbcore fb_sys_fops cfbcopyarea snd usb_common fb soundcore font fbdev video acpi_cpufreq button processor nfsd zram nct6775 auth_rpcgss zsmalloc hwmon_vid jc42 lockd hwmon grace sunrpc drm drm_panel_orientation_quirks nfs_ssc backlight CPU: 2 PID: 2358578 Comm: kworker/2:0 Not tainted 5.11.0-gentoo #2 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./A320M-HDV R3.0, BIOS P3.10 06/26/2019 Workqueue: events dm_irq_work_func [amdgpu] RIP: 0010:smu8_send_msg_to_smc_with_parameter+0xfe/0x140 [amdgpu] Code: c0 48 c7 c7 a0 4b a8 c0 48 89 c1 48 f7 ea 48 89 c8 44 89 f1 48 c1 f8 3f 48 c1 fa 07 48 29 c2 49 89 d0 44 89 ea e8 db a8 e4 d5 <0f> 0b eb b3 41 bc ea ff ff ff eb ab 48 8b 7d 60 be c0 01 00 00 48 RSP: 0018:ffffb25b4015bb48 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 00009c65e3a5a2a5 RCX: ffffa23c0f512d58 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffffa23c0f512d50 RBP: ffffa23908ceec00 R08: 0000000000000000 R09: ffffb25b4015b980 R10: 0000000000000003 R11: ffffffff96e57e48 R12: 00000000ffffffc2 R13: 0000000000000011 R14: 0000000000000460 R15: ffffa2390e53128c FS: 0000000000000000(0000) GS:ffffa23c0f500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f99db4fbbc0 CR3: 0000000134f00000 CR4: 00000000001506e0 Call Trace: smum_send_msg_to_smc_with_parameter+0xbe/0xf0 [amdgpu] smu8_set_power_state_tasks+0x1d6/0x360 [amdgpu] phm_set_power_state+0x44/0x68 [amdgpu] psm_adjust_power_state_dynamic+0xb6/0x1b8 [amdgpu] ? amdgpu_fence_wait_empty+0x80/0xd0 [amdgpu] pp_dpm_dispatch_tasks+0x35/0x58 [amdgpu] dm_pp_apply_display_requirements+0x197/0x1c0 [amdgpu] dce11_update_clocks+0x8f/0xe8 [amdgpu] dc_commit_updates_for_stream+0xc2c/0x1980 [amdgpu] ? dc_link_detect_helper+0x516/0xe90 [amdgpu] dm_set_dpms_off+0x85/0xd0 [amdgpu] handle_hpd_irq+0xf1/0x148 [amdgpu] dm_irq_work_func+0x44/0x50 [amdgpu] process_one_work+0x1b8/0x318 worker_thread+0x4b/0x3a0 ? rescuer_thread+0x360/0x360 kthread+0x11c/0x138 ? __kthread_bind_mask+0x58/0x58 ret_from_fork+0x22/0x30 ---[ end trace 0fa0eb110cfd5fb7 ]--- amdgpu: smu8_send_msg_to_smc_with_parameter(0x026e) aborted; SMU still servicing msg (0x0011) amdgpu: smu8_send_msg_to_smc_with_parameter(0x002f) aborted; SMU still servicing msg (0x0011) amdgpu: smu8_send_msg_to_smc_with_parameter(0x0012) aborted; SMU still servicing msg (0x0011) amdgpu: smu8_send_msg_to_smc_with_parameter(0x0013) aborted; SMU still servicing msg (0x0011) amdgpu: smu8_send_msg_to_smc_with_parameter(0x0011) aborted; SMU still servicing msg (0x0011) [drm] amdgpu_dm_irq_schedule_work FAILED src 2 [drm] amdgpu_dm_irq_schedule_work FAILED src 2 amdgpu: smu8_send_msg_to_smc_with_parameter(0x026e) aborted; SMU still servicing msg (0x0011) [drm] amdgpu_dm_irq_schedule_work FAILED src 2 amdgpu: smu8_send_msg_to_smc_with_parameter(0x002f) aborted; SMU still servicing msg (0x0011) amdgpu: smu8_send_msg_to_smc_with_parameter(0x0012) aborted; SMU still servicing msg (0x0011) amdgpu: smu8_send_msg_to_smc_with_parameter(0x0013) aborted; SMU still servicing msg (0x0011) amdgpu: smu8_send_msg_to_smc_with_parameter(0x0011) aborted; SMU still servicing msg (0x0011) amdgpu: smu8_send_msg_to_smc_with_parameter(0x026e) aborted; SMU still servicing msg (0x0011) [...] Some data about the machine: $ inxi -b System: Kernel: 5.11.0-gentoo x86_64 bits: 64 Desktop: MATE 1.24.0 Distro: Gentoo Base System release 2.7 Machine: Type: Desktop Mobo: ASRock model: A320M-HDV R3.0 serial: <superuser/root required> UEFI [Legacy]: American Megatrends v: P3.10 date: 06/26/2019 CPU: Info: Quad Core AMD A10-9700E RADEON R7 10 COMPUTE CORES 4C+6G [MCP] speed: 2485 MHz min/max: 800/3000 MHz Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] driver: amdgpu v: kernel Display: x11 server: X.Org 1.20.10 driver: amdgpu,ati unloaded: fbdev,modesetting resolution: 1920x1080~60Hz OpenGL: renderer: AMD CARRIZO (DRM 3.40.0 5.11.0-gentoo LLVM 11.0.0) v: 4.6 Mesa 20.2.6 Network: Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8169