Bug 216092

Summary: rn_vbios_smu_send_msg_with_param+0xf9/0x100 - amdgpu
Product: Drivers Reporter: sander44 (ionut_n2001)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexdeucher, arek.rusi, CoelacanthusHex, mario.limonciello, tr.ml
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.19.0-rc1 Subsystem:
Regression: No Bisected commit-id:

Description sander44 2022-06-07 06:19:27 UTC
Hi Kernel Team,

I build today 5.19.0-rc1 and i notice this issue:

Git: 

Merge tag 'dma-mapping-5.19-2022-06-06' of git://git.infradead.org/users/hch/dma-mapping HEAD master
Pull dma-mapping fixes from Christoph Hellwig:

 - fix a regressin in setting swiotlb ->force_bounce (me)

 - make dma-debug less chatty (Rob Clark)

* tag 'dma-mapping-5.19-2022-06-06' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: fix setting ->force_bounce
  dma-debug: make things less spammy under memory pressure

Error/Warning:

[    2.491290] ------------[ cut here ]------------
[    2.491291] WARNING: CPU: 15 PID: 272 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr_vbios_smu.c:98 rn_vbios_smu_send_msg_with_param+0xf9/0x100 [amdgpu]
[    2.491554] Modules linked in: hid_asus asus_wmi sparse_keymap platform_profile usbhid amdgpu(+) iommu_v2 gpu_sched drm_buddy i2c_algo_bit drm_ttm_helper ttm drm_display_helper cec rc_core drm_kms_helper syscopyarea sysfillrect sysimgblt nvme hid_generic fb_sys_fops crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd drm nvme_core xhci_pci i2c_piix4 xhci_pci_renesas wmi i2c_hid_acpi i2c_hid video hid
[    2.491574] CPU: 15 PID: 272 Comm: systemd-udevd Not tainted 5.19.0-rc1-lowlatency #1
[    2.491577] Hardware name: ASUSTeK COMPUTER INC. ROG Zephyrus G14 GA401QM_GA401QM/GA401QM, BIOS GA401QM.410 12/13/2021
[    2.491578] RIP: 0010:rn_vbios_smu_send_msg_with_param+0xf9/0x100 [amdgpu]
[    2.491809] Code: 1e 49 8b 3c 24 48 c7 c2 e0 8c a2 c0 be 93 62 01 00 e8 5b cc e8 ff 5b 41 5c 41 5d 41 5e 5d c3 3d fe 00 00 00 74 db 0f 0b eb d7 <0f> 0b e9 55 ff ff ff 0f 1f 44 00 00 55 31 d2 be 02 00 00 00 48 89
[    2.491811] RSP: 0018:ffffb259409973d8 EFLAGS: 00010202
[    2.491813] RAX: 00000000000000fe RBX: 0000000000030d41 RCX: 000000000000000b
[    2.491815] RDX: 0000000000000000 RSI: 000000000001629b RDI: ffff890d5d9c0000
[    2.491816] RBP: ffffb259409973f8 R08: ffffffffc0c84a60 R09: 0000000000000002
[    2.491817] R10: 0000000000000002 R11: 0000000000000001 R12: ffff890d58c43e00
[    2.491818] R13: 000000000000000d R14: 0000000000000001 R15: 0000000000000003
[    2.491819] FS:  00007f9aeb0dd8c0(0000) GS:ffff89140e9c0000(0000) knlGS:0000000000000000
[    2.491820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.491821] CR2: 00007f9aeb77b800 CR3: 00000001199e4000 CR4: 0000000000750ee0
[    2.491823] PKRU: 55555554
[    2.491824] Call Trace:
[    2.491825]  <TASK>
[    2.491827]  rn_vbios_smu_enable_48mhz_tmdp_refclk_pwrdwn+0x17/0x20 [amdgpu]
[    2.492098]  rn_clk_mgr_construct+0x13c/0xe40 [amdgpu]
[    2.492296]  dc_clk_mgr_create+0x408/0x590 [amdgpu]
[    2.492487]  dc_create+0x24e/0x640 [amdgpu]
[    2.492701]  amdgpu_dm_init.isra.0+0x222/0x300 [amdgpu]
[    2.492901]  ? dev_vprintk_emit+0x171/0x195
[    2.492904]  ? dev_printk_emit+0x4e/0x65
[    2.492906]  dm_hw_init+0x13/0x30 [amdgpu]
[    2.493096]  amdgpu_device_init.cold+0x1a06/0x1ed9 [amdgpu]
[    2.493284]  ? pci_read_config_word+0x27/0x40
[    2.493286]  ? do_pci_enable_device+0xd7/0x100
[    2.493288]  amdgpu_driver_load_kms+0x1c/0x160 [amdgpu]
[    2.493366]  amdgpu_pci_probe+0x16f/0x3b0 [amdgpu]
[    2.493442]  local_pci_probe+0x4b/0x90
[    2.493444]  ? pci_match_device+0xde/0x130
[    2.493445]  pci_device_probe+0xc8/0x270
[    2.493447]  really_probe+0x1d2/0x3b0
[    2.493449]  __driver_probe_device+0x115/0x190
[    2.493450]  driver_probe_device+0x23/0xc0
[    2.493451]  __driver_attach+0xbd/0x1e0
[    2.493452]  ? __device_attach_driver+0x110/0x110
[    2.493453]  bus_for_each_dev+0x7f/0xc0
[    2.493454]  driver_attach+0x1e/0x20
[    2.493454]  bus_add_driver+0x170/0x210
[    2.493455]  driver_register+0x95/0xf0
[    2.493456]  __pci_register_driver+0x68/0x70
[    2.493457]  amdgpu_init+0x6e/0x1000 [amdgpu]
[    2.493542]  ? 0xffffffffc0e08000
[    2.493543]  do_one_initcall+0x49/0x210
[    2.493545]  ? kmem_cache_alloc_trace+0x1a6/0x320
[    2.493548]  do_init_module+0x52/0x210
[    2.493550]  load_module+0x1ec6/0x2350
[    2.493552]  __do_sys_finit_module+0xc5/0x130
[    2.493553]  ? __do_sys_finit_module+0xc5/0x130
[    2.493555]  __x64_sys_finit_module+0x18/0x20
[    2.493556]  do_syscall_64+0x5c/0x80
[    2.493558]  ? ksys_mmap_pgoff+0x10c/0x250
[    2.493559]  ? do_syscall_64+0x69/0x80
[    2.493560]  ? exit_to_user_mode_prepare+0x35/0x170
[    2.493561]  ? syscall_exit_to_user_mode+0x26/0x40
[    2.493563]  ? __x64_sys_mmap+0x33/0x40
[    2.493565]  ? do_syscall_64+0x69/0x80
[    2.493565]  ? sysvec_call_function+0x4e/0x90
[    2.493566]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[    2.493568] RIP: 0033:0x7f9aeb7d5a3d
[    2.493570] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[    2.493571] RSP: 002b:00007fff950f45f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    2.493572] RAX: ffffffffffffffda RBX: 0000561e0e3360a0 RCX: 00007f9aeb7d5a3d
[    2.493573] RDX: 0000000000000000 RSI: 00007f9aeb96c441 RDI: 000000000000001a
[    2.493573] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
[    2.493574] R10: 000000000000001a R11: 0000000000000246 R12: 00007f9aeb96c441
[    2.493574] R13: 0000561e0e32f820 R14: 0000561e0e32bd60 R15: 0000561e0e32e6d0
[    2.493575]  </TASK>
[    2.493576] ---[ end trace 0000000000000000 ]---
[    2.494098] [drm] Display Core initialized with v3.2.186!
Comment 1 RockT 2022-06-15 09:39:17 UTC
I see exactly the same with Manjaro testing kernel 5.19.0-rc1.


[    7.843798] amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    7.852927] amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    7.852928] amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    7.853513] amdgpu 0000:06:00.0: amdgpu: SMU is initialized successfully!
[    7.853691] ------------[ cut here ]------------
[    7.853692] WARNING: CPU: 0 PID: 432 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr_vbios_smu.c:98 rn_vbios_smu_send_msg_with_param+0xf1/0x100 [amdgpu]
[    7.853897] Modules linked in: amd64_edac(-) pcc_cpufreq(-) fjes(+) snd_usb_audio(+) mhi_wwan_ctrl mhi_wwan_mbim snd_usbmidi_lib snd_rawmidi snd_seq_device bnep squashfs loop btusb btrtl qrtr btbcm uvcvideo btintel btmtk videobuf2_vmalloc videobuf2_memops intel_rapl_msr snd_acp3x_rn snd_soc_dmic bluetooth videobuf2_v4l2 videobuf2_common videodev snd_acp3x_pdm_dma vfat ecdh_generic hid_multitouch iwlmvm snd_sof_amd_renoir think_lmi(+) crc16 mc fat snd_ctl_led snd_hda_codec_realtek firmware_attributes_class snd_sof_amd_acp wmi_bmof intel_rapl_common amdgpu(+) snd_sof_pci mac80211 snd_hda_codec_generic snd_hda_codec_hdmi snd_sof snd_hda_intel snd_intel_dspcfg snd_sof_utils snd_intel_sdw_acpi snd_soc_core libarc4 edac_mce_amd snd_hda_codec snd_compress iwlwifi ac97_bus gpu_sched kvm_amd snd_hda_core snd_pcm_dmaengine drm_buddy snd_acp_pci drm_ttm_helper iwlmei snd_pci_acp6x thinkpad_acpi ttm snd_pci_acp5x snd_hwdep ledtrig_audio r8169 cfg80211 kvm snd_pcm snd_rn_pci_acp3x ucsi_acpi
[    7.853926]  platform_profile snd_timer snd_acp_config realtek drm_display_helper snd typec_ucsi irqbypass rfkill mhi_pci_generic mdio_devres sp5100_tco snd_soc_acpi tpm_crb snd_pci_acp3x cec soundcore rapl psmouse mei libphy typec mhi i2c_piix4 k10temp roles i2c_hid_acpi tpm_tis i2c_hid video wmi tpm_tis_core amd_pmc acpi_cpufreq pinctrl_amd joydev mousedev mac_hid uinput ipmi_devintf ipmi_msghandler fuse crypto_user bpf_preload ip_tables x_tables hid_logitech_hidpp xfs libcrc32c crc32c_generic hid_logitech_dj hid_jabra usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm dm_mod serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul vivaldi_fmap crc32c_intel ghash_clmulni_intel aesni_intel nvme crypto_simd ccp xhci_pci cryptd i8042 nvme_core rng_core xhci_pci_renesas serio
[    7.853954] CPU: 0 PID: 432 Comm: systemd-udevd Not tainted 5.19.0-1-MANJARO #1 12b6001e0e27e2c5de0e86e6e0c9807155e77ed6
[    7.853957] Hardware name: LENOVO 20XF006GGE/20XF006GGE, BIOS R1NET50W (1.20) 04/14/2022
[    7.853958] RIP: 0010:rn_vbios_smu_send_msg_with_param+0xf1/0x100 [amdgpu]
[    7.854144] Code: f8 01 75 1b 48 8b 7d 00 5b be 93 62 01 00 48 c7 c2 a0 9f 6c c1 5d 41 5c 41 5d e9 3a ed f4 ff 3d fe 00 00 00 74 de 0f 0b eb da <0f> 0b e9 58 ff ff ff 0f 1f 84 00 00 00 00 00 66 0f 1f 00 0f 1f 44
[    7.854146] RSP: 0018:ffff9ae4030076f0 EFLAGS: 00010202
[    7.854148] RAX: 00000000000000fe RBX: 0000000000030d41 RCX: ffffffffc1938118
[    7.854149] RDX: 0000000000000000 RSI: 000000000001629b RDI: ffff8be184400000
[    7.854150] RBP: ffff8be186cdf800 R08: ffff8be1a1086800 R09: 0000000000000cb1
[    7.854151] R10: 000000000000001e R11: 0036ee8000000000 R12: 000000000000000d
[    7.854152] R13: 0000000000000001 R14: 000000000000018f R15: ffff8be186cdf800
[    7.854153] FS:  00007f8967b45080(0000) GS:ffff8be84ee00000(0000) knlGS:0000000000000000
[    7.854154] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.854154] CR2: 00005636d87b17c8 CR3: 0000000105fde000 CR4: 0000000000750ef0
[    7.854155] PKRU: 55555554
[    7.854156] Call Trace:
[    7.854158]  <TASK>
[    7.854160]  rn_clk_mgr_construct+0x151/0x620 [amdgpu a6a2d017171775457fc880c1e7e3ceb4f3d662e5]
[    7.854316]  dc_clk_mgr_create+0x42c/0x5d0 [amdgpu a6a2d017171775457fc880c1e7e3ceb4f3d662e5]
[    7.854464]  dc_create+0x23c/0x5b0 [amdgpu a6a2d017171775457fc880c1e7e3ceb4f3d662e5]
[    7.854620]  amdgpu_dm_init.isra.0+0x22d/0x350 [amdgpu a6a2d017171775457fc880c1e7e3ceb4f3d662e5]
[    7.854775]  ? dev_vprintk_emit+0x177/0x19c
[    7.854781]  dm_hw_init+0x12/0x20 [amdgpu a6a2d017171775457fc880c1e7e3ceb4f3d662e5]
[    7.854927]  amdgpu_device_init.cold+0x17b4/0x1d57 [amdgpu a6a2d017171775457fc880c1e7e3ceb4f3d662e5]
[    7.855089]  amdgpu_driver_load_kms+0x19/0x130 [amdgpu a6a2d017171775457fc880c1e7e3ceb4f3d662e5]
[    7.855214]  amdgpu_pci_probe+0x148/0x360 [amdgpu a6a2d017171775457fc880c1e7e3ceb4f3d662e5]
[    7.855335]  local_pci_probe+0x42/0x80
[    7.855339]  pci_device_probe+0xc1/0x220
[    7.855341]  ? sysfs_do_create_link_sd+0x6a/0xd0
[    7.855344]  really_probe+0x1bc/0x390
[    7.855348]  __driver_probe_device+0xfc/0x170
[    7.855349]  driver_probe_device+0x1f/0x90
[    7.855351]  __driver_attach+0xbf/0x1b0
[    7.855352]  ? __device_attach_driver+0xe0/0xe0
[    7.855353]  bus_for_each_dev+0x84/0xd0
[    7.855355]  bus_add_driver+0x15d/0x200
[    7.855357]  driver_register+0x8d/0xe0
[    7.855358]  ? 0xffffffffc1acb000
[    7.855359]  do_one_initcall+0x5a/0x220
[    7.855363]  do_init_module+0x4a/0x1e0
[    7.855365]  __do_sys_init_module+0x138/0x1b0
[    7.855367]  do_syscall_64+0x5c/0x90
[    7.855370]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[    7.855372] RIP: 0033:0x7f896831299e
[    7.855374] Code: 48 8b 0d fd a3 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca a3 0e 00 f7 d8 64 89 01 48
[    7.855375] RSP: 002b:00007ffdbabccaf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    7.855376] RAX: ffffffffffffffda RBX: 00005564361c9ba0 RCX: 00007f896831299e
[    7.855376] RDX: 00007f896899a32c RSI: 00000000011c22e7 RDI: 00007f8964af6010
[    7.855377] RBP: 00007f8964af6010 R08: 00005564361c2ae0 R09: 0000000000000000
[    7.855377] R10: 0000000000000005 R11: 0000000000000246 R12: 00007f896899a32c
[    7.855378] R13: 00005564361c4e80 R14: 00005564361c9ba0 R15: 000055643611f560
[    7.855379]  </TASK>
[    7.855380] ---[ end trace 0000000000000000 ]---
[    7.855519] [drm] Display Core initialized with v3.2.186!

System:
Lenovo T14s
CPU AMD Ryzen 7 PRO 5850U with Radeon Graphics
Comment 2 Alex Deucher 2022-06-15 13:42:43 UTC
Can you bisect?
Comment 3 Alex Deucher 2022-06-15 13:44:07 UTC
Does reverting c1b972a18d05d007f0ddff31db2ff50790576e92 fix the issue?
Comment 4 RockT 2022-06-15 13:56:32 UTC
(In reply to Alex Deucher from comment #3)
> Does reverting c1b972a18d05d007f0ddff31db2ff50790576e92 fix the issue?

I never rebuild an Arch/Manjaro Kernel.
Will try but cannot promise.
Comment 5 Arek Ruśniak 2022-07-25 12:52:07 UTC
Hi, 
I have Asus ROG Zephyrus G15 with cezzane gpu, and this bug occured me too.

I build 5.19-rc8 with reverted: 

commit c1b972a18d05d007f0ddff31db2ff50790576e92
Author: Oliver Logush <oliver.logush@amd.com>
Date:   Tue Mar 22 10:26:19 2022 -0400

    drm/amd/display: Insert pulling smu busy status before sending another request

It fixed issue!
@Alex do you need bisecting still for some reason? If yes, ping me please.
Comment 6 Alex Deucher 2022-08-08 16:53:46 UTC
See also:
https://gitlab.freedesktop.org/drm/amd/-/issues/2110
Comment 7 Mario Limonciello (AMD) 2022-08-08 18:32:02 UTC
As pointed out in the linked #2110, it's fixed in Linus' tree by this:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=149f6d1a6035a7aa6595ac6eeb9c8f566b2103cd

Which will be part of 6.0-rc1.