Bug 210467 - amdgpu Vega 3 lock MCLK on 1200mhz
Summary: amdgpu Vega 3 lock MCLK on 1200mhz
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
: 210465 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-12-03 11:24 UTC by Alexey
Modified: 2020-12-22 06:29 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.9.11, 5.4.80-2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (88.66 KB, text/plain)
2020-12-03 11:29 UTC, Alexey
Details
dmesg debug mode (422.03 KB, text/plain)
2020-12-03 11:30 UTC, Alexey
Details
possible warning fix (1.98 KB, patch)
2020-12-03 21:28 UTC, Alex Deucher
Details | Diff
lshw (22.09 KB, text/plain)
2020-12-06 12:13 UTC, Alexey
Details
glxinfo (146.31 KB, text/plain)
2020-12-06 16:06 UTC, Alexey
Details
modinfo amdgpu (workly) (37.56 KB, text/plain)
2020-12-07 20:32 UTC, Alexey
Details
modules info (workly) (24.52 KB, text/plain)
2020-12-07 20:46 UTC, Alexey
Details

Description Alexey 2020-12-03 11:24:40 UTC
Lock at start (disabled display manager, check in console):
[root@hard alex]# cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 400Mhz 
1: 933Mhz 
2: 1067Mhz 
3: 1200Mhz *

Found two warnings in dmesg:

[   18.539019] ------------[ cut here ]------------
[   18.539021] Bogus possible_crtcs: [ENCODER:65:TMDS-65] possible_crtcs=0xf (full crtc mask=0x7)
[   18.539077] WARNING: CPU: 3 PID: 439 at drivers/gpu/drm/drm_mode_config.c:617 drm_mode_config_validate+0x178/0x200 [drm]
[   18.539077] Modules linked in: cmac algif_hash algif_skcipher af_alg bnep 8821ce(OE) btusb btrtl btbcm edac_mce_amd btintel kvm_amd rtw88_8821ce amdgpu(+) nls_iso8859_1 bluetooth rtw88_8821c nls_cp437 vfat fat rtw88_pci kvm rtw88_core ecdh_generic wmi_bmof ecc mac80211 uvcvideo snd_hda_codec_realtek pcspkr snd_hda_codec_generic irqbypass rapl ledtrig_audio snd_hda_codec_hdmi videobuf2_vmalloc videobuf2_memops rndis_host snd_rn_pci_acp3x k10temp cdc_ether snd_hda_intel snd_pci_acp3x videobuf2_v4l2 usbnet snd_intel_dspcfg videobuf2_common mousedev input_leds snd_hda_codec i2c_hid snd_hda_core joydev mii videodev r8169 snd_hwdep snd_pcm mc cfg80211 realtek snd_timer mdio_devres of_mdio snd fixed_phy ideapad_laptop gpu_sched sp5100_tco sparse_keymap i2c_algo_bit soundcore libphy libarc4 rfkill battery i2c_piix4 ac elan_i2c mac_hid evdev wmi pinctrl_amd acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) vboxvideo drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec rc_core drm syscopyarea
[   18.539113]  sysfillrect sysimgblt fb_sys_fops vboxsf fuse vboxguest crypto_user acpi_call(OE) agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm hid_generic usbhid hid dm_mod serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas ccp xhci_hcd rng_core i8042 serio
[   18.539134] CPU: 3 PID: 439 Comm: systemd-udevd Tainted: G           OE     5.9.11-arch2-1 #1
[   18.539135] Hardware name: LENOVO 81LW/LNVNB161216, BIOS ARCN32WW 07/11/2019
[   18.539150] RIP: 0010:drm_mode_config_validate+0x178/0x200 [drm]
[   18.539153] Code: 39 d6 75 e6 45 85 c8 74 0a 44 89 c0 f7 d0 44 85 c8 74 19 49 8b 55 38 41 8b 75 18 44 89 c9 48 c7 c7 70 61 46 c0 e8 38 9f 79 d1 <0f> 0b 49 8b 45 08 4c 8d 68 f8 49 39 c4 0f 85 ed fe ff ff 5b 5d 41
[   18.539155] RSP: 0018:ffffaf0e405cfa58 EFLAGS: 00010282
[   18.539157] RAX: 0000000000000000 RBX: ffff9f25700c1ad0 RCX: 0000000000000000
[   18.539158] RDX: 0000000000000001 RSI: ffffffff92559a97 RDI: 00000000ffffffff
[   18.539159] RBP: 0000000000000001 R08: 000000000000044f R09: 0000000000000001
[   18.539160] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f25700c1ad8
[   18.539161] R13: ffff9f25649ffc00 R14: ffff9f25700c1ad8 R15: 000000000000001f
[   18.539163] FS:  00007fc5ceee7ec0(0000) GS:ffff9f2574ac0000(0000) knlGS:0000000000000000
[   18.539164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.539165] CR2: 00007fa743c07270 CR3: 00000002b0202000 CR4: 00000000003506e0
[   18.539166] Call Trace:
[   18.539188]  drm_dev_register+0x158/0x1b0 [drm]
[   18.539303]  amdgpu_pci_probe+0x107/0x180 [amdgpu]
[   18.539310]  local_pci_probe+0x42/0x80
[   18.539313]  ? pci_match_device+0xd7/0x100
[   18.539317]  pci_device_probe+0xfa/0x1b0
[   18.539322]  really_probe+0x205/0x460
[   18.539325]  driver_probe_device+0xe1/0x150
[   18.539329]  device_driver_attach+0xa1/0xb0
[   18.539332]  __driver_attach+0x8a/0x150
[   18.539334]  ? device_driver_attach+0xb0/0xb0
[   18.539336]  ? device_driver_attach+0xb0/0xb0
[   18.539338]  bus_for_each_dev+0x89/0xd0
[   18.539340]  bus_add_driver+0x12b/0x1e0
[   18.539342]  driver_register+0x8b/0xe0
[   18.539345]  ? 0xffffffffc17d2000
[   18.539348]  do_one_initcall+0x59/0x240
[   18.539353]  do_init_module+0x5c/0x260
[   18.539359]  load_module+0x21a7/0x2450
[   18.539365]  __do_sys_init_module+0x12d/0x180
[   18.539373]  do_syscall_64+0x33/0x40
[   18.539376]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   18.539378] RIP: 0033:0x7fc5cf7d6e4e
[   18.539380] Code: 48 8b 0d 25 10 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f2 0f 0c 00 f7 d8 64 89 01 48
[   18.539381] RSP: 002b:00007ffd09434ca8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[   18.539383] RAX: ffffffffffffffda RBX: 00005634744ec9f0 RCX: 00007fc5cf7d6e4e
[   18.539384] RDX: 00005634744ec3d0 RSI: 0000000000afd361 RDI: 00007fc5cd0db010
[   18.539385] RBP: 00007fc5cd0db010 R08: fffffffffffffff0 R09: 0000000000afd370
[   18.539386] R10: 0000563474393010 R11: 0000000000000246 R12: 00005634744ec3d0
[   18.539387] R13: 0000000000000060 R14: 00005634744f1b30 R15: 00005634744ec9f0
[   18.539390] ---[ end trace 6aa1c68690dd6fcb ]---
[   18.539391] ------------[ cut here ]------------
[   18.539393] Bogus possible_crtcs: [ENCODER:71:TMDS-71] possible_crtcs=0xf (full crtc mask=0x7)
[   18.539432] WARNING: CPU: 3 PID: 439 at drivers/gpu/drm/drm_mode_config.c:617 drm_mode_config_validate+0x178/0x200 [drm]
[   18.539432] Modules linked in: cmac algif_hash algif_skcipher af_alg bnep 8821ce(OE) btusb btrtl btbcm edac_mce_amd btintel kvm_amd rtw88_8821ce amdgpu(+) nls_iso8859_1 bluetooth rtw88_8821c nls_cp437 vfat fat rtw88_pci kvm rtw88_core ecdh_generic wmi_bmof ecc mac80211 uvcvideo snd_hda_codec_realtek pcspkr snd_hda_codec_generic irqbypass rapl ledtrig_audio snd_hda_codec_hdmi videobuf2_vmalloc videobuf2_memops rndis_host snd_rn_pci_acp3x k10temp cdc_ether snd_hda_intel snd_pci_acp3x videobuf2_v4l2 usbnet snd_intel_dspcfg videobuf2_common mousedev input_leds snd_hda_codec i2c_hid snd_hda_core joydev mii videodev r8169 snd_hwdep snd_pcm mc cfg80211 realtek snd_timer mdio_devres of_mdio snd fixed_phy ideapad_laptop gpu_sched sp5100_tco sparse_keymap i2c_algo_bit soundcore libphy libarc4 rfkill battery i2c_piix4 ac elan_i2c mac_hid evdev wmi pinctrl_amd acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) vboxvideo drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec rc_core drm syscopyarea
[   18.539461]  sysfillrect sysimgblt fb_sys_fops vboxsf fuse vboxguest crypto_user acpi_call(OE) agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted tpm hid_generic usbhid hid dm_mod serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas ccp xhci_hcd rng_core i8042 serio
[   18.539475] CPU: 3 PID: 439 Comm: systemd-udevd Tainted: G        W  OE     5.9.11-arch2-1 #1
[   18.539476] Hardware name: LENOVO 81LW/LNVNB161216, BIOS ARCN32WW 07/11/2019
[   18.539495] RIP: 0010:drm_mode_config_validate+0x178/0x200 [drm]
[   18.539499] Code: 39 d6 75 e6 45 85 c8 74 0a 44 89 c0 f7 d0 44 85 c8 74 19 49 8b 55 38 41 8b 75 18 44 89 c9 48 c7 c7 70 61 46 c0 e8 38 9f 79 d1 <0f> 0b 49 8b 45 08 4c 8d 68 f8 49 39 c4 0f 85 ed fe ff ff 5b 5d 41
[   18.539500] RSP: 0018:ffffaf0e405cfa58 EFLAGS: 00010282
[   18.539502] RAX: 0000000000000000 RBX: ffff9f25700c1ad0 RCX: 0000000000000000
[   18.539503] RDX: 0000000000000001 RSI: ffffffff92559a97 RDI: 00000000ffffffff
[   18.539504] RBP: 0000000000000001 R08: 0000000000000480 R09: 0000000000000001
[   18.539505] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f25700c1ad8
[   18.539507] R13: ffff9f2564956000 R14: ffff9f25700c1ad8 R15: 000000000000001f
[   18.539508] FS:  00007fc5ceee7ec0(0000) GS:ffff9f2574ac0000(0000) knlGS:0000000000000000
[   18.539509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.539511] CR2: 00007fa743c07270 CR3: 00000002b0202000 CR4: 00000000003506e0
[   18.539513] Call Trace:
[   18.539534]  drm_dev_register+0x158/0x1b0 [drm]
[   18.539631]  amdgpu_pci_probe+0x107/0x180 [amdgpu]
[   18.539634]  local_pci_probe+0x42/0x80
[   18.539636]  ? pci_match_device+0xd7/0x100
[   18.539640]  pci_device_probe+0xfa/0x1b0
[   18.539644]  really_probe+0x205/0x460
[   18.539648]  driver_probe_device+0xe1/0x150
[   18.539650]  device_driver_attach+0xa1/0xb0
[   18.539654]  __driver_attach+0x8a/0x150
[   18.539659]  ? device_driver_attach+0xb0/0xb0
[   18.539662]  ? device_driver_attach+0xb0/0xb0
[   18.539664]  bus_for_each_dev+0x89/0xd0
[   18.539668]  bus_add_driver+0x12b/0x1e0
[   18.539672]  driver_register+0x8b/0xe0
[   18.539675]  ? 0xffffffffc17d2000
[   18.539677]  do_one_initcall+0x59/0x240
[   18.539681]  do_init_module+0x5c/0x260
[   18.539686]  load_module+0x21a7/0x2450
[   18.539692]  __do_sys_init_module+0x12d/0x180
[   18.539697]  do_syscall_64+0x33/0x40
[   18.539700]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   18.539702] RIP: 0033:0x7fc5cf7d6e4e
[   18.539704] Code: 48 8b 0d 25 10 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f2 0f 0c 00 f7 d8 64 89 01 48
[   18.539705] RSP: 002b:00007ffd09434ca8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[   18.539708] RAX: ffffffffffffffda RBX: 00005634744ec9f0 RCX: 00007fc5cf7d6e4e
[   18.539710] RDX: 00005634744ec3d0 RSI: 0000000000afd361 RDI: 00007fc5cd0db010
[   18.539712] RBP: 00007fc5cd0db010 R08: fffffffffffffff0 R09: 0000000000afd370
[   18.539713] R10: 0000563474393010 R11: 0000000000000246 R12: 00005634744ec3d0
[   18.539715] R13: 0000000000000060 R14: 00005634744f1b30 R15: 00005634744ec9f0
[   18.539718] ---[ end trace 6aa1c68690dd6fcc ]---
Comment 1 Alexey 2020-12-03 11:28:22 UTC
*** Bug 210465 has been marked as a duplicate of this bug. ***
Comment 2 Alexey 2020-12-03 11:29:57 UTC
Created attachment 293915 [details]
dmesg
Comment 3 Alexey 2020-12-03 11:30:51 UTC
Created attachment 293917 [details]
dmesg debug mode
Comment 4 Alex Deucher 2020-12-03 21:28:23 UTC
Created attachment 293925 [details]
possible warning fix

The warning in the log is harmless, but this patch should fix it.
Comment 5 Alexey 2020-12-06 12:13:40 UTC
Created attachment 293959 [details]
lshw

what other data to provide? what to debug?
Comment 6 Alexey 2020-12-06 16:06:33 UTC
Created attachment 293971 [details]
glxinfo
Comment 7 Alexey 2020-12-07 17:29:43 UTC
Problem in firmware. I downgrade linux-firmware package, pp_dpm_mclk now "0: 400Mhz" at start.
Comment 8 Alex Deucher 2020-12-07 19:25:25 UTC
Can you narrow down which specific firmware?  Also what what versions did you try?
Comment 9 Alexey 2020-12-07 20:32:49 UTC
Created attachment 294027 [details]
modinfo amdgpu (workly)

(In reply to Alex Deucher from comment #8)
> Can you narrow down which specific firmware?  Also what what versions did
> you try?

workly: linux-firmware 20201023.dae4b4c-1
All new versions is bugged. Cant do build per single commits (expensive traffic).
Comment 10 Alexey 2020-12-07 20:46:27 UTC
Created attachment 294029 [details]
modules info (workly)
Comment 11 cpl 2020-12-22 06:29:16 UTC
Same issue on Vega 6, this is also causing worse gpu performance. Reverting this commit fixes it: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=3bcc4c1df91f879fb4f01a6b48a9c12fc7b6e724

That said, memory clock is now stuck on 933Mhz. (Performance is good, though.)

Note You need to log in before you can comment on or make changes to this bug.