Created attachment 292451 [details] dmesg On each boot, the kernel throws an oops. Typical -------------------- Arch Linux 5.8.7 and 5.8.8 Using kernel parameter amdgpu.runpm=0 (if not, immediate crash). Using Plymouth Model: MSI Bravo 17 - A4DDR-035NL CPU: AMD Ryzen 7 4800H with integrated Vega GPU GPU: AMD Navi 14 - Radeon RX 5500M Oops -------------------- [ 4.621337] [drm] Initialized amdgpu 3.38.0 20150101 for 0000:07:00.0 on minor 1 [ 4.640077] ------------[ cut here ]------------ [ 4.640156] WARNING: CPU: 0 PID: 953 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2545 dc_link_set_backlight_level+0x8a/0xf0 [amdgpu] [ 4.640156] Modules linked in: snd_acp3x_pdm_dma snd_acp3x_rn snd_soc_dmic snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_rn_pci_acp3x snd_pci_acp3x btusb btrtl btbcm btintel bluetooth ecdh_generic ecc iwlmvm amdgpu mac80211 joydev snd_hda_codec_realtek mousedev libarc4 snd_hda_codec_generic ledtrig_audio iwlwifi snd_hda_codec_hdmi hid_multitouch snd_hda_intel hid_generic snd_intel_dspcfg snd_hda_codec gpu_sched msi_wmi i2c_algo_bit sparse_keymap edac_mce_amd ttm snd_hda_core cfg80211 kvm_amd snd_hwdep snd_pcm drm_kms_helper kvm r8169 snd_timer cec realtek rc_core syscopyarea snd sp5100_tco irqbypass sysfillrect psmouse sysimgblt rapl input_leds fb_sys_fops pcspkr k10temp soundcore libphy i2c_piix4 rfkill tpm_crb wmi battery i2c_hid ac tpm_tis tpm_tis_core hid tpm pinctrl_amd uvcvideo acpi_cpufreq videobuf2_vmalloc videobuf2_memops evdev videobuf2_v4l2 soc_button_array mac_hid videobuf2_common videodev mc vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) dm_mod drm sg crypto_user agpgart [ 4.640182] ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas ccp xhci_hcd rng_core i8042 serio [ 4.640190] CPU: 0 PID: 953 Comm: systemd-backlig Tainted: G OE 5.8.7-arch1-1 #1 [ 4.640191] Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.116 07/10/2020 [ 4.640240] RIP: 0010:dc_link_set_backlight_level+0x8a/0xf0 [amdgpu] [ 4.640241] Code: 30 03 00 00 31 c0 48 8d 96 c0 01 00 00 48 8b 0a 48 85 c9 74 06 48 3b 59 08 74 20 83 c0 01 48 81 c2 c8 04 00 00 83 f8 06 75 e3 <0f> 0b 45 31 e4 5b 44 89 e0 5d 41 5c 41 5d 41 5e c3 48 98 48 69 c0 [ 4.640242] RSP: 0018:ffffadc080807df0 EFLAGS: 00010246 [ 4.640243] RAX: 0000000000000006 RBX: ffff9cd5438c9800 RCX: 0000000000000000 [ 4.640243] RDX: ffff9cd53f801e70 RSI: ffff9cd53f800000 RDI: 0000000000000000 [ 4.640244] RBP: ffff9cd543900000 R08: 00000000000000ff R09: 000000000000000a [ 4.640244] R10: 000000000000000a R11: f000000000000000 R12: 000000000000ff01 [ 4.640245] R13: 0000000000000000 R14: 000000000000ffff R15: ffff9cd549120260 [ 4.640246] FS: 00007f5b331a8000(0000) GS:ffff9cd55f600000(0000) knlGS:0000000000000000 [ 4.640246] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.640247] CR2: 00005598681fe978 CR3: 0000000408e64000 CR4: 0000000000340ef0 [ 4.640247] Call Trace: [ 4.640302] amdgpu_dm_backlight_update_status+0xb4/0xc0 [amdgpu] [ 4.640321] backlight_device_set_brightness+0x7e/0x130 [ 4.640323] brightness_store+0x63/0x80 [ 4.640326] kernfs_fop_write+0xce/0x1b0 [ 4.640329] vfs_write+0xc7/0x1f0 [ 4.640331] ksys_write+0x67/0xe0 [ 4.640335] do_syscall_64+0x44/0x70 [ 4.640337] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 4.640339] RIP: 0033:0x7f5b33fc0f67 [ 4.640341] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 4.640342] RSP: 002b:00007ffc766ac878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 4.640343] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f5b33fc0f67 [ 4.640343] RDX: 0000000000000004 RSI: 00007ffc766ac960 RDI: 0000000000000004 [ 4.640344] RBP: 00007ffc766ac960 R08: 0000000000000000 R09: 0000000000000000 [ 4.640344] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 4.640344] R13: 00005598681e83c0 R14: 0000000000000004 R15: 00007f5b34093720 [ 4.640346] ---[ end trace de75e01f35cca025 ]--- Dmesg is attached.
Possibly related to this fix: https://bugzilla.kernel.org/show_bug.cgi?id=203905 If I looked well at kernel.org, it seemed that this fix is included in kernel 5.7.8 (the one in the report). So, *possibly* this is a side effect or bug introduced after this fix?
I found out that this oops is triggered by systemd-backlight, which is auto enabled on boot. There is a way to disable this feature, by adding the following kernel parameter: systemd.restore_state=0 This prevents the oops, but creates another oops, see below. Maybe this oops, is the reason why the backlight setting oopses. Dmesg without the systemd-backlight call is attached. [ 4.694971] amdgpu 0000:07:00.0: amdgpu: SMU is initialized successfully! [ 4.696369] [drm] kiq ring mec 2 pipe 1 q 0 [ 4.697106] ------------[ cut here ]------------ [ 4.697329] WARNING: CPU: 10 PID: 398 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr.c:716 rn_clk_mgr_construct+0x142/0x3f0 [amdgpu] [ 4.697330] Modules linked in: snd_pci_acp3x btusb btrtl btbcm btintel bluetooth ecdh_generic iwlmvm ecc snd_hda_codec_realtek joydev amdgpu(+) mousedev snd_hda_codec_generic mac80211 ledtrig_audio snd_hda_codec_hdmi libarc4 snd_hda_intel edac_mce_amd gpu_sched snd_intel_dspcfg i2c_algo_bit kvm_amd snd_hda_codec ttm hid_multitouch snd_hda_core r8169 hid_generic msi_wmi iwlwifi sparse_keymap kvm drm_kms_helper snd_hwdep realtek snd_pcm cec irqbypass mdio_devres rc_core of_mdio cfg80211 psmouse rapl snd_timer input_leds fixed_phy syscopyarea pcspkr sp5100_tco snd sysfillrect libphy tpm_crb sysimgblt rfkill k10temp i2c_piix4 uvcvideo ac wmi battery tpm_tis soundcore fb_sys_fops tpm_tis_core videobuf2_vmalloc i2c_hid videobuf2_memops tpm videobuf2_v4l2 hid pinctrl_amd videobuf2_common videodev soc_button_array evdev mc acpi_cpufreq mac_hid drm sg dm_mod crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul [ 4.697386] crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_hcd ccp rng_core i8042 serio [ 4.697398] CPU: 10 PID: 398 Comm: systemd-udevd Not tainted 5.9.0-rc4-1-git-00038-g581cb3a26baf #1 [ 4.697400] Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.116 07/10/2020 [ 4.697608] RIP: 0010:rn_clk_mgr_construct+0x142/0x3f0 [amdgpu] [ 4.697613] Code: 00 00 00 41 8b 8c c4 80 00 00 00 41 89 c1 89 c7 85 c9 74 10 41 8b 94 c4 84 00 00 00 85 d2 0f 85 aa 01 00 00 48 83 e8 01 73 d9 <0f> 0b 83 7b 20 01 74 0c 81 bd e8 00 00 00 ff 14 37 00 7f 27 48 8b [ 4.697615] RSP: 0018:ffffa92c4286b6c0 EFLAGS: 00010297 [ 4.697617] RAX: ffffffffffffffff RBX: ffff973a7e874180 RCX: 0000000000000000 [ 4.697619] RDX: ffff973a92ac1e80 RSI: ffffa92c4286b6e8 RDI: 0000000000000000 [ 4.697620] RBP: ffff973a98469e00 R08: 0000000000000000 R09: 0000000000000000 [ 4.697622] R10: 7fc9117fffffffff R11: ffff973a7e88f400 R12: ffffa92c4286b6e8 [ 4.697623] R13: ffff973a7e874e40 R14: ffff973a8a190000 R15: ffff973a7e874180 [ 4.697625] FS: 00007f705a9e9440(0000) GS:ffff973a9f680000(0000) knlGS:0000000000000000 [ 4.697627] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.697628] CR2: 000055de9eb7a774 CR3: 0000000419700000 CR4: 0000000000350ee0 [ 4.697630] Call Trace: [ 4.697842] dc_clk_mgr_create+0x172/0x1b0 [amdgpu] [ 4.698042] dc_create+0x24a/0x7a0 [amdgpu] [ 4.698050] ? kmem_cache_alloc_trace+0x106/0x240 [ 4.698255] amdgpu_dm_init.isra.0+0x17f/0x1e0 [amdgpu] [ 4.698460] dm_hw_init+0xe/0x20 [amdgpu] [ 4.698666] amdgpu_device_init.cold+0x171a/0x19d8 [amdgpu] [ 4.698827] amdgpu_driver_load_kms+0x5c/0x230 [amdgpu] [ 4.698984] amdgpu_pci_probe+0xf4/0x180 [amdgpu] [ 4.698991] local_pci_probe+0x42/0x80 [ 4.698995] ? pci_match_device+0xd7/0x100 [ 4.698998] pci_device_probe+0xfa/0x1b0 [ 4.699002] really_probe+0x205/0x460 [ 4.699005] driver_probe_device+0xe1/0x150 [ 4.699008] device_driver_attach+0xa1/0xb0 [ 4.699011] __driver_attach+0x8a/0x150 [ 4.699012] ? device_driver_attach+0xb0/0xb0 [ 4.699014] ? device_driver_attach+0xb0/0xb0 [ 4.699017] bus_for_each_dev+0x89/0xd0 [ 4.699021] bus_add_driver+0x12b/0x1e0 [ 4.699024] driver_register+0x8b/0xe0 [ 4.699026] ? 0xffffffffc0fb6000 [ 4.699030] do_one_initcall+0x59/0x234 [ 4.699036] do_init_module+0x5c/0x260 [ 4.699039] load_module+0x21a7/0x2450 [ 4.699046] __do_sys_init_module+0x12d/0x180 [ 4.699053] do_syscall_64+0x33/0x40 [ 4.699057] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 4.699059] RIP: 0033:0x7f705b79ae4e [ 4.699063] Code: 48 8b 0d 25 10 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f2 0f 0c 00 f7 d8 64 89 01 48 [ 4.699064] RSP: 002b:00007ffee9ca2528 EFLAGS: 00000246 ORIG_RAX: 00000000000000af [ 4.699067] RAX: ffffffffffffffda RBX: 00005624b5c38250 RCX: 00007f705b79ae4e [ 4.699068] RDX: 00005624b5c38640 RSI: 0000000000a5a2a1 RDI: 00005624b65430e0 [ 4.699069] RBP: 00005624b65430e0 R08: ffffffffffffffe0 R09: 00007ffee9ca0671 [ 4.699070] R10: 00005624b5a34010 R11: 0000000000000246 R12: 00005624b5c38640 [ 4.699071] R13: 0000000000000008 R14: 00005624b5c2df00 R15: 00005624b5c38250 [ 4.699076] ---[ end trace 3e1ef6f5f1a6a9c8 ]--- [ 4.699158] [drm] Display Core initialized with v3.2.95!
Created attachment 292485 [details] dmesg without systemd-backlight call
The behavior for kernel 5.8.8 and 5.9.0-rc4 seems to be similar. The last dmesg is from kernel 5.9.0-rc4
The last oops might already have been tackled by a proposed patchset on the amd-gfx development mailing list of the AMD Display Core v3.2.102. https://lists.freedesktop.org/archives/amd-gfx/2020-September/053625.html The patch which might resolve the oops is this following. https://lists.freedesktop.org/archives/amd-gfx/2020-September/053633.html
I have applied and tested the patch. Unfortunately it did not resolve the problem. Apparently the ASSERT(0) in this piece of code triggering the oops. ---------------------- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c rn_clk_mgr_helper_populate_bw_params /* Find lowest DPM, FCLK is filled in reverse order*/ for (i = PP_SMU_NUM_FCLK_DPM_LEVELS - 1; i >= 0; i--) { if (clock_table->FClocks[i].Freq != 0 && clock_table->FClocks[i].Vol != 0) { j = i; break; } } if (j == -1) { /* clock table is all 0s, just use our own hardcode */ ASSERT(0); return; } ----------------------
Apparently, bugzilla is not used (anymore?) by AMD devs, so I created a new bugreport in the 'active' bug tracker. https://gitlab.freedesktop.org/drm/amd/-/issues/1294