Bug 209225

Summary: [AMDGPU] oops on each boot: dc_link_set_backlight_level
Product: Drivers Reporter: Arthur Borsboom (arthurborsboom)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=203905
Kernel Version: 5.8.7, 5.8.8, 5.9.0-rc4 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
dmesg without systemd-backlight call

Description Arthur Borsboom 2020-09-10 14:46:50 UTC
Created attachment 292451 [details]
dmesg

On each boot, the kernel throws an oops.

Typical
--------------------
Arch Linux 5.8.7 and 5.8.8
Using kernel parameter amdgpu.runpm=0 (if not, immediate crash).
Using Plymouth
Model: MSI Bravo 17 - A4DDR-035NL
CPU: AMD Ryzen 7 4800H with integrated Vega GPU
GPU: AMD Navi 14 - Radeon RX 5500M

Oops
--------------------

[    4.621337] [drm] Initialized amdgpu 3.38.0 20150101 for 0000:07:00.0 on minor 1
[    4.640077] ------------[ cut here ]------------
[    4.640156] WARNING: CPU: 0 PID: 953 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:2545 dc_link_set_backlight_level+0x8a/0xf0 [amdgpu]
[    4.640156] Modules linked in: snd_acp3x_pdm_dma snd_acp3x_rn snd_soc_dmic snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_rn_pci_acp3x snd_pci_acp3x btusb btrtl btbcm btintel bluetooth ecdh_generic ecc iwlmvm amdgpu mac80211 joydev snd_hda_codec_realtek mousedev libarc4 snd_hda_codec_generic ledtrig_audio iwlwifi snd_hda_codec_hdmi hid_multitouch snd_hda_intel hid_generic snd_intel_dspcfg snd_hda_codec gpu_sched msi_wmi i2c_algo_bit sparse_keymap edac_mce_amd ttm snd_hda_core cfg80211 kvm_amd snd_hwdep snd_pcm drm_kms_helper kvm r8169 snd_timer cec realtek rc_core syscopyarea snd sp5100_tco irqbypass sysfillrect psmouse sysimgblt rapl input_leds fb_sys_fops pcspkr k10temp soundcore libphy i2c_piix4 rfkill tpm_crb wmi battery i2c_hid ac tpm_tis tpm_tis_core hid tpm pinctrl_amd uvcvideo acpi_cpufreq videobuf2_vmalloc videobuf2_memops evdev videobuf2_v4l2 soc_button_array mac_hid videobuf2_common videodev mc vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) dm_mod drm sg crypto_user agpgart
[    4.640182]  ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_pci_renesas ccp xhci_hcd rng_core i8042 serio
[    4.640190] CPU: 0 PID: 953 Comm: systemd-backlig Tainted: G           OE     5.8.7-arch1-1 #1
[    4.640191] Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.116 07/10/2020
[    4.640240] RIP: 0010:dc_link_set_backlight_level+0x8a/0xf0 [amdgpu]
[    4.640241] Code: 30 03 00 00 31 c0 48 8d 96 c0 01 00 00 48 8b 0a 48 85 c9 74 06 48 3b 59 08 74 20 83 c0 01 48 81 c2 c8 04 00 00 83 f8 06 75 e3 <0f> 0b 45 31 e4 5b 44 89 e0 5d 41 5c 41 5d 41 5e c3 48 98 48 69 c0
[    4.640242] RSP: 0018:ffffadc080807df0 EFLAGS: 00010246
[    4.640243] RAX: 0000000000000006 RBX: ffff9cd5438c9800 RCX: 0000000000000000
[    4.640243] RDX: ffff9cd53f801e70 RSI: ffff9cd53f800000 RDI: 0000000000000000
[    4.640244] RBP: ffff9cd543900000 R08: 00000000000000ff R09: 000000000000000a
[    4.640244] R10: 000000000000000a R11: f000000000000000 R12: 000000000000ff01
[    4.640245] R13: 0000000000000000 R14: 000000000000ffff R15: ffff9cd549120260
[    4.640246] FS:  00007f5b331a8000(0000) GS:ffff9cd55f600000(0000) knlGS:0000000000000000
[    4.640246] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.640247] CR2: 00005598681fe978 CR3: 0000000408e64000 CR4: 0000000000340ef0
[    4.640247] Call Trace:
[    4.640302]  amdgpu_dm_backlight_update_status+0xb4/0xc0 [amdgpu]
[    4.640321]  backlight_device_set_brightness+0x7e/0x130
[    4.640323]  brightness_store+0x63/0x80
[    4.640326]  kernfs_fop_write+0xce/0x1b0
[    4.640329]  vfs_write+0xc7/0x1f0
[    4.640331]  ksys_write+0x67/0xe0
[    4.640335]  do_syscall_64+0x44/0x70
[    4.640337]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    4.640339] RIP: 0033:0x7f5b33fc0f67
[    4.640341] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[    4.640342] RSP: 002b:00007ffc766ac878 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[    4.640343] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f5b33fc0f67
[    4.640343] RDX: 0000000000000004 RSI: 00007ffc766ac960 RDI: 0000000000000004
[    4.640344] RBP: 00007ffc766ac960 R08: 0000000000000000 R09: 0000000000000000
[    4.640344] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[    4.640344] R13: 00005598681e83c0 R14: 0000000000000004 R15: 00007f5b34093720
[    4.640346] ---[ end trace de75e01f35cca025 ]---

Dmesg is attached.
Comment 1 Arthur Borsboom 2020-09-10 16:10:09 UTC
Possibly related to this fix:

https://bugzilla.kernel.org/show_bug.cgi?id=203905

If I looked well at kernel.org, it seemed that this fix is included in kernel 5.7.8 (the one in the report).

So, *possibly* this is a side effect or bug introduced after this fix?
Comment 2 Arthur Borsboom 2020-09-12 09:27:32 UTC
I found out that this oops is triggered by systemd-backlight, which is auto enabled on boot. There is a way to disable this feature, by adding the following kernel parameter:

systemd.restore_state=0

This prevents the oops, but creates another oops, see below.
Maybe this oops, is the reason why the backlight setting oopses.

Dmesg without the systemd-backlight call is attached.

[    4.694971] amdgpu 0000:07:00.0: amdgpu: SMU is initialized successfully!
[    4.696369] [drm] kiq ring mec 2 pipe 1 q 0
[    4.697106] ------------[ cut here ]------------
[    4.697329] WARNING: CPU: 10 PID: 398 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr.c:716 rn_clk_mgr_construct+0x142/0x3f0 [amdgpu]
[    4.697330] Modules linked in: snd_pci_acp3x btusb btrtl btbcm btintel bluetooth ecdh_generic iwlmvm ecc snd_hda_codec_realtek joydev amdgpu(+) mousedev snd_hda_codec_generic mac80211 ledtrig_audio snd_hda_codec_hdmi libarc4 snd_hda_intel edac_mce_amd gpu_sched snd_intel_dspcfg i2c_algo_bit kvm_amd snd_hda_codec ttm hid_multitouch snd_hda_core r8169 hid_generic msi_wmi iwlwifi sparse_keymap kvm drm_kms_helper snd_hwdep realtek snd_pcm cec irqbypass mdio_devres rc_core of_mdio cfg80211 psmouse rapl snd_timer input_leds fixed_phy syscopyarea pcspkr sp5100_tco snd sysfillrect libphy tpm_crb sysimgblt rfkill k10temp i2c_piix4 uvcvideo ac wmi battery tpm_tis soundcore fb_sys_fops tpm_tis_core videobuf2_vmalloc i2c_hid videobuf2_memops tpm videobuf2_v4l2 hid pinctrl_amd videobuf2_common videodev soc_button_array evdev mc acpi_cpufreq mac_hid drm sg dm_mod crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul
[    4.697386]  crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper xhci_pci xhci_hcd ccp rng_core i8042 serio
[    4.697398] CPU: 10 PID: 398 Comm: systemd-udevd Not tainted 5.9.0-rc4-1-git-00038-g581cb3a26baf #1
[    4.697400] Hardware name: Micro-Star International Co., Ltd. Bravo 17 A4DDR/MS-17FK, BIOS E17FKAMS.116 07/10/2020
[    4.697608] RIP: 0010:rn_clk_mgr_construct+0x142/0x3f0 [amdgpu]
[    4.697613] Code: 00 00 00 41 8b 8c c4 80 00 00 00 41 89 c1 89 c7 85 c9 74 10 41 8b 94 c4 84 00 00 00 85 d2 0f 85 aa 01 00 00 48 83 e8 01 73 d9 <0f> 0b 83 7b 20 01 74 0c 81 bd e8 00 00 00 ff 14 37 00 7f 27 48 8b
[    4.697615] RSP: 0018:ffffa92c4286b6c0 EFLAGS: 00010297
[    4.697617] RAX: ffffffffffffffff RBX: ffff973a7e874180 RCX: 0000000000000000
[    4.697619] RDX: ffff973a92ac1e80 RSI: ffffa92c4286b6e8 RDI: 0000000000000000
[    4.697620] RBP: ffff973a98469e00 R08: 0000000000000000 R09: 0000000000000000
[    4.697622] R10: 7fc9117fffffffff R11: ffff973a7e88f400 R12: ffffa92c4286b6e8
[    4.697623] R13: ffff973a7e874e40 R14: ffff973a8a190000 R15: ffff973a7e874180
[    4.697625] FS:  00007f705a9e9440(0000) GS:ffff973a9f680000(0000) knlGS:0000000000000000
[    4.697627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.697628] CR2: 000055de9eb7a774 CR3: 0000000419700000 CR4: 0000000000350ee0
[    4.697630] Call Trace:
[    4.697842]  dc_clk_mgr_create+0x172/0x1b0 [amdgpu]
[    4.698042]  dc_create+0x24a/0x7a0 [amdgpu]
[    4.698050]  ? kmem_cache_alloc_trace+0x106/0x240
[    4.698255]  amdgpu_dm_init.isra.0+0x17f/0x1e0 [amdgpu]
[    4.698460]  dm_hw_init+0xe/0x20 [amdgpu]
[    4.698666]  amdgpu_device_init.cold+0x171a/0x19d8 [amdgpu]
[    4.698827]  amdgpu_driver_load_kms+0x5c/0x230 [amdgpu]
[    4.698984]  amdgpu_pci_probe+0xf4/0x180 [amdgpu]
[    4.698991]  local_pci_probe+0x42/0x80
[    4.698995]  ? pci_match_device+0xd7/0x100
[    4.698998]  pci_device_probe+0xfa/0x1b0
[    4.699002]  really_probe+0x205/0x460
[    4.699005]  driver_probe_device+0xe1/0x150
[    4.699008]  device_driver_attach+0xa1/0xb0
[    4.699011]  __driver_attach+0x8a/0x150
[    4.699012]  ? device_driver_attach+0xb0/0xb0
[    4.699014]  ? device_driver_attach+0xb0/0xb0
[    4.699017]  bus_for_each_dev+0x89/0xd0
[    4.699021]  bus_add_driver+0x12b/0x1e0
[    4.699024]  driver_register+0x8b/0xe0
[    4.699026]  ? 0xffffffffc0fb6000
[    4.699030]  do_one_initcall+0x59/0x234
[    4.699036]  do_init_module+0x5c/0x260
[    4.699039]  load_module+0x21a7/0x2450
[    4.699046]  __do_sys_init_module+0x12d/0x180
[    4.699053]  do_syscall_64+0x33/0x40
[    4.699057]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    4.699059] RIP: 0033:0x7f705b79ae4e
[    4.699063] Code: 48 8b 0d 25 10 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f2 0f 0c 00 f7 d8 64 89 01 48
[    4.699064] RSP: 002b:00007ffee9ca2528 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[    4.699067] RAX: ffffffffffffffda RBX: 00005624b5c38250 RCX: 00007f705b79ae4e
[    4.699068] RDX: 00005624b5c38640 RSI: 0000000000a5a2a1 RDI: 00005624b65430e0
[    4.699069] RBP: 00005624b65430e0 R08: ffffffffffffffe0 R09: 00007ffee9ca0671
[    4.699070] R10: 00005624b5a34010 R11: 0000000000000246 R12: 00005624b5c38640
[    4.699071] R13: 0000000000000008 R14: 00005624b5c2df00 R15: 00005624b5c38250
[    4.699076] ---[ end trace 3e1ef6f5f1a6a9c8 ]---
[    4.699158] [drm] Display Core initialized with v3.2.95!
Comment 3 Arthur Borsboom 2020-09-12 09:28:10 UTC
Created attachment 292485 [details]
dmesg without systemd-backlight call
Comment 4 Arthur Borsboom 2020-09-12 09:29:44 UTC
The behavior for kernel 5.8.8 and 5.9.0-rc4 seems to be similar. The last dmesg is from kernel 5.9.0-rc4
Comment 5 Arthur Borsboom 2020-09-12 21:03:05 UTC
The last oops might already have been tackled by a proposed patchset on the amd-gfx development mailing list of the AMD Display Core v3.2.102.

https://lists.freedesktop.org/archives/amd-gfx/2020-September/053625.html

The patch which might resolve the oops is this following.

https://lists.freedesktop.org/archives/amd-gfx/2020-September/053633.html
Comment 6 Arthur Borsboom 2020-09-12 22:40:30 UTC
I have applied and tested the patch.
Unfortunately it did not resolve the problem.

Apparently the ASSERT(0) in this piece of code triggering the oops.

----------------------

drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr.c
rn_clk_mgr_helper_populate_bw_params

        /* Find lowest DPM, FCLK is filled in reverse order*/

        for (i = PP_SMU_NUM_FCLK_DPM_LEVELS - 1; i >= 0; i--) {
                if (clock_table->FClocks[i].Freq != 0 && clock_table->FClocks[i].Vol != 0) {
                        j = i;
                        break;
                }
        }

        if (j == -1) {
                /* clock table is all 0s, just use our own hardcode */
                ASSERT(0); 
                return;
        }

----------------------
Comment 7 Arthur Borsboom 2020-09-20 18:13:05 UTC
Apparently, bugzilla is not used (anymore?) by AMD devs, so I created a new bugreport in the 'active' bug tracker.

https://gitlab.freedesktop.org/drm/amd/-/issues/1294