Created attachment 300208 [details] Full message from dmesg I have an error in dmesg: [ 7.729183] UBSAN: invalid-load in drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5882:84 [ 7.729185] load of value 32 is not a valid value for type '_Bool'
More info about system: https://linux-hardware.org/?probe=08b04c15d3 Full dmesg: https://linux-hardware.org/?probe=08b04c15d3&log=dmesg
> [ 7.729181] > ================================================================================ > [ 7.729183] UBSAN: invalid-load in > drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5882:84 > [ 7.729185] load of value 32 is not a valid value for type '_Bool' > [ 7.729186] CPU: 5 PID: 4803 Comm: systemd-udevd Tainted: G O > 5.15.12-gentoo-x86_64 #11 > [ 7.729188] Hardware name: HP HP Pavilion Gaming Laptop 15-ec1xxx/87B1, > BIOS F.20 11/04/2020 > [ 7.729190] Call Trace: > [ 7.729192] <TASK> > [ 7.729193] dump_stack_lvl+0x45/0x59 > [ 7.729200] ubsan_epilogue+0x5/0x40 > [ 7.729202] __ubsan_handle_load_invalid_value.cold+0x43/0x48 > [ 7.729204] create_stream_for_sink.cold+0x3a/0x7d [amdgpu] > [ 7.729387] create_validate_stream_for_sink+0x55/0x140 [amdgpu] > [ 7.729537] amdgpu_dm_connector_mode_valid+0x4f/0x180 [amdgpu] > [ 7.729678] drm_connector_mode_valid+0x35/0x80 > [ 7.729682] drm_helper_probe_single_connector_modes+0x3b2/0x880 > [ 7.729684] drm_client_modeset_probe+0x287/0x1380 > [ 7.729686] ? kmem_cache_alloc_trace+0x17d/0x380 > [ 7.729689] ? trace_hardirqs_on+0x2b/0x100 > [ 7.729692] ? ktime_get_mono_fast_ns+0x49/0xc0 > [ 7.729694] __drm_fb_helper_initial_config_and_unlock+0x44/0x500 > [ 7.729696] ? drm_file_alloc+0x199/0x280 > [ 7.729698] ? drm_client_init+0x12e/0x180 > [ 7.729700] amdgpu_fbdev_init+0xd6/0x140 [amdgpu] > [ 7.729749] amdgpu_device_init.cold+0xfc0/0x1b4c [amdgpu] > [ 7.729749] ? _raw_spin_unlock_irqrestore+0x15/0x40 > [ 7.729749] ? pci_conf1_read+0x99/0x100 > [ 7.729749] ? pci_bus_read_config_word+0x49/0x80 > [ 7.729749] amdgpu_driver_load_kms+0x67/0x340 [amdgpu] > [ 7.729749] amdgpu_pci_probe+0x113/0x1c0 [amdgpu] > [ 7.729749] pci_device_probe+0xe1/0x180 > [ 7.729749] really_probe+0x207/0x400 > [ 7.729749] __driver_probe_device+0x10d/0x1c0 > [ 7.729749] driver_probe_device+0x1e/0xc0 > [ 7.729749] __driver_attach+0xce/0x200 > [ 7.729749] ? __device_attach_driver+0x100/0x100 > [ 7.729749] bus_for_each_dev+0x78/0xc0 > [ 7.729749] bus_add_driver+0x12b/0x200 > [ 7.729749] driver_register+0x8f/0x100 > [ 7.729749] ? 0xffffffffc0d84000 > [ 7.729749] do_one_initcall+0x44/0x240 > [ 7.729749] ? kmem_cache_alloc_trace+0x17d/0x380 > [ 7.729749] do_init_module+0x87/0x280 > [ 7.729749] __do_sys_init_module+0x12d/0x1c0 > [ 7.729749] do_syscall_64+0x5c/0xc0 > [ 7.729749] ? trace_hardirqs_on_prepare+0x24/0xc0 > [ 7.729749] ? syscall_exit_to_user_mode+0x2c/0x80 > [ 7.729749] ? do_syscall_64+0x69/0xc0 > [ 7.729749] ? trace_hardirqs_on_prepare+0x24/0xc0 > [ 7.729749] ? syscall_exit_to_user_mode+0x2c/0x80 > [ 7.729749] ? do_syscall_64+0x69/0xc0 > [ 7.729749] ? do_user_addr_fault+0x1e6/0x6c0 > [ 7.729749] ? trace_hardirqs_off+0x26/0xc0 > [ 7.729749] ? exc_page_fault+0x89/0x140 > [ 7.729749] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 7.729749] RIP: 0033:0x701aa4f7be0a > [ 7.729749] Code: 48 8b 0d 61 80 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e > 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d > 01 f0 ff ff 73 01 c3 48 8b 0d 2e 80 0b 00 f7 d8 64 89 01 48 > [ 7.729749] RSP: 002b:00007ffe4ebcb818 EFLAGS: 00000246 ORIG_RAX: > 00000000000000af > [ 7.729749] RAX: ffffffffffffffda RBX: 000060e5c69b38b0 RCX: > 0000701aa4f7be0a > [ 7.729749] RDX: 000060e5c69bea80 RSI: 000000000157d12f RDI: > 0000701aa2499010 > [ 7.729749] RBP: 00007ffe4ebcb860 R08: 0000701aa3c7f000 R09: > 0000000000000000 > [ 7.729749] R10: 000060e5c6b2d5f0 R11: 0000000000000246 R12: > 000060e5c69bea80 > [ 7.729749] R13: 0000701aa2499010 R14: 000000000000003c R15: > 000060e5c6b152b0 > [ 7.729749] </TASK> > [ 7.730306] > ================================================================================
I have this problem too on Thinkpad T14s AMD, kernel 5.15.12.
Also seeing this on an ubuntu 5.15.13 mainline kernel on ubuntu/22.04 with xorg. [ 63.026997] ================================================================================ [ 63.027004] fbcon: Taking over console [ 63.027008] UBSAN: invalid-load in /home/kernel/COD/linux/drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5882:84 [ 63.027013] load of value 4 is not a valid value for type '_Bool' [ 63.027017] CPU: 0 PID: 508 Comm: plymouthd Tainted: P OE 5.15.13-051513-generic #202201050731 [ 63.027020] Hardware name: MSI MS-7998/C236A WORKSTATION (MS-7998), BIOS 2.A0 06/15/2018 [ 63.027021] Call Trace: [ 63.027023] <TASK> [ 63.027025] show_stack+0x52/0x58 [ 63.027030] dump_stack_lvl+0x4a/0x5f [ 63.027036] dump_stack+0x10/0x12 [ 63.027039] ubsan_epilogue+0x9/0x45 [ 63.027042] __ubsan_handle_load_invalid_value.cold+0x44/0x49 [ 63.027046] create_stream_for_sink.cold+0x5d/0xbb [amdgpu] [ 63.027337] create_validate_stream_for_sink+0x59/0x150 [amdgpu] [ 63.027572] dm_update_crtc_state+0x235/0x7b0 [amdgpu] [ 63.027808] amdgpu_dm_atomic_check+0x596/0xcd0 [amdgpu] [ 63.028080] ? __cond_resched+0x1a/0x50 [ 63.028085] ? ww_mutex_lock+0x83/0x90 [ 63.028088] ? dm_plane_format_mod_supported+0x1f/0x100 [amdgpu] [ 63.028324] ? drm_plane_check_pixel_format+0x45/0x90 [drm] [ 63.028367] ? drm_atomic_plane_check+0x12f/0x360 [drm] [ 63.028396] drm_atomic_check_only+0x250/0x4b0 [drm] [ 63.028423] drm_atomic_commit+0x18/0x50 [drm] [ 63.028450] drm_client_modeset_commit_atomic+0x1df/0x220 [drm] [ 63.028476] drm_client_modeset_commit_locked+0x5b/0x160 [drm] [ 63.028501] ? mutex_lock+0x13/0x40 [ 63.028504] drm_client_modeset_commit+0x27/0x50 [drm] [ 63.028531] __drm_fb_helper_restore_fbdev_mode_unlocked+0xc2/0xf0 [drm_kms_helper] [ 63.028556] drm_fb_helper_lastclose+0x17/0x20 [drm_kms_helper] [ 63.028571] amdgpu_driver_lastclose_kms+0xe/0x20 [amdgpu] [ 63.028749] drm_release+0xe0/0x110 [drm] [ 63.028773] __fput+0x9c/0x260 [ 63.028777] ____fput+0xe/0x10 [ 63.028779] task_work_run+0x6d/0xa0 [ 63.028783] do_exit+0x21b/0x3c0 [ 63.028786] do_group_exit+0x3b/0xb0 [ 63.028789] __x64_sys_exit_group+0x18/0x20 [ 63.028791] do_syscall_64+0x59/0xc0 [ 63.028794] ? asm_exc_page_fault+0x8/0x30 [ 63.028798] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 63.028802] RIP: 0033:0x7f3126a99ed1 [ 63.028805] Code: Unable to access opcode bytes at RIP 0x7f3126a99ea7. [ 63.028806] RSP: 002b:00007ffdc2f08a48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 63.028809] RAX: ffffffffffffffda RBX: 00007f3126bc66d0 RCX: 00007f3126a99ed1 [ 63.028811] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 [ 63.028813] RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000001 [ 63.028815] R10: 000000000000001f R11: 0000000000000246 R12: 00007f3126bc66d0 [ 63.028816] R13: 0000000000000000 R14: 00007f3126bc6ba8 R15: 00007f3126bc6bc0 [ 63.028820] </TASK> [ 63.028822] ================================================================================
This problem has disappeared for me in kernel 5.16.0. Can anybody else confirm?
UBSAN is enabled for the 5.15 kernels since 5.15.8. It is not enabled for the 5.16 kernels. Check the build log files to confirm that ubsan.o is linked into the kernel for 5.15, but not 5.16. Also check out the discussion in https://gitlab.freedesktop.org/drm/amd/-/issues/1779.
UBSAN <- Undefined Behaviour sanity checker > Compile-time instrumentation is used to detect various undefined behaviours > at runtime. > For more details, see: Documentation/dev-tools/ubsan.rst UBSAN is just a technology that help to detect various bugs, not a bug source. I resolve problem by kernel rebuild: - rebuild 5.15.12 with different kernel config - have no UBSAN warnings (big changes made - can not detect which kernel config parameter resolve problem); - update to 5.15.13 - have no UBSAN warnings, have single crash in amdgpu driver with closing X subsystem (sysrq helps to restart display manager); - update to 5.16.0 - have no UBSAN warnings, but have high latencies (12-15 ms) in [drm_atomic_helper_wait_for_flip_done] in high loaded mode (detected by latencytop program).
The bug is due to the use of uninitialized variable, which UBSAN detected. See the patch https://patchwork.freedesktop.org/patch/468484 for the fix.