Bug 215445

Summary: AMDGPU -- UBSAN: invalid-load in amdgpu_dm.c:5882:84 - load of value 32 is not a valid value for type '_Bool'
Product: Drivers Reporter: Bogdan (bogdan.pylypenko107)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: peci1, satadru, talktome7468
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.15.12 Subsystem:
Regression: No Bisected commit-id:
Attachments: Full message from dmesg

Description Bogdan 2022-01-03 03:34:51 UTC
Created attachment 300208 [details]
Full message from dmesg

I have an error in dmesg:

[    7.729183] UBSAN: invalid-load in drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5882:84
[    7.729185] load of value 32 is not a valid value for type '_Bool'
Comment 2 Bogdan 2022-01-03 03:41:17 UTC
> [    7.729181]
> ================================================================================
> [    7.729183] UBSAN: invalid-load in
> drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5882:84
> [    7.729185] load of value 32 is not a valid value for type '_Bool'
> [    7.729186] CPU: 5 PID: 4803 Comm: systemd-udevd Tainted: G           O   
>   5.15.12-gentoo-x86_64 #11
> [    7.729188] Hardware name: HP HP Pavilion Gaming Laptop 15-ec1xxx/87B1,
> BIOS F.20 11/04/2020
> [    7.729190] Call Trace:
> [    7.729192]  <TASK>
> [    7.729193]  dump_stack_lvl+0x45/0x59
> [    7.729200]  ubsan_epilogue+0x5/0x40
> [    7.729202]  __ubsan_handle_load_invalid_value.cold+0x43/0x48
> [    7.729204]  create_stream_for_sink.cold+0x3a/0x7d [amdgpu]
> [    7.729387]  create_validate_stream_for_sink+0x55/0x140 [amdgpu]
> [    7.729537]  amdgpu_dm_connector_mode_valid+0x4f/0x180 [amdgpu]
> [    7.729678]  drm_connector_mode_valid+0x35/0x80
> [    7.729682]  drm_helper_probe_single_connector_modes+0x3b2/0x880
> [    7.729684]  drm_client_modeset_probe+0x287/0x1380
> [    7.729686]  ? kmem_cache_alloc_trace+0x17d/0x380
> [    7.729689]  ? trace_hardirqs_on+0x2b/0x100
> [    7.729692]  ? ktime_get_mono_fast_ns+0x49/0xc0
> [    7.729694]  __drm_fb_helper_initial_config_and_unlock+0x44/0x500
> [    7.729696]  ? drm_file_alloc+0x199/0x280
> [    7.729698]  ? drm_client_init+0x12e/0x180
> [    7.729700]  amdgpu_fbdev_init+0xd6/0x140 [amdgpu]
> [    7.729749]  amdgpu_device_init.cold+0xfc0/0x1b4c [amdgpu]
> [    7.729749]  ? _raw_spin_unlock_irqrestore+0x15/0x40
> [    7.729749]  ? pci_conf1_read+0x99/0x100
> [    7.729749]  ? pci_bus_read_config_word+0x49/0x80
> [    7.729749]  amdgpu_driver_load_kms+0x67/0x340 [amdgpu]
> [    7.729749]  amdgpu_pci_probe+0x113/0x1c0 [amdgpu]
> [    7.729749]  pci_device_probe+0xe1/0x180
> [    7.729749]  really_probe+0x207/0x400
> [    7.729749]  __driver_probe_device+0x10d/0x1c0
> [    7.729749]  driver_probe_device+0x1e/0xc0
> [    7.729749]  __driver_attach+0xce/0x200
> [    7.729749]  ? __device_attach_driver+0x100/0x100
> [    7.729749]  bus_for_each_dev+0x78/0xc0
> [    7.729749]  bus_add_driver+0x12b/0x200
> [    7.729749]  driver_register+0x8f/0x100
> [    7.729749]  ? 0xffffffffc0d84000
> [    7.729749]  do_one_initcall+0x44/0x240
> [    7.729749]  ? kmem_cache_alloc_trace+0x17d/0x380
> [    7.729749]  do_init_module+0x87/0x280
> [    7.729749]  __do_sys_init_module+0x12d/0x1c0
> [    7.729749]  do_syscall_64+0x5c/0xc0
> [    7.729749]  ? trace_hardirqs_on_prepare+0x24/0xc0
> [    7.729749]  ? syscall_exit_to_user_mode+0x2c/0x80
> [    7.729749]  ? do_syscall_64+0x69/0xc0
> [    7.729749]  ? trace_hardirqs_on_prepare+0x24/0xc0
> [    7.729749]  ? syscall_exit_to_user_mode+0x2c/0x80
> [    7.729749]  ? do_syscall_64+0x69/0xc0
> [    7.729749]  ? do_user_addr_fault+0x1e6/0x6c0
> [    7.729749]  ? trace_hardirqs_off+0x26/0xc0
> [    7.729749]  ? exc_page_fault+0x89/0x140
> [    7.729749]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [    7.729749] RIP: 0033:0x701aa4f7be0a
> [    7.729749] Code: 48 8b 0d 61 80 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e
> 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 af 00 00 00 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 8b 0d 2e 80 0b 00 f7 d8 64 89 01 48
> [    7.729749] RSP: 002b:00007ffe4ebcb818 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000af
> [    7.729749] RAX: ffffffffffffffda RBX: 000060e5c69b38b0 RCX:
> 0000701aa4f7be0a
> [    7.729749] RDX: 000060e5c69bea80 RSI: 000000000157d12f RDI:
> 0000701aa2499010
> [    7.729749] RBP: 00007ffe4ebcb860 R08: 0000701aa3c7f000 R09:
> 0000000000000000
> [    7.729749] R10: 000060e5c6b2d5f0 R11: 0000000000000246 R12:
> 000060e5c69bea80
> [    7.729749] R13: 0000701aa2499010 R14: 000000000000003c R15:
> 000060e5c6b152b0
> [    7.729749]  </TASK>
> [    7.730306]
> ================================================================================
Comment 3 Martin Pecka 2022-01-05 12:45:56 UTC
I have this problem too on Thinkpad T14s AMD, kernel 5.15.12.
Comment 4 Satadru Pramanik 2022-01-06 22:05:56 UTC
Also seeing this on an ubuntu 5.15.13 mainline kernel on ubuntu/22.04 with xorg.


[   63.026997] ================================================================================
[   63.027004] fbcon: Taking over console
[   63.027008] UBSAN: invalid-load in /home/kernel/COD/linux/drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5882:84
[   63.027013] load of value 4 is not a valid value for type '_Bool'
[   63.027017] CPU: 0 PID: 508 Comm: plymouthd Tainted: P           OE     5.15.13-051513-generic #202201050731
[   63.027020] Hardware name: MSI MS-7998/C236A WORKSTATION (MS-7998), BIOS 2.A0 06/15/2018
[   63.027021] Call Trace:
[   63.027023]  <TASK>
[   63.027025]  show_stack+0x52/0x58
[   63.027030]  dump_stack_lvl+0x4a/0x5f
[   63.027036]  dump_stack+0x10/0x12
[   63.027039]  ubsan_epilogue+0x9/0x45
[   63.027042]  __ubsan_handle_load_invalid_value.cold+0x44/0x49
[   63.027046]  create_stream_for_sink.cold+0x5d/0xbb [amdgpu]
[   63.027337]  create_validate_stream_for_sink+0x59/0x150 [amdgpu]
[   63.027572]  dm_update_crtc_state+0x235/0x7b0 [amdgpu]
[   63.027808]  amdgpu_dm_atomic_check+0x596/0xcd0 [amdgpu]
[   63.028080]  ? __cond_resched+0x1a/0x50
[   63.028085]  ? ww_mutex_lock+0x83/0x90
[   63.028088]  ? dm_plane_format_mod_supported+0x1f/0x100 [amdgpu]
[   63.028324]  ? drm_plane_check_pixel_format+0x45/0x90 [drm]
[   63.028367]  ? drm_atomic_plane_check+0x12f/0x360 [drm]
[   63.028396]  drm_atomic_check_only+0x250/0x4b0 [drm]
[   63.028423]  drm_atomic_commit+0x18/0x50 [drm]
[   63.028450]  drm_client_modeset_commit_atomic+0x1df/0x220 [drm]
[   63.028476]  drm_client_modeset_commit_locked+0x5b/0x160 [drm]
[   63.028501]  ? mutex_lock+0x13/0x40
[   63.028504]  drm_client_modeset_commit+0x27/0x50 [drm]
[   63.028531]  __drm_fb_helper_restore_fbdev_mode_unlocked+0xc2/0xf0 [drm_kms_helper]
[   63.028556]  drm_fb_helper_lastclose+0x17/0x20 [drm_kms_helper]
[   63.028571]  amdgpu_driver_lastclose_kms+0xe/0x20 [amdgpu]
[   63.028749]  drm_release+0xe0/0x110 [drm]
[   63.028773]  __fput+0x9c/0x260
[   63.028777]  ____fput+0xe/0x10
[   63.028779]  task_work_run+0x6d/0xa0
[   63.028783]  do_exit+0x21b/0x3c0
[   63.028786]  do_group_exit+0x3b/0xb0
[   63.028789]  __x64_sys_exit_group+0x18/0x20
[   63.028791]  do_syscall_64+0x59/0xc0
[   63.028794]  ? asm_exc_page_fault+0x8/0x30
[   63.028798]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   63.028802] RIP: 0033:0x7f3126a99ed1
[   63.028805] Code: Unable to access opcode bytes at RIP 0x7f3126a99ea7.
[   63.028806] RSP: 002b:00007ffdc2f08a48 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   63.028809] RAX: ffffffffffffffda RBX: 00007f3126bc66d0 RCX: 00007f3126a99ed1
[   63.028811] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[   63.028813] RBP: 0000000000000000 R08: ffffffffffffff80 R09: 0000000000000001
[   63.028815] R10: 000000000000001f R11: 0000000000000246 R12: 00007f3126bc66d0
[   63.028816] R13: 0000000000000000 R14: 00007f3126bc6ba8 R15: 00007f3126bc6bc0
[   63.028820]  </TASK>
[   63.028822] ================================================================================
Comment 5 Martin Pecka 2022-01-11 10:25:08 UTC
This problem has disappeared for me in kernel 5.16.0. Can anybody else confirm?
Comment 6 hock 2022-01-11 11:42:46 UTC
UBSAN is enabled for the 5.15 kernels since 5.15.8. It is not enabled for the 5.16 kernels. Check the build log files to confirm that ubsan.o is linked into the kernel for 5.15, but not 5.16.


Also check out the discussion in https://gitlab.freedesktop.org/drm/amd/-/issues/1779.
Comment 7 Bogdan 2022-01-12 20:23:32 UTC
UBSAN <- Undefined Behaviour sanity checker
> Compile-time instrumentation is used to detect various undefined behaviours
> at runtime.
> For more details, see: Documentation/dev-tools/ubsan.rst

UBSAN is just a technology that help to detect various bugs, not a bug source.

I resolve problem by kernel rebuild:
- rebuild 5.15.12 with different kernel config - have no UBSAN warnings (big changes made - can not detect which kernel config parameter resolve problem);
- update to 5.15.13 - have no UBSAN warnings, have single crash in amdgpu driver with closing X subsystem (sysrq helps to restart display manager);
- update to 5.16.0 - have no UBSAN warnings, but have high latencies (12-15 ms) in [drm_atomic_helper_wait_for_flip_done] in high loaded mode (detected by latencytop program).
Comment 8 hock 2022-01-13 02:50:28 UTC
The bug is due to the use of uninitialized variable, which UBSAN detected.

See the patch https://patchwork.freedesktop.org/patch/468484 for the fix.