Created attachment 284909 [details] kern.log A bug that looks like caused by amdgpu_drm. Hard lockup of system as-in not responding to any inputs, after forced reset the attached bug information in kern.log
Having something such as "make -j 20" of kernel sources while using a browser in GPU-accelerated mode seems to contribute to this. Repeats on 5.2.14 as well but logs don't show anything useful for some reason.
System freezing repeated on 5.3.6 (Debian/testing bullseye).
SysRq tricks do not work when freezing: Alt + SysRq + b does not reboot.
With 5.3.7 system is not locking up although KASAN reports error in amdgpu: [ 31.792441] ================================================================== [ 31.792718] BUG: KASAN: global-out-of-bounds in read_indirect_azalia_reg+0x69/0x100 [amdgpu] [ 31.792786] Read of size 4 at addr ffffffffc15b69e8 by task systemd-udevd/425 [ 31.792865] CPU: 15 PID: 425 Comm: systemd-udevd Tainted: G E 5.3.7 #1 [ 31.792870] Hardware name: System manufacturer System Product Name/TUF B450-PLUS GAMING, BIOS 1804 07/29/2019 [ 31.792874] Call Trace: [ 31.792884] dump_stack+0x9a/0xf0 [ 31.792896] print_address_description+0x67/0x323 [ 31.793069] ? read_indirect_azalia_reg+0x69/0x100 [amdgpu] [ 31.793240] ? read_indirect_azalia_reg+0x69/0x100 [amdgpu] [ 31.793249] __kasan_report.cold+0x1a/0x3d [ 31.793260] ? memcpy+0x40/0x50 [ 31.793431] ? read_indirect_azalia_reg+0x69/0x100 [amdgpu] [ 31.793444] kasan_report+0xe/0x12 [ 31.793616] read_indirect_azalia_reg+0x69/0x100 [amdgpu] [ 31.793797] dce_aud_endpoint_valid+0xf/0x20 [amdgpu] [ 31.793973] resource_construct+0x24d/0x550 [amdgpu] [ 31.794157] ? dc_destroy_resource_pool+0x70/0x70 [amdgpu] [ 31.794168] ? kasan_unpoison_shadow+0x33/0x40 [ 31.794357] dce120_create_resource_pool+0x911/0xb10 [amdgpu] [ 31.794544] ? dce120_i2c_hw_create+0x80/0x80 [amdgpu] [ 31.794554] ? mark_held_locks+0x3e/0xa0 [ 31.794562] ? _raw_write_unlock_irqrestore+0x4b/0x60 [ 31.794570] ? match_held_lock+0x2e/0x240 [ 31.794749] dc_create_resource_pool+0x1c3/0x300 [amdgpu] [ 31.794921] ? resource_parse_asic_id+0x1e0/0x1e0 [amdgpu] [ 31.794930] ? kasan_unpoison_shadow+0x33/0x40 [ 31.794938] ? __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 31.795120] ? dal_gpio_service_create+0x10c/0x130 [amdgpu] [ 31.795299] dc_create+0x46b/0xbd0 [amdgpu] [ 31.795308] ? mark_lock+0xb1/0x9f0 [ 31.795486] ? destruct+0x280/0x280 [amdgpu] [ 31.795496] ? mark_held_locks+0x3e/0xa0 [ 31.795506] ? match_held_lock+0x2e/0x240 [ 31.795512] ? lockdep_hardirqs_on+0x19a/0x290 [ 31.795533] ? kasan_unpoison_shadow+0x33/0x40 [ 31.795542] ? __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 31.795725] amdgpu_dm_init+0x25e/0x320 [amdgpu] [ 31.795740] ? rcu_read_lock_sched_held+0x9d/0xb0 [ 31.795892] ? amdgpu_mm_rreg+0x204/0x230 [amdgpu] [ 31.796071] ? dm_resume+0x5d0/0x5d0 [amdgpu] [ 31.796246] ? vega10_enable_fan_control_feature+0x75/0x90 [amdgpu] [ 31.796418] ? vega10_fan_ctrl_start_smc_fan_control+0x26/0x40 [amdgpu] [ 31.796587] ? vega10_start_thermal_controller+0x30c/0x320 [amdgpu] [ 31.796622] ? memcpy+0x35/0x50 [ 31.796793] ? psm_set_states+0x90/0xb0 [amdgpu] [ 31.796980] dm_hw_init+0xe/0x20 [amdgpu] [ 31.797150] amdgpu_device_init.cold+0x23ea/0x2657 [amdgpu] [ 31.797318] ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu] [ 31.797327] ? mark_held_locks+0x3e/0xa0 [ 31.797334] ? _raw_write_unlock_irqrestore+0x4b/0x60 [ 31.797343] ? lockdep_hardirqs_on+0x19a/0x290 [ 31.797352] ? match_held_lock+0x2e/0x240 [ 31.797523] ? amdgpu_driver_load_kms+0xc2/0x3a0 [amdgpu] [ 31.797529] ? rcu_read_lock_sched_held+0x9d/0xb0 [ 31.797690] amdgpu_driver_load_kms+0x11b/0x3a0 [amdgpu] [ 31.797848] ? amdgpu_register_gpu_instance+0xd0/0xd0 [amdgpu] [ 31.797859] ? __kasan_slab_free+0x141/0x170 [ 31.797905] drm_dev_register+0x1d8/0x220 [drm] [ 31.798077] amdgpu_pci_probe+0xf7/0x160 [amdgpu] [ 31.798235] ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu] [ 31.798244] local_pci_probe+0x74/0xc0 [ 31.798259] pci_device_probe+0x1ee/0x2f0 [ 31.798267] ? pci_device_remove+0x1a0/0x1a0 [ 31.798285] ? sysfs_do_create_link_sd.isra.0+0x74/0xc0 [ 31.798306] really_probe+0x184/0x500 [ 31.798323] driver_probe_device+0x7e/0x130 [ 31.798336] device_driver_attach+0x87/0x90 [ 31.798346] ? device_driver_attach+0x90/0x90 [ 31.798351] __driver_attach+0xb0/0x1a0 [ 31.798363] ? device_driver_attach+0x90/0x90 [ 31.798368] bus_for_each_dev+0xe9/0x140 [ 31.798377] ? subsys_dev_iter_exit+0x10/0x10 [ 31.798387] ? __list_add_valid+0x2f/0x60 [ 31.798408] bus_add_driver+0x226/0x2e0 [ 31.798424] driver_register+0xd8/0x150 [ 31.798432] ? 0xffffffffc0e48000 [ 31.798443] do_one_initcall+0xbd/0x3e4 [ 31.798451] ? perf_trace_initcall_level+0x250/0x250 [ 31.798461] ? check_flags.part.0+0x82/0x210 [ 31.798478] ? kasan_unpoison_shadow+0x33/0x40 [ 31.798485] ? kasan_unpoison_shadow+0x33/0x40 [ 31.798503] do_init_module+0xfd/0x390 [ 31.798519] load_module+0x3f40/0x4290 [ 31.798584] ? module_frob_arch_sections+0x20/0x20 [ 31.798598] ? kernel_read+0x9b/0xc0 [ 31.798613] ? kernel_read_file+0x187/0x330 [ 31.798627] ? free_bprm+0xe0/0xe0 [ 31.798635] ? __seccomp_filter+0x127/0x990 [ 31.798672] ? __do_sys_finit_module+0x121/0x1b0 [ 31.798677] __do_sys_finit_module+0x121/0x1b0 [ 31.798686] ? __ia32_sys_init_module+0x40/0x40 [ 31.798699] ? vma_is_stack_for_current+0x60/0x60 [ 31.798740] ? trace_hardirqs_on_thunk+0x1a/0x20 [ 31.798751] ? mark_held_locks+0x23/0xa0 [ 31.798767] do_syscall_64+0x78/0x260 [ 31.798777] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 31.798783] RIP: 0033:0x7fee910ef0c9 [ 31.798790] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 3d 0c 00 f7 d8 64 89 01 48 [ 31.798795] RSP: 002b:00007ffc15cfb518 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 31.798802] RAX: ffffffffffffffda RBX: 000055da850b2b90 RCX: 00007fee910ef0c9 [ 31.798807] RDX: 0000000000000000 RSI: 00007fee90ff2cad RDI: 0000000000000013 [ 31.798812] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055da8509f568 [ 31.798816] R10: 0000000000000013 R11: 0000000000000246 R12: 00007fee90ff2cad [ 31.798821] R13: 0000000000000000 R14: 000055da850a23e0 R15: 000055da850b2b90 [ 31.798873] The buggy address belongs to the variable: [ 31.799074] audio_regs+0x108/0xfffffffffff00720 [amdgpu] [ 31.799138] Memory state around the buggy address: [ 31.799181] ffffffffc15b6880: fa fa fa fa 00 00 04 fa fa fa fa fa 00 00 00 00 [ 31.799242] ffffffffc15b6900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 31.799302] >ffffffffc15b6980: 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa [ 31.799361] ^ [ 31.799416] ffffffffc15b6a00: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 [ 31.799476] ffffffffc15b6a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa [ 31.799536] ==================================================================
Splitting KASAN-reported bug to another one.
Again, seems to be fixed by this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c0fdf3302cb4f186c871684eac5c407a107e480