Nov 1 00:22:49 bart kernel: [ 2.139660] Oops: 0000 [#1] PREEMPT SMP NOPTI Nov 1 00:22:49 bart kernel: [ 2.139665] CPU: 3 PID: 113 Comm: systemd-udevd Not tainted 5.15.0 #1 Nov 1 00:22:49 bart kernel: [ 2.139671] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.21 11/17/2017 Nov 1 00:22:49 bart kernel: [ 2.139675] RIP: 0010:smu8_dpm_powergate_acp+0x7/0x30 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.140165] Code: fd ff 8b 54 24 0c 48 8b 3c 24 31 c9 be 13 00 00 00 e8 7d fe fd ff 31 c0 48 83 c4 10 c3 66 0f 1f 44 00 00 48 8b 87 c0 01 00 00 <40> 38 b0 db 01 00 00 74 1b 31 d2 40 84 f6 74 0a be 0b 00 00 00 e9 Nov 1 00:22:49 bart kernel: [ 2.140173] RSP: 0018:ffffafc6804d7bf0 EFLAGS: 00010286 Nov 1 00:22:49 bart kernel: [ 2.140177] RAX: 0000000000000000 RBX: ffff97d209b00000 RCX: 000000000000000a Nov 1 00:22:49 bart kernel: [ 2.140181] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff97d384c89000 Nov 1 00:22:49 bart kernel: [ 2.140184] RBP: ffff97d384c89000 R08: 0000000000000282 R09: ffffeb368a132200 Nov 1 00:22:49 bart kernel: [ 2.140188] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000009 Nov 1 00:22:49 bart kernel: [ 2.140191] R13: ffff97d209b00010 R14: ffff97d209b00000 R15: 0000000000000000 Nov 1 00:22:49 bart kernel: [ 2.140195] FS: 00007f285ce788c0(0000) GS:ffff97d4ef580000(0000) knlGS:0000000000000000 Nov 1 00:22:49 bart kernel: [ 2.140200] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 1 00:22:49 bart kernel: [ 2.140203] CR2: 00000000000001db CR3: 0000000107d3e000 CR4: 00000000001506e0 Nov 1 00:22:49 bart kernel: [ 2.140207] Call Trace: Nov 1 00:22:49 bart kernel: [ 2.140214] pp_set_powergating_by_smu+0x1bb/0x280 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.140560] acp_hw_fini+0x13c/0x140 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.140893] amdgpu_device_fini_hw+0x208/0x2d5 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.141294] amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.141641] amdgpu_pci_probe+0x127/0x1b0 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.141941] pci_device_probe+0xf5/0x160 Nov 1 00:22:49 bart kernel: [ 2.141948] really_probe+0x1f0/0x400 Nov 1 00:22:49 bart kernel: [ 2.141954] __driver_probe_device+0xf9/0x170 Nov 1 00:22:49 bart kernel: [ 2.141958] driver_probe_device+0x27/0xa0 Nov 1 00:22:49 bart kernel: [ 2.141961] __driver_attach+0xbd/0x1d0 Nov 1 00:22:49 bart kernel: [ 2.141964] ? __device_attach_driver+0xe0/0xe0 Nov 1 00:22:49 bart kernel: [ 2.141968] ? __device_attach_driver+0xe0/0xe0 Nov 1 00:22:49 bart kernel: [ 2.141971] bus_for_each_dev+0x75/0xc0 Nov 1 00:22:49 bart kernel: [ 2.141974] ? klist_add_tail+0x4f/0x90 Nov 1 00:22:49 bart kernel: [ 2.141979] bus_add_driver+0x143/0x200 Nov 1 00:22:49 bart kernel: [ 2.141982] driver_register+0x86/0xd0 Nov 1 00:22:49 bart kernel: [ 2.141985] ? 0xffffffffc0ae8000 Nov 1 00:22:49 bart kernel: [ 2.141988] do_one_initcall+0x47/0x170 Nov 1 00:22:49 bart kernel: [ 2.142030] ? kmem_cache_alloc+0x280/0x3a0 Nov 1 00:22:49 bart kernel: [ 2.142037] do_init_module+0x51/0x220 Nov 1 00:22:49 bart kernel: [ 2.142043] __do_sys_finit_module+0xca/0x140 Nov 1 00:22:49 bart kernel: [ 2.142049] do_syscall_64+0x3b/0xc0 Nov 1 00:22:49 bart kernel: [ 2.142067] entry_SYSCALL_64_after_hwframe+0x44/0xae Nov 1 00:22:49 bart kernel: [ 2.142073] RIP: 0033:0x7f285ccbb5e9 Nov 1 00:22:49 bart kernel: [ 2.142077] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 58 0c 00 f7 d8 64 89 01 48 Nov 1 00:22:49 bart kernel: [ 2.142084] RSP: 002b:00007ffd135b22b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Nov 1 00:22:49 bart kernel: [ 2.142089] RAX: ffffffffffffffda RBX: 000055ee3452a270 RCX: 00007f285ccbb5e9 Nov 1 00:22:49 bart kernel: [ 2.142092] RDX: 0000000000000000 RSI: 00007f285ce6fe2d RDI: 000000000000000f Nov 1 00:22:49 bart kernel: [ 2.142105] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055ee34525f10 Nov 1 00:22:49 bart kernel: [ 2.142109] R10: 000000000000000f R11: 0000000000000246 R12: 00007f285ce6fe2d Nov 1 00:22:49 bart kernel: [ 2.142112] R13: 0000000000000000 R14: 000055ee34528600 R15: 000055ee3452a270 Nov 1 00:22:49 bart kernel: [ 2.142116] Modules linked in: ahci crct10dif_pclmul crct10dif_common libahci crc32_pclmul crc32c_intel xhci_pci ehci_pci psmouse amdgpu(+) libata scsi_mod scsi_common ehci_hcd xhci_hcd usbcore i2c_piix4 usb_common r8169 realtek mdio_devres drm_ttm_helper libphy ttm mfd_core gpu_sched Nov 1 00:22:49 bart kernel: [ 2.142146] CR2: 00000000000001db Nov 1 00:22:49 bart kernel: [ 2.142160] ---[ end trace 175b07c8f6d66881 ]--- Booting continues and later another Oops appears Nov 1 00:22:49 bart kernel: [ 2.947122] Oops: 0002 [#2] PREEMPT SMP NOPTI Nov 1 00:22:49 bart kernel: [ 2.947127] CPU: 3 PID: 39 Comm: kworker/3:1 Tainted: G D 5.15.0 #1 Nov 1 00:22:49 bart kernel: [ 2.947134] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.21 11/17/2017 Nov 1 00:22:49 bart kernel: [ 2.947137] Workqueue: events amdgpu_uvd_idle_work_handler [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.947597] RIP: 0010:smu8_dpm_powergate_uvd+0xe/0xc0 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.947942] Code: 7e cb e9 ad fd ff ff b8 f4 ff ff ff eb d9 e8 a9 16 64 db 66 0f 1f 84 00 00 00 00 00 55 48 8b 87 c0 01 00 00 40 84 f6 48 89 fd <40> 88 b0 d8 01 00 00 74 37 48 8b 3f 31 d2 be 08 00 00 00 e8 ba a5 Nov 1 00:22:49 bart kernel: [ 2.947948] RSP: 0018:ffffafc6801afe38 EFLAGS: 00010202 Nov 1 00:22:49 bart kernel: [ 2.947952] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000100000000 Nov 1 00:22:49 bart kernel: [ 2.947955] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff97d384c89000 Nov 1 00:22:49 bart kernel: [ 2.947958] RBP: ffff97d384c89000 R08: ffff97d4ef59dca0 R09: ffff97d4ef535eb4 Nov 1 00:22:49 bart kernel: [ 2.947960] R10: 0000000000000007 R11: 0000000000000005 R12: ffff97d384c89018 Nov 1 00:22:49 bart kernel: [ 2.947963] R13: 0000000000000001 R14: ffff97d209b082f8 R15: ffff97d4ef5a1d05 Nov 1 00:22:49 bart kernel: [ 2.947966] FS: 0000000000000000(0000) GS:ffff97d4ef580000(0000) knlGS:0000000000000000 Nov 1 00:22:49 bart kernel: [ 2.947970] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 1 00:22:49 bart kernel: [ 2.947972] CR2: 00000000000001d8 CR3: 0000000107d44000 CR4: 00000000001506e0 Nov 1 00:22:49 bart kernel: [ 2.947975] Call Trace: Nov 1 00:22:49 bart kernel: [ 2.947981] pp_set_powergating_by_smu+0xdd/0x280 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.948380] amdgpu_dpm_enable_uvd+0x79/0x1a0 [amdgpu] Nov 1 00:22:49 bart kernel: [ 2.948715] process_one_work+0x1c0/0x330 Nov 1 00:22:49 bart kernel: [ 2.948723] worker_thread+0x4b/0x3c0 Nov 1 00:22:49 bart kernel: [ 2.948727] ? rescuer_thread+0x360/0x360 Nov 1 00:22:49 bart kernel: [ 2.948730] kthread+0x12d/0x160 Nov 1 00:22:49 bart kernel: [ 2.948734] ? set_kthread_struct+0x30/0x30 Nov 1 00:22:49 bart kernel: [ 2.948738] ret_from_fork+0x22/0x30 Nov 1 00:22:49 bart kernel: [ 2.948743] Modules linked in: efivarfs autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sd_mod t10_pi sr_mod crc_t10dif cdrom crct10dif_generic ahci crct10dif_pclmul crct10dif_common libahci crc32_pclmul crc32c_intel xhci_pci ehci_pci psmouse amdgpu(+) libata scsi_mod scsi_common ehci_hcd xhci_hcd usbcore i2c_piix4 usb_common r8169 realtek mdio_devres drm_ttm_helper libphy ttm mfd_core gpu_sched Nov 1 00:22:49 bart kernel: [ 2.948794] CR2: 00000000000001d8 Nov 1 00:22:49 bart kernel: [ 2.948798] ---[ end trace 175b07c8f6d66882 ]--- Nov 1 00:22:49 bart kernel: [ 2.948801] RIP: 0010:smu8_dpm_powergate_acp+0x7/0x30 [amdgpu] This error also occurs using 5.15-rc7.
G 00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] [1002:9874] (rev ca) (prog-if 00 [VGA controller]) DeviceName: ATI EG BROADWAY Subsystem: Hewlett-Packard Company Wani [Radeon R5/R6/R7 Graphics] [103c:8332] Flags: bus master, fast devsel, latency 0, IRQ 37 Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0800000 (64-bit, prefetchable) [size=8M] I/O ports at 4000 [size=256] Memory at f0400000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: amdgpu Kernel modules: amdgpu
There is another error message just before the Oops: Nov 1 00:22:49 bart kernel: [ 2.137397] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed Nov 1 00:22:49 bart kernel: [ 2.137402] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init Nov 1 00:22:49 bart kernel: [ 2.137406] amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device. Nov 1 00:22:49 bart kernel: [ 2.139639] BUG: kernel NULL pointer dereference, address: 00000000000001db
Actually the above message is not complete: Nov 1 00:22:49 bart kernel: [ 2.136382] kfd kfd: amdgpu: Allocated 3969056 bytes on gart Nov 1 00:22:49 bart kernel: [ 2.136462] kfd kfd: amdgpu: error getting iommu info. is the iommu enabled? Nov 1 00:22:49 bart kernel: [ 2.136470] kfd kfd: amdgpu: Error initializing iommuv2 Nov 1 00:22:49 bart kernel: [ 2.137386] kfd kfd: amdgpu: device 1002:9874 NOT added due to errors Nov 1 00:22:49 bart kernel: [ 2.137393] kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874 Nov 1 00:22:49 bart kernel: [ 2.137397] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed Nov 1 00:22:49 bart kernel: [ 2.137402] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init Nov 1 00:22:49 bart kernel: [ 2.137406] amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device. Nov 1 00:22:49 bart kernel: [ 2.139639] BUG: kernel NULL pointer dereference, address: 00000000000001db The messages from kfd have been there with older kernels, too, but were not fatal. They are caused by the HP Laptop 15-bw0xx/8332, not having a iommu or its BIOS not properly initializing it. But linux-5.15 has added the following lines to the amdgpu_device_ip_init: r = amdgpu_amdkfd_resume_iommu(adev); if (r) goto init_failed; which make causes the amdgpu_device_ip_init function to fail when kfd init fails. As a workaround one could remove these. A BIOS update could perhaps also solve the problem but seems to require a Windows running on the Laptop (which was actually sold without Windows)
Just confirmed that removing the 3 lines r = amdgpu_amdkfd_resume_iommu(adev); if (r) goto init_failed; can be used as a workaround. Removing only the if (r) check is not enough, just calling amdgpu_amdkfd_resume_iommu(adev) leads to freezing.
This commit leads to a freeze when starting https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu?id=714d9e4574d54596973ee3b0624ee4a16264d700 After reverting it the kernel 5.15 boots normally
Looks like the same problem as i stated here: https://bugzilla.kernel.org/show_bug.cgi?id=214859
*** This bug has been marked as a duplicate of bug 214859 ***