Bug 214901

Summary: amdgpu freezes HP laptop at start up
Product: Drivers Reporter: spasswolf
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED DUPLICATE    
Severity: normal CC: towo
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.15.0 Subsystem:
Regression: Yes Bisected commit-id:

Description spasswolf 2021-11-01 00:02:39 UTC
Nov  1 00:22:49 bart kernel: [    2.139660] Oops: 0000 [#1] PREEMPT SMP NOPTI
Nov  1 00:22:49 bart kernel: [    2.139665] CPU: 3 PID: 113 Comm: systemd-udevd Not tainted 5.15.0 #1
Nov  1 00:22:49 bart kernel: [    2.139671] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.21 11/17/2017
Nov  1 00:22:49 bart kernel: [    2.139675] RIP: 0010:smu8_dpm_powergate_acp+0x7/0x30 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.140165] Code: fd ff 8b 54 24 0c 48 8b 3c 24 31 c9 be 13 00 00 00 e8 7d fe fd ff 31 c0 48 83 c4 10 c3 66 0f 1f 44 00 00 48 8b 87 c0 01 00 00 <40> 38 b0 db 01 00 00 74 1b 31 d2 40 84 f6 74 0a be 0b 00 00 00 e9
Nov  1 00:22:49 bart kernel: [    2.140173] RSP: 0018:ffffafc6804d7bf0 EFLAGS: 00010286
Nov  1 00:22:49 bart kernel: [    2.140177] RAX: 0000000000000000 RBX: ffff97d209b00000 RCX: 000000000000000a
Nov  1 00:22:49 bart kernel: [    2.140181] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff97d384c89000
Nov  1 00:22:49 bart kernel: [    2.140184] RBP: ffff97d384c89000 R08: 0000000000000282 R09: ffffeb368a132200
Nov  1 00:22:49 bart kernel: [    2.140188] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000009
Nov  1 00:22:49 bart kernel: [    2.140191] R13: ffff97d209b00010 R14: ffff97d209b00000 R15: 0000000000000000
Nov  1 00:22:49 bart kernel: [    2.140195] FS:  00007f285ce788c0(0000) GS:ffff97d4ef580000(0000) knlGS:0000000000000000
Nov  1 00:22:49 bart kernel: [    2.140200] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  1 00:22:49 bart kernel: [    2.140203] CR2: 00000000000001db CR3: 0000000107d3e000 CR4: 00000000001506e0
Nov  1 00:22:49 bart kernel: [    2.140207] Call Trace:
Nov  1 00:22:49 bart kernel: [    2.140214]  pp_set_powergating_by_smu+0x1bb/0x280 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.140560]  acp_hw_fini+0x13c/0x140 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.140893]  amdgpu_device_fini_hw+0x208/0x2d5 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.141294]  amdgpu_driver_load_kms.cold+0x72/0x94 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.141641]  amdgpu_pci_probe+0x127/0x1b0 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.141941]  pci_device_probe+0xf5/0x160
Nov  1 00:22:49 bart kernel: [    2.141948]  really_probe+0x1f0/0x400
Nov  1 00:22:49 bart kernel: [    2.141954]  __driver_probe_device+0xf9/0x170
Nov  1 00:22:49 bart kernel: [    2.141958]  driver_probe_device+0x27/0xa0
Nov  1 00:22:49 bart kernel: [    2.141961]  __driver_attach+0xbd/0x1d0
Nov  1 00:22:49 bart kernel: [    2.141964]  ? __device_attach_driver+0xe0/0xe0
Nov  1 00:22:49 bart kernel: [    2.141968]  ? __device_attach_driver+0xe0/0xe0
Nov  1 00:22:49 bart kernel: [    2.141971]  bus_for_each_dev+0x75/0xc0
Nov  1 00:22:49 bart kernel: [    2.141974]  ? klist_add_tail+0x4f/0x90
Nov  1 00:22:49 bart kernel: [    2.141979]  bus_add_driver+0x143/0x200
Nov  1 00:22:49 bart kernel: [    2.141982]  driver_register+0x86/0xd0
Nov  1 00:22:49 bart kernel: [    2.141985]  ? 0xffffffffc0ae8000
Nov  1 00:22:49 bart kernel: [    2.141988]  do_one_initcall+0x47/0x170
Nov  1 00:22:49 bart kernel: [    2.142030]  ? kmem_cache_alloc+0x280/0x3a0
Nov  1 00:22:49 bart kernel: [    2.142037]  do_init_module+0x51/0x220
Nov  1 00:22:49 bart kernel: [    2.142043]  __do_sys_finit_module+0xca/0x140
Nov  1 00:22:49 bart kernel: [    2.142049]  do_syscall_64+0x3b/0xc0
Nov  1 00:22:49 bart kernel: [    2.142067]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Nov  1 00:22:49 bart kernel: [    2.142073] RIP: 0033:0x7f285ccbb5e9
Nov  1 00:22:49 bart kernel: [    2.142077] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 58 0c 00 f7 d8 64 89 01 48
Nov  1 00:22:49 bart kernel: [    2.142084] RSP: 002b:00007ffd135b22b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Nov  1 00:22:49 bart kernel: [    2.142089] RAX: ffffffffffffffda RBX: 000055ee3452a270 RCX: 00007f285ccbb5e9
Nov  1 00:22:49 bart kernel: [    2.142092] RDX: 0000000000000000 RSI: 00007f285ce6fe2d RDI: 000000000000000f
Nov  1 00:22:49 bart kernel: [    2.142105] RBP: 0000000000020000 R08: 0000000000000000 R09: 000055ee34525f10
Nov  1 00:22:49 bart kernel: [    2.142109] R10: 000000000000000f R11: 0000000000000246 R12: 00007f285ce6fe2d
Nov  1 00:22:49 bart kernel: [    2.142112] R13: 0000000000000000 R14: 000055ee34528600 R15: 000055ee3452a270
Nov  1 00:22:49 bart kernel: [    2.142116] Modules linked in: ahci crct10dif_pclmul crct10dif_common libahci crc32_pclmul crc32c_intel xhci_pci ehci_pci psmouse amdgpu(+) libata scsi_mod scsi_common ehci_hcd xhci_hcd usbcore i2c_piix4 usb_common r8169 realtek mdio_devres drm_ttm_helper libphy ttm mfd_core gpu_sched
Nov  1 00:22:49 bart kernel: [    2.142146] CR2: 00000000000001db
Nov  1 00:22:49 bart kernel: [    2.142160] ---[ end trace 175b07c8f6d66881 ]---


Booting continues and later another Oops appears


Nov  1 00:22:49 bart kernel: [    2.947122] Oops: 0002 [#2] PREEMPT SMP NOPTI
Nov  1 00:22:49 bart kernel: [    2.947127] CPU: 3 PID: 39 Comm: kworker/3:1 Tainted: G      D           5.15.0 #1
Nov  1 00:22:49 bart kernel: [    2.947134] Hardware name: HP HP Laptop 15-bw0xx/8332, BIOS F.21 11/17/2017
Nov  1 00:22:49 bart kernel: [    2.947137] Workqueue: events amdgpu_uvd_idle_work_handler [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.947597] RIP: 0010:smu8_dpm_powergate_uvd+0xe/0xc0 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.947942] Code: 7e cb e9 ad fd ff ff b8 f4 ff ff ff eb d9 e8 a9 16 64 db 66 0f 1f 84 00 00 00 00 00 55 48 8b 87 c0 01 00 00 40 84 f6 48 89 fd <40> 88 b0 d8 01 00 00 74 37 48 8b 3f 31 d2 be 08 00 00 00 e8 ba a5
Nov  1 00:22:49 bart kernel: [    2.947948] RSP: 0018:ffffafc6801afe38 EFLAGS: 00010202
Nov  1 00:22:49 bart kernel: [    2.947952] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000100000000
Nov  1 00:22:49 bart kernel: [    2.947955] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff97d384c89000
Nov  1 00:22:49 bart kernel: [    2.947958] RBP: ffff97d384c89000 R08: ffff97d4ef59dca0 R09: ffff97d4ef535eb4
Nov  1 00:22:49 bart kernel: [    2.947960] R10: 0000000000000007 R11: 0000000000000005 R12: ffff97d384c89018
Nov  1 00:22:49 bart kernel: [    2.947963] R13: 0000000000000001 R14: ffff97d209b082f8 R15: ffff97d4ef5a1d05
Nov  1 00:22:49 bart kernel: [    2.947966] FS:  0000000000000000(0000) GS:ffff97d4ef580000(0000) knlGS:0000000000000000
Nov  1 00:22:49 bart kernel: [    2.947970] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  1 00:22:49 bart kernel: [    2.947972] CR2: 00000000000001d8 CR3: 0000000107d44000 CR4: 00000000001506e0
Nov  1 00:22:49 bart kernel: [    2.947975] Call Trace:
Nov  1 00:22:49 bart kernel: [    2.947981]  pp_set_powergating_by_smu+0xdd/0x280 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.948380]  amdgpu_dpm_enable_uvd+0x79/0x1a0 [amdgpu]
Nov  1 00:22:49 bart kernel: [    2.948715]  process_one_work+0x1c0/0x330
Nov  1 00:22:49 bart kernel: [    2.948723]  worker_thread+0x4b/0x3c0
Nov  1 00:22:49 bart kernel: [    2.948727]  ? rescuer_thread+0x360/0x360
Nov  1 00:22:49 bart kernel: [    2.948730]  kthread+0x12d/0x160
Nov  1 00:22:49 bart kernel: [    2.948734]  ? set_kthread_struct+0x30/0x30
Nov  1 00:22:49 bart kernel: [    2.948738]  ret_from_fork+0x22/0x30
Nov  1 00:22:49 bart kernel: [    2.948743] Modules linked in: efivarfs autofs4 ext4 crc32c_generic crc16 mbcache jbd2 sd_mod t10_pi sr_mod crc_t10dif cdrom crct10dif_generic ahci crct10dif_pclmul crct10dif_common libahci crc32_pclmul crc32c_intel xhci_pci ehci_pci psmouse amdgpu(+) libata scsi_mod scsi_common ehci_hcd xhci_hcd usbcore i2c_piix4 usb_common r8169 realtek mdio_devres drm_ttm_helper libphy ttm mfd_core gpu_sched
Nov  1 00:22:49 bart kernel: [    2.948794] CR2: 00000000000001d8
Nov  1 00:22:49 bart kernel: [    2.948798] ---[ end trace 175b07c8f6d66882 ]---
Nov  1 00:22:49 bart kernel: [    2.948801] RIP: 0010:smu8_dpm_powergate_acp+0x7/0x30 [amdgpu]

This error also occurs using 5.15-rc7.
Comment 1 spasswolf 2021-11-01 00:07:18 UTC
G
00:01.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] [1002:9874] (rev ca) (prog-if 00 [VGA controller])
	DeviceName: ATI EG BROADWAY
	Subsystem: Hewlett-Packard Company Wani [Radeon R5/R6/R7 Graphics] [103c:8332]
	Flags: bus master, fast devsel, latency 0, IRQ 37
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0800000 (64-bit, prefetchable) [size=8M]
	I/O ports at 4000 [size=256]
	Memory at f0400000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
Comment 2 spasswolf 2021-11-01 09:10:34 UTC
There is another error message just before the Oops:
Nov  1 00:22:49 bart kernel: [    2.137397] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed
Nov  1 00:22:49 bart kernel: [    2.137402] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init
Nov  1 00:22:49 bart kernel: [    2.137406] amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device.
Nov  1 00:22:49 bart kernel: [    2.139639] BUG: kernel NULL pointer dereference, address: 00000000000001db
Comment 3 spasswolf 2021-11-01 09:33:56 UTC
Actually the above message is not complete:
Nov  1 00:22:49 bart kernel: [    2.136382] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Nov  1 00:22:49 bart kernel: [    2.136462] kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?
Nov  1 00:22:49 bart kernel: [    2.136470] kfd kfd: amdgpu: Error initializing iommuv2
Nov  1 00:22:49 bart kernel: [    2.137386] kfd kfd: amdgpu: device 1002:9874 NOT added due to errors
Nov  1 00:22:49 bart kernel: [    2.137393] kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874
Nov  1 00:22:49 bart kernel: [    2.137397] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed
Nov  1 00:22:49 bart kernel: [    2.137402] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init
Nov  1 00:22:49 bart kernel: [    2.137406] amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device.
Nov  1 00:22:49 bart kernel: [    2.139639] BUG: kernel NULL pointer dereference, address: 00000000000001db
The messages from kfd have been there with older kernels, too, but were not fatal. They are caused by the HP Laptop 15-bw0xx/8332, not having a iommu or
its BIOS not properly initializing it.
But linux-5.15 has added the following lines to the amdgpu_device_ip_init: 
	r = amdgpu_amdkfd_resume_iommu(adev);
	if (r)
		goto init_failed;
which make causes the amdgpu_device_ip_init function to fail when kfd init fails. As a workaround one could remove these.
A BIOS update could perhaps also solve the problem but seems to require a Windows running on the Laptop (which was actually sold without Windows)
Comment 4 spasswolf 2021-11-01 11:01:06 UTC
Just confirmed that removing the 3 lines
        r = amdgpu_amdkfd_resume_iommu(adev);
	if (r)
		goto init_failed;
can be used as a workaround. Removing only the if (r) check is not enough, 
just calling amdgpu_amdkfd_resume_iommu(adev) leads to freezing.
Comment 5 spasswolf 2021-11-02 10:08:34 UTC
This commit leads to a freeze when starting
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu?id=714d9e4574d54596973ee3b0624ee4a16264d700
After reverting it the kernel 5.15 boots normally
Comment 6 towo 2021-11-02 11:40:14 UTC
Looks like the same problem as i stated here:

https://bugzilla.kernel.org/show_bug.cgi?id=214859
Comment 7 spasswolf 2021-11-17 16:38:59 UTC

*** This bug has been marked as a duplicate of bug 214859 ***