Hello ! Trying the new kernel RC today (6.10.0-rc1), I no longer have video. With 6.9.1 works. Lenovo ThinkCentre M715q 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] (rev e4) In the journal, I have multiple entries like this one : May 27 14:24:22 youpi kernel: iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:00:01.0 pasid=0x00000 address=0x102e89980 flags=0x0080] May 27 14:24:22 youpi kernel: AMD-Vi: DTE[0]: 7190000000000003 May 27 14:24:22 youpi kernel: AMD-Vi: DTE[1]: 00001001034f0002 May 27 14:24:22 youpi kernel: AMD-Vi: DTE[2]: 200000010022a013 May 27 14:24:22 youpi kernel: AMD-Vi: DTE[3]: 0000000000000000 Then, multiple entries like that one : May 27 14:24:22 youpi kernel: amdgpu 0000:00:01.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110) May 27 14:24:22 youpi kernel: [drm:amdgpu_device_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -110 May 27 14:24:22 youpi kernel: amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed May 27 14:24:22 youpi kernel: amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init May 27 14:24:22 youpi kernel: amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device. May 27 14:24:22 youpi kernel: ------------[ cut here ]------------ May 27 14:24:22 youpi kernel: WARNING: CPU: 0 PID: 179 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:630 amdgpu_irq_put+0x45/0x70 [amdgpu] May 27 14:24:22 youpi kernel: Modules linked in: sd_mod usbhid uas hid usb_storage amdgpu(+) amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit drm_suballoc_helper drm> May 27 14:24:22 youpi kernel: CPU: 0 PID: 179 Comm: (udev-worker) Not tainted 6.10.0-rc1-jcg #1 May 27 14:24:22 youpi kernel: Hardware name: LENOVO 10VGS02P00/3130, BIOS M1XKT57A 02/10/2022 May 27 14:24:22 youpi kernel: RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] May 27 14:24:22 youpi kernel: Code: 48 8b 4e 10 48 83 39 00 74 2c 89 d1 48 8d 04 88 8b 08 85 c9 74 14 f0 ff 08 b8 00 00 00 00 74 05 e9 80 d8 a3 fc e9 6b fd ff ff <0f> May 27 14:24:22 youpi kernel: RSP: 0018:ffffbc9c80813a48 EFLAGS: 00010246 May 27 14:24:22 youpi kernel: RAX: ffff985ad74e3780 RBX: ffff985a82f18878 RCX: 0000000000000000 May 27 14:24:22 youpi kernel: RDX: 0000000000000000 RSI: ffff985a82f254b8 RDI: ffff985a82f00000 May 27 14:24:22 youpi kernel: RBP: ffff985a82f10208 R08: 0000000000000000 R09: 0000000000000003 May 27 14:24:22 youpi kernel: R10: ffffbc9c80813880 R11: ffffffffbdec7828 R12: ffff985a82f105e8 May 27 14:24:22 youpi kernel: R13: ffff985a82f00010 R14: ffff985a82f00000 R15: ffff985a82f254b8 May 27 14:24:22 youpi kernel: FS: 00007f18ca0058c0(0000) GS:ffff985b57600000(0000) knlGS:0000000000000000 May 27 14:24:22 youpi kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 27 14:24:22 youpi kernel: CR2: 00005563a55b3a68 CR3: 000000010f8bc000 CR4: 00000000001506f0 May 27 14:24:22 youpi kernel: Call Trace: May 27 14:24:22 youpi kernel: <TASK> May 27 14:24:22 youpi kernel: ? __warn+0x7c/0x120 May 27 14:24:22 youpi kernel: ? amdgpu_irq_put+0x45/0x70 [amdgpu] May 27 14:24:22 youpi kernel: ? report_bug+0x155/0x170 May 27 14:24:22 youpi kernel: ? handle_bug+0x3f/0x80 May 27 14:24:22 youpi kernel: ? exc_invalid_op+0x13/0x60 May 27 14:24:22 youpi kernel: ? asm_exc_invalid_op+0x16/0x20 May 27 14:24:22 youpi kernel: ? amdgpu_irq_put+0x45/0x70 [amdgpu] May 27 14:24:22 youpi kernel: amdgpu_fence_driver_hw_fini+0xfa/0x130 [amdgpu] May 27 14:24:22 youpi kernel: amdgpu_device_fini_hw+0xa2/0x3f0 [amdgpu] May 27 14:24:22 youpi kernel: amdgpu_driver_load_kms+0x79/0xb0 [amdgpu] May 27 14:24:22 youpi kernel: amdgpu_pci_probe+0x182/0x4f0 [amdgpu] May 27 14:24:22 youpi kernel: local_pci_probe+0x41/0x90 May 27 14:24:22 youpi kernel: pci_device_probe+0xbb/0x1e0 May 27 14:24:22 youpi kernel: really_probe+0xd6/0x390 May 27 14:24:22 youpi kernel: ? __pfx___driver_attach+0x10/0x10 May 27 14:24:22 youpi kernel: __driver_probe_device+0x78/0x150 May 27 14:24:22 youpi kernel: driver_probe_device+0x1f/0x90 May 27 14:24:22 youpi kernel: __driver_attach+0xce/0x1c0 May 27 14:24:22 youpi kernel: bus_for_each_dev+0x84/0xd0 May 27 14:24:22 youpi kernel: bus_add_driver+0x10e/0x240 May 27 14:24:22 youpi kernel: driver_register+0x55/0x100 May 27 14:24:22 youpi kernel: ? __pfx_amdgpu_init+0x10/0x10 [amdgpu] May 27 14:24:22 youpi kernel: do_one_initcall+0x57/0x320 May 27 14:24:22 youpi kernel: do_init_module+0x60/0x230 May 27 14:24:22 youpi kernel: init_module_from_file+0x86/0xc0 May 27 14:24:22 youpi kernel: idempotent_init_module+0x11b/0x2b0 May 27 14:24:22 youpi kernel: __x64_sys_finit_module+0x5a/0xb0 May 27 14:24:22 youpi kernel: do_syscall_64+0x7e/0x190 May 27 14:24:22 youpi kernel: ? ksys_mmap_pgoff+0x14e/0x1f0 May 27 14:24:22 youpi kernel: ? syscall_exit_to_user_mode+0x71/0x1e0 May 27 14:24:22 youpi kernel: ? do_syscall_64+0x8a/0x190 May 27 14:24:22 youpi kernel: ? do_syscall_64+0x8a/0x190 May 27 14:24:22 youpi kernel: ? do_syscall_64+0x8a/0x190 May 27 14:24:22 youpi kernel: ? __irq_exit_rcu+0x38/0xb0 May 27 14:24:22 youpi kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e May 27 14:24:22 youpi kernel: RIP: 0033:0x7f18c9e79719 May 27 14:24:22 youpi kernel: Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> May 27 14:24:22 youpi kernel: RSP: 002b:00007ffd56f52208 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 May 27 14:24:22 youpi kernel: RAX: ffffffffffffffda RBX: 00005563a558e400 RCX: 00007f18c9e79719 May 27 14:24:22 youpi kernel: RDX: 0000000000000000 RSI: 00007f18ca01defd RDI: 0000000000000015 May 27 14:24:22 youpi kernel: RBP: 00007f18ca01defd R08: 0000000000000000 R09: 00005563a55902b0 May 27 14:24:22 youpi kernel: R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000020000 May 27 14:24:22 youpi kernel: R13: 0000000000000000 R14: 00005563a5591f30 R15: 000055638158bec1 May 27 14:24:22 youpi kernel: </TASK> May 27 14:24:22 youpi kernel: ---[ end trace 0000000000000000 ]--- I suspect this commit : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c?id=db5d28c0bfe566908719bec8e25443aabecbb802 Let me now if you need more information. Cheers, jC
Created attachment 306354 [details] Full logs of the boot. I added the full log of the boot process showing all the errors.
Can you bisect? https://docs.kernel.org/admin-guide/bug-bisect.html
Bisecting: 5720 revisions left to test after this (roughly 13 steps) I'll try, but it will take some time. My machine is not very powerful...
Possibly the same as this report: https://lore.kernel.org/all/20240527192159.GEZlTdV7OoOuJrHmI0@fat_crate.local/
Created attachment 306364 [details] Check Enhanced PPR support before enabling PPR
Hi, Attached patch should fix this issue. Can you please test it? I will send proper patch to mailing list soon. -Vasant
Also can you please attach full dmesg? I want to see IOMMU feature list and confirm what I am doing is right. -Vasant
Hi, I plan to finish the bisection today, and I'll test your patch. jC
(In reply to Jean-Christophe Guillain from comment #8) > Hi, > > I plan to finish the bisection today, and I'll test your patch. > You mean bisecting for this issue? If so we know the culprit commit. Issue is happening because IOMMU driver tried to enable PPR bit in DTE without checking Enhanced PPR support in EFR register. -Vasant
I applied your patch to the 6.10.0-rc1 kernel, and I confirm that it fixes this bug. Thank you very much ! jC (full dmesg attached)
Created attachment 306367 [details] Full dmesg after applying Vasant's patch
(I still finished my bisection, and as you said, c4cb23111103a841c2df30058597398443bcad5f is the first bad commit.)
Thanks Jean for testing. I will send patch with your Tested-by today. -Vasant
*** Bug 218921 has been marked as a duplicate of this bug. ***
(In reply to Vasant Hegde from comment #5) > Created attachment 306364 [details] > Check Enhanced PPR support before enabling PPR I applied your patch on top of rc2 and also confirm that it works. Thank you.
(In reply to Hanabishi from comment #15) > (In reply to Vasant Hegde from comment #5) > > Created attachment 306364 [details] > > Check Enhanced PPR support before enabling PPR > > I applied your patch on top of rc2 and also confirm that it works. > Thank you. Thanks Hanabishi for testing. FYI. Patches merged into -rc3. -Vasant
I seem to have a similar problem on 6.10-rc5 after suspend. I get a black screen on resume. [ 269.157149] amdgpu 0000:02:00.0: amdgpu: reserve 0x400000 from 0xf41f800000 for PSP TMR [ 269.159956] iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:02:00.0 pasid=0x00000 address=0x131400000 flags=0x0180] [ 269.159960] AMD-Vi: DTE[0]: 6190000000000003 [ 269.159962] AMD-Vi: DTE[1]: 00001001049e000b [ 269.159963] AMD-Vi: DTE[2]: 200000013c610013 [ 269.159963] AMD-Vi: DTE[3]: 0000000000000000 [ 269.160104] amdgpu 0000:02:00.0: amdgpu: failed to load ucode SDMA0(0x1) [ 269.160108] amdgpu 0000:02:00.0: amdgpu: psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xF)
Created attachment 306495 [details] Complete dmesg
Unfortunately there was another big in suspend/resume path. Can you please test with below patch? https://lore.kernel.org/linux-iommu/ZnqzXyCU8bn32j4-@8bytes.org/T/#m1cd1520facb8b758efdf7a8c0261f9ee2ec217d7 -Vasant
Yes, I confirm the patch "iommu/amd: Fix GT feature enablement again" applied to 6.10-rc5 fixes resume on my machine. Thanks for prompt reply!
(In reply to Vasant Hegde from comment #19) > Unfortunately there was another big in suspend/resume path. Can you please > test with below patch? > > https://lore.kernel.org/linux-iommu/ZnqzXyCU8bn32j4-@8bytes.org/T/ > #m1cd1520facb8b758efdf7a8c0261f9ee2ec217d7 > > > > -Vasant Can confirm this patch also fixes my suspend/resume issue, thanks!
(In reply to dreamlike_clinking040 from comment #21) > (In reply to Vasant Hegde from comment #19) > > Can confirm this patch also fixes my suspend/resume issue, thanks! Thanks a lot. -Vasant