Bug 218921

Summary: iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY ...]
Product: Drivers Reporter: Hanabishi (i.r.e.c.c.a.k.u.n+bugzilla.kernel.org)
Component: IOMMUAssignee: drivers_iommu
Status: RESOLVED DUPLICATE    
Severity: normal CC: regressions, vasant.hegde
Priority: P3    
Hardware: AMD   
OS: Linux   
Kernel Version: 6.10-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Archive with the crash logs

Description Hanabishi 2024-05-31 17:23:39 UTC
Created attachment 306392 [details]
Archive with the crash logs

Using 6.10-rc1, IOMMU causes graphics crash on this specific machine with AMD A8-7650K (KAVERI) iGPU.
6.9.x was fine.

Happens only if IOMMU is enabled in UEFI. It starts with such messages:

iommu ivhd0: AMD-Vi: Event logged [ILLEGAL_DEV_TABLE_ENTRY device=0000:00:01.0 pasid=0x00000 address=0x10131a880 flags=0x0080]
AMD-Vi: DTE[0]: 7490000000000003
AMD-Vi: DTE[1]: 00001001016a0002
AMD-Vi: DTE[2]: 20000001041f8813
AMD-Vi: DTE[3]: 0000000000000000
...

And then graphics driver crashes:

amdgpu 0000:00:01.0: [drm:0xffffffffc0954a18] *ERROR* ring gfx test failed (-110)
[drm:0xffffffffc0e66f2d] *ERROR* hw_init of IP block <gfx_v7_0> failed -110
amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init
------------[ cut here ]------------
WARNING: CPU: 2 PID: 179 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:630 0xffffffffc09ed7c9
...

Both 'amdgpu' and 'radeon' drivers crash. I attached detailed crash logs in the archive.
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-06-06 08:51:35 UTC
Not my area of expertise, but what you describe looks a bit like the symptoms in  existing reports that will be fixed by this patch https://lore.kernel.org/all/20240530084801.10758-1-vasant.hegde@amd.com/ (that with a bit of luck might make it into 6.10-rc3)
Comment 2 Vasant Hegde 2024-06-06 10:06:18 UTC
Right. Above patch and another path to fix EFR issue [1] should fix this issue.

[1] https://lore.kernel.org/all/20240530071118.10297-1-vasant.hegde@amd.com/


Note that all these patches are in Joerg's fix branch. as THorsten mentioned hopefully it will make it into -rc3.

-Vasant
Comment 3 Hanabishi 2024-06-06 10:40:37 UTC
Thanks. I also thought about https://lore.kernel.org/lkml/ZlweciPk77ra9W7H@gmail.com/ being related, but did not test that machine with rc2 yet.

I could apply the aforementioned patches ahead of time to test them though.
Comment 4 Hanabishi 2024-06-06 12:26:11 UTC
But this report turns out to be a duplicate though. The other one is more detailed. So I close this anyway.

*** This bug has been marked as a duplicate of bug 218900 ***