After commit d60096b3b2c2..cd8cc7d31b49 100644 drm-amdgpu-init-iommu~fd-device-init.patch Kernel 5.14.15 on most Ryzen Notebooks X cant't start really. There is a long time, before x is starting, dmesg is spammed with failure messages like Okt 28 10:28:08 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 Okt 28 10:28:21 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 Okt 28 10:28:34 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 Okt 28 10:28:47 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 Okt 28 10:29:01 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 Okt 28 10:29:14 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 Okt 28 10:29:27 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 and/or Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:128 vmid:0 pasid:0, for process pid 0 thread pid 0) Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000872000 from IH client 0x1b (UTCL2) Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00040D00 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CPG (0x6) Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x1 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: RW: 0x1 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:128 vmid:0 pasid:0, for process pid 0 thread pid 0) Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000000000872000 from IH client 0x1b (UTCL2) Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00040D00 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CPG (0x6) Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x0 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x1 Okt 28 10:29:40 kernel: ^[[0;1;39mamdgpu 0000:04:00.0: amdgpu: RW: 0x1 Reverting that commit and the kernel is back working normal. Here the related reports from our users (ignore the nvidia posts). https://forum.siduction.org/index.php?topic=8439.0
I can confirm this for a "04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c2)".
The relevant commit is 714d9e4574d54596973ee3b0624ee4a16264d700
Additional info, after installing the kernel from a working system, 1st boot with that kernel is working flawless. Rebooting with that kernel and the boot is hanging a long time, then the desktop starts but the system is not really usuable. All the problems do not happen after reverting 714d9e4574d54596973ee3b0624ee4a16264d700.
I think this patch set should address the issue: https://patchwork.freedesktop.org/series/96508/
Created attachment 299413 [details] patch to fix Suggest to upgrade to 5.15rc7 and apply this patch, then make a test.
Created attachment 299437 [details] analysis for this issue Linux 5.14.15 + afd1818 can fix the issue. Linux 5.15rc7 re-apply "init iommu after amdkfd device init" and "move iommu_resume before ip init/resume" which overwrote afd1818 caused the issue again. 714d9e4 drm/amdgpu: init iommu after amdkfd device init f02abeb drm/amdgpu: move iommu_resume before ip init/resume afd1818 drm/amdkfd: fix boot failure when iommu is disabled in Picasso. 286826d drm/amdgpu: init iommu after amdkfd device init 9cec53c drm/amdgpu: move iommu_resume before ip init/resume
With linux 5.14.17-rc1 and 5.15.1-rc1 the problem is gone. So i think, that bug is resolved.
*** Bug 214901 has been marked as a duplicate of this bug. ***