Created attachment 307597 [details] 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43 kernel log when the system didn't shut off properly 1. Please describe the problem: I booted the Fedora Rawhide KDE live image Fedora-KDE-Desktop-Live-Rawhide-20250201.n.0.x86_64.iso which had the kernel 6.14.0-0.rc0.20250130git72deda0abee6.11.fc42 on bare metal from a USB flash drive on an hp laptop with an AMD A10-9620P CPU and integrated Radeon R5 GPU. I used the live image. I selected Shut Down from the Application Launcher menu. The system showed the Plymouth spinner screen. I pressed Esc to show the shutdown messages. The normal shutdown messages appeared with the last being something like Unmounting /oldroot. The screen shut off. The laptop's power stayed on instead of shutting off. The fan became progressively louder over the next few minutes. sysrq+alt+b didn't reboot the system. I had to hold down the power button for 5 seconds to shut off the system. I reproduced the problem a few times with Fedora-KDE-Desktop-Live-Rawhide-20250204.n.0.x86_64.iso which had 6.14.0-0.rc1.15.fc42 and Fedora-KDE-Desktop-Live-Rawhide-20250208.n.0.x86_64.iso which had 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43. The problem also happened when rebooting. I installed 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43 in my F41 KDE installation and shut down. The problem happened after systemd-journald stopped so I couldn't see what the problem was in the kernel log. I'm attaching the kernel log with debug added to the kernel command line. The problem didn't happen with 6.13.1 or earlier, so I guess it was introduced in the 6.14 merge window before 72deda0abee6. I might try to bisect. The problem didn't happen in QEMU/KVM VMs in GNOME Boxes, so it might be hardware related. 2. What is the Version-Release number of the kernel: 6.14.0-0.rc0.20250130git72deda0abee6.11.fc42 to 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43 3. Did it work previously in Fedora? If so, what kernel version did the issue *first* appear? Old kernels are available for download at https://koji.fedoraproject.org/koji/packageinfo?packageID=8 : Yes. I first saw the issue first with 6.14.0-0.rc0.20250130git72deda0abee6.11.fc42 4. Can you reproduce this issue? If so, please provide the steps to reproduce the issue below: 1. Download Fedora-KDE-Desktop-Live-42-20250208.n.0.x86_64.iso from https://koji.fedoraproject.org/koji/buildinfo?buildID=2653640 2. Start Fedora Media Writer 3. Write Fedora-KDE-Desktop-Live-42-20250208.n.0.x86_64.iso to a USB flash drive 4. Reboot 5. Boot Fedora-KDE-Desktop-Live-42-20250208.n.0.x86_64.iso from the USB flash drive on bare metal on a system affected by this problem 6. Select Shut Down in the Application Launcher menu in Plasma 5. Does this problem occur with the latest Rawhide kernel? To install the Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by ``sudo dnf update --enablerepo=rawhide kernel``: Yes 6. Are you running any modules that not shipped with directly Fedora's kernel?: No 7. Please attach the kernel logs. You can get the complete kernel log for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the issue occurred on a previous boot, use the journalctl ``-b`` flag. Reproducible: Always I reported this problem at https://bugzilla.redhat.com/show_bug.cgi?id=2344500
Please bisect. https://docs.kernel.org/admin-guide/bug-bisect.html
I bisected. The first bad commit involved amdgpu pm. ff69bba05f085cd6d4277c27ac7600160167b384 is the first bad commit commit ff69bba05f085cd6d4277c27ac7600160167b384 (HEAD) Author: Boyuan Zhang <boyuan.zhang@amd.com> Date: Wed Oct 2 23:52:01 2024 -0400 drm/amd/pm: add inst to dpm_set_powergating_by_smu Add an instance parameter to amdgpu_dpm_set_powergating_by_smu() function, and use the instance to call set_powergating_by_smu(). v2: remove duplicated functions. remove for-loop in amdgpu_dpm_set_powergating_by_smu(), and temporarily move it to amdgpu_dpm_enable_vcn(), in order to keep the exact same logic as before, until further separation in next patch. v3: drop SI logic in amdgpu_dpm_enable_vcn(). Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 14 +++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 6 +++--- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 4 ++-- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 37 ++++++++++++++++++++++++++----------- drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h | 3 ++- 16 files changed, 59 insertions(+), 43 deletions(-) My GPU is a gfx_v8, and there's a gfx_v8_0.c changed in that patch. When I booted 6.14.0-0.rc1.20250207gitbb066fe812d6.19.fc43 with nomodeset on the kernel command line and the simpledrm kernel driver was used, the system shut down normally. I reported this problem at https://gitlab.freedesktop.org/drm/amd/-/issues/3959
Alex Deucher wrote a patch which fixed the problem https://gitlab.freedesktop.org/drm/amd/-/issues/3959#note_2773981