Bug 217905 - [Solved] Kernel 6.6-rc1 fails to reboot or shutdown Ryzen 5825U /Renoir
Summary: [Solved] Kernel 6.6-rc1 fails to reboot or shutdown Ryzen 5825U /Renoir
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Linux
Classification: Unclassified
Component: Kernel (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Virtual assignee for kernel bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-13 02:40 UTC by Tester47
Modified: 2023-09-20 10:49 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Tester47 2023-09-13 02:40:02 UTC
The Kernel stalls at boot very long with a drm-amdgpu message, but fails to restart or shutdown with secure boot enabled or not. Magic key works to exit. Nothing wrong in the Kernel 6.5 cycle.

sudo journalctl -b | curl -F 'file=@-' 0x0.st

http://0x0.st/HfUP.txt
Comment 1 Bagas Sanjaya 2023-09-13 07:45:43 UTC
(In reply to Tester47 from comment #0)
> The Kernel stalls at boot very long with a drm-amdgpu message, but fails to
> restart or shutdown with secure boot enabled or not. Magic key works to
> exit. Nothing wrong in the Kernel 6.5 cycle.
> 
> sudo journalctl -b | curl -F 'file=@-' 0x0.st
> 
> http://0x0.st/HfUP.txt

I don't see journal logs when shutting down. Can you attach one?

Also, you need to do bisection. If you don't know how, see Documentation/admin-guide/bug-bisect.rst for instructions.
Comment 2 Tester47 2023-09-13 16:25:39 UTC
Let me be clearer, it does not shutdown at all: magic key for shut down has no effect (o or b). The keyboard is dead. Plus, $ shutdown -r now hangs too. Restart works when using Alt+PrtSc+b. Same when booting stalls for long.

We started bisecting with 20230903 daily kernel, the bug was there. 6.6-rc1 has been removed. Take good note that next boot log after shutdown may or may not be the same log. Plus, booting requires now and then magic key to restart, because the Kernel hangs.  In this case, we must click enter twice + Esc to boot in desktop. 

It booted ok after a cold shutdown with enter twice and ESC ounce + backspace. Here's the shutdown log with $ shutdown now and, cold shutdown:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

http://0x0.st/HfD7.txt
Comment 3 Tester47 2023-09-14 01:37:43 UTC
Unfortunately, the latest kernel available is the one on the previous post. They seem to keep it for 7 days only. We have access to the drm tip kernel, and they all work. Today's kernel has the same issue 20230913 + 20230912. The bug was introduced on Monday Sept. 11th.

Works fine up to the 9th:
$ uname -a
Linux mm 6.5.0-060500drmtip20230907-generic #202309070203 SMP PREEMPT_DYNAMIC Thu Sep  7 02:24:17 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Bug was introduced there:
$ uname -a
Linux mm 6.6.0-060600rc1drmtip20230912-generic #202309120203 SMP PREEMPT_DYNAMIC Tue Sep 12 02:20:48 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux


This is the best I can do. Build from source takes too long, fan is noisy and keyboard overheats.

https://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/

Select 20230912 and click on changes to see the commits.

For boot:
sudo cat /var/log/syslog | curl -F 'file=@-' 0x0.st

http://0x0.st/Hf5b.txt

sudo journalctl -b | curl -F 'file=@-' 0x0.st

http://0x0.st/Hf5c.txt
Comment 4 Tester47 2023-09-14 01:54:17 UTC
$ sudo cat /var/log/syslog | curl -F 'file=@-' 0x0.st

http://0x0.st/Hf5m.txt

cc@mm:~$ sudo journalctl -b | curl -F 'file=@-' 0x0.st

http://0x0.st/Hf5a.txt


In all cases, tpm and secure boot are enabled. If secure boot is disabled, when we shut down, magic key works to restart.

On Monday, we were running Arch (miffe repo) dual boot with Manjaro (zip package github), and in both, the same issue was present.
Comment 5 Tester47 2023-09-14 02:03:31 UTC
This where it stalls for restart. Shut down hangs at the Lenovo image:

Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, active_cu_number 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring gfx_low uses VM inv eng 1 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring gfx_high uses VM inv eng 4 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 5 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 6 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 7 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 8 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 9 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 10 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 11 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 12 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 13 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 8
Sep 13 21:43:08 mm kernel: [drm] Initialized amdgpu 3.54.0 20150101 for 0000:04:00.0 on minor 0
Sep 13 21:43:08 mm kernel: fbcon: amdgpudrmfb (fb0) is primary device
Sep 13 21:43:08 mm kernel: [drm] DSC precompute is not needed.

Operating System: Kubuntu 23.10
KDE Plasma Version: 5.27.8
KDE Frameworks Version: 5.110.0
Qt Version: 5.15.10
Kernel Version: 6.6.0-060600rc1drmtip20230912-generic (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 5825U with Radeon Graphics
Memory: 13.5 GiB of RAM
Graphics Processor: AMD Radeon Graphics
Manufacturer: LENOVO
Product Name: 82RN
System Version: IdeaPad 3 15ABA7

Erratum: If secure boot is disabled, when we shut down, magic key works to restart. This stands for switching to X11.
Comment 6 Bagas Sanjaya 2023-09-14 07:07:22 UTC
Can you also open an issue on freedesktop tracker (as IMO amdgpu is involved) [1]?

[1]: https://gitlab.freedesktop.org/drm/amd/-/issues
Comment 7 Tester47 2023-09-14 13:55:08 UTC
Recap

This is exactly where Kernel 6.6-rc1 stalls on cold boot too for about 2 minutes. Probably the same place in Arch, but splash screen is much faster at boot. Clicking enter twice + ESC + backspace reduces the delay to a few seconds.

On Monday, our issue was introduced in the drm tip Kernel. One of the commits was already in Linux-next before Sept 3rd. It shoudn't be that hard to find. I'll be waiting for a correction in Linux-next (Ideapad 3 / Renoir).

Thanks in advance,

Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, active_cu_number 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring gfx_low uses VM inv eng 1 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring gfx_high uses VM inv eng 4 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 5 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 6 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 7 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 8 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 9 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 10 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 11 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 12 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 13 on hub 0
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 8
Sep 13 21:43:08 mm kernel: amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 8
Sep 13 21:43:08 mm kernel: [drm] Initialized amdgpu 3.54.0 20150101 for 0000:04:00.0 on minor 0
Sep 13 21:43:08 mm kernel: fbcon: amdgpudrmfb (fb0) is primary device
Sep 13 21:43:08 mm kernel: [drm] DSC precompute is not needed.

I'm done with that bug!
Comment 8 Bagas Sanjaya 2023-09-14 14:17:27 UTC
(In reply to Tester47 from comment #7)
> Recap
> 
> This is exactly where Kernel 6.6-rc1 stalls on cold boot too for about 2
> minutes. Probably the same place in Arch, but splash screen is much faster
> at boot. Clicking enter twice + ESC + backspace reduces the delay to a few
> seconds.
> 
> On Monday, our issue was introduced in the drm tip Kernel. One of the
> commits was already in Linux-next before Sept 3rd. It shoudn't be that hard
> to find. I'll be waiting for a correction in Linux-next (Ideapad 3 / Renoir).
> 

What fix?

But you replied earlier that compiling kernel on your machine is slow and
have (perhaps) hardware issues. Can you build the kernel on more powerful
computer and copy bzImage to the machine you have this regression instead?
Sorry, but you have to complete the bisection to get the exact culprit
or developers here will likely to ignore this bug report.
Comment 9 Bagas Sanjaya 2023-09-14 14:18:08 UTC
(In reply to Tester47 from comment #3)
> Unfortunately, the latest kernel available is the one on the previous post.
> They seem to keep it for 7 days only. We have access to the drm tip kernel,
> and they all work. Today's kernel has the same issue 20230913 + 20230912.
> The bug was introduced on Monday Sept. 11th.
> 
> Works fine up to the 9th:
> $ uname -a
> Linux mm 6.5.0-060500drmtip20230907-generic #202309070203 SMP
> PREEMPT_DYNAMIC Thu Sep  7 02:24:17 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
> 
> Bug was introduced there:
> $ uname -a
> Linux mm 6.6.0-060600rc1drmtip20230912-generic #202309120203 SMP
> PREEMPT_DYNAMIC Tue Sep 12 02:20:48 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
> 

Exact commit range please.
Comment 10 Tester47 2023-09-17 14:37:24 UTC
The correction is in Linux-next, but we get a stop job running for 1min 30sec for restart and shutdown:

This kills the delay:

$ sudo systemctl stop snapd.service
Warning: Stopping snapd.service, but it can still be activated by:
  snapd.socket
$ sudo systemctl disable snapd.service
Removed "/etc/systemd/system/multi-user.target.wants/snapd.service".


$ uname -a
Linux mm 6.6.0-060600rc1daily20230917-generic #202309162200 SMP PREEMPT_DYNAMIC Sun Sep 17 02:07:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

All is good now,
Comment 11 Bagas Sanjaya 2023-09-18 00:08:53 UTC
On 17/09/2023 21:37, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217905
> 
> Tester47 (e598@gmx.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |RESOLVED
>          Resolution|---                         |PATCH_ALREADY_AVAILABLE
>          Regression|No                          |Yes
>             Summary|Kernel 6.6-rc1 fails to     |[Solved] Kernel 6.6-rc1
>                    |reboot or shutdown Ryzen    |fails to reboot or shutdown
>                    |5825U                       |Ryzen 5825U /Renoir
> 
> --- Comment #10 from Tester47 (e598@gmx.com) ---
> The correction is in Linux-next, but we get a stop job running for 1min 30sec
> for restart and shutdown:
> 
> This kills the delay:
> 
> $ sudo systemctl stop snapd.service
> Warning: Stopping snapd.service, but it can still be activated by:
>   snapd.socket
> $ sudo systemctl disable snapd.service
> Removed "/etc/systemd/system/multi-user.target.wants/snapd.service".
> 

Do you have many snaps running?

> 
> $ uname -a
> Linux mm 6.6.0-060600rc1daily20230917-generic #202309162200 SMP
> PREEMPT_DYNAMIC
> Sun Sep 17 02:07:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
> 

What commit?
Comment 12 Tester47 2023-09-19 01:43:09 UTC
My email appears in the previous post, could you please erase it. Don't you have privacy rules?

This a dead link:

admin@kernel.org
Comment 13 Artem S. Tashkinov 2023-09-20 10:49:05 UTC
(In reply to Tester47 from comment #12)
> My email appears in the previous post, could you please erase it. Don't you
> have privacy rules?
> 

Your email has already leaked to the webs and that's the "perk" of debugging Linux publicly.

https://www.google.com/search?q=%22e598%40gmx.com%22

There's no point in removing it here.

Please do not scramble the title of the bug report - it's useful for other people and search engines.

Note You need to log in before you can comment on or make changes to this bug.