Bug 216331

Summary: Kernel 5.18.13 freezes VT display
Product: Drivers Reporter: Jingyuan Deng (1700011628)
Component: Console/FramebuffersAssignee: James Simmons (jsimmons)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: javier, jeremy, mario.limonciello, yixxt
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://github.com/NVIDIA/open-gpu-kernel-modules/issues/341
Kernel Version: 5.18.13 Subsystem:
Regression: No Bisected commit-id:

Description Jingyuan Deng 2022-08-06 07:14:18 UTC
After updating to kernel 5.18.13, I met a bug that VT display randomly freezes, but actually tty is working, so that I can type blindly to run any commands. At the same times Xorg and wayland work fine. So like starting a xterm or kmscon can workaround.

Similar problems occur on those devices running nouveau, Nvidia, Nvidia-open and even optimums laptops that blocking all three drivers above. For example on my optimums laptop I only use amdgpu and blacklisting nouveau and not installing neither Nvidia nor Nvidia-open but such problem still occurs.

Although Nvidia driver itself is an out-of-tree module, I have to say nouveau is not. And since this bug breaks nouveau and even optimums device using iGPU driver only, I think it is right to report the problem to kernel.org

Thanks.


Here are some problems may have relation although not quite same:

https://bbs.archlinux.org/viewtopic.php?id=278350
https://bbs.archlinux.org/viewtopic.php?id=278365
https://bugzilla.kernel.org/show_bug.cgi?id=216303
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/341
Comment 1 Jingyuan Deng 2022-08-06 10:48:01 UTC
Forgot to mention: I do not know whether problem is from Commit ee7a69aa38d87a3bbced7b8245c732c05ed0c6ec , because seems that our problems are not really same, but I met someone with same problem and know this problem occurs after 5.18.13
Comment 2 Jingyuan Deng 2022-08-06 17:30:50 UTC
I have to say I may mistaken this problem. Possibly because of amdgpu rather than nvidia or nouveau. Wait me for some time to find it out.
Comment 3 Artem S. Tashkinov 2022-08-09 03:24:34 UTC
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/341
Comment 4 Jingyuan Deng 2022-08-09 06:47:08 UTC
(In reply to Artem S. Tashkinov from comment #3)
> https://github.com/NVIDIA/open-gpu-kernel-modules/issues/341

But for me I have this problem on Amd+Nvidia optimums device, and blocking nvidia does not make any change. I still have this problem using nouveau or even just using amdgpu and blacklisting any nv driver.

In this github issue it is also reported ( at comments below ) that tty freezes even without Nvidia driver. Besides, I hear that on Loongson device there are similar problems on 5.19. So I have to say it may be a kernel problem rather than Nvidia problem
Comment 5 Jingyuan Deng 2022-08-09 06:49:42 UTC
(In reply to Jingyuan Deng from comment #2)
> I have to say I may mistaken this problem. Possibly because of amdgpu rather
> than nvidia or nouveau. Wait me for some time to find it out.

I find TTY freezes on amdgpu. And if I just use Nvidia these problem will not happen. However I can not tell if it is caused by amdgpu since many other device report similar prolem.
Comment 6 Jingyuan Deng 2022-08-09 06:50:35 UTC
And a small clue: kmscon will not get frozen. Only kernel tty does.
Comment 7 Javier Martinez Canillas 2022-08-09 10:53:11 UTC
(In reply to Jingyuan Deng from comment #1)
> Forgot to mention: I do not know whether problem is from Commit
> ee7a69aa38d87a3bbced7b8245c732c05ed0c6ec , because seems that our problems
> are not really same, but I met someone with same problem and know this
> problem occurs after 5.18.13

Does reverting the mentioned commit make the issue to go away? Otherwise, it may not be the same issue that the one reported in bug #216303.
Comment 8 Ryan H 2022-08-16 18:50:09 UTC
This has been an issue for me as well since Kernel 5.18.13 using amdgpu. As of Kernel 5.19.1 the issue is still ongoing.

5.18.12 is the last kernel that this issue does not occur.
Comment 9 Mario Limonciello (AMD) 2022-08-17 12:12:11 UTC
There's another regression in amdgpu.
See if https://gitlab.freedesktop.org/agd5f/linux/-/commit/a6250bdb6c4677ee77d699b338e077b900f94c0c

Helps you. It's headed to 6.0-rc2.
Comment 10 Jingyuan Deng 2022-08-18 10:02:03 UTC
(In reply to Ryan H from comment #8)
> This has been an issue for me as well since Kernel 5.18.13 using amdgpu. As
> of Kernel 5.19.1 the issue is still ongoing.
> 
> 5.18.12 is the last kernel that this issue does not occur.

Two questions:
1、Do you have other GPU or AMD only?
2、What AMD graphic cards do you use?
Comment 11 Ryan H 2022-08-18 15:33:50 UTC
(In reply to Jingyuan Deng from comment #10)
> (In reply to Ryan H from comment #8)
> > This has been an issue for me as well since Kernel 5.18.13 using amdgpu. As
> > of Kernel 5.19.1 the issue is still ongoing.
> > 
> > 5.18.12 is the last kernel that this issue does not occur.
> 
> Two questions:
> 1、Do you have other GPU or AMD only?
> 2、What AMD graphic cards do you use?

1. Amd only with Amd FX 8300 cpu, no IGP. 

2. Radeon RX 570
Comment 12 Ryan H 2022-08-19 00:48:06 UTC
Just built and ran kernel 5.19.2 and the same problem. 

Some other details, I'm always in a X session switching over to an tty when the bug happens. Half the time the tty is just frozen and I can switch over back to X running KDE and move my mouse for a couple seconds then the mouse and keyboard gets frozen. Also I get those green and pink outlines stuck on screen of whatever windows, text or pictures I was looking at when the error occurred.
Comment 13 Mario Limonciello (AMD) 2022-08-19 00:49:23 UTC
Please have a try with the patch I linked above.
Comment 14 Ryan H 2022-08-22 15:26:24 UTC
(In reply to Mario Limonciello (AMD) from comment #13)
> Please have a try with the patch I linked above.

Wow! I patched kernel 5.19.2 on Friday, the patch applied cleanly and I ran it for just over 24 hours with no crash or bug with tty VT switching. I also patched kernel 5.19.3 Sunday and I'm running it right now no crashes or freezes. These uptimes are many hours longer than any kernel I ran since 5.18.13 with this bug. So I guess the patch solved my issue. 

Thank you much!
Comment 15 Mario Limonciello (AMD) 2022-08-22 18:17:05 UTC
Great!  Here is the commit as landed in 6.0-rc2.
https://github.com/torvalds/linux/commit/a6250bdb6c4677ee77d699b338e077b900f94c0c

It is CC to stable and will backport to 5.18.y and 5.19.y as well.