Bug 218617

Summary: Linux Kernel Bug Report: "Scheduling while atomic" Kernel Panic and System Freeze on NVIDIA RTX 2000 Ada Generation Laptop GPU
Product: Drivers Reporter: Sarah S. (sarah.salzstein)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED WILL_NOT_FIX    
Severity: blocking    
Priority: P3    
Hardware: Intel   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: Relevant dmesg output

Description Sarah S. 2024-03-20 18:44:28 UTC
Created attachment 306013 [details]
Relevant dmesg output

Dear Linux Kernel Development Team,

I am writing to report a critical issue I have been encountering with the Linux kernel, specifically related to my NVIDIA RTX 2000 Ada Generation Laptop GPU, installed in a Lenovo ThinkPad P1 Gen 6. The problem manifests as a "scheduling while atomic" kernel bug, followed by a complete system freeze.

Description of the Problem:
Randomly during system operation, the kernel gives the error message "scheduling while atomic". This error occurs seemingly at random intervals and under varying system loads. Subsequently, at some point, the system becomes unresponsive and necessitates a hard reboot to regain functionality.

Steps to Reproduce:
- The error occurs randomly during system operation.
- The system becomes unresponsive, leading to a complete freeze.


System Information:
- GPU: NVIDIA Corporation AD107GLM [RTX 2000 Ada Generation Laptop GPU] (rev a1)
Linux Distribution: Gentoo Linux
- Kernel Version: 6.8.1 (Also affects kernel versions 6.7.x, 6.6.x, and 6.1.x)

Additional Information:
The issue persists across multiple kernel versions, indicating it is not specific to a particular kernel release.
I have examined the system logs and have identified the occurrence of the "scheduling while atomic" error as the primary issue leading to the kernel panic and subsequent system freeze.
No specific system activity or workload triggers the error; it happens seemingly at random.
I have ensured that the GPU drivers are up to date and have attempted to reinstall them without resolving the issue, but no matter which NVIDIA driver version I install, the bug consistently persists.

Attached Logs:
Attached the relevant portion of the dmesg output showing the BUG.

This issue severely affects the usability and stability of my system, and I kindly request your assistance in resolving it promptly. If there are any additional diagnostic steps or information required from my end, please let me know, and I will gladly provide it.

Thank you for your attention to this matter.

Sincerely,
Sarah S.
Comment 1 Artem S. Tashkinov 2024-03-20 20:24:33 UTC
If it's an open source NVIDIA module, you report your bug here: https://github.com/NVIDIA/open-gpu-kernel-modules/issues

If it's a closed source NVIDIA driver, you report your bug here: https://forums.developer.nvidia.com/c/gpu-graphics/linux/148

Unfortunately kernel developer can't and won't help you with out of tree modules.
Comment 2 Sarah S. 2024-03-20 20:28:38 UTC
Thank you for letting me know!

Sincerely,
Sarah S.