|Summary:||NMI watchdog : BUG: soft-lock CPU#0 stuck for 22s! [iwlwifi]|
|Product:||Drivers||Reporter:||Lahfa Samy (samy)|
|Component:||network-wireless-intel||Assignee:||Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel)|
|Attachments:||soft-locked cpu#0 log|
Description Lahfa Samy 2020-09-30 19:48:08 UTC
Currently on this kernel, when suspending and then resuming, this shows up along a call trace for the iwlwifi driver, my computer is T495 with a Ryzen 7 3700U with a Vega RX10 integrated graphic card and an Intel Wireless AC-9620. Kernel version : 5.8.12-arch1-1 #1 SMP PREEMPT Sat, 26 Sep 2020 21:42:58 Note: I'm also using the ZFS module but I don't think it is related to the issue at all just mentioning it in case. I can't seem to get any data or log from journalctl however I have taken pictures of the logs I see when trying to resume following some kernel parameters to allow more practical debugging of resuming issues. The soft-lockup of the CPU just keeps on forever for now. I've then began disabling stuff in the BIOS until I narrowed down that disabling the Wireless Card just solves the bug, thus why I'm strongly believing a driver bug is the issue. I'd also like to mention that the ArchLinux kernel-lts 5.2.68 is affected by this bug as well.
Comment 1 Lahfa Samy 2020-09-30 19:52:15 UTC
Created attachment 292735 [details] soft-locked cpu#0 log
Comment 2 Lahfa Samy 2020-09-30 19:54:51 UTC
Erratum : The lts kernel affected by this bug is the 5.4.68-1-lts (as can be seen in the attachment) not 5.2.68.
Comment 3 Lahfa Samy 2020-10-04 15:29:22 UTC
It seems this bug is related to irq, on my computer a call trace was generated for a irq which stated in the dmesg that the 'irqpoll' option should be added as a kernel option. Having added this option the bug reported here began to affect the system in actually any kernel whatsoever as of now (not just the latest). If I don't use the irqpoll option there is no freeze during a resume from suspended state. The dmesg suggest to try the option but the option itself leads to a this bug. I will later add the call trace from the 'irq nobody cared' that shows up when the 'irqpoll' is desactivated, I'm not sure if this is truly a bug or not anymore.
Comment 4 Johannes Berg 2021-09-20 11:48:44 UTC
This is very strange - iwl_read32() is literally just a readl(), so this would indicate that somehow the platform is stuck? Yes, the bug is related to an interrupt processing, and irqpoll would change something there, but I don't think irqpoll is what you want. But ... looks like this bug somehow got dropped, not sure why. Do you even still have this issue?