Bug 209435

Summary: NMI watchdog : BUG: soft-lock CPU#0 stuck for 22s! [iwlwifi]
Product: Drivers Reporter: Lahfa Samy (samy)
Component: network-wireless-intelAssignee: Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel)
Status: CLOSED UNREPRODUCIBLE    
Severity: high CC: dion, samy
Priority: P1    
Hardware: Intel   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=204241
Kernel Version: 5.8.12-arch1-1 Tree: Mainline
Regression: No
Attachments: soft-locked cpu#0 log

Description Lahfa Samy 2020-09-30 19:48:08 UTC
Currently on this kernel, when suspending and then resuming, this shows up along a call trace for the iwlwifi driver, my computer is T495 with a Ryzen 7 3700U with a Vega RX10 integrated graphic card and an Intel Wireless AC-9620.

Kernel version : 5.8.12-arch1-1 #1 SMP PREEMPT Sat, 26 Sep 2020 21:42:58
Note: I'm also using the ZFS module but I don't think it is related to the issue at all just mentioning it in case.

I can't seem to get any data or log from journalctl however I have taken pictures of the logs I see when trying to resume following some kernel parameters to allow more practical debugging of resuming issues.
The soft-lockup of the CPU just keeps on forever for now.

I've then began disabling stuff in the BIOS until I narrowed down that disabling the Wireless Card just solves the bug, thus why I'm strongly believing a driver bug is the issue.

I'd also like to mention that the ArchLinux kernel-lts 5.2.68 is affected by this bug as well.
Comment 1 Lahfa Samy 2020-09-30 19:52:15 UTC
Created attachment 292735 [details]
soft-locked cpu#0 log
Comment 2 Lahfa Samy 2020-09-30 19:54:51 UTC
Erratum : The lts kernel affected by this bug is the 5.4.68-1-lts (as can be seen in the attachment) not 5.2.68.
Comment 3 Lahfa Samy 2020-10-04 15:29:22 UTC
It seems this bug is related to irq, on my computer a call trace was generated for a irq which stated in the dmesg that the 'irqpoll' option should be added as a kernel option.

Having added this option the bug reported here began to affect the system in actually any kernel whatsoever as of now (not just the latest).

If I don't use the irqpoll option there is no freeze during a resume from suspended state.

The dmesg suggest to try the option but the option itself leads to a this bug.
I will later add the call trace from the 'irq nobody cared' that shows up when the 'irqpoll' is desactivated, I'm not sure if this is truly a bug or not anymore.
Comment 4 Johannes Berg 2021-09-20 11:48:44 UTC
This is very strange - iwl_read32() is literally just a readl(), so this would indicate that somehow the platform is stuck?

Yes, the bug is related to an interrupt processing, and irqpoll would change something there, but I don't think irqpoll is what you want.

But ... looks like this bug somehow got dropped, not sure why. Do you even still have this issue?