Bug 209435 - NMI watchdog : BUG: soft-lock CPU#0 stuck for 22s! [iwlwifi]
Summary: NMI watchdog : BUG: soft-lock CPU#0 stuck for 22s! [iwlwifi]
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless-intel (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Default virtual assignee for network-wireless-intel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-30 19:48 UTC by Lahfa Samy
Modified: 2021-09-30 14:42 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.8.12-arch1-1
Tree: Mainline
Regression: No


Attachments
soft-locked cpu#0 log (1.62 MB, image/jpeg)
2020-09-30 19:52 UTC, Lahfa Samy
Details

Description Lahfa Samy 2020-09-30 19:48:08 UTC
Currently on this kernel, when suspending and then resuming, this shows up along a call trace for the iwlwifi driver, my computer is T495 with a Ryzen 7 3700U with a Vega RX10 integrated graphic card and an Intel Wireless AC-9620.

Kernel version : 5.8.12-arch1-1 #1 SMP PREEMPT Sat, 26 Sep 2020 21:42:58
Note: I'm also using the ZFS module but I don't think it is related to the issue at all just mentioning it in case.

I can't seem to get any data or log from journalctl however I have taken pictures of the logs I see when trying to resume following some kernel parameters to allow more practical debugging of resuming issues.
The soft-lockup of the CPU just keeps on forever for now.

I've then began disabling stuff in the BIOS until I narrowed down that disabling the Wireless Card just solves the bug, thus why I'm strongly believing a driver bug is the issue.

I'd also like to mention that the ArchLinux kernel-lts 5.2.68 is affected by this bug as well.
Comment 1 Lahfa Samy 2020-09-30 19:52:15 UTC
Created attachment 292735 [details]
soft-locked cpu#0 log
Comment 2 Lahfa Samy 2020-09-30 19:54:51 UTC
Erratum : The lts kernel affected by this bug is the 5.4.68-1-lts (as can be seen in the attachment) not 5.2.68.
Comment 3 Lahfa Samy 2020-10-04 15:29:22 UTC
It seems this bug is related to irq, on my computer a call trace was generated for a irq which stated in the dmesg that the 'irqpoll' option should be added as a kernel option.

Having added this option the bug reported here began to affect the system in actually any kernel whatsoever as of now (not just the latest).

If I don't use the irqpoll option there is no freeze during a resume from suspended state.

The dmesg suggest to try the option but the option itself leads to a this bug.
I will later add the call trace from the 'irq nobody cared' that shows up when the 'irqpoll' is desactivated, I'm not sure if this is truly a bug or not anymore.
Comment 4 Johannes Berg 2021-09-20 11:48:44 UTC
This is very strange - iwl_read32() is literally just a readl(), so this would indicate that somehow the platform is stuck?

Yes, the bug is related to an interrupt processing, and irqpoll would change something there, but I don't think irqpoll is what you want.

But ... looks like this bug somehow got dropped, not sure why. Do you even still have this issue?

Note You need to log in before you can comment on or make changes to this bug.