Created attachment 299177 [details] System journal logs WiFi client device: AX210 WiFi AP: Unifi AP AC Pro Reproduction steps: 1. Start iperf to generate network load. 2. Perform `nmcli device wifi rescan`. 3. Expect: no device restart. Actual: device restarts (see logs) and network speeds become abysmally slow or non-responsive until disconnecting and reconnecting to WiFi network. I can reliably reproduce. This occurs regardless of whether the connected network is 5 GHz only, 2.4 GHz only, or 2.4+5 GHz. The network is 802.11ac. However, when network is 2.4 GHz only, network speed reduction seems to be less severe. Since NetworkManager performs rescans periodically, this bug occurs quite readily if performing continuous high network load.
Setting modprobe config `options iwlwifi disable_11ax=Y` appears to workaround the issue (note that the original problem occurs when connected to non-ax networks), but of course comes with the caveat the 802.11ax is disabled.
Created attachment 299667 [details] Fix - formatted patch. Hey Andrew, Can you please check if this fix works for you?
Hi Golan, thanks for the patch! I patched atop 5.14.18 and removed the iwlwifi param override workaround and ran some tests. Results using my original reproduction steps: 1. First verified that unpatched Kernel still exhibits issue—it does. 2. Ran test once, resulted in microcode error (it appears broadly the same as the original). However, network performance recovers after a second (unlike before where performance would tank indefinitely) 3. Repeated test multiple times and no microcode error occurred. 4. Disconnected from WiFi network and reconnected, then re-ran (2) and (3) with the same results. So overall definitely a lot better, if imperfect. Exhibits a microcode error seemingly once per connection, and recovery from error is much improved without the need for manual intervention.
Thanks for the extensive tests and update Andrew! So, the patch I shared was intended to fix the slow tx after restart. Now, as you can see, we're back to the high tx rates very close after the fw restart. As for the NMI/fw restart itself - we'll need to complete that investigation.
Hi Golan, It appears I am having the same issue as Andrew and I can confirm that the provided patch reduces the poor/no throughput after a firmware restart from the previous 1-5 minutes down to a few seconds reliably. I previously posted on the Intel support forum about this issue here https://community.intel.com/t5/Wireless/Intel-AX210-Firmware-Reset-under-Load/m-p/1328150. If I can provide any additional data points or logs regarding the NMI/fw restart please let me know! - Kevin
There is no much data in the attached log, but could be the scan timeout is a duplicated issue to another bug. Can you please try the attached patch on top of your kernel and see if it solves the issue?
Created attachment 299841 [details] iwlwifi patch to set scan timeout to 30 seconds
Hi Ayala, I am not able to reproduce the SW error with the provided patch.
Created attachment 299895 [details] attachment-26212-0.html Hi, Thank you for your email. I'm OOO Su-Tue (5-7/12/21). Available by WhatsApp for urgent issues. Thanks, Golan
Looks like both of the patches here are in 5.18, and I'm still getting lockups... Should I file a new issue?