Bug 214549 - Intel AX210 Firmware Reset and Slow Speed - regular scan timed out
Summary: Intel AX210 Firmware Reset and Slow Speed - regular scan timed out
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless-intel (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Default virtual assignee for network-wireless-intel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-28 01:08 UTC by Kevin Anderson
Modified: 2021-12-28 21:08 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.13
Subsystem:
Regression: No
Bisected commit-id:


Attachments
iwl_fw_dump (924.98 KB, application/pgp-encrypted)
2021-09-28 01:08 UTC, Kevin Anderson
Details
dmesg (5.58 KB, text/plain)
2021-09-28 01:08 UTC, Kevin Anderson
Details
iwlwifi patch to set scan timeout to 30 seconds (1.35 KB, patch)
2021-11-29 13:10 UTC, Ilan Peer
Details | Diff
Patch to fix low TPT after HW reconfiguration (1.36 KB, patch)
2021-11-30 15:09 UTC, Ilan Peer
Details | Diff

Description Kevin Anderson 2021-09-28 01:08:11 UTC
Created attachment 298989 [details]
iwl_fw_dump

I am frequently hitting what appears to be a microcode software error that causes connection instability and slow speed. When it errors my maximum bandwidth drops to around ~400k and may or may not recover. If it does not recover after a few minutes I have to restart the interface.

I've saw some theories that it could be related to UniFi Access Points but I don't know that has been proven.

I have attached a firmware dump and am willing to provide any other information that is requested.


ethtool -i wlp170s0 | grep firmware
firmware-version: 63.c04f3485.0 ty-a0-gf-a0-63.uc
Comment 1 Kevin Anderson 2021-09-28 01:08:31 UTC
Created attachment 298991 [details]
dmesg
Comment 2 Ilan Peer 2021-11-29 13:10:31 UTC
Created attachment 299767 [details]
iwlwifi patch to set scan timeout to 30 seconds
Comment 3 Ilan Peer 2021-11-29 13:11:12 UTC
Hi,

The scan data in the FW dump is minimal but based on what we have it seems that driver triggered a fragmented scan (since the link was in high traffic) on all the 6GHz. A fragmented scan means that the scan over each channel can take around 420ms (as it is not continuous). Since there are 58 6GHz channels the scan time can take up to 25 seconds. As the current scan timeout guard is 20 seconds, the timeout fired and triggered a FW reset.

1. I adding a patch to change the timeout to 30 seconds. Would you be able to apply the patch on top of your kernel and try the fix?

2. The more interesting point here is to understand why we got a passive scan on all 6GHz channels. For that can you please try to reproduce the issues with:

- Trace-cmd recording. See https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging.
- wpa_supplicant log + dmesg.

Thanks,

Ilan
Comment 4 Kevin Anderson 2021-11-29 23:41:20 UTC
Hi Ilan,

I gathered the logs before applying your patch. The logs are around 200MB, mainly due to the trace-cmd. I am unsure if there is anything sensitive there so I just GPG encrypted the tar.gz with the 3 keys listed on the iwlwifi debugging page. I am assuming you will be able to open them but if you need them encrypted with a different key please let me know.

Additionally due to the size I couldn't upload them directly so here is a public Google Drive link: https://drive.google.com/file/d/1nih_eJ7egsxjEtq5soNPjJ757ARfvDp-/view?usp=sharing. If you have trouble accessing it or would prefer them through a different mechanism please let me know.

I tested with your patch for 30 minutes without a crash. I intend on letting it run overnight and I will report back tomorrow with the results.

Thanks,
Kevin
Comment 5 Ilan Peer 2021-11-30 15:09:12 UTC
Hi Kevin,

Thanks for you input. Please keep me updated with your findings.

We have a possible root cause for passive scan on all 6GHz channels but we still need to verify it. I'll try to see if the debug data you provided would be helpful.

As for the low TPT encountered after the reset I'm attaching a patch that should fix the issue. If possible give it a try and let me know if it works for you, i.e., you get normal TPT after FW reset.

Regards,

Ilan.
Comment 6 Ilan Peer 2021-11-30 15:09:57 UTC
Created attachment 299797 [details]
Patch to fix low TPT after HW reconfiguration
Comment 7 Kevin Anderson 2021-11-30 22:01:11 UTC
Hi Ilan,

I can confirm that the low TPT patch does help when there is a firmware restart. I had to verify that patch without the scan timeout fix.

With the scan timeout patch applied I ran iperf for a little bit over 12 hours without a single firmware restart!

I intend on continuing to run with these patches till they land in a stable kernel but if you need more information to validate the root cause of the passive scan I am willing to assist if I can.

Thanks,
Kevin
Comment 8 Ilan Peer 2021-12-06 15:32:08 UTC
Hi Kevin,

Thanks for sharing your inputs.

We found the reason for the long passive scan on the 6GHz channels (the one causing the long scan timeout). I'm currently working on a fix for this.

At this stage I do not require any additional input.

Thanks again for helping debug this issue.

Regards,

Ilan.
Comment 9 Kevin Anderson 2021-12-07 01:14:23 UTC
Hi Ilan,

That's great to hear! Will the patch to increase the timeout be submitted upstream or is the more appropriate fix to resolve the long passive scan issue?

Thanks,
Kevin
Comment 10 Ilan Peer 2021-12-07 06:26:49 UTC
Hi Kevin,

The patch that increases the timeout is a valid patch, as there are specific settings in which a passive scan over all 6GHz channels would indeed take 25 seconds (not contiguous). The patch would be submitted to upstream.

The fix I'm working on would avoid passively scanning non-PSC channels in case there is no need to scan them.

Regards,

Ilan.

Note You need to log in before you can comment on or make changes to this bug.