Bug 209237 - Sometimes Intel AX201 cannot connect to AP, Reason: 2=PREV_AUTH_NOT_VALID
Summary: Sometimes Intel AX201 cannot connect to AP, Reason: 2=PREV_AUTH_NOT_VALID
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless-intel (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Default virtual assignee for network-wireless-intel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-11 06:54 UTC by Kai-Heng Feng
Modified: 2020-12-11 07:30 UTC (History)
2 users (show)

See Also:
Kernel Version: mainline 5.9-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
journalctl -b with 2.4g connection (2.33 MB, text/plain)
2020-09-11 06:56 UTC, Kai-Heng Feng
Details
journalctl -b with 5g connection (587.62 KB, text/plain)
2020-09-11 06:56 UTC, Kai-Heng Feng
Details
devcoredump (1.43 KB, application/octet-stream)
2020-10-21 12:40 UTC, Kai-Heng Feng
Details
FW ver 62 (1.27 MB, application/octet-stream)
2020-12-07 11:24 UTC, Antara
Details

Description Kai-Heng Feng 2020-09-11 06:54:39 UTC
All linux, linux-firmware and wpa are mainline git.

After each reboot, there's a 10~20% chance that NetworkManager's automatic WiFi connection fails.

Originally thought this has something to do with reboot, but actually it can be easily reproduced with a scripts to connect/disconnect WiFi in a loop.
Comment 1 Kai-Heng Feng 2020-09-11 06:56:26 UTC
Created attachment 292461 [details]
journalctl -b with 2.4g connection
Comment 2 Kai-Heng Feng 2020-09-11 06:56:46 UTC
Created attachment 292463 [details]
journalctl -b with 5g connection
Comment 3 Antara 2020-10-05 06:58:05 UTC
Hi Kai-Heng Feng,

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging. This wiki details how to collect trace, sniffer or even FW dumps.

Please collect trace-cmd logs of iwlwifi, mac80211, cfg80211 drivers for the failure scenario for further analysis.
eg.
sudo trace-cmd record -e iwlwifi -e mac80211 -e cfg80211 -e iwlwifi_msg

Also please collect the wpa-supplicant logs with highest debugging level by using -ddd option and -f option to store it to a log file.

eg.
sudo wpa_supplicant -i wlp1s0 -c /etc/wpa_supplicant.conf -ddd -f /var/log/wpa_supplicant.log

Regards,
Antara
Comment 4 Kai-Heng Feng 2020-10-06 03:04:29 UTC
Both files are too large to attach so I use Google Drive.

journalctl-b which contains both dmesg and wpasupplicant log:
https://drive.google.com/file/d/1F1amSHG-tjRUavnbn-Rpxtb0INk58uZV/view?usp=sharing

trace.dat:
https://drive.google.com/file/d/17XmaERT1wEtirv-y7gyAb6ezmYQ_f9Pb/view?usp=sharing
Comment 5 Antara 2020-10-12 10:54:04 UTC
(In reply to Kai-Heng Feng from comment #4)
> Both files are too large to attach so I use Google Drive.
> 
> journalctl-b which contains both dmesg and wpasupplicant log:
> https://drive.google.com/file/d/1F1amSHG-tjRUavnbn-Rpxtb0INk58uZV/
> view?usp=sharing
> 
> trace.dat:
> https://drive.google.com/file/d/17XmaERT1wEtirv-y7gyAb6ezmYQ_f9Pb/
> view?usp=sharing

Hi Kai-Heng Feng

Thanks for the logs. I am checking the logs but it is difficult to corelate the supplicant logs with the trace logs since the timestamp in both the logs are different.

Can you please start wpa_supplicant with '-T' option along with the default options and then collect the trace-cmd logs so that trace logs will have both  supplicant and driver prints.

Regards,
Antara
Comment 6 Antara 2020-10-13 09:25:13 UTC
Hi Kai-Heng Feng,

From the logs I can see that m4 is being sent by supplicant and forwarded by the driver and it is getting success from the FW for the 4way handshake. But it seems AP does not receive this message. Can you please provide the FW dump and sniffer pcap log if possible for further analysis.

Please use the procedure mentioned in the below wiki link.
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging

Regards,
Antara
Comment 7 Kai-Heng Feng 2020-10-21 12:40:08 UTC
Created attachment 293121 [details]
devcoredump
Comment 9 Kai-Heng Feng 2020-10-21 12:47:34 UTC
Following packages are latest:
linux:
3e4ee0ad7a71bd308847ae36c619866e4e2e6f19 Revert "iwlwifi: remove wide_cmd_header field"

linux-firmware
4f41e9d6437d03e7ce9279076107dd08f46656f3 iwlwifi: update and add new FWs from core56-54 release

wpa
2d8a7cf3f551e427b940492499c4ec134b586b6b tests: Update dpp_controller_rx_errors to use the assigned TCP port

NetworkManager
b6d6a16b2a1c5e4593ba5ff22a2e11f23dca93c9 license: merge branch 'th/keyfile-relicense-as-lgpl'
Comment 10 Anthony Wong 2020-10-28 03:34:39 UTC
Hi Antara, do you have any update or need more information?
Comment 11 Antara 2020-10-28 08:51:04 UTC
Hi Kai-heng Feng,

Please let me know the network key you are using to analyse the sniffer PCAP logs.

Also the FW dump has not captured any data. I tried the same insturctions at my end and I was able to capture the dump with the procedure mentioned in the wiki.

Can you please try once and capture the FW dump properly. Also please provide the trace-cmd log and fw dump for the same run.

Regards,
Antara
Comment 12 Kai-Heng Feng 2020-10-29 08:55:27 UTC
> Please let me know the network key you are using to analyse the sniffer PCAP
> logs.
Sent to your email.

> Also the FW dump has not captured any data. I tried the same insturctions at
> my end and I was able to capture the dump with the procedure mentioned in the
> wiki.
Can you please try S3 and see if it works on your side?
Apparent S3 breaks it.

> Can you please try once and capture the FW dump properly. Also please provide
> the trace-cmd log and fw dump for the same run.
The issue is much more easier to reproduce with S3 or right after boot.
Do you know how to set them up?
Comment 13 Antara 2020-11-03 11:56:43 UTC
Hi Kai-Heng Feng,

I tried to reproduce with AX200 but was not able to reproduce this scenario.

I suggest you follow below steps
1. start FW dump collection using below command
echo 1 > /sys/kernel/debug/iwlwifi/0000\:0X\:00.0/iwlmvm/fw_dbg_collect
2. start trace cmd after this
sudo trace-cmd record -e iwlwifi -e mac80211 -e cfg80211 -e iwlwifi_msg
3. run the failure scenario
4. stop trace cmd to creat trace.dat 
5. collect fw dump with below command
cat /sys/devices/virtual/devcoredump/devcdY/data > iwl.dump
echo 1 > /sys/devices/virtual/devcoredump/devcdY/data

Regards,
Antara
Comment 14 Antara 2020-11-03 12:01:32 UTC
Hi Kai-Heng Feng,

It seems this scenario is passing with old FW version according to below reply from Chen, Matt. can you please confirm.

Hi all,
 
As on-site Canonical Taipei office for this issue verification, we did pass it by taking backport-iwlwifi and 55.ucode for this platform.
It did reproduce the issue in the kernel v5.6 with 48.ucode.
More detail fro https://bugs.launchpad.net/bugs/1892838:
@Kai-Heng, FYI. We did do some comparison tests while Matt was on-site and noticed a remarked difference in test results. With the driver/firmware[1] that are built in the existing image, we could consistently see the failure within 100 cycles of disconnect/connect stress test, whilst with the driver/firmware pack in comment#70, it could pass 1000+ cycles.
 
[1] v5.6 kerenl
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/iwlwifi-Qu-c0-hr-b0-48.ucode
Comment 15 Kai-Heng Feng 2020-11-03 13:54:32 UTC
(In reply to Antara from comment #13)
> cat /sys/devices/virtual/devcoredump/devcdY/data > iwl.dump
> echo 1 > /sys/devices/f/devcoredump/devcdY/data
 
The dump is always 1460 byte, not matter what I did. Should I add some debug parameter to make firmware dump work?
Comment 16 Kai-Heng Feng 2020-11-03 13:55:16 UTC
(In reply to Antara from comment #14)
> Hi Kai-Heng Feng,
> 
> It seems this scenario is passing with old FW version according to below
> reply from Chen, Matt. can you please confirm.

It's still reproducible on all kernel/firmware combination.
Comment 17 Kai-Heng Feng 2020-11-05 08:07:33 UTC
(In reply to Kai-Heng Feng from comment #15)
> (In reply to Antara from comment #13)
> > cat /sys/devices/virtual/devcoredump/devcdY/data > iwl.dump
> > echo 1 > /sys/devices/f/devcoredump/devcdY/data
>  
> The dump is always 1460 byte, not matter what I did. Should I add some debug
> parameter to make firmware dump work?

On 5.10-rc2:
1) # echo 1 > /sys/kernel/debug/iwlwifi/0000\:00\:14.3/iwlmvm/fw_dbg_collect
2) Run iperf3
3) # cat /sys/devices/virtual/devcoredump/devcd*/data > iwl.dump
4) iwl.dump is still 1460 bytes.
Comment 18 Antara 2020-11-10 08:28:35 UTC
Hi Kai-Heng Feng

Can you please share the FW you are using. I'll try to generate the dump at my end.

Regards,
Antara
Comment 19 Kai-Heng Feng 2020-11-10 09:11:37 UTC
[   37.608368] iwlwifi 0000:00:14.3: loaded firmware version 59.601f3a66.0 QuZ-a0-hr-b0-59.ucode op_mode iwlmvm
Comment 20 Antara 2020-11-18 08:38:34 UTC
Hi Kai-Heng Feng,

I do not have the a0 HW and i was not able to reproduce using b0 HW.
Although I was able to collect the FW dump easily with the instruction given in wiki.

Can you please confrm that you followed the below step mentioned in the wiki:
"First you'll need to allow DEV_COREDUMP by setting CONFIG_ALLOW_DEV_COREDUMP to Y."

I checked my driver workspace and following flags are enabled in the .config in my workspace

CPTCFG_BACKPORTED_BPAUTO_WANT_DEV_COREDUMP=y
CPTCFG_BPAUTO_WANT_DEV_COREDUMP=y
CPTCFG_COREDUMP=y
CPTCFG_WANT_DEV_COREDUMP=y
CPTCFG_ALLOW_DEV_COREDUMP=y
CPTCFG_DEV_COREDUMP=y 

Please try to enable these flags and then try to collect FW dump.

Regards,
Antara
Comment 21 Kai-Heng Feng 2020-11-18 14:11:22 UTC
It's a bit different but they are enabled:
CONFIG_COREDUMP=y
CONFIG_WANT_DEV_COREDUMP=y
CONFIG_ALLOW_DEV_COREDUMP=y
CONFIG_DEV_COREDUMP=y
Comment 22 Kai-Heng Feng 2020-11-19 17:25:30 UTC
Still empty firmware dump with backport-iwlwifi:
$ cat .config | grep COREDUMP
CPTCFG_BPAUTO_WANT_DEV_COREDUMP=y
CPTCFG_COREDUMP=y
CPTCFG_WANT_DEV_COREDUMP=y
CPTCFG_ALLOW_DEV_COREDUMP=y
CPTCFG_DEV_COREDUMP=y
CPTCFG_BACKPORTED_BPAUTO_WANT_DEV_COREDUMP=y
Comment 23 Antara 2020-12-07 11:23:25 UTC
Hi Kai-Heng Feng

Without the FW dump I won't be able to debug any further.
Please try once with the attached FW.

Regards,
Antara
Comment 24 Antara 2020-12-07 11:24:22 UTC
Created attachment 293975 [details]
FW ver 62
Comment 25 Kai-Heng Feng 2020-12-10 05:21:21 UTC
I had to bump IWL_22000_UCODE_API_MAX from 59 to 62, but this issue is still reproducible with FW 62.

And again, it still can't create FW dump.

Have you tried this FW locally and confirmed it can generate FW dump?
Comment 26 Antara 2020-12-11 07:25:13 UTC
Hi Kai-Heng Feng,

I don't have a0 HW available with me. I tried with b0 HW and was able to collect the FW dump properly. I had shared the FW from the same build so it should work.

Regards,
Antara
Comment 27 Kai-Heng Feng 2020-12-11 07:30:16 UTC
Well, what's the difference between a0 and b0 HW? Would it be possible for you to access a0 HW? It's an Intel product after all...

Note You need to log in before you can comment on or make changes to this bug.