Bug 215698 - ath9k —> Wifi randomly disconnects [Ar5B22/AR9462]
Summary: ath9k —> Wifi randomly disconnects [Ar5B22/AR9462]
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-18 08:54 UTC by SlayerProof32
Modified: 2022-04-29 14:00 UTC (History)
8 users (show)

See Also:
Kernel Version: 5.16.5 (latest)
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description SlayerProof32 2022-03-18 08:54:45 UTC
When connecting to wifi, my AR9462 wifi chip disconnects randomly. First after about 2 to 3 minutes of use, then every 1 or so minutes afterwards. This has happened on multiple wifi networks (public and private), multiple pieces of the same hardware (I once had 3 of these laying around and all exhibited this exact issue) and multiple distros (Ubuntu, Manjaro, Endeavouros, Garuda, Arch). I have noticed this bug since around 2019 and am not sure if it existed before then. 

Here is the relevant dmesg output

'''[ 2230.918630] wlp4s0: authenticate with d6:32:e5:3a:6a:a3
[ 2230.928671] wlp4s0: send auth to d6:32:e5:3a:6a:a3 (try 1/3)
[ 2230.955079] wlp4s0: authenticated
[ 2230.957070] wlp4s0: associate with d6:32:e5:3a:6a:a3 (try 1/3)
[ 2231.075076] wlp4s0: associate with d6:32:e5:3a:6a:a3 (try 2/3)
[ 2231.146174] wlp4s0: RX AssocResp from d6:32:e5:3a:6a:a3 (capab=0x1431 status=0 aid=3)
[ 2231.146259] wlp4s0: associated
[ 2231.146365] ath: EEPROM regdomain: 0x8348
[ 2231.146366] ath: EEPROM indicates we should expect a country code
[ 2231.146367] ath: doing EEPROM country->regdmn map search
[ 2231.146368] ath: country maps to regdmn code: 0x3a
[ 2231.146369] ath: Country alpha2 being used: US
[ 2231.146370] ath: Regpair used: 0x3a
[ 2231.146371] ath: regdomain 0x8348 dynamically updated by country element
[ 2231.176910] wlp4s0: Limiting TX power to 30 (30 - 0) dBm as advertised by d6:32:e5:3a:6a:a3
[ 2234.055472] wlp4s0: authenticate with ba:95:75:6f:29:bb
[ 2234.066784] wlp4s0: send auth to ba:95:75:6f:29:bb (try 1/3)
[ 2234.118290] wlp4s0: authenticated
[ 2234.118919] wlp4s0: associate with ba:95:75:6f:29:bb (try 1/3)
[ 2234.226910] wlp4s0: associate with ba:95:75:6f:29:bb (try 2/3)
[ 2234.291656] wlp4s0: RX AssocResp from ba:95:75:6f:29:bb (capab=0x1431 status=0 aid=1)
[ 2234.291736] wlp4s0: associated
[ 2234.291804] ath: EEPROM regdomain: 0x8348
[ 2234.291805] ath: EEPROM indicates we should expect a country code
[ 2234.291806] ath: doing EEPROM country->regdmn map search
[ 2234.291806] ath: country maps to regdmn code: 0x3a
[ 2234.291807] ath: Country alpha2 being used: US
[ 2234.291808] ath: Regpair used: 0x3a
[ 2234.291808] ath: regdomain 0x8348 dynamically updated by country element
[ 2234.511544] wlp4s0: Limiting TX power to 30 (30 - 0) dBm as advertised by ba:95:75:6f:29:bb
[ 2236.565234] kauditd_printk_skb: 1 callbacks suppressed
[ 2236.565236] audit: type=1130 audit(1647592179.050:420): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 2243.953987] wlp4s0: authenticate with d6:32:e5:3a:6a:a3
[ 2243.964052] wlp4s0: send auth to d6:32:e5:3a:6a:a3 (try 1/3)
[ 2244.130437] wlp4s0: send auth to d6:32:e5:3a:6a:a3 (try 2/3)
[ 2244.150424] wlp4s0: authenticated
[ 2244.152496] wlp4s0: associate with d6:32:e5:3a:6a:a3 (try 1/3)
[ 2244.258458] wlp4s0: associate with d6:32:e5:3a:6a:a3 (try 2/3)
[ 2244.334186] wlp4s0: RX AssocResp from d6:32:e5:3a:6a:a3 (capab=0x1431 status=0 aid=3)
[ 2244.334276] wlp4s0: associated
[ 2244.334358] ath: EEPROM regdomain: 0x8348
[ 2244.334360] ath: EEPROM indicates we should expect a country code
[ 2244.334361] ath: doing EEPROM country->regdmn map search
[ 2244.334361] ath: country maps to regdmn code: 0x3a
[ 2244.334362] ath: Country alpha2 being used: US
[ 2244.334363] ath: Regpair used: 0x3a
[ 2244.334364] ath: regdomain 0x8348 dynamically updated by country element
[ 2244.390100] wlp4s0: Limiting TX power to 30 (30 - 0) dBm as advertised by d6:32:e5:3a:6a:a3
[ 2246.596982] audit: type=1131 audit(1647592189.083:421): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 2246.929857] wlp4s0: authenticate with ba:95:75:6f:45:68
'''

This exact same sequence of messages occurs in my dmesg 6 or 7 times after only have been booted up long enough to write this bug report. It gets worse with heavy netowrk usage.


Heres hwinfo --netcard 

0: PCI 400.0: 0282 WLAN controller
[Created at pci.386]
Unique ID: YVtp.3rLvyknhi0A
Parent ID: 0Rrv.EfzaRdvLjy5
SysFS ID: /devices/pci0000:00/0000:00:1c.7/0000:04:00.0
SysFS BusID: 0000:04:00.0
Hardware Class: network
Model: "Qualcomm Atheros AR9462 Wireless Network Adapter"
Vendor: pci 0x168c "Qualcomm Atheros"
Device: pci 0x0034 "AR9462 Wireless Network Adapter"
SubVendor: pci 0x1028 "Dell"
SubDevice: pci 0x020a
Revision: 0x01
Driver: "ath9k"
Driver Modules: "ath9k", "ath9k"
Device File: wlp4s0
Features: WLAN
Memory Range: 0xf7100000-0xf717ffff (rw,non-prefetchable)
Memory Range: 0xf7180000-0xf718ffff (ro,non-prefetchable,disabled)
IRQ: 19 (14653398 events)
HW Address: 6a:4e:f8:70:84:de
Permanent HW Address: 74:29:af:5a:26:c9
Link detected: no
WLAN channels: 1 2 3 4 5 6 7 8 9 10 11 36 40 44 48 52 56 60 64 100 104 108 112 116 120 124 128 132 136 140 149 153
WLAN frequencies: 2.412 2.417 2.422 2.427 2.432 2.437 2.442 2.447 2.452 2.457 2.462 5.18 5.2 5.22 5.24 5.26 5.28 5.3 5.32 5.5 5.52 5.54 5.56 5.58 5.6 5.62 5.64 5.66 5.68 5.7 5.745 5.765
WLAN encryption modes: WEP40 WEP104 TKIP CCMP
WLAN authentication modes: open sharedkey wpa-psk wpa-eap
Module Alias: "pci:v0000168Cd00000034sv00001028sd0000020Abc02sc80i00"
Driver Info #0:
Driver Status: ath9k is active
Driver Activation Cmd: "modprobe ath9k"
Config Status: cfg=new, avail=yes, need=no, active=unknown
Attached to: #11 (PCI bridge)
Comment 1 SlayerProof32 2022-03-18 09:12:45 UTC
Dmesg after restricting card to a specific BSSID. This doesn't fix anything.

[ 4701.065410] wlp4s0: authenticate with ba:95:75:6f:45:68
[ 4701.075319] wlp4s0: send auth to ba:95:75:6f:45:68 (try 1/3)
[ 4701.666707] wlp4s0: send auth to ba:95:75:6f:45:68 (try 2/3)
[ 4701.709739] wlp4s0: authenticated
[ 4701.710688] wlp4s0: associate with ba:95:75:6f:45:68 (try 1/3)
[ 4701.818738] wlp4s0: associate with ba:95:75:6f:45:68 (try 2/3)
[ 4701.886129] wlp4s0: RX AssocResp from ba:95:75:6f:45:68 (capab=0x1431 status=0 aid=1)
[ 4701.886240] wlp4s0: associated
[ 4701.886314] ath: EEPROM regdomain: 0x8348
[ 4701.886315] ath: EEPROM indicates we should expect a country code
[ 4701.886316] ath: doing EEPROM country->regdmn map search
[ 4701.886316] ath: country maps to regdmn code: 0x3a
[ 4701.886317] ath: Country alpha2 being used: US
[ 4701.886318] ath: Regpair used: 0x3a
[ 4701.886319] ath: regdomain 0x8348 dynamically updated by country element
[ 4702.287208] wlp4s0: Limiting TX power to 30 (30 - 0) dBm as advertised by ba:95:75:6f:45:68
[ 4704.161277] kauditd_printk_skb: 1 callbacks suppressed
[ 4704.161280] audit: type=1130 audit(1647594646.727:672): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 4705.945121] wlp4s0: authenticate with ba:95:75:6f:45:68
[ 4705.955124] wlp4s0: send auth to ba:95:75:6f:45:68 (try 1/3)
[ 4706.011160] wlp4s0: authenticated
[ 4706.011528] wlp4s0: associate with ba:95:75:6f:45:68 (try 1/3)
[ 4706.114582] wlp4s0: associate with ba:95:75:6f:45:68 (try 2/3)
[ 4706.188402] wlp4s0: RX AssocResp from ba:95:75:6f:45:68 (capab=0x1431 status=0 aid=1)
[ 4706.188535] wlp4s0: associated
[ 4706.188623] ath: EEPROM regdomain: 0x8348
[ 4706.188625] ath: EEPROM indicates we should expect a country code
[ 4706.188626] ath: doing EEPROM country->regdmn map search
[ 4706.188627] ath: country maps to regdmn code: 0x3a
[ 4706.188628] ath: Country alpha2 being used: US
[ 4706.188629] ath: Regpair used: 0x3a
[ 4706.188630] ath: regdomain 0x8348 dynamically updated by country element
[ 4706.280681] wlp4s0: Limiting TX power to 30 (30 - 0) dBm as advertised by ba:95:75:6f:45:68
[ 4708.819971] wlp4s0: authenticate with ba:95:75:6f:45:68
[ 4708.830015] wlp4s0: send auth to ba:95:75:6f:45:68 (try 1/3)
[ 4708.970474] wlp4s0: send auth to ba:95:75:6f:45:68 (try 2/3)
[ 4708.977999] wlp4s0: authenticated
[ 4708.978416] wlp4s0: associate with ba:95:75:6f:45:68 (try 1/3)
[ 4709.082470] wlp4s0: associate with ba:95:75:6f:45:68 (try 2/3)
[ 4709.154774] wlp4s0: RX AssocResp from ba:95:75:6f:45:68 (capab=0x1431 status=0 aid=1)
[ 4709.154875] wlp4s0: associated
[ 4709.154945] ath: EEPROM regdomain: 0x8348
[ 4709.154946] ath: EEPROM indicates we should expect a country code
[ 4709.154947] ath: doing EEPROM country->regdmn map search
[ 4709.154947] ath: country maps to regdmn code: 0x3a
[ 4709.154948] ath: Country alpha2 being used: US
[ 4709.154949] ath: Regpair used: 0x3a
[ 4709.154949] ath: regdomain 0x8348 dynamically updated by country element
[ 4709.250440] wlp4s0: Limiting TX power to 30 (30 - 0) dBm as advertised by ba:95:75:6f:45:68
[ 4716.017853] audit: type=1101 audit(1647594658.583:673): pid=114559 uid=1000 auid=1000 ses=2 msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time acct="family" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
[ 4716.017971] audit: type=1110 audit(1647594658.583:674): pid=114559 uid=1000 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1res=success'
[ 4716.019498] audit: type=1105 audit(1647594658.585:675): pid=114559 uid=1000 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/d
Comment 2 SlayerProof32 2022-03-18 20:01:14 UTC
Update: I tried loading with nohwcrypt=1 and that did not fix the issue
Comment 3 alexander 2022-03-21 13:01:59 UTC
The most strange thing I see here is, that you only have authentication/association messages without any de-authentication line in between. And how you can get any data transmitted with only seconds between the associations also remains an question...

Can you proved debug logs from wpa_supplicant, so we can see what it's going on here?

Assuming you use wpa_supplicant with dbus, the most common combination:
sudo dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 \
/fi/w1/wpa_supplicant1 \
org.freedesktop.DBus.Properties.Set \
string:fi.w1.wpa_supplicant1 \
string:DebugLevel \
variant:string:"msgdump"

That will write tons of debug information into the log and we should see better what's going on here.
Comment 4 gzhqyz 2022-03-21 17:36:53 UTC
Hi, @SlayerProof32 

The latest release of 5.16 kernel series should be 5.16.15 when you reported, not 5.16.5.

It could be a regression if you are using 5.16.15, see https://bugzilla.kernel.org/show_bug.cgi?id=215703
Comment 5 SlayerProof32 2022-03-22 09:35:09 UTC
I’m currently not at the computer I used to test this, however if a cause is not determined by the time I get back, I will upload the wpa_supplicant logs ASAP.
Comment 6 SlayerProof32 2022-03-22 09:36:03 UTC
Is it possible you can change the status to [needinfo] to reflect the fact that I must upload the logs before any progress can be made.
Comment 7 Esteban López 2022-03-23 22:29:49 UTC
I get the same kind of repetitive errors and disconnections in kernels 5.15.31 and 5.17.0 and using iwd. Here is one of them:

[ +31.434515] wlan0: authenticate with 98:42:65:62:5c:e4
[  +0.014544] wlan0: send auth to 98:42:65:62:5c:e4 (try 1/3)
[  +0.001078] wlan0: authenticated
[  +0.001648] wlan0: associate with 98:42:65:62:5c:e4 (try 1/3)
[  +0.001387] wlan0: RX AssocResp from 98:42:65:62:5c:e4 (capab=0x11 status=0 aid=1)
[  +0.000137] wlan0: associated
[  +0.004503] ath: EEPROM regdomain: 0x82d4
[  +0.000005] ath: EEPROM indicates we should expect a country code
[  +0.000001] ath: doing EEPROM country->regdmn map search
[  +0.000000] ath: country maps to regdmn code: 0x37
[  +0.000001] ath: Country alpha2 being used: ES
[  +0.000001] ath: Regpair used: 0x37
[  +0.000000] ath: regdomain 0x82d4 dynamically updated by country element
[  +0.000416] wlan0: Limiting TX power to 23 (23 - 0) dBm as advertised by 98:42:65:62:5c:e4

I also have an Atheros AR9462 card.
Comment 8 Esteban López 2022-03-23 22:49:07 UTC
Also, at kernel version 5.15.26 this doesn't happen, I started noticing this issue when I updated to 5.15.31 and 5.17.0
Comment 9 SlayerProof32 2022-03-23 23:58:30 UTC
Is it possible you can run 

sudo dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 \
/fi/w1/wpa_supplicant1 \
org.freedesktop.DBus.Properties.Set \
string:fi.w1.wpa_supplicant1 \
string:DebugLevel \
variant:string:"msgdump" >wpa_supplicant.log and upload?

It may help show what the issue is. I will run it once I have access to the computer I put that chip in again.

Also check out this bug 

https://bugzilla.kernel.org/show_bug.cgi?id=215703
Comment 10 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-25 09:22:48 UTC
TWIMC, the Linux developers are discussing a patch for mainline that seems to help, in case anyone wants to give it a shot:

https://lore.kernel.org/stable/871qyr9t4e.fsf@toke.dk/
Comment 11 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-27 09:41:56 UTC
TWIMC: the change that afaics is causing this regression was reverted in mainline:
https://git.kernel.org/torvalds/c/bddac7c1e02ba47f0570e494c9289acea3062cc1

The revert will likely be backported to the next (due today) or over-next kernel versions released in affected stable and long series.
Comment 12 Deposite Pirate 2022-03-28 08:00:20 UTC
I can confirm this problem with M.2 "05:00.0 Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01)". I had the problem a while ago, then it disappeared for months and now it's back since about two weeks.
Comment 13 Daniel Calcoen 2022-04-28 22:30:28 UTC
my hardware : laptop Asus N751JK Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01) Subsystem: AzureWave AR9462 Wireless Network Adapter

I was runing Ubuntu 20.10, 21.04 up to Ubuntu 21.10 with Kernel 5.13.0 without any problem.
When migrated to Ubuntu 22.04 which installs Kernel 5.15 then the problems started.
I upgraded and tested unsuccessfully the following Kernels : 5.15.28, 5.15.36, 5.16.14, 5.16.20, 5.17.4, 5.17.5

I disabled the power save as explained at https://askubuntu.com/questions/1403773/22-04-wifi-keeps-disconnecting-for-a-few-seconds-frequently/1404471#1404471
without improvement in any of the Kernels tested.
Also the "intel_iommu=off" mentioned at https://groups.google.com/g/linux.debian.kernel/c/RFpPIp0cncA/m/v9ELDxLICgAJ Bug#994590 didn't help.

I downgraded and currently running without any problem using the Kernel 5.13.19-051319-generic.

There is no problem in my airpoint (router) nor my internet access, I have other different computers and devices working correctly on wifi.
In addition my N751JK works correctly when using the Ethernet adapter for the physical cable.
Comment 14 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-04-29 05:49:12 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #11)
> TWIMC: the change that afaics is causing this regression was reverted in
> mainline:
> https://git.kernel.org/torvalds/c/bddac7c1e02ba47f0570e494c9289acea3062cc1

FWIW, this ticket afaics gets really confusing. It's unclear if some of the comments here are about the problem discussed in https://bugzilla.kernel.org/show_bug.cgi?id=215703 that is fixed by the patch mentioned in the quote above and present in the latest kernels series that were affected. Then there are comments about problems with heavily modified distro kernels (like the one in Ubuntu), which often are more confusing then helpful, because the upstream developers don't know if the problem might be caused by one of the many changes the distributor applied to their kernel. That's why the front-page of this server states that issues with such kernels need to be filed in the distros bug tracker.

In the end I'd suggest to close this ticket. If there is someone that still has trouble with random disconnects on the latest vanilla kernel I'd suggest to file a new ticket and drop a link here, so those that others still affected can follow there and provide a update. It would also help to clarify in that ticket when this problem started to happen (ideally with a bisection to identify the change that causes this), as developers have to fix such identified regressions quickly.
Comment 15 Daniel Calcoen 2022-04-29 09:16:32 UTC
as suggested I open ticket https://bugzilla.kernel.org/show_bug.cgi?id=215918
dedicated to collect only for Ubuntu flavor of kernels
Comment 16 Deposite Pirate 2022-04-29 13:56:35 UTC
Hello, I can confirm that the problem is fixed with Linux 5.16.20. Thank you very much for fixing it.
Comment 17 Daniel Calcoen 2022-04-29 14:00:01 UTC
for the Ubuntu kernel 5.16.20 continue to show the problem
follow up at https://bugzilla.kernel.org/show_bug.cgi?id=215918

Note You need to log in before you can comment on or make changes to this bug.