Bug 215703 - ath9k frequent connection problems after kernel upgrade.
Summary: ath9k frequent connection problems after kernel upgrade.
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: networking_wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-19 17:55 UTC by Adilson Dantas
Modified: 2022-04-08 18:43 UTC (History)
17 users (show)

See Also:
Kernel Version: 5.16.15 and higher
Tree: Mainline
Regression: No


Attachments

Description Adilson Dantas 2022-03-19 17:55:53 UTC
I have found some problems with an Atheros Wireless since I upgraded from kernel 5.16.14 to 5.16.15. The connection becomes very unstable even when I was near the wifi router. I got some messages like this.

Mar 19 14:17:51 r2d2 systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Mar 19 14:19:32 r2d2 wpa_supplicant[911]: wlp1s0: CTRL-EVENT-BEACON-LOSS 
Mar 19 14:19:33 r2d2 wpa_supplicant[911]: wlp1s0: CTRL-EVENT-BEACON-LOSS 
Mar 19 14:19:34 r2d2 wpa_supplicant[911]: wlp1s0: CTRL-EVENT-BEACON-LOSS 
Mar 19 14:19:35 r2d2 wpa_supplicant[911]: wlp1s0: CTRL-EVENT-BEACON-LOSS 
Mar 19 14:19:36 r2d2 wpa_supplicant[911]: wlp1s0: CTRL-EVENT-BEACON-LOSS 
Mar 19 14:19:36 r2d2 wpa_supplicant[911]: wlp1s0: CTRL-EVENT-DISCONNECTED bssid=50:d4:f7:71:cc:dc reason=4 locally_generated=1
Mar 19 14:19:36 r2d2 wpa_supplicant[911]: wlp1s0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
Mar 19 14:19:36 r2d2 NetworkManager[910]: <info>  [1647710376.5175] device (wlp1s0): supplicant interface state: completed -> disconnected

The network controller used is:  Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter (rev 01) on a Dell notebook.

The same problem occours with 5.16.16 but everything is normal with 5.16.14 and below.
Comment 1 p.kerry 2022-03-19 22:12:08 UTC
It's the same issue on a HP laptop with the same ath9k AR9565 which has worked perfectly with the same setup for years.
The same change that affects 5.16.15 was also incorporated into the 5.15 tree: 5.15.28 works, whereas 5.15.29 and 5.15.30 don't.

Regards
Paul
Comment 2 jakalof 2022-03-20 17:54:12 UTC
I have the exact same issue but on LTS kernel 5.15.29

Same module ath9k

Same issue with wpa_supplicant

Not exactly sure when this started happening but its definitely very recently; say in the last week or so.

wpa_supplicant sometimes will recover itself and reconnect; but overall its a total train wreck.  

Seems like its heavy load/traffic which triggers these events.  Its noticeable with light browsing where there are substantial slowdowns in page loading etc; but heavy loads brings total disconnect.

I do think it may be related to signal strength; whatever has changed -- it took weak signals that were working before and turned them into signals over which connections are no longer possible.
Comment 3 gzhqyz 2022-03-21 17:30:50 UTC
Similar issue was reported here[1]. Reverting these two commits solves my problem.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13
swiotlb: rework "fix info leak with DMA_FROM_DEVICE"

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e
swiotlb: fix info leak with DMA_FROM_DEVICE

[1] https://bugs.gentoo.org/835513
Comment 4 Adilson Dantas 2022-03-21 19:58:15 UTC
(In reply to gzhqyz from comment #3)
> Similar issue was reported here[1]. Reverting these two commits solves my
> problem.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13
> swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e
> swiotlb: fix info leak with DMA_FROM_DEVICE
> 
> [1] https://bugs.gentoo.org/835513

Kernel 5.17 is also affected by this issue. But applying gzhqyz solution solves it.
Comment 5 Gerardo Exequiel Pozzi 2022-03-22 02:27:10 UTC
I have an issue that I guess is related to the same root cause of this report. But my Wifi card is working in AP mode. It stop responding until hostapd restart, then again become responsive again for few random seconds or minutes. When trying to connect from devices, hostapd does not log anything.

PCIe card with AR9485 (TP-LINK TL-WN781ND V2.2).

Linux 5.16.15 or 5.15.30

I can see much more frequent messages of: "ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02000020 DMADBG_7=0x00006100"
Comment 6 SlayerProof32 2022-03-22 09:39:32 UTC
I have a similar issue with a AR5B22 [Ar9462 + BT4.0] wifi card. I didn’t have much time to debug and I currently don’t have the computer with me but I’ll upload logs when I can as well as see if reverting the commits changes anything
Comment 7 jakalof 2022-03-22 11:53:48 UTC
(In reply to Adilson Dantas from comment #4)
> (In reply to gzhqyz from comment #3)
> > Similar issue was reported here[1]. Reverting these two commits solves my
> > problem.
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > ?id=aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13
> > swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > ?id=ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e
> > swiotlb: fix info leak with DMA_FROM_DEVICE
> > 
> > [1] https://bugs.gentoo.org/835513
> 
> Kernel 5.17 is also affected by this issue. But applying gzhqyz solution
> solves it.

Thanks @gzhqyz and Adilson Dantas.

I successfully "unpatched" 5.15.30 and the issue went away for me.  Much appreciation for your posts.

If this will help anyone; here is what I did on arch in PKGBUILD prepare()

  patch -Rp1 -i ../aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13.patch
  patch -Rp1 -i ../ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e.patch


My wireless device and module lspci

03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01)
        Subsystem: Qualcomm Atheros Device 3112
        Flags: bus master, fast devsel, latency 0, IRQ 19
        Memory at f7e00000 (64-bit, non-prefetchable) [size=128K]
        Expansion ROM at f7e20000 [disabled] [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: ath9k
        Kernel modules: ath9k
Comment 8 SlayerProof32 2022-03-23 23:59:46 UTC
We are also documenting some of this behavior over here: https://bugzilla.kernel.org/show_bug.cgi?id=215698
This bug is specific to the Ar9462 but could be showing the same issues.
Comment 9 Justin M. Forbes 2022-03-24 13:09:30 UTC
I might recommend that reverting CVE patches is not a long term fix. The patches, or the driver need to be fixed.
Comment 10 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-25 09:23:00 UTC
TWIMC, the Linux developers are discussing a patch for mainline that seems to help, in case anyone wants to give it a shot:

https://lore.kernel.org/stable/871qyr9t4e.fsf@toke.dk/
Comment 11 jakalof 2022-03-25 20:10:07 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #10)
> TWIMC, the Linux developers are discussing a patch for mainline that seems
> to help, in case anyone wants to give it a shot:
> 
> https://lore.kernel.org/stable/871qyr9t4e.fsf@toke.dk/

Thanks for the efforts.  Much appreciation.


I tried this with no success:( on 5.16.16 through the archlinux build system.
Initially there is wifi; but as soon as I start using it the connect is lost.


$ uname -a Linux i5 5.16.16-arch1-1-ath9k-test #1 SMP PREEMPT Fri, 25 Mar 2022
17:12:45 +0000 x86_64 GNU/Linux


I had some trouble applying patch directly and ended up manually applying the
changes to "recv.c"; I worry that there are other changes elsewhere since I am
not working with same branch as kernel developers - perhaps that made for the
negative results.  However, still hoping this information may be helpful.  I
may try again but since I'm learning all this as I go its slow going.


Still getting the same bad vibes from network and wpa_supplicant through
systemctl status.  Just as in the OP.  

Mar 25 13:42:48 i5 systemd-networkd[274]: wlp3s0: Gained carrier
Mar 25 13:44:21 i5 systemd-networkd[274]: wlp3s0: Lost carrier
Mar 25 13:44:24 i5 systemd-networkd[274]: wlp3s0: Connected WiFi access point: my ssid
Mar 25 13:44:24 i5 systemd-networkd[274]: wlp3s0: Gained carrier
Mar 25 13:46:13 i5 systemd-networkd[274]: wlp3s0: Lost carrier
Mar 25 13:46:16 i5 systemd-networkd[274]: wlp3s0: Connected WiFi access point: my ssid
Mar 25 13:46:16 i5 systemd-networkd[274]: wlp3s0: Gained carrier
Mar 25 13:46:30 i5 systemd-networkd[274]: wlp3s0: Lost carrier
Mar 25 13:46:33 i5 systemd-networkd[274]: wlp3s0: Connected WiFi access point: my ssid
Mar 25 13:46:33 i5 systemd-networkd[274]: wlp3s0: Gained carrier

Mar 25 13:46:18 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS
Mar 25 13:46:20 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS
Mar 25 13:46:22 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS
Mar 25 13:46:24 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS
Mar 25 13:46:26 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS
Mar 25 13:46:28 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS
Mar 25 13:46:30 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS
Mar 25 13:46:30 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-DISCONNECTED bssid=....

Mar 25 13:46:30 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
Comment 12 jakalof 2022-03-25 23:39:32 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #10)
> TWIMC, the Linux developers are discussing a patch for mainline that seems
> to help, in case anyone wants to give it a shot:
> 
> https://lore.kernel.org/stable/871qyr9t4e.fsf@toke.dk/

Thanks for the efforts.  Much appreciation.

My comment 11 was INCORRECT -- turns out my build (relating to comment 11) did not have patch applied.

This time around I successfully applied the patch and the wireless issue is gone.

$ uname -a
Linux i5 5.15.31-1-lts-modprobe #1 SMP Fri, 25 Mar 2022 22:50:02 +0000 x86_64 GNU/Linux

Thanks so much for your work.
Comment 13 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-27 09:41:51 UTC
TWIMC: the change that afaics is causing this regression was reverted in mainline:
https://git.kernel.org/torvalds/c/bddac7c1e02ba47f0570e494c9289acea3062cc1

The revert will likely be backported to the next (due today) or over-next kernel versions released in affected stable and long series.
Comment 14 Adilson Dantas 2022-03-28 14:45:33 UTC
5.17.1 still have this issue. I had to patch it to get Wifi working again.

Maybe this last revert will be available on 5.17.2
Comment 15 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-03-28 14:51:35 UTC
The revert is queued for 5.17.2 (among others):
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.17/revert-swiotlb-rework-fix-info-leak-with-dma_from_device.patch
Comment 16 Nikita 2022-03-30 02:52:49 UTC
On ArchLinux in linux-zen 5.17.1.zen1-1 kernel my adapter started working fine.
Comment 17 p.kerry 2022-03-30 09:19:05 UTC
@Nikita - Arch reverted commit aa6f8dc (see above comment #3) via...
https://github.com/archlinux/linux/commits/v5.17.1-arch1
and then pushed out 5.17.1-arch1-1 and I presume it's the same for the linux-zen kernel you are using.
As far as I can see, this revert hasn't been applied to the Arch lts kernel.
https://bugs.archlinux.org/task/74187
Other distros may/may not choose to revert while waiting for the relevant fixes.
Comment 18 Lucas Lira 2022-03-30 23:26:53 UTC
On ArchLinux in linux-zen 5.17.1.zen1-1 kernel my adapter started working fine, but after long time online the connection get slow...
Comment 19 IAteNoodles-Linux 2022-03-31 04:33:56 UTC
I had the same problem after upgrading to 5.17, I just used the arch archive repo to get 5.16.14 which didn't had the regression.
Comment 20 jann_ 2022-03-31 14:01:22 UTC
Hello, @IAteNoodles-Linux you could show the steps to install said file kernel
Comment 21 IAteNoodles-Linux 2022-04-03 16:15:48 UTC
(In reply to jann_ from comment #20)
> Hello, @IAteNoodles-Linux you could show the steps to install said file
> kernel

The kernel 5.17.1 has this regression fixed, but you can read about the arch archive repo at the wiki.


pacman -U https://archive.archlinux.org/packages/path/packagename.pkg.tar.xz

so, what I did was


pacman -U https://archive.archlinux.org/packages/l/linux/linux-5.16.14.arch1-1-x86_64.pkg.tar.zst

Sorry for this weird post, i couldn't format the commands
Comment 22 jann_ 2022-04-03 20:46:39 UTC
@IAteNoodles-Linux thank you very much, I help me a lot this
Comment 23 Nicolás Adamo 2022-04-08 00:13:49 UTC
Kernel 5.15.32-1-lts still shows this bug. Does anyone know which LTS version will fix that regression?
Comment 24 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-04-08 05:35:00 UTC
(In reply to Nicolás Adamo from comment #23)
> Kernel 5.15.32-1-lts still shows this bug. Does anyone know which LTS
> version will fix that regression?

See comment 13, it just became the "overnext" release of various series. In the 5.15.y case that 5.15.33 that should be out soon:

https://lore.kernel.org/stable/75067e12-9501-8603-b008-24284f47f8c0@w6rz.net/
Comment 25 Adilson Dantas 2022-04-08 18:43:59 UTC
No problems found with the recent 5.17.2.

Since this regression was fixed I will mark this bug as resolved.

Note You need to log in before you can comment on or make changes to this bug.