Bug 215703
Summary: | ath9k frequent connection problems after kernel upgrade. | ||
---|---|---|---|
Product: | Networking | Reporter: | Adilson Dantas (adilson) |
Component: | Wireless | Assignee: | networking_wireless (networking_wireless) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | 7opb228k3, abc, aivaraslaimikis, daniel.calcoen, gzhqyz, jakalof, jmforbes, kernel, kortrax11, lucasgta95, nation.koteyka, nicoadamo, p.kerry, regressions, rotering, sugey90210, vmlinuz386, webreg |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 5.16.15 and higher | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Adilson Dantas
2022-03-19 17:55:53 UTC
It's the same issue on a HP laptop with the same ath9k AR9565 which has worked perfectly with the same setup for years. The same change that affects 5.16.15 was also incorporated into the 5.15 tree: 5.15.28 works, whereas 5.15.29 and 5.15.30 don't. Regards Paul I have the exact same issue but on LTS kernel 5.15.29 Same module ath9k Same issue with wpa_supplicant Not exactly sure when this started happening but its definitely very recently; say in the last week or so. wpa_supplicant sometimes will recover itself and reconnect; but overall its a total train wreck. Seems like its heavy load/traffic which triggers these events. Its noticeable with light browsing where there are substantial slowdowns in page loading etc; but heavy loads brings total disconnect. I do think it may be related to signal strength; whatever has changed -- it took weak signals that were working before and turned them into signals over which connections are no longer possible. Similar issue was reported here[1]. Reverting these two commits solves my problem. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 swiotlb: rework "fix info leak with DMA_FROM_DEVICE" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e swiotlb: fix info leak with DMA_FROM_DEVICE [1] https://bugs.gentoo.org/835513 (In reply to gzhqyz from comment #3) > Similar issue was reported here[1]. Reverting these two commits solves my > problem. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 > swiotlb: rework "fix info leak with DMA_FROM_DEVICE" > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e > swiotlb: fix info leak with DMA_FROM_DEVICE > > [1] https://bugs.gentoo.org/835513 Kernel 5.17 is also affected by this issue. But applying gzhqyz solution solves it. I have an issue that I guess is related to the same root cause of this report. But my Wifi card is working in AP mode. It stop responding until hostapd restart, then again become responsive again for few random seconds or minutes. When trying to connect from devices, hostapd does not log anything. PCIe card with AR9485 (TP-LINK TL-WN781ND V2.2). Linux 5.16.15 or 5.15.30 I can see much more frequent messages of: "ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02000020 DMADBG_7=0x00006100" I have a similar issue with a AR5B22 [Ar9462 + BT4.0] wifi card. I didn’t have much time to debug and I currently don’t have the computer with me but I’ll upload logs when I can as well as see if reverting the commits changes anything (In reply to Adilson Dantas from comment #4) > (In reply to gzhqyz from comment #3) > > Similar issue was reported here[1]. Reverting these two commits solves my > > problem. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > > ?id=aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 > > swiotlb: rework "fix info leak with DMA_FROM_DEVICE" > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > > ?id=ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e > > swiotlb: fix info leak with DMA_FROM_DEVICE > > > > [1] https://bugs.gentoo.org/835513 > > Kernel 5.17 is also affected by this issue. But applying gzhqyz solution > solves it. Thanks @gzhqyz and Adilson Dantas. I successfully "unpatched" 5.15.30 and the issue went away for me. Much appreciation for your posts. If this will help anyone; here is what I did on arch in PKGBUILD prepare() patch -Rp1 -i ../aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13.patch patch -Rp1 -i ../ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e.patch My wireless device and module lspci 03:00.0 Network controller: Qualcomm Atheros AR93xx Wireless Network Adapter (rev 01) Subsystem: Qualcomm Atheros Device 3112 Flags: bus master, fast devsel, latency 0, IRQ 19 Memory at f7e00000 (64-bit, non-prefetchable) [size=128K] Expansion ROM at f7e20000 [disabled] [size=64K] Capabilities: <access denied> Kernel driver in use: ath9k Kernel modules: ath9k We are also documenting some of this behavior over here: https://bugzilla.kernel.org/show_bug.cgi?id=215698 This bug is specific to the Ar9462 but could be showing the same issues. I might recommend that reverting CVE patches is not a long term fix. The patches, or the driver need to be fixed. TWIMC, the Linux developers are discussing a patch for mainline that seems to help, in case anyone wants to give it a shot: https://lore.kernel.org/stable/871qyr9t4e.fsf@toke.dk/ (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #10) > TWIMC, the Linux developers are discussing a patch for mainline that seems > to help, in case anyone wants to give it a shot: > > https://lore.kernel.org/stable/871qyr9t4e.fsf@toke.dk/ Thanks for the efforts. Much appreciation. I tried this with no success:( on 5.16.16 through the archlinux build system. Initially there is wifi; but as soon as I start using it the connect is lost. $ uname -a Linux i5 5.16.16-arch1-1-ath9k-test #1 SMP PREEMPT Fri, 25 Mar 2022 17:12:45 +0000 x86_64 GNU/Linux I had some trouble applying patch directly and ended up manually applying the changes to "recv.c"; I worry that there are other changes elsewhere since I am not working with same branch as kernel developers - perhaps that made for the negative results. However, still hoping this information may be helpful. I may try again but since I'm learning all this as I go its slow going. Still getting the same bad vibes from network and wpa_supplicant through systemctl status. Just as in the OP. Mar 25 13:42:48 i5 systemd-networkd[274]: wlp3s0: Gained carrier Mar 25 13:44:21 i5 systemd-networkd[274]: wlp3s0: Lost carrier Mar 25 13:44:24 i5 systemd-networkd[274]: wlp3s0: Connected WiFi access point: my ssid Mar 25 13:44:24 i5 systemd-networkd[274]: wlp3s0: Gained carrier Mar 25 13:46:13 i5 systemd-networkd[274]: wlp3s0: Lost carrier Mar 25 13:46:16 i5 systemd-networkd[274]: wlp3s0: Connected WiFi access point: my ssid Mar 25 13:46:16 i5 systemd-networkd[274]: wlp3s0: Gained carrier Mar 25 13:46:30 i5 systemd-networkd[274]: wlp3s0: Lost carrier Mar 25 13:46:33 i5 systemd-networkd[274]: wlp3s0: Connected WiFi access point: my ssid Mar 25 13:46:33 i5 systemd-networkd[274]: wlp3s0: Gained carrier Mar 25 13:46:18 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS Mar 25 13:46:20 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS Mar 25 13:46:22 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS Mar 25 13:46:24 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS Mar 25 13:46:26 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS Mar 25 13:46:28 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS Mar 25 13:46:30 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-BEACON-LOSS Mar 25 13:46:30 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-DISCONNECTED bssid=.... Mar 25 13:46:30 i5 wpa_supplicant[412]: wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #10) > TWIMC, the Linux developers are discussing a patch for mainline that seems > to help, in case anyone wants to give it a shot: > > https://lore.kernel.org/stable/871qyr9t4e.fsf@toke.dk/ Thanks for the efforts. Much appreciation. My comment 11 was INCORRECT -- turns out my build (relating to comment 11) did not have patch applied. This time around I successfully applied the patch and the wireless issue is gone. $ uname -a Linux i5 5.15.31-1-lts-modprobe #1 SMP Fri, 25 Mar 2022 22:50:02 +0000 x86_64 GNU/Linux Thanks so much for your work. TWIMC: the change that afaics is causing this regression was reverted in mainline: https://git.kernel.org/torvalds/c/bddac7c1e02ba47f0570e494c9289acea3062cc1 The revert will likely be backported to the next (due today) or over-next kernel versions released in affected stable and long series. 5.17.1 still have this issue. I had to patch it to get Wifi working again. Maybe this last revert will be available on 5.17.2 The revert is queued for 5.17.2 (among others): https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.17/revert-swiotlb-rework-fix-info-leak-with-dma_from_device.patch On ArchLinux in linux-zen 5.17.1.zen1-1 kernel my adapter started working fine. @Nikita - Arch reverted commit aa6f8dc (see above comment #3) via... https://github.com/archlinux/linux/commits/v5.17.1-arch1 and then pushed out 5.17.1-arch1-1 and I presume it's the same for the linux-zen kernel you are using. As far as I can see, this revert hasn't been applied to the Arch lts kernel. https://bugs.archlinux.org/task/74187 Other distros may/may not choose to revert while waiting for the relevant fixes. On ArchLinux in linux-zen 5.17.1.zen1-1 kernel my adapter started working fine, but after long time online the connection get slow... I had the same problem after upgrading to 5.17, I just used the arch archive repo to get 5.16.14 which didn't had the regression. Hello, @IAteNoodles-Linux you could show the steps to install said file kernel (In reply to jann_ from comment #20) > Hello, @IAteNoodles-Linux you could show the steps to install said file > kernel The kernel 5.17.1 has this regression fixed, but you can read about the arch archive repo at the wiki. pacman -U https://archive.archlinux.org/packages/path/packagename.pkg.tar.xz so, what I did was pacman -U https://archive.archlinux.org/packages/l/linux/linux-5.16.14.arch1-1-x86_64.pkg.tar.zst Sorry for this weird post, i couldn't format the commands @IAteNoodles-Linux thank you very much, I help me a lot this Kernel 5.15.32-1-lts still shows this bug. Does anyone know which LTS version will fix that regression? (In reply to Nicolás Adamo from comment #23) > Kernel 5.15.32-1-lts still shows this bug. Does anyone know which LTS > version will fix that regression? See comment 13, it just became the "overnext" release of various series. In the 5.15.y case that 5.15.33 that should be out soon: https://lore.kernel.org/stable/75067e12-9501-8603-b008-24284f47f8c0@w6rz.net/ No problems found with the recent 5.17.2. Since this regression was fixed I will mark this bug as resolved. At least for my hardware it is not resolved my hardware : laptop Asus N751JK Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01) Subsystem: AzureWave AR9462 Wireless Network Adapter I tested Ubuntu Mainline 5.17.4 and 5.17.5 and the problem persist https://bugzilla.kernel.org/show_bug.cgi?id=215698 I'm trying to bisect between 5.15.4 and 5.15.5 where the problems appears for my hardware for the 1st time (and I need help for that https://askubuntu.com/questions/1411565/which-repository-should-i-use-to-build-ubuntu-kernel-and-how-should-i-build-to-b?noredirect=1#comment2453262_1411565 ) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1970965 Unfortunately with my limited knowledge I was not able to bisect the Ubuntu Kernel version but I installed then Mainline Kernel 5.17.11 and the problem has disappeared https://askubuntu.com/questions/1411565/which-repository-should-i-use-to-build-ubuntu-kernel-and-how-should-i-build-to-b |