Created attachment 303752 [details] kernel log (dmesg) during device association I have a “Qualcomm Technologies, Inc QCN6024/9024/9074 Wireless Network Adapter (rev 01)” connected using a Thunderbolt adapter. Running hostapd with it appears to work fine, but all client authentication requests fail. This correlates with failures and page faults reported in dmesg and hostapd transmission retries / timeouts.
Created attachment 303753 [details] hostapd log (timeout) during device association
Created attachment 303754 [details] lspci -kvvv entry
Created attachment 303755 [details] lspci -tv
Created attachment 303756 [details] boltctl
Created attachment 303757 [details] hostapd.conf (uncommented lines only)
Further notes: The same problem occurs with crypto_mode=0 and also crypto_mode=1 (hardware vs software). Tried around 100 configurations (which removed various 802.11ac and 802.11ax options, tried different channels, minimized hostapd.conf etc.), but the problem was still exactly the same. The symptoms are that nothing can connect. But the network can be found and is visible.
Created attachment 303758 [details] iw list (parts pertaining to this particular device)
Created attachment 303759 [details] dmesg when ath11k_pci is loaded The device firmware comes from the package 20230210.bf4115c-1 on ArchLinux. It probably consists of the files in /lib/firmware/ath11k/QCN9074/hw1.0.
Posting the kernel and hostapd log also outside attachments (for search engines to find). This is (what appears to be) the ath11k_pci driver bug exposed on a hostapd-based server during WiFi client association + authentication: Feb 18 19:25:15 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0xb2a000080 flags=0x0020] Feb 18 19:25:15 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x483040300 flags=0x0020] Feb 18 19:25:15 kernel: ath11k_pci 0000:3d:00.0: frame rx with invalid buf_id 0 Feb 18 19:25:18 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e000a0 flags=0x0020] Feb 18 19:25:18 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x483040340 flags=0x0020] Feb 18 19:25:18 kernel: ath11k_pci 0000:3d:00.0: frame rx with invalid buf_id 0 Feb 18 19:25:24 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e000c0 flags=0x0020] Feb 18 19:25:24 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x483040380 flags=0x0020] Feb 18 19:25:24 kernel: ath11k_pci 0000:3d:00.0: frame rx with invalid buf_id 0 Feb 18 19:25:36 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e000e0 flags=0x0020] Feb 18 19:25:36 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x4830403c0 flags=0x0020] Feb 18 19:25:36 kernel: ath11k_pci 0000:3d:00.0: frame rx with invalid buf_id 0 Feb 18 19:25:56 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e00100 flags=0x0020] Feb 18 19:26:16 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e00120 flags=0x0020] The same event as seen by hostapd (unable to communicate with / get a response from) the device at some point: Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: authenticated Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: authenticated Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA-OPMODE-N_SS-CHANGED d4:3a:2c:b7:37:13 2 Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: associated (aid 1) Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: associated (aid 1) Feb 18 19:25:15 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-STARTED d4:3a:2c:b7:37:13 Feb 18 19:25:15 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1 Feb 18 19:25:18 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:25:24 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:25:36 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:25:56 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:26:16 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:26:36 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:26:36 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-TIMEOUT-FAILURE d4:3a:2c:b7:37:13 Feb 18 19:26:41 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: deauthenticated due to local deauth request Feb 18 19:26:41 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: deauthenticated due to local deauth request Devices can seee the hostapd’s SSID and (attempt to) associate with it. So the transmitters on the device do have their required extra 5V power, the device itself works in general and is identified as WiFi6 etc. If it weren’t for the page fault, it would most likely just work.
Details about the device: https://compex.com.sg/shop/wifi-module/802-11ax-wifi-module/pn02-1-wifi6-11ax-qcn6024-qcn9024-qcn9074/ (However, this is the later version of the Compex PN02.1 with a heatsink included and with a slightly smaller PCB (not sure if that matters).) The device has its additional 5V power supply from the host’s PSU, as required by the specs. The connection goes via a Thunderbolt —> NGFF —> M-key —> E-key “adapter chain”. The Thunderbolt host is a desktop described (e.g.) here: https://bbs.archlinux.org/viewtopic.php?id=261303 https://bbs.archlinux.org/viewtopic.php?id=283471 Considering this↑↑↑ context, it might be the case that a Thunderbolt-related issue is (also) to blame here and/or that this bug only occurs when the device is connected over Thunderbolt.
Tried some “no IOMMU” workaround “wisdom” from the web: iommu=soft — no effect; AMD IOMMU works as usual amd_iommu=off — won’t boot amd_iommu=off iommu=soft — won’t boot iommu=pt — won’t boot So I’m guessing that disabling the IOMMU is not an option on this system.
What is the exact kernel version you are using? The firmware info from the attachements is this: eb 18 19:16:29 kernel: ath11k_pci 0000:3d:00.0: BAR 0: assigned [mem 0xe0a00000-0xe0bfffff 64bit] Feb 18 19:16:29 kernel: ath11k_pci 0000:3d:00.0: MSI vectors: 16 Feb 18 19:16:29 kernel: ath11k_pci 0000:3d:00.0: qcn9074 hw1.0 Feb 18 19:16:30 kernel: ath11k_pci 0000:3d:00.0: chip_id 0x0 chip_family 0x0 board_id 0xff soc_id 0xffffffff Feb 18 19:16:30 kernel: ath11k_pci 0000:3d:00.0: fw_version 0x2506844c fw_build_timestamp 2021-07-13 10:24 fw_build_id Unfortunately the firmware does not provide the version string but the date looks old. Please try the latest firmware from here: https://github.com/kvalo/ath11k-firmware/tree/master/QCN9074/hw1.0 Also can you try on an another system without Thunderbolt? This would help to rule if it's a problem with the setup. To me this looks like an iommu problem.
Hopefully this patch fixes the issue: https://patchwork.kernel.org/project/linux-wireless/patch/20231212031914.47339-1-imguzh@gmail.com/ Please let us know if you are able to test it.
Sorry for 1 year of a delay; I just didn’t find the time to get back to this and basically gave up on the device. With kernel 6.12.1 on a Lenovo Carbon X1 Gen10 and ArchLinux, all I can say is the following: 0. The patch no longer applies; seems to be already applied to dp.c and just fails on hal.c 1. When I test it without the patch, just with the regular kernel, it won’t work. Here’s the dmesg after I plug the device into Thunderbolt: https://pastebin.com/xPAXQHwa 2. The output above was obtained with this firmware: https://git.codelinaro.org/clo/ath-firmware/ath11k-firmware/-/tree/main/QCN9074/hw1.0/2.9.0.1/WLAN.HK.2.9.0.1-02146-QCAHKSWPL_SILICONZ-1 However, the default old linux-firmware version (below) yields the same final outcome. New firmware tried: fw_version 0x290b8862 fw_build_timestamp 2024-09-23 10:51 fw_build_id Standard linux-firmware version (also tried): fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1 The fact that the new firmware’s build ID is not logged at the end of the log message (WLAN.HK.2.9.0.1-02146-QCAHKSWPL_SILICONZ-1 expected) *might* be an indication of yet another problem; not quite sure about that. The diff in dmesg output between the two firmware versions (the rest is identical, the version strings aside): Dec 01 13:23:37 kernel: thunderbolt 0-1: LT-LINK LT-LINK NVME -Dec 01 13:23:51 kernel: mhi mhi0: Device failed to clear MHI Reset -Dec 01 13:23:52 kernel: ath11k_pci 0000:22:00.0: link down error during global reset -Dec 01 13:23:52 kernel: pci_bus 0000:22: busn_res: [bus 22] is released -Dec 01 13:23:52 kernel: pci_bus 0000:21: busn_res: [bus 21-22] is released Dec 01 13:23:52 kernel: pcieport 0000:00:07.0: pciehp: Slot(3): Card present -Dec 01 13:23:52 kernel: pcieport 0000:00:07.0: pciehp: Slot(3): Link Up Dec 01 13:23:52 kernel: pci 0000:20:00.0: [8086:1576] type 01 class 0x060400 PCIe Switch Upstream Port Dec 01 13:23:52 kernel: pci 0000:20:00.0: PCI bridge to [bus 00] So the newer firmware yields fewer “pessimistic“ error messages and doesn’t explicitly say Link Up.