Bug 217056
Summary: | ath11k: QCN9074: iommu problems with Thunderbolt | ||
---|---|---|---|
Product: | Drivers | Reporter: | Andrej Podzimek (andrej) |
Component: | network-wireless | Assignee: | drivers_network-wireless (drivers_network-wireless) |
Status: | NEEDINFO --- | ||
Severity: | high | CC: | andrej, kvalo |
Priority: | P4 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 6.1.11 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel log (dmesg) during device association
hostapd log (timeout) during device association lspci -kvvv entry lspci -tv boltctl hostapd.conf (uncommented lines only) iw list (parts pertaining to this particular device) dmesg when ath11k_pci is loaded |
Created attachment 303753 [details]
hostapd log (timeout) during device association
Created attachment 303754 [details]
lspci -kvvv entry
Created attachment 303755 [details]
lspci -tv
Created attachment 303756 [details]
boltctl
Created attachment 303757 [details]
hostapd.conf (uncommented lines only)
Further notes: The same problem occurs with crypto_mode=0 and also crypto_mode=1 (hardware vs software). Tried around 100 configurations (which removed various 802.11ac and 802.11ax options, tried different channels, minimized hostapd.conf etc.), but the problem was still exactly the same. The symptoms are that nothing can connect. But the network can be found and is visible. Created attachment 303758 [details]
iw list (parts pertaining to this particular device)
Created attachment 303759 [details]
dmesg when ath11k_pci is loaded
The device firmware comes from the package 20230210.bf4115c-1 on ArchLinux.
It probably consists of the files in /lib/firmware/ath11k/QCN9074/hw1.0.
Posting the kernel and hostapd log also outside attachments (for search engines to find). This is (what appears to be) the ath11k_pci driver bug exposed on a hostapd-based server during WiFi client association + authentication: Feb 18 19:25:15 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0xb2a000080 flags=0x0020] Feb 18 19:25:15 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x483040300 flags=0x0020] Feb 18 19:25:15 kernel: ath11k_pci 0000:3d:00.0: frame rx with invalid buf_id 0 Feb 18 19:25:18 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e000a0 flags=0x0020] Feb 18 19:25:18 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x483040340 flags=0x0020] Feb 18 19:25:18 kernel: ath11k_pci 0000:3d:00.0: frame rx with invalid buf_id 0 Feb 18 19:25:24 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e000c0 flags=0x0020] Feb 18 19:25:24 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x483040380 flags=0x0020] Feb 18 19:25:24 kernel: ath11k_pci 0000:3d:00.0: frame rx with invalid buf_id 0 Feb 18 19:25:36 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e000e0 flags=0x0020] Feb 18 19:25:36 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x4830403c0 flags=0x0020] Feb 18 19:25:36 kernel: ath11k_pci 0000:3d:00.0: frame rx with invalid buf_id 0 Feb 18 19:25:56 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e00100 flags=0x0020] Feb 18 19:26:16 kernel: ath11k_pci 0000:3d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x481e00120 flags=0x0020] The same event as seen by hostapd (unable to communicate with / get a response from) the device at some point: Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: authenticated Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: authenticated Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA-OPMODE-N_SS-CHANGED d4:3a:2c:b7:37:13 2 Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: associated (aid 1) Feb 18 19:25:15 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: associated (aid 1) Feb 18 19:25:15 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-STARTED d4:3a:2c:b7:37:13 Feb 18 19:25:15 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=1 Feb 18 19:25:18 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:25:24 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:25:36 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:25:56 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:26:16 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:26:36 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-RETRANSMIT d4:3a:2c:b7:37:13 Feb 18 19:26:36 hostapd[3702212]: charonwifi1: CTRL-EVENT-EAP-TIMEOUT-FAILURE d4:3a:2c:b7:37:13 Feb 18 19:26:41 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: deauthenticated due to local deauth request Feb 18 19:26:41 hostapd[3702212]: charonwifi1: STA d4:3a:2c:b7:37:13 IEEE 802.11: deauthenticated due to local deauth request Devices can seee the hostapd’s SSID and (attempt to) associate with it. So the transmitters on the device do have their required extra 5V power, the device itself works in general and is identified as WiFi6 etc. If it weren’t for the page fault, it would most likely just work. Details about the device: https://compex.com.sg/shop/wifi-module/802-11ax-wifi-module/pn02-1-wifi6-11ax-qcn6024-qcn9024-qcn9074/ (However, this is the later version of the Compex PN02.1 with a heatsink included and with a slightly smaller PCB (not sure if that matters).) The device has its additional 5V power supply from the host’s PSU, as required by the specs. The connection goes via a Thunderbolt —> NGFF —> M-key —> E-key “adapter chain”. The Thunderbolt host is a desktop described (e.g.) here: https://bbs.archlinux.org/viewtopic.php?id=261303 https://bbs.archlinux.org/viewtopic.php?id=283471 Considering this↑↑↑ context, it might be the case that a Thunderbolt-related issue is (also) to blame here and/or that this bug only occurs when the device is connected over Thunderbolt. Tried some “no IOMMU” workaround “wisdom” from the web: iommu=soft — no effect; AMD IOMMU works as usual amd_iommu=off — won’t boot amd_iommu=off iommu=soft — won’t boot iommu=pt — won’t boot So I’m guessing that disabling the IOMMU is not an option on this system. What is the exact kernel version you are using? The firmware info from the attachements is this: eb 18 19:16:29 kernel: ath11k_pci 0000:3d:00.0: BAR 0: assigned [mem 0xe0a00000-0xe0bfffff 64bit] Feb 18 19:16:29 kernel: ath11k_pci 0000:3d:00.0: MSI vectors: 16 Feb 18 19:16:29 kernel: ath11k_pci 0000:3d:00.0: qcn9074 hw1.0 Feb 18 19:16:30 kernel: ath11k_pci 0000:3d:00.0: chip_id 0x0 chip_family 0x0 board_id 0xff soc_id 0xffffffff Feb 18 19:16:30 kernel: ath11k_pci 0000:3d:00.0: fw_version 0x2506844c fw_build_timestamp 2021-07-13 10:24 fw_build_id Unfortunately the firmware does not provide the version string but the date looks old. Please try the latest firmware from here: https://github.com/kvalo/ath11k-firmware/tree/master/QCN9074/hw1.0 Also can you try on an another system without Thunderbolt? This would help to rule if it's a problem with the setup. To me this looks like an iommu problem. Hopefully this patch fixes the issue: https://patchwork.kernel.org/project/linux-wireless/patch/20231212031914.47339-1-imguzh@gmail.com/ Please let us know if you are able to test it. |
Created attachment 303752 [details] kernel log (dmesg) during device association I have a “Qualcomm Technologies, Inc QCN6024/9024/9074 Wireless Network Adapter (rev 01)” connected using a Thunderbolt adapter. Running hostapd with it appears to work fine, but all client authentication requests fail. This correlates with failures and page faults reported in dmesg and hostapd transmission retries / timeouts.