Bug 219380

Summary: ath12k: bad interaction with ath11k - neither will work out of the box
Product: Drivers Reporter: Mihai Moldovan (ionic)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED DUPLICATE    
Severity: normal    
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: ath11k debug log on boot, initializing correctly.
ath12k being loaded after ath11k was loaded - and failing to initialize

Description Mihai Moldovan 2024-10-12 22:07:52 UTC
In a machine with an ath11k-based (QCA6390) and an ath12k-based (WCN7850) card, neither card works upon boot. Instead, both cards fail the host capability request and either can't load up their board data (because qmi-chip-id can't be queried and stays 0) or MHI state doesn't change to READY, causing a failure (albeit a different one) in both cases.

Unloading ath11k_pci ath11k ath12k, then waiting for a few seconds and loading either ath11k_pci xor ath12k initializes one card successfully and it will work after that.

(There is an unrelated bug in which modprobe will segfault if loading the module too quickly after unloading the modules, but that's just FYI. I haven't debugged this heavily and don't feel inclined to do so, since a segfaulting modprobe requires a proper reboot.)

However, one of the two cards will stay dead. Even loading the other module shortly after the first one will not wake up the second card, and interact with the working card (driver spits out errors, but keeps working).

The dmesg output shows a clean boot and both ath11k and ath12k failures, 


Kernel: ath.git based on ath-202410111606 with a Debian base config (for sake of simplicity) and slight modifications to enable ath11k and ath12k debugging plus certification onus

System: Debian unstable as of 2024-10-12

Host device information: desktop computer, MSI Z87-G43 mainboard

BIOS version: 1.11 (newest, although from 2015)

Reproducibility: always

uname: Linux grml 6.12.0-rc2-wt-ath-g5bf2d24e7e25-dirty #1 SMP PREEMPT_DYNAMIC Sat Oct 12 13:38:54 CEST 2024 x86_64 GNU/Linux

lspci:
===
00:00.0 "Host bridge [0600]" "Intel Corporation [8086]" "4th Gen Core Processor DRAM Controller [0c00]" -r06 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:02.0 "VGA compatible controller [0300]" "Intel Corporation [8086]" "Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller [0412]" -r06 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:03.0 "Audio device [0403]" "Intel Corporation [8086]" "Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [0c0c]" -r06 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:14.0 "USB controller [0c03]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family USB xHCI [8c31]" -r05 -p30 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:16.0 "Communication controller [0780]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family MEI Controller #1 [8c3a]" -r04 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1a.0 "USB controller [0c03]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family USB EHCI #2 [8c2d]" -r05 -p20 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1b.0 "Audio device [0403]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset High Definition Audio Controller [8c20]" -r05 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [d816]"
00:1c.0 "PCI bridge [0604]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family PCI Express Root Port #1 [8c10]" -rd5 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1c.1 "PCI bridge [0604]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family PCI Express Root Port #2 [8c12]" -rd5 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1c.2 "PCI bridge [0604]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family PCI Express Root Port #3 [8c14]" -rd5 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1c.3 "PCI bridge [0604]" "Intel Corporation [8086]" "82801 PCI Bridge [244e]" -rd5 -p01 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1c.4 "PCI bridge [0604]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family PCI Express Root Port #5 [8c18]" -rd5 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1d.0 "USB controller [0c03]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family USB EHCI #1 [8c26]" -r05 -p20 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1f.0 "ISA bridge [0601]" "Intel Corporation [8086]" "Z87 Express LPC Controller [8c44]" -r05 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1f.2 "SATA controller [0106]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8c02]" -r05 -p01 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
00:1f.3 "SMBus [0c05]" "Intel Corporation [8086]" "8 Series/C220 Series Chipset Family SMBus Controller [8c22]" -r05 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
01:00.0 "Network controller [0280]" "Qualcomm Technologies, Inc [17cb]" "WCN785x Wi-Fi 7(802.11be) 320MHz 2x2 [FastConnect 7800] [1107]" -r01 -p00 "Foxconn International, Inc. [105b]" "High Band Simultaneous Wireless Network Adapter [e0f7]"
02:00.0 "Ethernet controller [0200]" "Realtek Semiconductor Co., Ltd. [10ec]" "RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [8168]" -r06 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
03:00.0 "Unassigned class [ff00]" "Qualcomm Technologies, Inc [17cb]" "QCA6390 Wireless Network Adapter [1101]" -p00 "Qualcomm Technologies, Inc [17cb]" "Device [0108]"
04:00.0 "PCI bridge [0604]" "ASMedia Technology Inc. [1b21]" "ASM1083/1085 PCIe to PCI Bridge [1080]" -r03 -p00 "Micro-Star International Co., Ltd. [MSI] [1462]" "Device [7816]"
06:00.0 "Ethernet controller [0200]" "Intel Corporation [8086]" "82576 Gigabit Network Connection [10c9]" -r01 -p00 "Intel Corporation [8086]" "Gigabit ET Dual Port Server Adapter [a01c]"
06:00.1 "Ethernet controller [0200]" "Intel Corporation [8086]" "82576 Gigabit Network Connection [10c9]" -r01 -p00 "Intel Corporation [8086]" "Gigabit ET Dual Port Server Adapter [a01c]"
===


Firmware: kernel/git/ath/linux-firmware.git with branch ath-20241009
find /lib/firmware/ath12k/ -type f | xargs md5sum:
===
73056f1d2aff886ce9bff313f455e963  /lib/firmware/ath12k/WCN7850/hw2.0/m3.bin
d3750b67b1013fe82358d0538fb131b0  /lib/firmware/ath12k/WCN7850/hw2.0/amss.bin
bb04932570c666b9d4ce8dea0e78b70a  /lib/firmware/ath12k/WCN7850/hw2.0/board-2.bin
===

dmesg | grep -E '(ath1[12]k|mhi)':
===
[   23.117854] [    T453] ath11k_pci 0000:03:00.0: BAR 0 [mem 0xf5000000-0xf5ffffff 64bit]: assigned                                                                                           [   23.118699] [    T453] ath11k_pci 0000:03:00.0: enabling device (0000 -> 0002)
[   23.119900] [    T453] ath11k_pci 0000:03:00.0: MSI vectors: 32          
[   23.120721] [    T453] ath11k_pci 0000:03:00.0: qca6390 hw2.0
[   23.146297] [    T442] ath12k_pci 0000:01:00.0: BAR 0 [mem 0xf7600000-0xf77fffff 64bit]: assigned
[   23.146676] [    T442] ath12k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[   23.147170] [    T442] ath12k_pci 0000:01:00.0: MSI vectors: 16
[   23.147530] [    T442] ath12k_pci 0000:01:00.0: Hardware name: wcn7850 hw2.0
[   23.454617] [    T442] mhi mhi1: Requested to power ON
[   23.455216] [    T442] mhi mhi1: Power on setup success
[   23.553144] [    T453] mhi mhi0: Requested to power ON
[   23.553462] [    T453] mhi mhi0: Power on setup success
[   23.724559] [    T121] mhi mhi1: Wait for device to enter SBL or Mission mode
[   24.237287] [     T66] ath12k_pci 0000:01:00.0: qmi dma allocation failed (7077888 B type 1), will try later with small size
[   24.239012] [     T70] ath12k_pci 0000:01:00.0: Host capability request failed, result: 1, err: 90
[   24.240716] [     T70] ath12k_pci 0000:01:00.0: qmi failed to send host cap QMI:-22
[   24.254721] [     T66] ath12k_pci 0000:01:00.0: chip_id 0x2 chip_family 0x4 board_id 0xff soc_id 0x40170200
[   24.256371] [     T66] ath12k_pci 0000:01:00.0: fw_version 0x100301e1 fw_build_timestamp 2023-12-06 04:05 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SIL
ICONZ-3
[   49.732554] [    T592] mhi mhi0: Device failed to enter MHI Ready
[   49.734526] [    T592] mhi mhi0: MHI did not enter READY state
[   49.736557] [    T453] ath11k_pci 0000:03:00.0: failed to power up mhi: -110
[   49.737450] [    T453] ath11k_pci 0000:03:00.0: failed to start mhi: -110
[   49.737453] [    T453] ath11k_pci 0000:03:00.0: failed to power up :-110
[   49.760590] [    T453] ath11k_pci 0000:03:00.0: failed to create soc core: -110
[   49.762237] [    T453] ath11k_pci 0000:03:00.0: failed to init core: -110
[   50.021416] [    T453] ath11k_pci 0000:03:00.0: probe with driver ath11k_pci failed with error -110
[14258.822127] [   T8922] ath12k_pci 0000:01:00.0: BAR 0 [mem 0xf7600000-0xf77fffff 64bit]: assigned
[14258.822301] [   T8922] ath12k_pci 0000:01:00.0: MSI vectors: 16
[14258.822306] [   T8922] ath12k_pci 0000:01:00.0: Hardware name: wcn7850 hw2.0
[14258.877851] [   T8922] mhi mhi0: Requested to power ON
[14258.877864] [   T8922] mhi mhi0: Power on setup success
[14258.985222] [    T592] mhi mhi0: Wait for device to enter SBL or Mission mode
[14259.383663] [   T8872] ath12k_pci 0000:01:00.0: qmi dma allocation failed (7077888 B type 1), will try later with small size
[14259.392680] [   T8872] ath12k_pci 0000:01:00.0: chip_id 0x2 chip_family 0x4 board_id 0xff soc_id 0x40170200
[14259.392688] [   T8872] ath12k_pci 0000:01:00.0: fw_version 0x100301e1 fw_build_timestamp 2023-12-06 04:05 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SIL
ICONZ-3
[14259.633591] [   T8928] ath12k_pci 0000:01:00.0 wlp1s0: renamed from wlan0
===
Comment 1 Mihai Moldovan 2024-10-23 03:14:02 UTC
Thank you all the help, or for that matter even acknowledging the issue. Stellar.


In any case, I've since upgraded to ath-202410221503, enabled debugging for ath11k, ath11k_pci, mhi, qmi-helpers, qrtr and qrtr-mhi and also blacklisted ath12k to get a defined good state at boot up time with just ath11k(_pci) being loaded and working.

I'll include a full debug log after booting up with ath11k(_pci) only and then another log after loading up ath12k at a later time.

As I already expected, and could confirm through additional debug info, the issue seems to be that uploading the firmware to the second card (in this case powered by ath12k, but I could also change the setup so that ath12k is loaded first and works and loading ath11k(_pci) later which fails) fails once the card enters AMSS/Mission Mode because the new QMI FW loading service is bound to both QMI sockets instead of "the right one" and the firmware probably sent to the "wrong" server, which means that the second card hangs in AMSS waiting for a firmware upload.

(Subsequent kernel module unloads and reloads will result in the card failing to even reach the MHI POWER_ON state, because it's still waiting for a firmware upload and a reset is not taking it out of this mode. Unfortunately, the only way to reset the card is to remove power to its PCIe socket, which requires a complete shutdown (a reboot won't do). This, however, is an unrelated bug, I'm just pointing it out for posterity.)

The interesting part is this:

===
[  T122] ath12k_pci 0000:01:00.0: mhi notify status reason MHI_CB_EE_MISSION_MODE
[  T122] [122] qcom_mhi_qrtr mhi1_IPCR: Qualcomm MHI QRTR driver probed
[  T535] [535] qrtr_ns_worker: QRTR_TYPE_NEW_SERVER: &sq = 00000000599df612, service = 69, instance = 257, node = 7, port = 1
[  T535] [535] lookup_notify: new = 1
[  T535] [535] lookup_notify: (hopefully for QRTR_TYPE_NEW_SERVER) qrtr_ns.sock = 00000000ca5b39c2, service = 69, instance = 257, node = 7, port = 1
[  T535] [535] lookup_notify: new = 1
[  T535] [535] lookup_notify: (hopefully for QRTR_TYPE_NEW_SERVER) qrtr_ns.sock = 00000000ca5b39c2, service = 69, instance = 257, node = 7, port = 1
[  T535] [535] lookup_notify: new = 1
[  T535] [535] lookup_notify: (hopefully for QRTR_TYPE_NEW_SERVER) qrtr_ns.sock = 00000000ca5b39c2, service = 69, instance = 257, node = 7, port = 1
[  T535] [535] qmi_recv_ctrl_pkt: received QRTR_TYPE_NEW_SERVER: qmi_handle = 000000000428dfc2, service = 69, instance = 257, node = 7, port = 1
[  T535] [535] qmi_recv_new_server: qmi_handle = 000000000428dfc2, service = 69, instance = 257, node = 7, port = 1
[  T535] ath11k_pci 0000:03:00.0: qmi wifi fw qmi service connected
[  T535] [535] qmi_recv_ctrl_pkt: received QRTR_TYPE_NEW_SERVER: qmi_handle = 00000000a08def4c, service = 69, instance = 257, node = 7, port = 1
[  T535] [535] qmi_recv_new_server: qmi_handle = 00000000a08def4c, service = 69, instance = 257, node = 7, port = 1
[  T535] ath12k_pci 0000:01:00.0: qmi wifi fw qmi service connected
[   T67] ath11k_pci 0000:03:00.0: qmi indication register request
[   T67] ath11k_pci 0000:03:00.0: qmi host cap request
[  T535] ath12k_pci 0000:01:00.0: no valid response from PHY capability, choose default num_phy 2
[  T535] ath12k_pci 0000:01:00.0: qmi about to send indication register request
[   T67] ath11k_pci 0000:03:00.0: host capability request failed: 1 90
[   T67] ath11k_pci 0000:03:00.0: failed to send qmi host cap: -22
===

qmi_handle = 428dfc2 is associated with the ath11k-based card, while qmi_handle = a08def4c is associated with the ath12k-based card. Unfortunately, ath11k binds the FW service before ath12k, so ath12k initialization will get stuck indefinitely.

Knowing that is great, but I still have to figure out how qrtr routes the packets to the actual QMI handles... and then find a way to figure out how to send packets on to the correct QMI handle only.
Comment 2 Mihai Moldovan 2024-10-23 03:15:12 UTC
Created attachment 307053 [details]
ath11k debug log on boot, initializing correctly.
Comment 3 Mihai Moldovan 2024-10-23 03:16:01 UTC
Created attachment 307054 [details]
ath12k being loaded after ath11k was loaded - and failing to initialize
Comment 4 Mihai Moldovan 2024-10-27 14:17:54 UTC
Essentially a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=218480, closing.

If you feel like it's not a duplicate, feel free to reopen.

*** This bug has been marked as a duplicate of bug 218480 ***