Bug 218276

Summary: ath11k: QCNFA765: Bug with non-standard router setting. Crashes, terrible latensy and speed.
Product: Drivers Reporter: Evgenii Ilchenko (evgenii.ilchenko)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: NEW ---    
Severity: high CC: bagasdotme, evgenii.ilchenko
Priority: P3    
Hardware: AMD   
OS: Linux   
Kernel Version: 6.5.13 6.7.0rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: Dmesg output

Description Evgenii Ilchenko 2023-12-16 11:46:12 UTC
Created attachment 305615 [details]
Dmesg output

Hardware:
Lenovo Thinkpad P14s (21K5001JUS)
AMD Ryzen 7450u with Qualcomm QCNFA765 Wireless Network Adapter
Router: Huawei HG8245X6-10

Software:
Debian Testing (trixie).
Testing with 6.1.0, 6.5.0, 6.5.13 kernels.


The problem is reproduced in the following environment:

802.11ax is turned off on the router.
In this case, a lot of messages like this are printed to the logs:

ath11k_pci 0000:02:00.0: Received with invalid mcs in VHT mode 11
ath11k_pci 0000:02:00.0: Received with invalid mcs in VHT mode 10

and:

[   19.498035] ------------[ cut here ]------------
[   19.498039] Rate marked as a VHT rate but data is invalid: MCS: 10, NSS: 0
[   19.498138] WARNING: CPU: 12 PID: 3107 at net/mac80211/rx.c:5337 ieee80211_rx_list+0x2b3/0xda0 [mac80211]
.........
[   19.498631] RIP: 0010:ieee80211_rx_list+0x2b3/0xda0 [mac80211]
[   19.498684] Code: 00 00 80 3d 96 a7 07 00 00 0f 85 2d ff ff ff 0f b6 53 4a 40 0f b6 f7 48 c7 c7 e0 a4 e2 c1 c6 05 7a a7 07 00 01 e8 dd 5d b6 e3 <0f> 0b e9 0b ff ff ff 40 80 ff 0b 0f 86 26 03 00 00 80 3d 5c a7 07
.......
[   19.498724] Call Trace:
[   19.498731]  <IRQ>
[   19.498735]  ? ieee80211_rx_list+0x2b3/0xda0 [mac80211]
[   19.498785]  ? __warn+0x81/0x130
[   19.498799]  ? ieee80211_rx_list+0x2b3/0xda0 [mac80211]
[   19.498852]  ? report_bug+0x171/0x1a0
[   19.498861]  ? prb_read_valid+0x1b/0x30
[   19.498871]  ? srso_alias_return_thunk+0x5/0x7f
[   19.498882]  ? handle_bug+0x3c/0x80
[   19.498891]  ? exc_invalid_op+0x17/0x70
[   19.498897]  ? asm_exc_invalid_op+0x1a/0x20
[   19.498910]  ? ieee80211_rx_list+0x2b3/0xda0 [mac80211]
[   19.498941]  ? srso_alias_return_thunk+0x5/0x7f
[   19.498944]  ? _dev_warn+0x79/0xa0
[   19.498952]  ? srso_alias_return_thunk+0x5/0x7f
[   19.498956]  ? ath11k_peer_find_by_id+0x100/0x1c0 [ath11k]
[   19.498978]  ieee80211_rx_napi+0x53/0xe0 [mac80211]
[   19.498999]  ath11k_dp_rx_process_received_packets+0x23e/0x660 [ath11k]
[   19.499013]  ath11k_dp_process_rx+0x2cf/0x3c0 [ath11k]
[   19.499026]  ath11k_dp_service_srng+0x2e0/0x320 [ath11k]
[   19.499037]  ath11k_pcic_ext_grp_napi_poll+0x25/0x80 [ath11k]
[   19.499047]  __napi_poll+0x28/0x1b0
[   19.499055]  net_rx_action+0x2a4/0x380
[   19.499058]  ? srso_alias_return_thunk+0x5/0x7f
[   19.499060]  ? __napi_schedule+0xb0/0xc0
[   19.499065]  __do_softirq+0xc7/0x2ae
[   19.499070]  ? handle_edge_irq+0x8b/0x230
[   19.499076]  __irq_exit_rcu+0x96/0xb0
[   19.499083]  common_interrupt+0x86/0xa0
[   19.499086]  </IRQ>
[   19.499087]  <TASK>
[   19.499089]  asm_common_interrupt+0x26/0x40
.........
[   19.499179] ---[ end trace 0000000000000000 ]---
full dmesg are attached.

Under these conditions, there is a high proportion of packet loss and terrible network speed.
--- 8.8.8.8 ping statistics ---
1897 packets transmitted, 1868 received, 1.52873% packet loss, time 1899829ms
rtt min/avg/max/mdev = 8.361/19.235/182.594/10.237 ms

Workaround:
When you enable 802.11ax in the router settings, everything becomes fine.

From my side, looks like router is sending incompatible in 802.11ac mode MCS setting and this cause the problem.
But a lot of devices (include thinkpad t14 g2 with AX201 intel wi-fi) work well with this router and this setting.
Comment 1 Bagas Sanjaya 2023-12-17 05:39:29 UTC
(In reply to Evgenii Ilchenko from comment #0)
> Created attachment 305615 [details]
> Dmesg output
> 
> Hardware:
> Lenovo Thinkpad P14s (21K5001JUS)
> AMD Ryzen 7450u with Qualcomm QCNFA765 Wireless Network Adapter
> Router: Huawei HG8245X6-10
> 
> Software:
> Debian Testing (trixie).
> Testing with 6.1.0, 6.5.0, 6.5.13 kernels.
> 

Can you check current mainline (v6.7-rc5)?
Comment 2 Evgenii Ilchenko 2023-12-17 22:43:54 UTC
(In reply to Bagas Sanjaya from comment #1)
> Can you check current mainline (v6.7-rc5)?
Of course.
At first glance it seemed to be better, but the problem is still reproducible.
1800 packets transmitted, 1715 received, 4.72222% packet loss, time 1803370ms

Dmesg:
https://drive.proton.me/urls/ANXKYVSSE0#1UAg2yv5RbvD
Ping with timestamps:
https://drive.proton.me/urls/0X1YVJ0QEG#HWiaF4ZtM2YZ

There appears to be a correlation between log messages (ath11k_pci ... Received with invalid mcs) and packet loss.
Comment 3 Bagas Sanjaya 2023-12-21 03:55:08 UTC
(In reply to Evgenii Ilchenko from comment #0)
> Created attachment 305615 [details]
> Dmesg output
> 
> Hardware:
> Lenovo Thinkpad P14s (21K5001JUS)
> AMD Ryzen 7450u with Qualcomm QCNFA765 Wireless Network Adapter
> Router: Huawei HG8245X6-10
> 
> Software:
> Debian Testing (trixie).
> Testing with 6.1.0, 6.5.0, 6.5.13 kernels.
> 
> 
> The problem is reproduced in the following environment:
> 
> 802.11ax is turned off on the router.
> In this case, a lot of messages like this are printed to the logs:
> 
> ath11k_pci 0000:02:00.0: Received with invalid mcs in VHT mode 11
> ath11k_pci 0000:02:00.0: Received with invalid mcs in VHT mode 10
> 
> and:
> 
> [   19.498035] ------------[ cut here ]------------
> [   19.498039] Rate marked as a VHT rate but data is invalid: MCS: 10, NSS: 0
> [   19.498138] WARNING: CPU: 12 PID: 3107 at net/mac80211/rx.c:5337
> ieee80211_rx_list+0x2b3/0xda0 [mac80211]
> .........
> [   19.498631] RIP: 0010:ieee80211_rx_list+0x2b3/0xda0 [mac80211]
> [   19.498684] Code: 00 00 80 3d 96 a7 07 00 00 0f 85 2d ff ff ff 0f b6 53
> 4a 40 0f b6 f7 48 c7 c7 e0 a4 e2 c1 c6 05 7a a7 07 00 01 e8 dd 5d b6 e3 <0f>
> 0b e9 0b ff ff ff 40 80 ff 0b 0f 86 26 03 00 00 80 3d 5c a7 07
> .......
> [   19.498724] Call Trace:
> [   19.498731]  <IRQ>
> [   19.498735]  ? ieee80211_rx_list+0x2b3/0xda0 [mac80211]
> [   19.498785]  ? __warn+0x81/0x130
> [   19.498799]  ? ieee80211_rx_list+0x2b3/0xda0 [mac80211]
> [   19.498852]  ? report_bug+0x171/0x1a0
> [   19.498861]  ? prb_read_valid+0x1b/0x30
> [   19.498871]  ? srso_alias_return_thunk+0x5/0x7f
> [   19.498882]  ? handle_bug+0x3c/0x80
> [   19.498891]  ? exc_invalid_op+0x17/0x70
> [   19.498897]  ? asm_exc_invalid_op+0x1a/0x20
> [   19.498910]  ? ieee80211_rx_list+0x2b3/0xda0 [mac80211]
> [   19.498941]  ? srso_alias_return_thunk+0x5/0x7f
> [   19.498944]  ? _dev_warn+0x79/0xa0
> [   19.498952]  ? srso_alias_return_thunk+0x5/0x7f
> [   19.498956]  ? ath11k_peer_find_by_id+0x100/0x1c0 [ath11k]
> [   19.498978]  ieee80211_rx_napi+0x53/0xe0 [mac80211]
> [   19.498999]  ath11k_dp_rx_process_received_packets+0x23e/0x660 [ath11k]
> [   19.499013]  ath11k_dp_process_rx+0x2cf/0x3c0 [ath11k]
> [   19.499026]  ath11k_dp_service_srng+0x2e0/0x320 [ath11k]
> [   19.499037]  ath11k_pcic_ext_grp_napi_poll+0x25/0x80 [ath11k]
> [   19.499047]  __napi_poll+0x28/0x1b0
> [   19.499055]  net_rx_action+0x2a4/0x380
> [   19.499058]  ? srso_alias_return_thunk+0x5/0x7f
> [   19.499060]  ? __napi_schedule+0xb0/0xc0
> [   19.499065]  __do_softirq+0xc7/0x2ae
> [   19.499070]  ? handle_edge_irq+0x8b/0x230
> [   19.499076]  __irq_exit_rcu+0x96/0xb0
> [   19.499083]  common_interrupt+0x86/0xa0
> [   19.499086]  </IRQ>
> [   19.499087]  <TASK>
> [   19.499089]  asm_common_interrupt+0x26/0x40
> .........
> [   19.499179] ---[ end trace 0000000000000000 ]---
> full dmesg are attached.
> 
> Under these conditions, there is a high proportion of packet loss and
> terrible network speed.
> --- 8.8.8.8 ping statistics ---
> 1897 packets transmitted, 1868 received, 1.52873% packet loss, time 1899829ms
> rtt min/avg/max/mdev = 8.361/19.235/182.594/10.237 ms
> 
> Workaround:
> When you enable 802.11ax in the router settings, everything becomes fine.
> 
> From my side, looks like router is sending incompatible in 802.11ac mode MCS
> setting and this cause the problem.
> But a lot of devices (include thinkpad t14 g2 with AX201 intel wi-fi) work
> well with this router and this setting.

Forwarded to LKML [1].

[1]: https://lore.kernel.org/lkml/ZYO12aX3RpWzWuDs@archie.me/