Bug 206961

Summary: iwlwifi: AX200: channel switching many times causes Tx queue alloc failed
Product: Drivers Reporter: Jeff Schuler (jschuler)
Component: network-wireless-intelAssignee: Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel)
Status: NEW ---    
Severity: blocking CC: jschuler, linuxwifi, thomas.f.steeples, ZeroBeat
Priority: P1    
Hardware: ARM   
OS: Linux   
Kernel Version: 5.1.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: system infos, kernel logs, and reproduction bash script.
picture of the wifi card
firmware trace for iwlwifi
solution patch

Description Jeff Schuler 2020-03-25 19:05:48 UTC
Created attachment 288055 [details]
system infos, kernel logs, and reproduction bash script.

Hi,

I am attempting to report a bug that I believe is in either the iwlwifi driver, iwlmvm, or the firmware provided by Intel for use with the AX200NGW WiFi card. Relevant files have been attached.

Following directions per this page: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging

Note: Switching to the Intel 8265 cards with this same kernel configuration DOES NOT exhibit this issue. It is Intel AX200NGW card specific in one way or another.

dmesg snippet:

Mar 24 22:14:02 chevy-sensor kernel: [  687.364749] vmap allocation for size 8192 failed: use vmalloc=<size> to increase size
Mar 24 22:14:02 chevy-sensor kernel: [  687.364785] iwlwifi 0000:01:00.0: Tx queue alloc failed
Mar 24 22:14:02 chevy-sensor kernel: [  687.373723] ------------[ cut here ]------------
Mar 24 22:14:02 chevy-sensor kernel: [  687.373748] WARNING: CPU: 0 PID: 749 at /builds/sf-kuro-bsp-1/bastille-networks/bsp/host/build/yocto/poky-jethro-2.0/builds/bastille-sensor-chevy/tmp/work-shared/bastille-sensor-chevy/kernel-source/arch/arm/mm/dma-mapping.c:882 __arm_dma_free+0x144/0x160
Mar 24 22:14:02 chevy-sensor kernel: [  687.373756] Freeing invalid buffer 05e2f7a9
Mar 24 22:14:02 chevy-sensor kernel: [  687.373759] Modules linked in: arc4 iwlmvm mac80211 iwlwifi btusb btrtl btbcm btintel a10_id(O) a10_hwmon(O) ltc6948(O) bard(O)
Mar 24 22:14:02 chevy-sensor kernel: [  687.373791] CPU: 0 PID: 749 Comm: autochange Tainted: G           O      5.1.0 #1
Mar 24 22:14:02 chevy-sensor kernel: [  687.373795] Hardware name: Altera SOCFPGA Arria10
Mar 24 22:14:02 chevy-sensor kernel: [  687.373798] Backtrace: 
Mar 24 22:14:02 chevy-sensor kernel: [  687.373811] [<8010eba4>] (dump_backtrace) from [<8010ee9c>] (show_stack+0x20/0x24)
Mar 24 22:14:02 chevy-sensor kernel: [  687.373818]  r7:00000009 r6:60000013 r5:80c7ad70 r4:00000000
Mar 24 22:14:02 chevy-sensor kernel: [  687.373826] [<8010ee7c>] (show_stack) from [<8087574c>] (dump_stack+0x94/0xa8)
Mar 24 22:14:02 chevy-sensor kernel: [  687.373839] [<808756b8>] (dump_stack) from [<80124f8c>] (__warn+0x104/0x11c)
Mar 24 22:14:02 chevy-sensor kernel: [  687.373845]  r7:00000009 r6:809f1fc0 r5:00000000 r4:ed4b5894
Mar 24 22:14:02 chevy-sensor kernel: [  687.373854] [<80124e88>] (__warn) from [<80124ffc>] (warn_slowpath_fmt+0x58/0x74)
Mar 24 22:14:02 chevy-sensor kernel: [  687.373861]  r9:00002710 r8:6413a000 r7:00001000 r6:80c07c48 r5:809f1fa4 r4:80c07c48
Mar 24 22:14:02 chevy-sensor kernel: [  687.373869] [<80124fa8>] (warn_slowpath_fmt) from [<80119f08>] (__arm_dma_free+0x144/0x160)
Mar 24 22:14:02 chevy-sensor kernel: [  687.373873]  r3:f63c9000 r2:809f1fa4
Mar 24 22:14:02 chevy-sensor kernel: [  687.373877]  r5:f63c9000 r4:80c0d488
Mar 24 22:14:02 chevy-sensor kernel: [  687.373884] [<80119dc4>] (__arm_dma_free) from [<80119f7c>] (arm_dma_free+0x28/0x30)
Mar 24 22:14:02 chevy-sensor kernel: [  687.373889]  r6:f63c9000 r5:ee884878 r4:80119f54
Mar 24 22:14:02 chevy-sensor kernel: [  687.373903] [<80119f54>] (arm_dma_free) from [<8018ab84>] (dma_free_attrs+0xcc/0xf0)
Mar 24 22:14:02 chevy-sensor kernel: [  687.373940] [<8018aab8>] (dma_free_attrs) from [<7f0c7844>] (iwl_pcie_gen2_txq_free_memory+0x54/0x90 [iwlwifi])
Mar 24 22:14:02 chevy-sensor kernel: [  687.373947]  r8:ed4b599c r7:ee884878 r6:00000000 r5:edaa0040 r4:e40ddb00
Mar 24 22:14:02 chevy-sensor kernel: [  687.373984] [<7f0c77f0>] (iwl_pcie_gen2_txq_free_memory [iwlwifi]) from [<7f0c7964>] (iwl_trans_pcie_dyn_txq_alloc_dma+0xe4/0x138 [iwlwifi])
Mar 24 22:14:02 chevy-sensor kernel: [  687.373990]  r7:00000010 r6:e40ddb00 r5:fffffff4 r4:edaa0040
Mar 24 22:14:02 chevy-sensor kernel: [  687.373984] [<7f0c77f0>] (iwl_pcie_gen2_txq_free_memory [iwlwifi]) from [<7f0c7964>] (iwl_trans_pcie_dyn_txq_alloc_dma+0xe4/0x138 [iwlwifi])
Mar 24 22:14:02 chevy-sensor kernel: [  687.373990]  r7:00000010 r6:e40ddb00 r5:fffffff4 r4:edaa0040
Mar 24 22:14:02 chevy-sensor kernel: [  687.374025] [<7f0c7880>] (iwl_trans_pcie_dyn_txq_alloc_dma [iwlwifi]) from [<7f0c7bd8>] (iwl_trans_pcie_dyn_txq_alloc+0x98/0x12c [iwlwifi])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374032]  r10:00000000 r9:ed4b5ba4 r8:7f0c7b40 r7:0000001d r6:edaa0040 r5:80c07c48
Mar 24 22:14:02 chevy-sensor kernel: [  687.374036]  r4:ed4b59a0 r3:00002710
Mar 24 22:14:02 chevy-sensor kernel: [  687.374105] [<7f0c7b40>] (iwl_trans_pcie_dyn_txq_alloc [iwlwifi]) from [<7f2943f4>] (iwl_mvm_tvqm_enable_txq+0x70/0x1d4 [iwlmvm])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374111]  r7:0000000f r6:00000002 r5:ed994e84 r4:edaa0040
Mar 24 22:14:02 chevy-sensor kernel: [  687.374178] [<7f294384>] (iwl_mvm_tvqm_enable_txq [iwlmvm]) from [<7f2950e0>] (iwl_mvm_enable_aux_snif_queue+0xcc/0xd8 [iwlmvm])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374184]  r8:00000000 r7:5f4bfaeb r6:00000001 r5:ed996360 r4:80c07c48
Mar 24 22:14:02 chevy-sensor kernel: [  687.374248] [<7f295014>] (iwl_mvm_enable_aux_snif_queue [iwlmvm]) from [<7f297b08>] (iwl_mvm_add_snif_sta+0xa0/0xc4 [iwlmvm])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374253]  r7:ebcc78fc r6:ebcc78fc r5:00000000 r4:ed994e84
Mar 24 22:14:02 chevy-sensor kernel: [  687.374316] [<7f297a68>] (iwl_mvm_add_snif_sta [iwlmvm]) from [<7f280390>] (__iwl_mvm_assign_vif_chanctx+0x1b8/0x1f0 [iwlmvm])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374321]  r6:00000000 r5:ed994e84 r4:ebcc77b8
Mar 24 22:14:02 chevy-sensor kernel: [  687.374384] [<7f2801d8>] (__iwl_mvm_assign_vif_chanctx [iwlmvm]) from [<7f28063c>] (iwl_mvm_assign_vif_chanctx+0x44/0x58 [iwlmvm])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374391]  r9:ed4b5ba4 r8:00000000 r7:e40dd1b0 r6:ebcc77b8 r5:ed994e9c r4:ed994e84
Mar 24 22:14:02 chevy-sensor kernel: [  687.374514] [<7f2805f8>] (iwl_mvm_assign_vif_chanctx [iwlmvm]) from [<7f15258c>] (ieee80211_assign_vif_chanctx+0x120/0x538 [mac80211])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374520]  r7:e40dd180 r6:7f2805f8 r5:ed9944a0 r4:ebcc7000
Mar 24 22:14:02 chevy-sensor kernel: [  687.374611] [<7f15246c>] (ieee80211_assign_vif_chanctx [mac80211]) from [<7f153c3c>] (ieee80211_vif_use_channel+0x140/0x268 [mac80211])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374618]  r10:00000001 r9:ed4b5ba4 r8:00000000 r7:ed9944a0 r6:ed994c64 r5:ebcc7000
Mar 24 22:14:02 chevy-sensor kernel: [  687.374621]  r4:e40dd180
Mar 24 22:14:02 chevy-sensor kernel: [  687.374713] [<7f153afc>] (ieee80211_vif_use_channel [mac80211]) from [<7f132df4>] (ieee80211_set_monitor_channel+0xa8/0x110 [mac80211])
Mar 24 22:14:02 chevy-sensor kernel: [  687.374721]  r10:00000006 r9:ed4b5c9c r8:ed994000 r7:ebcc7000 r6:ed4b5ba4 r5:ed994b78
Mar 24 22:14:02 chevy-sensor kernel: [  687.374724]  r4:ed9941a0 r3:ebd19f40
Mar 24 22:14:02 chevy-sensor kernel: [  687.374780] [<7f132d4c>] (ieee80211_set_monitor_channel [mac80211]) from [<8084e604>] (cfg80211_set_monitor_channel+0x90/0x1d4)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374786]  r7:ed4b5ba4 r6:ed9941a0 r5:ed994000 r4:80c07c48
Mar 24 22:14:02 chevy-sensor kernel: [  687.374802] [<8084e574>] (cfg80211_set_monitor_channel) from [<8083dd8c>] (__nl80211_set_channel+0x1c0/0x2c4)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374807]  r7:ee326000 r6:ed4b5ba4 r5:ee326508 r4:80c07c48
Mar 24 22:14:02 chevy-sensor kernel: [  687.374816] [<8083dbcc>] (__nl80211_set_channel) from [<8083e19c>] (nl80211_set_wiphy+0x2e4/0xaec)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374823]  r10:ed994000 r9:ed4b5c9c r8:ee326000 r7:00000000 r6:ed02f014 r5:80a8fe74
Mar 24 22:14:02 chevy-sensor kernel: [  687.374826]  r4:8096ae40
Mar 24 22:14:02 chevy-sensor kernel: [  687.374840] [<8083deb8>] (nl80211_set_wiphy) from [<806b2914>] (genl_rcv_msg+0x254/0x468)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374847]  r10:ed4b5cfc r9:80c65100 r8:eb3a50c0 r7:80c07c48 r6:ed02f014 r5:80a8fe74
Mar 24 22:14:02 chevy-sensor kernel: [  687.374850]  r4:8096ae40
Mar 24 22:14:02 chevy-sensor kernel: [  687.374859] [<806b26c0>] (genl_rcv_msg) from [<806b1918>] (netlink_rcv_skb+0xec/0x120)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374866]  r10:00000000 r9:ed4b5d68 r8:ed02f000 r7:806b26c0 r6:80c07c48 r5:eb3a50c0
Mar 24 22:14:02 chevy-sensor kernel: [  687.374868]  r4:00000000
Mar 24 22:14:02 chevy-sensor kernel: [  687.374876] [<806b182c>] (netlink_rcv_skb) from [<806b26b0>] (genl_rcv+0x34/0x44)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374882]  r8:0000003c r7:ee9e6c64 r6:eb3a50c0 r5:eb3a50c0 r4:80c69170
Mar 24 22:14:02 chevy-sensor kernel: [  687.374890] [<806b267c>] (genl_rcv) from [<806b1074>] (netlink_unicast+0x1c0/0x254)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374894]  r5:eae3f800 r4:ee9e6c00
Mar 24 22:14:02 chevy-sensor kernel: [  687.374901] [<806b0eb4>] (netlink_unicast) from [<806b1484>] (netlink_sendmsg+0x2b0/0x36c)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374908]  r10:00000000 r9:00000000 r8:0000003c r7:eae3f800 r6:eb3a50c0 r5:80c07c48
Mar 24 22:14:02 chevy-sensor kernel: [  687.374911]  r4:ed4b5f44
Mar 24 22:14:02 chevy-sensor kernel: [  687.374921] [<806b11d4>] (netlink_sendmsg) from [<80649500>] (sock_sendmsg+0x24/0x34)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374927]  r10:00000000 r9:00000000 r8:eb45ea80 r7:00000000 r6:00000000 r5:80c07c48
Mar 24 22:14:02 chevy-sensor kernel: [  687.374930]  r4:ed4b5f44
Mar 24 22:14:02 chevy-sensor kernel: [  687.374937] [<806494dc>] (sock_sendmsg) from [<80649d9c>] (___sys_sendmsg+0x23c/0x250)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374945] [<80649b60>] (___sys_sendmsg) from [<8064b0a4>] (__sys_sendmsg+0x60/0x9c)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374952]  r10:00000128 r9:ed4b4000 r8:80101228 r7:eb45ea80 r6:00000000 r5:7ea8e260
Mar 24 22:14:02 chevy-sensor kernel: [  687.374955]  r4:80c07c48
Mar 24 22:14:02 chevy-sensor kernel: [  687.374962] [<8064b044>] (__sys_sendmsg) from [<8064b0fc>] (sys_sendmsg+0x1c/0x20)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374967]  r7:00000128 r6:76f0f000 r5:000477d0 r4:00048920
Mar 24 22:14:02 chevy-sensor kernel: [  687.374976] [<8064b0e0>] (sys_sendmsg) from [<80101000>] (ret_fast_syscall+0x0/0x54)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374980] Exception stack(0xed4b5fa8 to 0xed4b5ff0)
Mar 24 22:14:02 chevy-sensor kernel: [  687.374987] 5fa0:                   00048920 000477d0 00000003 7ea8e260 00000000 00000000
Mar 24 22:14:02 chevy-sensor kernel: [  687.374995] 5fc0: 00048920 000477d0 76f0f000 00000128 7ea8e260 00047860 76f4c000 00000000
Mar 24 22:14:02 chevy-sensor kernel: [  687.375000] 5fe0: 00000000 7ea8e1dc 76ef715f 76e93a38
Mar 24 22:14:02 chevy-sensor kernel: [  687.375027] ---[ end trace 56708eec0258f6a4 ]---
Mar 24 22:14:02 chevy-sensor kernel: [  687.425132] ------------[ cut here ]------------


SYSTEM INFORMATION:

I've provided a file in the attached tarball titled `system-info.txt` that I believe provides a decent overview of the platform being used to exercise the iwlwifi driver and AX200NGW card. Please let me know any more information that may be helpful and I'll send that right over!

Brief:
- OS: Ubuntu 16.04.4 LTS
- ARCH: ARMv7
- KERNEL: 5.1.0 (Altera SOCFPGA Fork https://github.com/altera-opensource/linux-socfpga/commit/5c21be71d13b64bc4cce9b1bd8411d41d15b1a6a)
- FIRMWARE: iwlwifi-cc-a0-46.177b3e46.0.ucode
- WLAN CARD: Intel AX200NGW


REPRODUCTION:

We are able to reliably crash the iwlwifi / pcie / and/or firmware with a simple script, provided in the attached tarball as `wifi_chan_switch_causes_crash.sh`

This script brings up the wireless interface in monitor mode using the `iw` utility, then switches channels over and over again. No data capture is necessary to reproduce the crash. Simply toggling the channels is enough to expose what seems to be some sort of DMA bug.

Additional comments: Power cycling the system seems to be the only solution to reset the state of the driver. Reloading the iwlwifi, iwlmvm drivers does not help, and neither does resetting the PCIe interface via sysfs.

FIRMWARE:

We are using the latest firmware (iwlwifi-cc-a0-46.177b3e46.0.ucode) from the linux-firmware repo here: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

The latest .tgz linked on the iwlwifi drivers page: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi (iwlwifi-cc-46.3cfab8da.0.tgz) was also tried, but exhibited the same problems.

KERNEL LOGGING:

Have provided dmesg output both before, and after the crash has occurred, both files are provided in the attached tarball as `dmesg.pre-crash.log` and `dmesg.post-crash.log`

IWLWIFI DEBUGGING:

The Kernel being used has CONFIG_IWLWIFI_DEBUG, CONFIG_IWLWIFI_DEBUGFS, and CONFIG_IWLWIFI_TRACING all enabled. Debug levels were not set, as per directions in the debugging page. Please let me know which bitmap flags, or debug levels to enable and I can reproduce the issue with those and send over kernel logs.

TRACING:

DID NOT INCLUDE THIS TRACE.dat BECAUSE IT IS 32MiB COMPRESSED AND TOO LARGE TO ATTACH. Please let me know if a trace would be helpful and I can perhaps use less trace-cmd flags.

Performed the iwlwifi tracing using the provided command:

`sudo trace-cmd record -e iwlwifi -e mac80211 -e cfg80211 -e iwlwifi_msg`

I started the trace, then kicked off my script in another terminal that reproduces the crash, and then terminated the trace once the crash occured.

Trace file provided in the attached tarball: `trace.dat`

I did not perform a firmware dump because there was not a debug firmware provided for the AX200 on this page:

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging

Please let me know if that will be necessary and I can follow the steps with any provided debug firmware.
Comment 1 Jeff Schuler 2020-03-25 19:30:38 UTC
Created attachment 288059 [details]
picture of the wifi card
Comment 2 Jeff Schuler 2020-03-25 20:11:43 UTC
Created attachment 288061 [details]
firmware trace for iwlwifi

Started trace capture prior to running reproduction script test, stopped trace capture once iwlwifi crashed. `sudo trace-cmd record -e iwlwifi`
Comment 3 Michael 2020-03-26 14:24:17 UTC
I can confirm this issue, because I received a similar issue report here:
https://github.com/ZerBea/hcxdumptool/issues/105
Switching channels here (and we are doing this really fast):
https://github.com/ZerBea/hcxdumptool/blob/master/hcxdumptool.c#L3812
causes a crash.
Comment 4 thomas.f.steeples 2020-03-26 14:42:59 UTC
(In reply to Michael from comment #3)
> I can confirm this issue, because I received a similar issue report here:
> https://github.com/ZerBea/hcxdumptool/issues/105
> Switching channels here (and we are doing this really fast):
> https://github.com/ZerBea/hcxdumptool/blob/master/hcxdumptool.c#L3812
> causes a crash.

This is me. Running Arch with kernel 5.5.11, x86_64 architecture. Same card and firmware.
Comment 5 Michael 2020-03-26 15:14:30 UTC
@thomas - race condition while reporting the issue.
Anyway, I can confirm it due to hcxdumptool debug printf().
Stay time on an empty channel before we try to set a new channel is 200000usec - enough to crash the driver/firmware.
You can increase/decrease it here:
https://github.com/ZerBea/hcxdumptool/blob/master/include/hcxdumptool.h#L55
Stay time on a busy channel is one second.

BTW:
iw use NETLINK (--debug), while hcxdumptool is running pure ioctl() system calls.
Comment 6 Jeff Schuler 2020-03-26 17:12:57 UTC
Yeah, Thomas, we were not able to reproduce this on x86_64. However, we did find the bug. It's a memory leak in iwlmvm. Initial testing looks promising, after we have completed our diligence and are happy with the solution, will post a patch here!
Comment 7 Jeff Schuler 2020-03-30 22:54:58 UTC
Created attachment 288131 [details]
solution patch

Patching the memory leak. The PCIe Tx Q is allocated over and over again and never freed! Intel should probably fix the root of this problem as I doubt this is the 100% best place to do this...