Bug 209491 - Background scans performed by Network Manager cause frequent WCN3990 firmware crashes and ath10k_snoc misbehavior
Summary: Background scans performed by Network Manager cause frequent WCN3990 firmware...
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: ARM Linux
: P1 high
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-03 23:37 UTC by RussianNeuroMancer
Modified: 2021-07-04 05:15 UTC (History)
0 users

See Also:
Kernel Version: 5.8.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Linux 5.8.0 dmesg (95.61 KB, text/plain)
2020-10-03 23:37 UTC, RussianNeuroMancer
Details
failed recovery dmesg (246.55 KB, text/plain)
2020-10-03 23:38 UTC, RussianNeuroMancer
Details

Description RussianNeuroMancer 2020-10-03 23:37:13 UTC
Hello!

I found that background scans performed by default by Network Manager (more info: https://blogs.gnome.org/dcbw/2016/05/16/networkmanager-and-wifi-scans/ ) cause WCN3990 firmware crash. Firmware crash could cause different ath10k_snoc behavior: 

1. WCN3990 successfully recovered and normal operation continue. Please see lines 1025-1068 in attached dmesg58.log

2. WCN3990 successfully recovered and ath10k_snoc tells system that it's still connected, but no traffic coming through WiFi connection. I have to reconnect to get WiFi working again or reload driver kernel module. Please see lines 1076-1125 in attached dmesg58.log (at line 1125 I give up waiting and decided to reconnect).

3. WCN3990 failed to recover, WiFi connection get dropped, and continuous stream of following error into system logs eats up all free space on UFS. This error literally eats gigabytes of disk space (in syslog, kernel.log and journalctl at the same time) in a few minutes:

[ 3845.586537] ieee80211_restart_work called with hardware scan in progress
[ 3845.586648] WARNING: CPU: 7 PID: 11742 at net/mac80211/main.c:260 ieee80211_restart_work+0xf4/0x100 [mac80211]
[ 3845.586650] Modules linked in: rfcomm fuse xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter bridge stp llc af_alg bnep q6asm_dai q6afe_dai q6routing q6asm q6adm q6dsp_common snd_soc_wsa881x regmap_sdw snd_soc_wcd934x soundwire_qcom gpio_wcd934x venus_enc wcd934x venus_dec regmap_slimbus videobuf2_dma_sg qrtr_smd msm qcom_spmi_adc5 venus_core qcom_vadc_common v4l2_mem2mem uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common hci_uart btqca btbcm snd_soc_sdm845 qcom_spmi_temp_alarm bluetooth snd_soc_rt5663 snd_soc_qcom_common videodev q6afe reset_qcom_pdc q6core ecdh_generic apr pdr_interface ti_sn65dsi86 mc snd_soc_rl6231 ecc hid_multitouch crct10dif_ce some_battery soundwire_bus drm_kms_helper qcom_rng ath10k_snoc ath10k_core ath mac80211 libarc4 cfg80211 slim_qcom_ngd_ctrl socinfo rfkill slimbus icc_osm_l3 qrtr
[ 3845.586708]  rmtfs_mem ns qnoc_sdm845 icc_rpmh icc_bcm_voter qcom_q6v5_pas ip_tables x_tables ipv6 nf_defrag_ipv6 btrfs blake2b_generic libcrc32c xor xor_neon zstd_decompress zstd_compress raid6_pq i2c_hid camcc_sdm845 i2c_qcom_geni qcom_q6v5_mss mdt_loader phy_qcom_qusb2 qcom_common qcom_glink_smem qcom_q6v5_ipa_notify qcom_sysmon qmi_helpers qcom_q6v5 panel_simple drm
[ 3845.586745] CPU: 7 PID: 11742 Comm: kworker/7:1 Not tainted 5.8.0-050800-generic #202007122030
[ 3845.586747] Hardware name: LENOVO 81JL/LNVNB161216, BIOS 9UCN33WW(V2.06) 06/ 4/2019
[ 3845.586775] Workqueue: events_freezable ieee80211_restart_work [mac80211]
[ 3845.586780] pstate: 80c00005 (Nzcv daif +PAN +UAO BTYPE=--)
[ 3845.586805] pc : ieee80211_restart_work+0xf4/0x100 [mac80211]
[ 3845.586827] lr : ieee80211_restart_work+0xf4/0x100 [mac80211]
[ 3845.586829] sp : ffff8000178c3d70
[ 3845.586831] x29: ffff8000178c3d70 x28: ffffd337f1357000 
[ 3845.586833] x27: ffff0000bdf5c748 x26: ffffd337f157ccf0 
[ 3845.586836] x25: 0000000000000000 x24: ffff0001efda1d20 
[ 3845.586838] x23: ffff0001fb8c1900 x22: ffff0001efda1d18 
[ 3845.586841] x21: ffff0001efda07e0 x20: ffff0000c8861b00 
[ 3845.586843] x19: ffff0001efda1d18 x18: 0000000000000000 
[ 3845.586845] x17: 0000000000000000 x16: ffffd337ef98b928 
[ 3845.586847] x15: ffffd337f1371000 x14: ffffd337f1581090 
[ 3845.586850] x13: 0000000000007428 x12: ffffd337f1580000 
[ 3845.586852] x11: ffffd337f1371000 x10: ffffd337f15806d8 
[ 3845.586854] x9 : 0000000000000001 x8 : 0000000000000000 
[ 3845.586856] x7 : 0000000000000004 x6 : 000000000000068a 
[ 3845.586859] x5 : 0000000000000001 x4 : 0000000000000000 
[ 3845.586861] x3 : 0000000000000001 x2 : ffff0001fb8b41f0 
[ 3845.586863] x1 : b50adb3cc93bdf00 x0 : 0000000000000000 
[ 3845.586866] Call trace:
[ 3845.586889]  ieee80211_restart_work+0xf4/0x100 [mac80211]
[ 3845.586900]  process_one_work+0x1bc/0x338
[ 3845.586903]  worker_thread+0x50/0x420
[ 3845.586906]  kthread+0x130/0x148
[ 3845.586912]  ret_from_fork+0x10/0x34

When this happened it's become impossible to close applications that access network, perform reboot or shutdown, as shutdown always get stuck on stopping network-related system daemons.
dmesg_failed_recovery.log is attached; repeating part was cut out to make file smaller.

It's seems like disabling Network Manager background scans allow to avoid WCN3990 firmware crashes, or at least workaround issues described in Item 2 and Item 3.

Tested kernels:
https://github.com/andersson/kernel/commits/wip/c630-5.7
https://github.com/andersson/kernel/commits/wip/c630-5.8
https://github.com/steev/linux/commits/c630-5.8-rc4-inline-encryption

Issue is reproducible with following 5 GHz access points:

    QCA9880-BR4A (wireless adapter installed into SolidRun ClearFog Pro, working in WiFi-5 mode)
    ASUS RT-N56U A1 with Ralink RT3662F wireless adapter
    ASUS RT-N56U B1 with MediaTek MT7612EN wireless adapter

Issue is reproducible with following 2.4 GHz access points:

    MSM8996 (several different SD820-based smartphones in Access Point mode)
    ASUS RT-N56U A1 with Ralink RT3092L wireless adapter
    NanoPi-R1 with Ampak AP6212 wireless adapter

linux-firmware package version is 1.188, wlanmdsp.mbn sha256sum is 92e1501254e6de78c0f2e2cf091507d488b608d07e53acd14813a82744823ec2
board-2.bin was generated by this script: https://github.com/Celliwig/Lenovo-Yoga-c630/blob/master/yoga_fw_extract/yoga_fw_extract.sh

Originally reported here:
https://github.com/aarch64-laptops/build/issues/24
https://github.com/aarch64-laptops/build/issues/51
Comment 1 RussianNeuroMancer 2020-10-03 23:37:45 UTC
Created attachment 292799 [details]
Linux 5.8.0 dmesg
Comment 2 RussianNeuroMancer 2020-10-03 23:38:07 UTC
Created attachment 292801 [details]
failed recovery dmesg
Comment 3 RussianNeuroMancer 2021-07-04 05:15:13 UTC
Its seems like this issue is not reproducible anymore with Linux 5.12.5, closing for now.

Note You need to log in before you can comment on or make changes to this bug.