Bug 212059

Summary: ath11k: qca6390: firmware crashes with 160 MHz channels
Product: Drivers Reporter: Sven (mail)
Component: network-wirelessAssignee: Kalle Valo (kvalo)
Status: RESOLVED CODE_FIX    
Severity: normal CC: jgehrcke, kvalo, mail
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.11.2 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output

Description Sven 2021-03-04 19:35:22 UTC
Created attachment 295649 [details]
dmesg output

Hi,
thank you for adding support for the ath11k_pci driver support!

I've a Dell XPS 13 9310 with a Killer Wi-Fi 6 AX500-DBS pci card running ArchLinux. Connecting to a 2.4 GHz WiFi works fine with the recent kernel version 5.11.2. 

However, if I try to connect to a 5 GHz-only network, the firmware crashes with the following error:
firmware crashed: MHI_CB_SYS_ERROR

Thanks,
Sven
Comment 1 Sven 2021-03-04 19:41:47 UTC
Some additional information:

The bug occurs every time I try to connect to the 5 GHz WiFi but never on the 2.4 GHz WiFi (same WiFi access point).

I've added the workaround for the memory under 32M limit to my Linux cmdline:
memmap=12M$20M
Comment 2 Kalle Valo 2021-03-05 13:52:20 UTC
Does the AP have 160 MHz channels enabled? There is a know crash with that, you could try reverting:

3579994476b6 wireless: fix wrong 160/80+80 MHz setting
Comment 3 Sven 2021-03-05 21:39:12 UTC
Yes, my AP has 160 MHz channels enabled. It has no option to disable 160 MHz channels, so I've switched from 802.11n+ac+ax (Wi-Fi 6) mode to 802.11a+n (Wi-Fi 4) mode and the problem was gone.

I've started recompiling the kernel without the commit 3579994476b6 and will try it tomorrow.
Comment 4 Kalle Valo 2021-03-06 05:10:52 UTC
Thanks, let me know how the revert works. Changing the title to better reflect the bug.
Comment 5 Sven 2021-03-06 09:12:31 UTC
The kernel build with the revert works fine. I've no problem connecting to the 802.11n+ac+ax 5 GHz access point anymore.
Comment 6 Dr. Jan-Philip Gehrcke 2021-08-01 17:07:30 UTC
Is this problem expected to be present in kernel 5.13.5? See https://bugzilla.kernel.org/show_bug.cgi?id=213937 which might actually be a duplicate of this ticket here.
Comment 7 Sven 2021-08-01 19:01:01 UTC
The last time I had this bug was on kernel version 5.13.4-arch2-1 (official Arch Linux kernel). I had this issue every time I connected to my 5 GHz-only network, too. With kernel 5.13.5-arch1-1 and kernel 5.13.6-arch1-1 it couldn't reproduce the bug, yet. However, I've encountered a CPU soft lockup with recent versions. This problem is mentioned in the bug report [1], too. So I will compile a vanilla/mainline kernel and if I can reproduce the problem with it.

$ journalctl | grep ath11k_mac_op_unassign_vif_chanctx -A 10
[...]
--
Jul 25 21:46:24 sven-xps kernel: WARNING: CPU: 2 PID: 800 at drivers/net/wireless/ath/ath11k/mac.c:5582 ath11k_mac_op_unassign_vif_chanctx+0x182/0x280 [ath11k]
Jul 25 21:46:24 sven-xps kernel: Modules linked in: ccm michael_mic xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter snd_seq_dummy snd_hrtimer snd_seq snd_seq_device xt_addrtype nft_compat nf_tables libcrc32c br_netfilter bridge stp llc overlay rfcomm cmac algif_hash algif_skcipher af_alg snd_ctl_led snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic nfnetlink bnep hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio snd_soc_dmic hid_sensor_hub cros_ec_ishtp joydev cros_ec intel_ishtp_loader mousedev qrtr_mhi intel_ishtp_hid snd_sof_pci_intel_tgl 8250_dw snd_sof_intel_hda_common iTCO_wdt soundwire_intel intel_pmc_bxt wacom hid_multitouch iTCO_vendor_support soundwire_generic_allocation intel_tcc_cooling usbhid soundwire_cadence x86_pkg_temp_thermal dell_laptop intel_powerclamp intel_pmt_telemetry coretemp mei_hdcp intel_pmt_class
Jul 25 21:46:24 sven-xps kernel:  intel_rapl_msr dell_smm_hwmon snd_sof_intel_hda kvm_intel snd_sof_pci snd_sof_xtensa_dsp kvm snd_sof snd_soc_hdac_hda snd_hda_ext_core irqbypass snd_soc_acpi_intel_match intel_cstate snd_soc_acpi intel_uncore soundwire_bus ledtrig_audio snd_soc_core qrtr ns snd_compress pcspkr psmouse ac97_bus ath11k_pci snd_pcm_dmaengine ath11k snd_hda_intel snd_intel_dspcfg dell_wmi snd_intel_sdw_acpi dell_smbios qmi_helpers dell_wmi_sysman snd_hda_codec dcdbas snd_hda_core mac80211 dell_wmi_descriptor wmi_bmof snd_hwdep snd_pcm intel_spi_pci snd_timer intel_spi i2c_i801 spi_nor snd mtd i2c_smbus soundcore cfg80211 uvcvideo mhi videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 libarc4 videobuf2_common videodev mei_me intel_lpss_pci vfat intel_lpss fat mei i915 mc idma64 intel_ish_ipc i2c_algo_bit tpm_crb intel_ishtp thunderbolt intel_pmt drm_kms_helper hci_uart cec processor_thermal_device ucsi_acpi processor_thermal_rfim intel_gtt typec_ucsi btqca processor_thermal_mbox syscopyarea btrtl
Jul 25 21:46:24 sven-xps kernel:  processor_thermal_rapl sysfillrect typec tpm_tis intel_rapl_common sysimgblt btbcm intel_soc_dts_iosf fb_sys_fops roles tpm_tis_core btintel bluetooth mac_hid ecdh_generic rfkill i2c_hid_acpi ecc i2c_hid int3403_thermal int340x_thermal_zone intel_hid sparse_keymap acpi_tad int3400_thermal acpi_thermal_rel acpi_pad drm sg crypto_user fuse agpgart bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys dm_mod trusted asn1_encoder tee tpm rng_core rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel rtsx_pci xhci_pci xhci_pci_renesas wmi i8042 serio video aesni_intel crypto_simd cryptd
Jul 25 21:46:24 sven-xps kernel: CPU: 2 PID: 800 Comm: wpa_supplicant Not tainted 5.13.4-arch2-1 #1
Jul 25 21:46:24 sven-xps kernel: Hardware name: Dell Inc. XPS 13 9310/0MRT12, BIOS 2.2.0 04/06/2021
Jul 25 21:46:24 sven-xps kernel: RIP: 0010:ath11k_mac_op_unassign_vif_chanctx+0x182/0x280 [ath11k]
Jul 25 21:46:24 sven-xps kernel: Code: 8b 83 e0 02 00 00 4c 89 e1 be 10 00 00 00 4c 89 ef 48 c7 c2 30 61 2e c1 e8 2b 6c 01 00 80 bb 98 03 00 00 00 0f 85 c3 fe ff ff <0f> 0b e9 bc fe ff ff f0 80 a5 d8 16 00 00 fe f6 05 78 98 05 00 10
Jul 25 21:46:24 sven-xps kernel: RSP: 0018:ffffa55d80f777e0 EFLAGS: 00010246
Jul 25 21:46:24 sven-xps kernel: RAX: 0000000000000000 RBX: ffff98ecd944d570 RCX: 0000000000000000
Jul 25 21:46:24 sven-xps kernel: RDX: ffff98ecc6cd4b80 RSI: ffff98ecd944d570 RDI: ffff98ecde91b658
Jul 25 21:46:24 sven-xps kernel: RBP: ffff98ecd944c940 R08: ffff98ecd944c940 R09: ffffa55d80f77628
Jul 25 21:46:24 sven-xps kernel: R10: ffffa55d80f77620 R11: ffffffff896ccde8 R12: ffff98ee088dc598
Jul 25 21:46:24 sven-xps kernel: R13: ffff98ecd6520000 R14: ffff98ecde919f40 R15: ffff98ecde91b658
Jul 25 21:46:24 sven-xps kernel: FS:  00007f0d9218b7c0(0000) GS:ffff98f42f680000(0000) knlGS:0000000000000000
Jul 25 21:46:24 sven-xps kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 25 21:46:24 sven-xps kernel: CR2: 00002b4081f66000 CR3: 000000011cc94003 CR4: 0000000000770ee0


[1] "watchdog: BUG: soft lockup - CPU#X stuck for", see comment #200069 from Bjørn Snoen (Monday, 24 May 2021, 21:20 GMT): https://bugs.archlinux.org/task/69223#comment200069
Comment 8 Kalle Valo 2021-11-20 17:11:26 UTC
*** Bug 213937 has been marked as a duplicate of this bug. ***
Comment 9 Kalle Valo 2021-11-20 17:13:38 UTC
I was told that this cfg80211 commit should fix this:

https://git.kernel.org/linus/e6ed929b4140

It was introduced in v5.14-rc1, but I do not know if it was ported to older stable releases. Marking the bug fixed.