Bug 215504

Summary: ath11k: wcn6855: kernel oops on suspend without firmware
Product: Drivers Reporter: Mario Limonciello (AMD) (mario.limonciello)
Component: network-wirelessAssignee: Kalle Valo (kvalo)
Status: RESOLVED CODE_FIX    
Severity: normal CC: bihagkashikar, kvalo
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.16.0 Subsystem:
Regression: No Bisected commit-id:

Description Mario Limonciello (AMD) 2022-01-18 22:37:25 UTC
I am finding that with kernel 5.16.0 with no firmware in place ath11k will cause a kernel oops during suspend when an wcn6855 is present in the system.

Here is the initialization information from ath11k:

ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0xb0000000-0xb01fffff 64bit]
ath11k_pci 0000:01:00.0: wcn6855 hw2.0
ath11k_pci 0000:01:00.0: chip_id 0x2 chip_family 0xb board_id 0xff soc_id 0x400c0200
ath11k_pci 0000:01:00.0: fw_version 0x11080bbb fw_build_timestamp 2021-12-16 03:42 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2
ath11k_pci 0000:01:00.0: failed to fetch board data for bus=pci,qmi-chip-id=2,qmi-board-id=255 from ath11k/WCN6855/hw2.0/board-2.bin
ath11k_pci 0000:01:00.0: failed to fetch board-2.bin or board.bin from WCN6855/hw2.0
ath11k_pci 0000:01:00.0: qmi failed to fetch board file: -2
ath11k_pci 0000:01:00.0: failed to load board data file: -2

Here is the OOPS that occurs during suspend:

<6>[  473.653007] PM: suspend entry (s2idle)
<6>[  473.660770] Filesystems sync: 0.007 seconds
<7>[  473.660772] PM: Preparing system for sleep (s2idle)
<6>[  473.662298] Freezing user space processes ... (elapsed 0.029 seconds) done.
<6>[  473.692108] OOM killer disabled.
<6>[  473.692109] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
<7>[  473.693286] PM: Suspending system (s2idle)
<6>[  473.693291] printk: Suspending console(s) (use no_console_suspend to debug)
<1>[  474.407787] BUG: unable to handle page fault for address: 0000000000002070
<1>[  474.407791] #PF: supervisor read access in kernel mode
<1>[  474.407794] #PF: error_code(0x0000) - not-present page
<6>[  474.407798] PGD 0 P4D 0 
<4>[  474.407801] Oops: 0000 [#1] PREEMPT SMP NOPTI
<4>[  474.407805] CPU: 2 PID: 2350 Comm: kworker/u32:14 Tainted: G        W         5.16.0 #248
<4>[  474.407810] Hardware name: <sanitized>
dmesg-efi-164255130402001:
Oops#1 Part2
<4>[  474.407813] Workqueue: events_unbound async_run_entry_fn
<4>[  474.407826] RIP: 0010:ath11k_dp_rx_process_mon_rings+0x6e/0x5b0 [ath11k]
<4>[  474.407847] Code: 31 c0 48 85 d2 74 0b 44 89 ee 4c 89 ff 0f ae e8 ff d2 48 98 48 89 c2 48 c1 e2 04 48 01 d0 48 c1 e0 05 49 8b 84 04 b0 82 00 00 <48> 8b 80 70 20 00 00 a8 02 0f 84 83 00 00 00 49 8b 84 24 40 07 01
<4>[  474.407849] RSP: 0018:ffffb26580c1fc40 EFLAGS: 00010256
<4>[  474.407853] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080
<4>[  474.407855] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8d19c9990718
<4>[  474.407856] RBP: ffffb26580c1fce8 R08: 0000000000000001 R09: 0000000000000001
<4>[  474.407858] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d19c9980000
<4>[  474.407859] R13: 0000000000000000 R14: 0000000000000080 R15: ffff8d19c9990718
<4>[  474.407861] FS:  0000000000000000(0000) GS:ffff8d1d0e680000(0000) knlGS:0000000000000000
<4>[  474.407864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  474.407865] CR2: 0000000000002070 CR3: 0000000424610000 CR4: 0000000000350ee0
<4>[  474.407868] Call Trace:
<4>[  474.407870]  <TASK>
<4>[  474.407874]  ? _raw_spin_lock_irqsave+0x2a/0x60
<4>[  474.407882]  ? lock_timer_base+0x72/0xa0
<4>[  474.407889]  ? _raw_spin_unlock_irqrestore+0x29/0x3d
<4>[  474.407892]  ? try_to_del_timer_sync+0x54/0x80
<4>[  474.407896]  ath11k_dp_rx_pktlog_stop+0x49/0xc0 [ath11k]
<4>[  474.407912]  ath11k_core_suspend+0x34/0x130 [ath11k]
<4>[  474.407923]  ath11k_pci_pm_suspend+0x1b/0x50 [ath11k_pci]
<4>[  474.407928]  pci_pm_suspend+0x7e/0x170
<4>[  474.407935]  ? pci_pm_freeze+0xc0/0xc0
<4>[  474.407939]  dpm_run_callback+0x4e/0x150
<4>[  474.407947]  __device_suspend+0x148/0x4c0
<4>[  474.407951]  async_suspend+0x20/0x90
dmesg-efi-164255130401001:
Oops#1 Part1
<4>[  474.407955]  async_run_entry_fn+0x33/0x120
<4>[  474.407959]  process_one_work+0x220/0x3f0
<4>[  474.407966]  worker_thread+0x4a/0x3d0
<4>[  474.407971]  kthread+0x17a/0x1a0
<4>[  474.407975]  ? process_one_work+0x3f0/0x3f0
<4>[  474.407979]  ? set_kthread_struct+0x40/0x40
<4>[  474.407983]  ret_from_fork+0x22/0x30
<4>[  474.407991]  </TASK>
<4>[  474.407992] Modules linked in: rfcomm hid_logitech_hidpp joydev input_leds hid_logitech_dj r8152 cdc_mbim cdc_wdm cdc_ncm cdc_ether uas usbnet usb_storage mii usbhid qrtr_mhi cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc nls_iso8859_1 amdgpu snd_ctl_led ath11k_pci iommu_v2 ath11k snd_hda_codec_realtek gpu_sched snd_hda_codec_generic qmi_helpers snd_hda_codec_hdmi kvm drm_ttm_helper uvcvideo crct10dif_pclmul ttm videobuf2_vmalloc videobuf2_memops mac80211 serio_raw snd_seq_midi videobuf2_v4l2 drm_kms_helper snd_hda_intel snd_seq_midi_event snd_intel_dspcfg nvram videobuf2_common efi_pstore snd_rawmidi ledtrig_audio cec snd_hda_codec libarc4 platform_profile i2c_algo_bit videodev fb_sys_fops snd_hda_core mc snd_seq syscopyarea snd_hwdep cfg80211 sysfillrect sysimgblt snd_pcm snd_seq_device rfkill snd_timer mhi snd_rn_pci_acp3x ccp snd_pci_acp3x k10temp snd soundcore ucsi_acpi video typec_ucsi hid_sensor_accel_3d
<4>[  474.408084]  hid_sensor_magn_3d hid_sensor_gyro_3d typec hid_sensor_trigger wmi industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common acpi_tad amd_pmc industrialio sch_fq_codel drm msr ip_tables x_tables autofs4 hid_sensor_hub hid_generic crc32_pclmul crc32c_intel i2c_piix4 amd_sfh xhci_pci xhci_pci_renesas nvme nvme_core i2c_hid_acpi i2c_hid hid
<4>[  474.408117] CR2: 0000000000002070
<4>[  474.408120] ---[ end trace 86046875c44e0ef2 ]---
Comment 1 Mario Limonciello (AMD) 2022-01-20 18:32:56 UTC
FWIW I did have the firmware in place - but it wasn't getting loaded.  Perhaps because https://github.com/torvalds/linux/commit/fc95d10ac41d75c14a81afcc8722333d8b2cf80f isn't in 5.16.0.  Maybe a good stable candidate?
Comment 3 Mario Limonciello (AMD) 2022-01-27 16:38:35 UTC
Great, thanks!