Created attachment 298477 [details] mainline kernel 5.14-rc4 dmesg When resume, ath11k reports error, mhi is in MHI_STATE_RESET than M3. [ 46.003087] ath11k_pci 0000:03:00.0: failed to set mhi state: RESUME(6) [ 46.003796] mhi-pci-generic 0000:05:00.0: failed to resume device: -22 [ 46.003814] mhi-pci-generic 0000:05:00.0: device recovery started Even disable runtime suspend, it will show the same error. fw is from https://github.com/kvalo/ath11k-firmware.git
Created attachment 298479 [details] lspci
This is with wcn6855: [ 6.243697] ath11k_pci 0000:03:00.0: wcn6855 hw2.0 [ 7.327423] ath11k_pci 0000:03:00.0: chip_id 0x2 chip_family 0xb board_id 0xff soc_id 0x400c0200 [ 7.327432] ath11k_pci 0000:03:00.0: fw_version 0x110d897f fw_build_timestamp 2021-05-27 19:36 fw_build_id QC_IMAGE_VERSION_STRING=WLAN.HSP.1.1-02431-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1 Changing the title to reflect that.
I also noticed a warning in dmesg: [ 7.315630] WARNING: CPU: 0 PID: 561 at mm/page_alloc.c:5366 __alloc_pages+0x2a9/0x320 Comes from: [ 7.315838] ath11k_qmi_msg_mem_request_cb+0xe3/0x190 [ath11k]
See relevant mailing list discussion here: https://lore.kernel.org/all/20211118174145.GA31300@thinkpad/T/
(In reply to mani from comment #4) > See relevant mailing list discussion here: > https://lore.kernel.org/all/20211118174145.GA31300@thinkpad/T/ There are two separate suspend bugs in ath11k at the moment. Even if I revert commit 020d3b26c07a ("bus: mhi: Early MHI resume failure in non M3 state") WCN6855 suspend still fails. I filed bug #215103 for handling the mhi issue.
(In reply to Kalle Valo from comment #3) > I also noticed a warning in dmesg: > > [ 7.315630] WARNING: CPU: 0 PID: 561 at mm/page_alloc.c:5366 > __alloc_pages+0x2a9/0x320 > > Comes from: > > [ 7.315838] ath11k_qmi_msg_mem_request_cb+0xe3/0x190 [ath11k] There is already a patch for this warning: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/commit/drivers/net/wireless/ath/ath11k?id=b9b5948cdd7bc8d9fa31c78cbbb04382c815587f
(In reply to Pengyu Ma from comment #0) > Created attachment 298477 [details] > mainline kernel 5.14-rc4 dmesg > > When resume, ath11k reports error, mhi is in MHI_STATE_RESET than M3. > > [ 46.003087] ath11k_pci 0000:03:00.0: failed to set mhi state: RESUME(6) > [ 46.003796] mhi-pci-generic 0000:05:00.0: failed to resume device: -22 > [ 46.003814] mhi-pci-generic 0000:05:00.0: device recovery started > > Even disable runtime suspend, it will show the same error. > > fw is from https://github.com/kvalo/ath11k-firmware.git Hi Pengyu, how do you find that mhi is in MHI_STATE_RESET? I did not find this info in the dmesg log.
Hi Baochen, After added extra debug info, I confirmed the RESET status. It should be the same issue as NUC suspend issue that Kalle and Mani discussed.
Hi Mani, Please Cc me if you got any patch, so I can help verify it. I think a recovery work after power lost as PCI MHI driver is necessary. The revert patch that ignore the MHI state is only a hack, after resume the mhi of ath11k won't work any more.
Hi Mani: Is there any proposed patch I can help verify? Or a schedule for this issue? Please let me know if I can help.
(In reply to Pengyu Ma from comment #9) > Hi Mani, > > Please Cc me if you got any patch, so I can help verify it. > I think a recovery work after power lost as PCI MHI driver is necessary. > > The revert patch that ignore the MHI state is only a hack, after resume the > mhi of ath11k won't work any more. Oh, on NUC if we ignore the MHI state during resume (basically the revert of 020d3b26c07a) it works just fine. But on your setup, it is not working?
Hi Mani, The WLAN could work after revert the patch, but I think MHI driver is not in right condition. Also it introduces another regression for LTE module [1eac:1001]. With the revert, the recovery work will be delayed for 20 seconds. So I think a proper recovery work like LTE module is better.
(In reply to Pengyu Ma from comment #12) > Hi Mani, > > The WLAN could work after revert the patch, but I think MHI driver is not in > right condition. > Okay. Thanks for the confirmation. I tried to reset the device in ath11k but so far I'm not able to bring it back to a working state afterward. During PM resume failure: 1. Doing MHI reset causes ath11k crashes. 2. Doing SOC reset requires full initialization of the device. So I'm going to apply the patch suggested by Loic that forces the resume ignoring the MHI state of the device. Will CC you so that you can also test. > Also it introduces another regression for LTE module [1eac:1001]. > With the revert, the recovery work will be delayed for 20 seconds. > Yeah, that patch is necessary for some LTE modules. > So I think a proper recovery work like LTE module is better. Unfortunately, that's not straightforward with WLAN devices.
Patch submitted and it got accepted into char-misc-linus branch. This should be available in the next -rc release candidate. https://lore.kernel.org/lkml/20211209131633.4168-1-manivannan.sadhasivam@linaro.org/
Created attachment 299979 [details] attachment-2895-0.html Out of office on Thu and Fri. Please call me if it's urgent : (+86) 181-1611-8005. Thanks, Peter
Hi all, This bug is fixed by firmware: https://github.com/kvalo/ath11k-firmware/tree/master/WCN6855/hw2.0/1.1/WLAN.HSP.1.1-03003-QCAHSPSWPL_V1_V2_SILICONZ_LITE-2 No other patch is needed.