Bug 212457

Summary: iwlwifi: AX200: config space inaccessible after resume from suspend; cannot transition from D3hot to D0
Product: Drivers Reporter: David Ward (david.ward)
Component: network-wireless-intelAssignee: Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: accounts, bobreid, danielpeterson, devin.tuchsen, gustavo, hamidrj, juliusvonkohout, kernel, perry_yuan, sebastian.pleschko
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.11.7 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg with out-of-tree VirtualBox drivers
dmesg with kernel 5.11.10-300.fc34.x86_64
dmesg with kernel 5.11.11-350.vanilla.1.fc34.x86_64
lspci -xxxvvv
bug seems to be still present in kernel-5.11.12-300.fc34.x86_64
bug seems to be still present in kernel-5.11.12-300.fc34.x86_64
bug seems to be still present in kernel-5.11.12-300.fc34.x86_64
bug still present in fedora kernel 5.11.19-300.fc34.x86_64
dmesg_ax200.txt

Description David Ward 2021-03-27 03:23:03 UTC
In bug 201469, several users reported an unrelated issue in iwlwifi that I am filing separately here.

They experience what appears to be a regression in commit 8954e1eb2270 ("iwlwifi: trans: Clear persistence bit when starting the FW"). This commit was introduced in v4.20-rc2 and was backported to v4.19.89.

After resume from S3 suspend, the wireless adapter is unusable, and dmesg contains this message approximately every 10 seconds:

    iwlwifi 0000:01:00.0: Error, can not clear persistence bit


Commit 44f61b5c832c ("iwlwifi: clear persistence bit according to device family") in v5.2-rc2 attempted to fix this. But at least one user (cc'd) is now reporting that the issue is still present in kernel v5.11.7.


@gustavo, can you please save the output of 'dmesg' after resuming from suspend, and then attach it to this bug?
Comment 1 gustavo 2021-03-27 11:09:56 UTC
I upgraded the computer from fedora 33 to fedora 34 beta so the kernel messages disappear... sorry.

With the new release, with kernel 5.11.9-300.fc34.x86_64, the problem has not appeared.
Comment 2 gustavo 2021-03-27 19:23:38 UTC
Created attachment 296091 [details]
dmesg with out-of-tree VirtualBox drivers

problem appear again
Comment 3 David Ward 2021-03-28 21:39:55 UTC
Before the iwlwifi error appears, there are several warnings which are not related to iwlwifi. dmesg cannot even print all the messages since boot, because the printk ring buffer is not large enough.

- Can you reproduce this behavior without the VirtualBox drivers? Please see: https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html

- Can you make the printk ring buffer larger, by setting the kernel parameter 'log_buf_len' with a larger value? The default value in Fedora 34 is 256KiB. See:
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
Comment 4 gustavo 2021-03-30 09:18:43 UTC
Created attachment 296149 [details]
dmesg with kernel 5.11.10-300.fc34.x86_64

after uninstalling virtualbox drivers...
i forgot to enlarge log_buf_len... in the next one
Comment 5 gustavo 2021-03-30 11:22:59 UTC
[gustavo@casa ~]$ sudo modprobe -r iwlwifi iwlmvm
[gustavo@casa ~]$ sudo modprobe iwlwifi
[gustavo@casa ~]$ dmesg | tail
[23258.092578] wlp7s0f3u4: RX AssocResp from 28:d1:27:5a:ab:41 (capab=0x431 status=0 aid=2)
[23258.210102] wlp7s0f3u4: associated
[27101.968904] Intel(R) Wireless WiFi driver for Linux
[27101.969051] iwlwifi 0000:05:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[27101.969173] iwlwifi 0000:05:00.0: HW_REV=0xFFFFFFFF, PCI issues?
[27101.969225] iwlwifi: probe of 0000:05:00.0 failed with error -5
[27145.803583] Intel(R) Wireless WiFi driver for Linux
[27145.803638] iwlwifi 0000:05:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[27145.803740] iwlwifi 0000:05:00.0: HW_REV=0xFFFFFFFF, PCI issues?
[27145.803775] iwlwifi: probe of 0000:05:00.0 failed with error -5
Comment 6 David Ward 2021-03-30 14:23:41 UTC
Thanks! This is much better.

Now, can you please try to reproduce this problem using the "vanilla" mainline kernel? Run the first two commands on this page to install it: https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories

Assuming it's not fixed in the mainline kernel, can you please attach dmesg for it?
Comment 7 gustavo 2021-03-30 14:53:33 UTC
Comment on attachment 296149 [details]
dmesg with kernel 5.11.10-300.fc34.x86_64

dmesg with kernel 5.11.10-300.fc34.x86_64
Comment 8 gustavo 2021-03-30 14:54:09 UTC
Comment on attachment 296149 [details]
dmesg with kernel 5.11.10-300.fc34.x86_64

dmesg with kernel 5.11.10-300.fc34.x86_64
Comment 9 gustavo 2021-03-31 17:46:13 UTC
Created attachment 296165 [details]
dmesg with kernel 5.11.11-350.vanilla.1.fc34.x86_64

dmesg showing the error with vanilla kernel 5.11.11-350.vanilla.1.fc34.x86_64
Comment 10 David Ward 2021-03-31 19:53:56 UTC
There is more than one vanilla repository; sorry for the confusion.

Could you please install and try "kernel-vanilla-mainline"? (The version will be 5.12.0-rc5 or later.)


With that kernel, could you run this command before suspend (when Wi-Fi works), and run it again after resume (when Wi-Fi is broken):

    lspci -xxxvvv

Please attach the output both times. Also please attach dmesg. Thanks!
Comment 11 gustavo 2021-04-01 15:54:51 UTC
Created attachment 296175 [details]
lspci -xxxvvv

I install the suggested kernel: kernel-5.12.0-0.rc5.20210331git2bb25b3a748a.181.vanilla.1.fc34.x86_64

output of "sudo lspci -xxxvvv" attached
Comment 12 gustavo 2021-04-09 11:22:09 UTC
the issue is no longer happening with v5.12-rc6-42-g454859c552da
Comment 13 gustavo 2021-04-12 09:27:55 UTC
Created attachment 296327 [details]
bug seems to be still present in kernel-5.11.12-300.fc34.x86_64

bug seems to be still present in kernel-5.11.12-300.fc34.x86_64
Comment 14 gustavo 2021-04-12 09:28:48 UTC
Created attachment 296329 [details]
bug seems to be still present in kernel-5.11.12-300.fc34.x86_64

bug seems to be still present in kernel-5.11.12-300.fc34.x86_64

prebug sudo lspci -xxxvvv
Comment 15 gustavo 2021-04-12 09:29:15 UTC
Created attachment 296331 [details]
bug seems to be still present in kernel-5.11.12-300.fc34.x86_64

bug seems to be still present in kernel-5.11.12-300.fc34.x86_64

postbug sudo lspci -xxxvvv
Comment 16 gustavo 2021-05-14 17:47:49 UTC
Created attachment 296749 [details]
bug still present in fedora kernel 5.11.19-300.fc34.x86_64

bug can be mixed with previous failures from mac80211...

[gustavo@casa ~]$ dmesg | grep RIP
[ 2308.872851] RIP: 0010:ieee80211_reconfig+0x8a/0x1280 [mac80211]
[ 2312.684461] RIP: 0010:drv_remove_interface+0xde/0xf0 [mac80211]
[ 2312.684781] RIP: 0010:drv_stop+0xc8/0xd0 [mac80211]
Comment 17 David Ward 2021-05-14 19:01:07 UTC
Did this work in an earlier release of 5.11.x?

Unless it did — can you please try a vanilla build of the mainline kernel? (Is this still happening in 5.12 and/or a 5.13 snapshot?) It would need to be fixed there before any changes can be made to the stable kernel. Please see: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

Thanks!
Comment 18 gustavo 2021-05-15 15:25:29 UTC
how can I test 5.13?

(In reply to David Ward from comment #17)
> Did this work in an earlier release of 5.11.x?
> 
> Unless it did — can you please try a vanilla build of the mainline kernel?
> (Is this still happening in 5.12 and/or a 5.13 snapshot?) It would need to
> be fixed there before any changes can be made to the stable kernel. Please
> see: https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
> 
> Thanks!
Comment 19 gustavo 2021-05-15 15:53:26 UTC
will the ones from https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories be valid?
Comment 20 gustavo 2021-07-05 08:59:48 UTC
bug is still present in kernel 5.12.13...
Comment 21 gustavo 2021-07-22 16:09:45 UTC
bug still present in kernel 5.13.4-200.fc34.x86_64
Comment 22 juliusvonkohout 2021-08-26 19:45:28 UTC
I am affected with 5.14.0.rc7
Comment 23 juliusvonkohout 2021-08-31 15:44:05 UTC
Created attachment 298547 [details]
dmesg_ax200.txt

Something is crashing completly

```
[26458.241020] ------------[ cut here ]------------
[26458.241021] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
[26458.241089] WARNING: CPU: 10 PID: 18416 at net/mac80211/util.c:2349 ieee80211_reconfig+0x8a/0x12d0 [mac80211]
[26458.241176] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bnep sunrpc vfat fat iwlmvm snd_hda_codec_realtek intel_rapl_msr snd_hda_codec_generic intel_rapl_common mac80211 ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi libarc4 edac_mce_amd snd_hda_codec uvcvideo btusb snd_hda_core snd_hwdep btrtl kvm_amd videobuf2_vmalloc btbcm snd_seq videobuf2_memops videobuf2_v4l2 videobuf2_common iwlwifi btintel kvm bluetooth snd_seq_device snd_pcm
[26458.241244]  videodev irqbypass cfg80211 rapl joydev mc pcspkr wmi_bmof snd_timer ecdh_generic snd ideapad_laptop platform_profile ucsi_acpi i2c_piix4 snd_rn_pci_acp3x sparse_keymap typec_ucsi snd_pci_acp3x soundcore typec rfkill acpi_cpufreq amd_pmc zram ip_tables amdgpu hid_sensor_hub hid_multitouch drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper crct10dif_pclmul cec crc32_pclmul sdhci_pci crc32c_intel drm cqhci nvme sdhci ghash_clmulni_intel serio_raw mmc_core nvme_core ccp sp5100_tco wmi video i2c_hid_acpi i2c_hid fuse
[26458.241298] CPU: 10 PID: 18416 Comm: kworker/u32:1 Not tainted 5.14.0-0.rc7.20210827git77dd11439b86.57.vanilla.1.fc34.x86_64 #1
[26458.241304] Hardware name: LENOVO 82L5/LNVNB161216, BIOS GSCN26WW 08/05/2021
[26458.241306] Workqueue: events_unbound async_run_entry_fn
[26458.241317] RIP: 0010:ieee80211_reconfig+0x8a/0x12d0 [mac80211]
[26458.241362] Code: 00 31 db e8 58 b1 be dd c6 85 74 06 00 00 00 48 89 ef e8 39 71 fc ff 41 89 c4 85 c0 74 4c 48 c7 c7 b8 10 57 c1 e8 44 02 6b de <0f> 0b 48 89 ef e8 bc cd ff ff e9 27 03 00 00 80 7c 24 16 00 0f 85
```
Comment 24 juliusvonkohout 2021-10-31 14:12:23 UTC
It seems to be fixed with 5.15 please close.
Comment 25 gustavo 2021-10-31 16:56:16 UTC
let me try kernel-5.15.0-0.rc7.20211028git1fc596a56b33.56.vanilla.1.fc35.x86_64 for a few hours to close it :)
Comment 26 gustavo 2021-11-01 01:48:05 UTC
the bug is still here: "iwlwifi 0000:05:00.0: Error, can not clear persistence bit" in kernel-5.15.0-0.rc7.20211028git1fc596a56b33.56.vanilla.1.fc35.x86_64
:(
Comment 27 juliusvonkohout 2021-11-01 10:19:51 UTC
Maybe you need a new firmware too. I am using Fedora 35 with a mainline Kernel from https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories
Comment 28 gustavo 2021-11-01 10:51:18 UTC
just like me but again I hit the same bug: 
[10779.130581] iwlwifi 0000:05:00.0: Error, can not clear persistence bit

This week I will try a new card... The hardware may be broken :(
Comment 29 Felix Schnizlein 2021-11-02 21:33:22 UTC
I can confirm this bug is still present in 5.15.0-2.ge9c94fc-default
Comment 30 gustavo 2021-11-17 13:32:23 UTC
the bug is still here with another card from another vendor, also ax200, using the latest kernel from fedora 35, 5.14.17-301.fc35.x86_64.
Comment 31 Daniel Petersson 2021-11-29 15:45:41 UTC
I have the same issue. Given my system and the info here I'm speculating its related to a combination of AMD Ryzen 3/4/5, Lenovo and ax200

(I have a Lenovo 13ACN05 with AMD Ryzen 5700U and ax200)

I dug around in the source code for iwlwifi, PCIe and ACPI, but I came up empty handed...

Does anyone have any suggestions for how to debug this to pinpoint the issue more accurately?
Comment 32 gustavo 2021-11-29 16:12:21 UTC
my cpu is also an AMD Ryzen 3 3200G...
Comment 33 hj 2021-12-02 03:02:10 UTC
Same issue here: (it has nothing to do with AMD) 
- Ubuntu 21.10;
- Kernel: 5.15.5-051505-generic;
- CPU: Intel® Core™ i7;
- ASUS motherboard.
Comment 34 Sebastian Pleschko 2021-12-12 20:02:50 UTC
I can confirm the same issue with Intel CPU:

Fedora 35
CPU: Intel(R) Core(TM) i7-5600U CPU
Kernel: 5.15.6-200.fc35.x86_64
DELL XPS13
Comment 35 Daniel Pecos 2021-12-15 20:31:52 UTC
Not 100% same scenario, but I'm getting the same error if I start my laptop running on battery and laptop-mode-tools is installed.

But not if I plug it up before. With the power cord, everything works fine.

Trying to reload the module once (and plugging in the power cord) does not work, I need to restart the laptop with the power cord attached.

Funny enough, uninstalling laptop-mode-tools fixes the issue (running on battery), although then the battery does not last much.
Comment 36 Daniel Pecos 2021-12-15 21:46:39 UTC
Quick update on my previous message, disabling power save on battery for wireless interfaces in laptop-mode-tools avoids the issue:

File /etc/laptop-mode/conf.d/wireless-power.conf, set:

WIRELESS_BATT_POWER_SAVING=0

And restart.
Comment 37 Daniel Pecos 2021-12-15 22:00:59 UTC
(In reply to Daniel Pecos from comment #36)
> Quick update on my previous message, disabling power save on battery for
> wireless interfaces in laptop-mode-tools avoids the issue:
> 
> File /etc/laptop-mode/conf.d/wireless-power.conf, set:
> 
> WIRELESS_BATT_POWER_SAVING=0
> 
> And restart.

Not really :(

There is something else going on, as with the same config, most of the times the interface is not available after restart yet.

I'll keep investigating...
Comment 38 Johannes Berg 2023-01-05 16:31:27 UTC
Eh, well, I guess this one kind of fell through the cracks and it's been a year now.

Anyone still seeing these issues?
Comment 39 gustavo 2023-01-05 16:38:59 UTC
working fine under newer kernels :)

El 5/1/23 a las 17:31, bugzilla-daemon@kernel.org escribió:
>
> https://urldefense.com/v3/__https://bugzilla.kernel.org/show_bug.cgi?id=212457__;!!D9dNQwwGXtA!R3kOTPTEmI9iuIZ6du4vIoBS9iQzcy8iSY9_0_8humvtUn-xLrR8MTyx_-Tc3vbdad77bN58Che3V0Hab8fI1UCHnA$
>
> --- Comment #38 from Johannes Berg (johannes@sipsolutions.net) ---
> Eh, well, I guess this one kind of fell through the cracks and it's been a
> year
> now.
>
> Anyone still seeing these issues?
>
Comment 40 Bob Reid 2023-10-18 20:19:00 UTC
(In reply to Johannes Berg from comment #38)
> Eh, well, I guess this one kind of fell through the cracks and it's been a
> year now.
> 
> Anyone still seeing these issues?

Unfortunately, still occurring w kernels 6.2x and 6.5x (various distros tried, currently Fedora 38 Cinnamon), Ryzen 5 3500C on a converted Thinkpad C13 Yoga Chromebook running mrchromebox.tech latest firmware. Many noob apologies if I am in the wrong place, but I am trying to get to the bottom of this error for resolution's sake because I LOVE this hardware.