Bug 205005

Summary: iwlwifi: AC3165: hitting "WARN_ON(!regdom && self_managed)"
Product: Drivers Reporter: Andre Heider (a.heider)
Component: network-wireless-intelAssignee: Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel)
Status: RESOLVED CODE_FIX    
Severity: normal CC: andrey.vihrov, chaitanya.mgit, golan.ben.ami, johannes, linux, martin.stolpe
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.2.9 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg

Description Andre Heider 2019-09-26 08:37:04 UTC
Created attachment 285177 [details]
dmesg

My Lenovo B71-80 shipped with this card:

02:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3165 Plus Bluetooth (rev 99)
	Subsystem: Intel Corporation Dual Band Wireless-AC 3165 Plus Bluetooth
	Flags: bus master, fast devsel, latency 0, IRQ 127
	Memory at a1000000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [c8] Power Management version 3
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [40] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number e0-94-67-ff-ff-b4-6d-bf
	Capabilities: [14c] Latency Tolerance Reporting
	Capabilities: [154] L1 PM Substates
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi

Recently I'm getting theses warnings on boot:

[    3.972548] ------------[ cut here ]------------
[    3.972598] WARNING: CPU: 1 PID: 517 at net/wireless/nl80211.c:6859 nl80211_get_reg_do+0x1fc/0x210 [cfg80211]
[    3.972598] Modules linked in: intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul binfmt_misc ghash_clmulni_intel btusb arc4 btrtl btbcm btintel nls_ascii nls_cp437 bluetooth snd_hda_codec_hdmi snd_soc_skl snd_soc_skl_ipc snd_hda_codec_conexant snd_hda_codec_generic snd_soc_sst_ipc snd_soc_sst_dsp vfat fat ledtrig_audio snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi iwlmvm snd_soc_core aesni_intel mac80211 drbg snd_compress uvcvideo i915 videobuf2_vmalloc videobuf2_memops aes_x86_64 crypto_simd videobuf2_v4l2 snd_hda_intel cryptd glue_helper iwlwifi intel_cstate efi_pstore videobuf2_common sg joydev evdev drm_kms_helper ansi_cprng intel_uncore snd_hda_codec intel_wmi_thunderbolt rtsx_usb_ms iTCO_wdt videodev iTCO_vendor_support intel_rapl_perf memstick cfg80211 serio_raw media watchdog ecdh_generic drm ecc snd_hda_core snd_hwdep snd_pcm snd_timer ideapad_laptop sparse_keymap snd soundcore i2c_algo_bit pcspkr efivars
[    3.972625]  intel_pch_thermal rfkill battery pcc_cpufreq acpi_pad ac button sunrpc efivarfs ip_tables x_tables autofs4 ext4 rtsx_usb_sdmmc mmc_core rtsx_usb mfd_core crc16 mbcache jbd2 crc32c_generic sr_mod cdrom sd_mod ahci libahci libata xhci_pci xhci_hcd scsi_mod usbcore psmouse r8169 crc32c_intel realtek libphy i2c_i801 wmi usb_common video
[    3.972642] CPU: 1 PID: 517 Comm: wpa_supplicant Not tainted 5.2.0-2-amd64 #1 Debian 5.2.9-2
[    3.972642] Hardware name: LENOVO 80RJ/Lenovo B71-80, BIOS 0RCN35WW 12/03/2015
[    3.972662] RIP: 0010:nl80211_get_reg_do+0x1fc/0x210 [cfg80211]
[    3.972663] Code: ff 48 89 ef e8 c5 55 35 f7 b8 a6 ff ff ff eb 89 eb ef b8 97 ff ff ff eb 80 e8 70 b4 de f6 48 c7 c7 08 68 6d c0 e8 b2 31 e5 f6 <0f> 0b 48 89 ef e8 9a 55 35 f7 b8 ea ff ff ff e9 5b ff ff ff 0f 1f
[    3.972665] RSP: 0018:ffffaa6a81e13ac8 EFLAGS: 00010246
[    3.972666] RAX: 0000000000000024 RBX: 0000000000000000 RCX: 0000000000000000
[    3.972667] RDX: 0000000000000000 RSI: ffff8b7832497688 RDI: ffff8b7832497688
[    3.972668] RBP: ffff8b782edfe200 R08: 0000000000000347 R09: 0000000000000004
[    3.972668] R10: 0000000000000000 R11: 0000000000000001 R12: ffffaa6a81e13b50
[    3.972669] R13: ffff8b782cec7014 R14: 0000000000000001 R15: ffff8b78298a8300
[    3.972671] FS:  00007f49414af800(0000) GS:ffff8b7832480000(0000) knlGS:0000000000000000
[    3.972672] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.972672] CR2: 0000560d09d30ba0 CR3: 000000046e394003 CR4: 00000000003606e0
[    3.972673] Call Trace:
[    3.972679]  ? _cond_resched+0x15/0x30
[    3.972682]  genl_family_rcv_msg+0x1d2/0x410
[    3.972685]  ? __alloc_pages_nodemask+0x163/0x310
[    3.972687]  genl_rcv_msg+0x47/0x8c
[    3.972689]  ? __kmalloc_node_track_caller+0x1cb/0x290
[    3.972691]  ? genl_family_rcv_msg+0x410/0x410
[    3.972692]  netlink_rcv_skb+0x49/0x110
[    3.972694]  genl_rcv+0x24/0x40
[    3.972695]  netlink_unicast+0x17e/0x200
[    3.972697]  netlink_sendmsg+0x204/0x3d0
[    3.972700]  sock_sendmsg+0x4c/0x50
[    3.972702]  ___sys_sendmsg+0x29f/0x300
[    3.972704]  ? ___sys_recvmsg+0x16c/0x200
[    3.972706]  ? filemap_map_pages+0x1b9/0x390
[    3.972709]  ? __handle_mm_fault+0x10a8/0x1280
[    3.972712]  __sys_sendmsg+0x57/0xa0
[    3.972715]  do_syscall_64+0x53/0x130
[    3.972717]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    3.972719] RIP: 0033:0x7f4941801744
[    3.972721] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 48 8d 05 99 3c 0c 00 8b 00 85 c0 75 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
[    3.972722] RSP: 002b:00007ffef6fd9668 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[    3.972723] RAX: ffffffffffffffda RBX: 0000560d09fe6480 RCX: 00007f4941801744
[    3.972724] RDX: 0000000000000000 RSI: 00007ffef6fd96a0 RDI: 0000000000000004
[    3.972725] RBP: 0000560d09fe8cf0 R08: 0000000000000004 R09: 0000560d09fea9c0
[    3.972725] R10: 00007ffef6fd9774 R11: 0000000000000246 R12: 0000560d09fe6390
[    3.972726] R13: 00007ffef6fd96a0 R14: 00007ffef6fd97d0 R15: 00007ffef6fd9774
[    3.972728] ---[ end trace 3f8fc0059321ffc6 ]---

Wifi then stops working completely, full dmesg.log attached.

This is the debian/buster 5.2.0-2-amd64 kernel, but testing 5.3~rc5 shows the same issue. firmware-iwlwifi is at 20190717-2.

This started happening recently, without any changes to the system from myself. The current kernel was installed +1 month ago. And wifi used to work with it. Between the last working state to the first broken state there were no system updates, no initrd updates, no config changes...

Any ideas?
Comment 1 Andre Heider 2019-09-26 08:47:40 UTC
I'm not alone with this:

8260:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=939986

AC9560:
https://bugzilla.redhat.com/show_bug.cgi?id=1738151
Comment 2 Andre Heider 2019-09-27 18:38:02 UTC
Enabling and disabling rfkill (no physical switch, Fn key combo) fixes the issue.

To summarize:
Out of the blue that WARN_ON was hit on every boot. Reboots, poweroffs or detaching the battery (not physically, but there's Fn+s+v for "shipping mode") didn't help. Booting kernel 4.19, 5.2 or 5.3 all yielded the same issue.

Toggling rfkill twice "fixed" the issue across reboots and poweroffs.

Enabling rfkill after the "fix" doesn't hit the WARN_ON, instead I get a proper "wpa_supplicant[3226]: rfkill: WLAN soft blocked". I wasn't yet successful in getting it into that broken state again.

No idea how it got into this state, but I definitely didn't use that rfkill key combo. Whatever the problem was, that WARN_ON sure wasn't helpful...
Comment 3 Chaitanya T K 2019-11-01 14:56:26 UTC
Observed a similar bug using a fork of https://github.com/MontaVista-OpenSourceTechnology/linux-mvista-2.4/tree/mvl-4.14/msd.cgx,

[   14.846887] WARNING: CPU: 1 PID: 2257 at net/wireless/nl80211.c:6225 nl80211_get_reg_do+0x1c0/0x220
 4.14.148-rt66-yocto-standard-15185-g7d02a7c6

The driver is a non-upstreamed one which supports a custom self-managed regulatory domain which is registered using regulatory_set_wiphy_regd(), so, I don't see a case where the regd can be NULL unless cfg80211 overrides the regd (which it doesn't seem to do for DRIVER set regd).

Also, we see unusual wait times during `systemctl status networking.service`, probably RTNL issues?
Comment 4 Golan Ben Ami 2021-11-15 09:41:11 UTC
fix was published on 5.14:
eb09ae9 iwlwifi: mvm: load regdomain at INIT stage