Bug 210733

Summary: iwlwifi #PF at IP: 0010:iwl_dbg_tlv_alloc_region+0x27/0x180 [iwlwifi]
Product: Drivers Reporter: Jeroen Jeurissen (jeroenjeurissen)
Component: network-wireless-intelAssignee: Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel)
Status: NEW ---    
Severity: blocking CC: adam, bp, build+kernel, cedric, chrisbcoutinho, epassaro15, fkrueger, krrgithub, psi-jack, tiwai
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.10.1-2.g8f3d468-default (Tumbleweed) Subsystem:
Regression: No Bisected commit-id:
Attachments: journalctl.log
Tentative fix patch
Alternative patch

Description Jeroen Jeurissen 2020-12-16 16:38:00 UTC
Created attachment 294161 [details]
journalctl.log

My Asus G14 laptop freezes on boot(AMD 4900HS)
Error message:
Dec 16 13:58:59 G14 kernel: BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 241s!
Dec 16 13:58:59 G14 kernel: Showing busy workqueues and worker pools:
Dec 16 13:58:59 G14 kernel: workqueue events: flags=0x0
Dec 16 13:58:59 G14 kernel:   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=16/256 refcnt=17
Dec 16 13:58:59 G14 kernel:     in-flight: 8810:request_firmware_work_func
Dec 16 13:58:59 G14 kernel:     pending: delayed_fput, smp_call_on_cpu_callback, kernfs_notify_workfn, free_work, kfree_rcu_monitor, drm_fb_helper_dirty_work [drm_kms_helper], pcpu_balance_workfn, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, console_callback, release_one_tty
Dec 16 13:58:59 G14 kernel:   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
Dec 16 13:58:59 G14 kernel:     pending: check_corruption, vmstat_shepherd
Dec 16 13:58:59 G14 kernel: workqueue events_unbound: flags=0x2
Dec 16 13:58:59 G14 kernel:   pwq 32: cpus=0-15 flags=0x4 nice=0 active=4/512 refcnt=13
Dec 16 13:58:59 G14 kernel:     in-flight: 107:fsnotify_connector_destroy_workfn fsnotify_connector_destroy_workfn, 110:fsnotify_mark_destroy_workfn fsnotify_mark_destroy_workfn BAR(1127) BAR(1) BAR(1197) BAR(1029) BAR(3671) BAR(3423) BAR(4904)
Dec 16 13:58:59 G14 kernel: workqueue rcu_gp: flags=0x8
Dec 16 13:58:59 G14 kernel:   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
Dec 16 13:58:59 G14 kernel:     pending: process_srcu, process_srcu
Dec 16 13:58:59 G14 kernel: workqueue mm_percpu_wq: flags=0x8
Dec 16 13:58:59 G14 kernel:   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256 refcnt=4
Dec 16 13:58:59 G14 kernel:     pending: vmstat_update, lru_add_drain_per_cpu BAR(133)
Dec 16 13:58:59 G14 kernel:   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
Dec 16 13:58:59 G14 kernel:     pending: vmstat_update
Dec 16 13:58:59 G14 kernel: workqueue pm: flags=0x4
Dec 16 13:58:59 G14 kernel:   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
Dec 16 13:58:59 G14 kernel:     pending: pm_runtime_work
Dec 16 13:58:59 G14 kernel: workqueue hci0: flags=0xa0012
Dec 16 13:58:59 G14 kernel:   pwq 33: cpus=0-15 flags=0x4 nice=-20 active=1/1 refcnt=5
Dec 16 13:58:59 G14 kernel:     in-flight: 900:hci_rx_work [bluetooth] BAR(901)
Dec 16 13:58:59 G14 kernel:     delayed: hci_cmd_work [bluetooth]
Dec 16 13:58:59 G14 kernel: workqueue hci0: flags=0xa0012
Dec 16 13:58:59 G14 kernel:   pwq 33: cpus=0-15 flags=0x4 nice=-20 active=1/1 refcnt=3
Dec 16 13:58:59 G14 kernel:     in-flight: 901:hci_power_off [bluetooth]
Dec 16 13:58:59 G14 kernel: pool 2: cpus=1 node=0 flags=0x0 nice=0 hung=241s workers=3 idle: 899 19
Dec 16 13:58:59 G14 kernel: pool 32: cpus=0-15 flags=0x4 nice=0 hung=0s workers=16 idle: 108 111 114 8 109 106 118 112 117 115 113 8756 116 8755
Dec 16 13:58:59 G14 kernel: pool 33: cpus=0-15 flags=0x4 nice=-20 hung=20s workers=3 idle: 219
Comment 1 Jeroen Jeurissen 2020-12-16 16:50:42 UTC
kernel params:
splash=silent resume=/dev/disk/by-uuid/4d96025f-7d10-456b-93fd-9aa6e706f4ca quiet rd.driver.blacklist=nouveau mitigations=auto
Comment 2 Borislav Petkov 2020-12-16 17:28:28 UTC
That's iwlwifi's driver dropping the ball:

Dec 16 13:54:57 G14 kernel: iwlwifi 0000:02:00.0: api flags index 2 larger than supported by driver
Dec 16 13:54:57 G14 kernel: BUG: unable to handle page fault for address: ffffa38101b8d39f
Dec 16 13:54:57 G14 kernel: #PF: supervisor write access in kernel mode
Dec 16 13:54:57 G14 kernel: #PF: error_code(0x0003) - permissions violation
Dec 16 13:54:57 G14 kernel: PGD 100000067 P4D 100000067 PUD 1001d4067 PMD 113033067 PTE 800000013183e161
Dec 16 13:54:57 G14 kernel: Oops: 0003 [#1] SMP NOPTI
Dec 16 13:54:57 G14 kernel: CPU: 1 PID: 193 Comm: kworker/1:1 Not tainted 5.10.1-2.g8f3d468-default #1 openSUSE Tumbleweed (unreleased)
Dec 16 13:54:57 G14 kernel: Hardware name: ASUSTeK COMPUTER INC. ROG Zephyrus G14 GA401IV_GA401IV/GA401IV, BIOS GA401IV.217 09/18/2020
Dec 16 13:54:57 G14 kernel: Workqueue: events request_firmware_work_func
Dec 16 13:54:57 G14 kernel: RIP: 0010:iwl_dbg_tlv_alloc_region+0x27/0x180 [iwlwifi]
Comment 3 Jeroen Jeurissen 2020-12-18 11:33:55 UTC
I can confirm that the problem is with iwlwifi, when I blacklist iwlwifi my laptop boots up just fine.
Comment 4 Jeroen Jeurissen 2020-12-18 11:45:30 UTC
When I modprobe iwlwifi I get the following error:
[   57.752421] Intel(R) Wireless WiFi driver for Linux
[   57.752477] iwlwifi 0000:02:00.0: enabling device (0000 -> 0002)
[   57.782402] iwlwifi 0000:02:00.0: api flags index 2 larger than supported by driver
[   57.782411] BUG: unable to handle page fault for address: ffffacce04b4239f
[   57.782413] #PF: supervisor write access in kernel mode
[   57.782413] #PF: error_code(0x0003) - permissions violation
[   57.782414] PGD 100000067 P4D 100000067 PUD 1001d4067 PMD 1115c7067 PTE 8000000282ebb161
[   57.782417] Oops: 0003 [#1] SMP NOPTI
[   57.782419] CPU: 6 PID: 187 Comm: kworker/6:1 Tainted: P           OE     5.10.1-2.g8f3d468-default #1 openSUSE Tumbleweed (unreleased)
[   57.782420] Hardware name: ASUSTeK COMPUTER INC. ROG Zephyrus G14 GA401IV_GA401IV/GA401IV, BIOS GA401IV.217 09/18/2020
[   57.782424] Workqueue: events request_firmware_work_func
[   57.782430] RIP: 0010:iwl_dbg_tlv_alloc_region+0x27/0x180 [iwlwifi]
[   57.782431] Code: 00 00 00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 08 8b 46 04 44 8b 66 10 44 8b 6e 14 83 f8 3f 0f 86 0b 01 00 00 <c6> 46 37 00 48 89 fb 48 8b 7f 38 48 89 f5 4c 8d 4e 18 31 d2 44 8d
[   57.782432] RSP: 0018:ffffacce0077bce0 EFLAGS: 00010212
[   57.782433] RAX: 0000000000000040 RBX: ffff91ffc2f20018 RCX: 0000000000000000
[   57.782434] RDX: ffffffffc32231b0 RSI: ffffacce04b42368 RDI: ffff91ffc2f20018
[   57.782434] RBP: 0000000000000000 R08: 0000000000000080 R09: 0000000000000001
[   57.782435] R10: ffffacce04b423b0 R11: 0000000000000040 R12: 000000000000000c
[   57.782436] R13: 0000000000000011 R14: ffff91fe82c4f800 R15: ffff91fed1e4d800
[   57.782437] FS:  0000000000000000(0000) GS:ffff92015f780000(0000) knlGS:0000000000000000
[   57.782437] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   57.782438] CR2: ffffacce04b4239f CR3: 00000001e9410000 CR4: 0000000000350ee0
[   57.782439] Call Trace:
[   57.782445]  iwl_dbg_tlv_alloc+0x79/0x120 [iwlwifi]
[   57.782450]  iwl_req_fw_callback+0x10f0/0x2480 [iwlwifi]
[   57.782452]  ? devres_add+0x1e/0x60
[   57.782453]  ? fw_add_devm_name.part.0+0x5c/0x80
[   57.782454]  ? assign_fw+0x6d/0x230
[   57.782455]  request_firmware_work_func+0x4d/0x90
[   57.782457]  process_one_work+0x1df/0x370
[   57.782459]  worker_thread+0x50/0x400
[   57.782460]  ? process_one_work+0x370/0x370
[   57.782460]  kthread+0x11b/0x140
[   57.782462]  ? __kthread_bind_mask+0x60/0x60
[   57.782464]  ret_from_fork+0x22/0x30
[   57.782465] Modules linked in: iwlwifi cfg80211 rfcomm hidp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iscsi_ibft iscsi_boot_sysfs iptable_filter ip_tables x_tables bpfilter vboxnetadp(OE) vboxnetflt(OE) cmac algif_hash algif_skcipher af_alg bnep vboxdrv(OE) dmi_sysfs squashfs snd_hda_codec_realtek nvidia_drm(POE) btusb snd_hda_codec_generic nvidia_modeset(POE) ledtrig_audio snd_hda_codec_hdmi btrtl btbcm snd_hda_intel btintel snd_intel_dspcfg nvidia_uvm(POE) edac_mce_amd soundwire_intel bluetooth soundwire_generic_allocation kvm_amd soundwire_cadence snd_hda_codec
[   57.782492]  snd_hda_core kvm snd_hwdep soundwire_bus ecdh_generic snd_soc_core nls_iso8859_1 snd_compress rfkill nls_cp437 snd_pcm_dmaengine vfat fat ecc nvidia(POE) loop irqbypass snd_pcm snd_rn_pci_acp3x wmi_bmof joydev efi_pstore pcspkr snd_timer sp5100_tco k10temp snd_pci_acp3x snd i2c_piix4 ucsi_acpi typec_ucsi soundcore typec thermal ac acpi_cpufreq tiny_power_button asus_wireless button fuse configfs usbhid hid_multitouch hid_generic amdgpu iommu_v2 gpu_sched ttm i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec xhci_pci xhci_pci_renesas rc_core xhci_hcd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm aesni_intel glue_helper crypto_simd usbcore nvme cryptd nvme_core ccp wmi battery video i2c_hid btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sg br_netfilter bridge stp llc dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr efivarfs
[   57.782524] CR2: ffffacce04b4239f
[   57.782526] ---[ end trace 2da0ba2c9fd6c69f ]---
[   57.782530] RIP: 0010:iwl_dbg_tlv_alloc_region+0x27/0x180 [iwlwifi]
[   57.782531] Code: 00 00 00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 08 8b 46 04 44 8b 66 10 44 8b 6e 14 83 f8 3f 0f 86 0b 01 00 00 <c6> 46 37 00 48 89 fb 48 8b 7f 38 48 89 f5 4c 8d 4e 18 31 d2 44 8d
[   57.782532] RSP: 0018:ffffacce0077bce0 EFLAGS: 00010212
[   57.782532] RAX: 0000000000000040 RBX: ffff91ffc2f20018 RCX: 0000000000000000
[   57.782533] RDX: ffffffffc32231b0 RSI: ffffacce04b42368 RDI: ffff91ffc2f20018
[   57.782533] RBP: 0000000000000000 R08: 0000000000000080 R09: 0000000000000001
[   57.782534] R10: ffffacce04b423b0 R11: 0000000000000040 R12: 000000000000000c
[   57.782535] R13: 0000000000000011 R14: ffff91fe82c4f800 R15: ffff91fed1e4d800
[   57.782535] FS:  0000000000000000(0000) GS:ffff92015f780000(0000) knlGS:0000000000000000
[   57.782536] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   57.782537] CR2: ffffacce04b4239f CR3: 00000001e9410000 CR4: 0000000000350ee0

and after a minute the workqueue lockup returns
Comment 5 Jeroen Jeurissen 2020-12-19 13:07:28 UTC
02:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
Comment 6 Frank Kruger 2020-12-20 21:23:34 UTC
I am experiencing the same issue with Wi-Fi AX200 and kernel 5.10.1 (openSUSE kernel:stable). Kernel version <= 5.9.X works fine.
Comment 7 Adam Stephens 2020-12-25 15:45:06 UTC
I'm experiencing this as well on openSUSE 5.10.1 and 5.10.2 kernels

00:14.3 Network controller: Intel Corporation Comet Lake PCH-LP CNVi WiFi
Comment 8 Chris Coutinho 2020-12-25 18:24:34 UTC
I'm also experiencing this bug on Tumbleweed. Had to revert back to 5.9.14-1

00:14.3 Network controller: Intel Corporation Comet Lake PCH-LP CNVi WiFi
Comment 9 Ezequiel Pássaro 2020-12-25 21:59:10 UTC
I'm experiencing this bug on openSUSE Tumbleweed with kernel 5.10.1


01:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
Comment 10 Eric Renfro 2020-12-26 06:05:08 UTC
I am experiencing this new bug on openSUSE Tumbleweed also with kernel 5.10.1. Reverting back to 5.9.14.

00:14.3 Network controller: Intel Corporation Comet Lake PCH-LP CNVi WiFi

To note, also, the workaround mentioned https://bugzilla.opensuse.org/show_bug.cgi?id=1180344 here:
options iwlwifi enable_ini=0

Managed to get around the problem for now. Seems to be the debug functions incorporated into that option.
Comment 11 Takashi Iwai 2020-12-27 11:55:05 UTC
The bug seems triggered by the modification of tlv data for adding the termination char in the name string.  The fix patch is below.
Comment 12 Takashi Iwai 2020-12-27 11:56:15 UTC
Created attachment 294357 [details]
Tentative fix patch
Comment 13 Arjen de Korte 2020-12-27 14:15:36 UTC
Created attachment 294363 [details]
Alternative patch

It might be a good idea to limit the number of charachters to be displayed to IWL_FW_INI_MAX_NAME through the precision field of the format string. See attachment. This doesn't require modification of the buffer, but takes the IWL_FW_INI_MAX_NAME as an argument instead.