Bug 206971

Summary: iwlwifi: BUG: sleeping function called from invalid context at net/core/dev.c:9919
Product: Drivers Reporter: gabriello.ramirez
Component: network-wireless-intelAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: ASSIGNED ---    
Severity: normal CC: asselsm, gabriello.ramirez, linuxwifi, luca, triffid.hunter
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.6.0-0.rc7 Subsystem:
Regression: No Bisected commit-id:
Attachments: BUG: sleeping function called from invalid context at net/core/dev.c:9919
BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924
partial patch
Patch, exit RCU read critical section earlier
complete logs with WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924
iwl_mvm_free_inactive_queue returns 0 or a error not a queue number
dont call iwl_mvm_free_inactive_queue, iwl_mvm_unshare_queue, and iwl_mvm_change_queue_tid under rcu_read_lock()

Description gabriello.ramirez 2020-03-26 02:20:28 UTC
Created attachment 288063 [details]
BUG: sleeping function called from invalid context at net/core/dev.c:9919

Hi, 

I have a Fedora 31 x86_64 machine with a wireless pci-express card with chipset 08:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 50)

as a wireless access point with hostapd-2.9-2.fc31.x86_64

and in the logs appear the following:

BUG: sleeping function called from invalid context at net/core/dev.c:9919
Mar 23 23:47:06 kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 372, name: kworker/0:2
Mar 23 23:47:06 kernel: INFO: lockdep is turned off.
Mar 23 23:47:06 kernel: CPU: 0 PID: 372 Comm: kworker/0:2 Not tainted 5.6.0-0.rc7.git0.2.iwlwififix3.fc31.x86_64 #1
Mar 23 23:47:06 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Mar 23 23:47:06 kernel: Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Mar 23 23:47:06 kernel: Call Trace:
Mar 23 23:47:06 kernel:  dump_stack+0x8f/0xd0
Mar 23 23:47:06 kernel:  ___might_sleep.cold+0xb3/0xc3
Mar 23 23:47:06 kernel:  synchronize_net+0x18/0x30
Mar 23 23:47:06 kernel:  iwl_mvm_inactivity_check+0x8ac/0xf10 [iwlmvm]
Mar 23 23:47:06 kernel:  iwl_mvm_add_new_dqa_stream_wk+0x176/0x920 [iwlmvm]
Mar 23 23:47:06 kernel:  process_one_work+0x25d/0x570
Mar 23 23:47:06 kernel:  worker_thread+0x50/0x3b0
Mar 23 23:47:06 kernel:  kthread+0x106/0x140
Mar 23 23:47:06 kernel:  ? process_one_work+0x570/0x570
Mar 23 23:47:06 kernel:  ? kthread_park+0x90/0x90
Mar 23 23:47:06 kernel:  ret_from_fork+0x3a/0x50
Mar 23 23:47:06 kernel: BUG: scheduling while atomic: kworker/0:2/372/0x00000002
Mar 23 23:47:06 kernel: INFO: lockdep is turned off.
Mar 23 23:47:06 kernel: Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables ccm 8021q garp mrp bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat intel_rapl_msr mei_hdcp iTCO_wdt iTCO_vendor_support intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass iwlmvm mac80211 libarc4 snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic iwlwifi ledtrig_audio joydev ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg intel_cstate cfg80211 intel_uncore snd_hda_codec intel_rapl_perf snd_hda_core snd_hwdep rfkill snd_seq wmi_bmof i2c_i801 snd_seq_device mei_me snd_pcm snd_timer snd mei soundcore lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Mar 23 23:47:06 kernel:  drm_kms_helper crc32c_intel drm r8169 e1000e ums_realtek uas usb_storage wmi video
Mar 23 23:47:06 kernel: CPU: 0 PID: 372 Comm: kworker/0:2 Tainted: G        W         5.6.0-0.rc7.git0.2.iwlwififix3.fc31.x86_64 #1
Mar 23 23:47:06 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Mar 23 23:47:06 kernel: Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Mar 23 23:47:06 kernel: Call Trace:
Mar 23 23:47:06 kernel:  dump_stack+0x8f/0xd0
Mar 23 23:47:06 kernel:  __schedule_bug.cold+0x64/0x71
Mar 23 23:47:06 kernel:  __schedule+0x822/0x990
Mar 23 23:47:06 kernel:  ? wait_for_completion+0xe9/0x150
Mar 23 23:47:06 kernel:  schedule+0x5f/0xd0
Mar 23 23:47:06 kernel:  schedule_timeout+0x265/0x3a0
Mar 23 23:47:06 kernel:  ? wait_for_completion+0x38/0x150
Mar 23 23:47:06 kernel:  ? wait_for_completion+0xe9/0x150
Mar 23 23:47:06 kernel:  wait_for_completion+0x108/0x150
Mar 23 23:47:06 kernel:  ? wake_up_q+0xa0/0xa0
Mar 23 23:47:06 kernel:  __wait_rcu_gp+0x157/0x170
Mar 23 23:47:06 kernel:  synchronize_rcu.part.0+0x38/0x40
Mar 23 23:47:06 kernel:  ? invoke_rcu_core+0xb0/0xb0
Mar 23 23:47:06 kernel:  ? __bpf_trace_rcu_utilization+0x10/0x10
Mar 23 23:47:06 kernel:  ? _cond_resched+0x15/0x30
Mar 23 23:47:06 kernel:  iwl_mvm_inactivity_check+0x8ac/0xf10 [iwlmvm]
Mar 23 23:47:06 kernel:  iwl_mvm_add_new_dqa_stream_wk+0x176/0x920 [iwlmvm]
Mar 23 23:47:06 kernel:  process_one_work+0x25d/0x570
Mar 23 23:47:06 kernel:  worker_thread+0x50/0x3b0
Mar 23 23:47:06 kernel:  kthread+0x106/0x140
Mar 23 23:47:06 kernel:  ? process_one_work+0x570/0x570
Mar 23 23:47:06 kernel:  ? kthread_park+0x90/0x90
Mar 23 23:47:06 kernel:  ret_from_fork+0x3a/0x50
Mar 23 23:47:06 kernel: ------------[ cut here ]------------

after several hours the machine freeze, and reboots via a watchdog

thanks,

Gabriel
Comment 1 gabriello.ramirez 2020-03-26 02:39:29 UTC
Created attachment 288065 [details]
BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924

Hi,

using the emailed patch the first bug in the logs is:

hostapd[1750]: wlp8s0: STA 78:00:9e:ff:6d:3e IEEE 802.11: did not acknowledge authentication response
Mar 25 19:12:41 kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924
Mar 25 19:12:41 kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2026, name: kworker/3:0
Mar 25 19:12:41 kernel: INFO: lockdep is turned off.
Mar 25 19:12:41 kernel: CPU: 3 PID: 2026 Comm: kworker/3:0 Not tainted 5.6.0-0.rc7.git0.2.iwlwififix4.fc31.x86_64 #1
Mar 25 19:12:41 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Mar 25 19:12:41 kernel: Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Mar 25 19:12:41 kernel: Call Trace:
Mar 25 19:12:41 kernel:  dump_stack+0x8f/0xd0
Mar 25 19:12:41 kernel:  ___might_sleep.cold+0xb3/0xc3
Mar 25 19:12:41 kernel:  iwl_trans_pcie_send_hcmd+0xdf/0x420 [iwlwifi]
Mar 25 19:12:41 kernel:  iwl_trans_send_cmd+0x79/0x150 [iwlwifi]
Mar 25 19:12:41 kernel:  ? __iwl_dbg+0x107/0x160 [iwlwifi]
Mar 25 19:12:41 kernel:  iwl_mvm_send_cmd+0x28/0x70 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_send_cmd_pdu+0x43/0x50 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_disable_txq.constprop.0+0x169/0x210 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_inactivity_check+0x8c6/0xf00 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_add_new_dqa_stream_wk+0x176/0x920 [iwlmvm]
Mar 25 19:12:41 kernel:  process_one_work+0x25d/0x570
Mar 25 19:12:41 kernel:  worker_thread+0x50/0x3b0
Mar 25 19:12:41 kernel:  kthread+0x106/0x140
Mar 25 19:12:41 kernel:  ? process_one_work+0x570/0x570
Mar 25 19:12:41 kernel:  ? kthread_park+0x90/0x90
Mar 25 19:12:41 kernel:  ret_from_fork+0x3a/0x50
Mar 25 19:12:41 kernel: BUG: scheduling while atomic: kworker/3:0/2026/0x00000002
Mar 25 19:12:41 kernel: INFO: lockdep is turned off.
Mar 25 19:12:41 kernel: Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables ccm 8021q garp mrp bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm mac80211 kvm libarc4 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support mei_hdcp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio joydev iwlwifi snd_hda_codec_hdmi intel_cstate snd_hda_intel snd_intel_dspcfg intel_uncore snd_hda_codec wmi_bmof snd_hda_core intel_rapl_perf cfg80211 i2c_i801 snd_hwdep rfkill snd_seq snd_seq_device snd_pcm mei_me snd_timer lpc_ich mei snd soundcore ip_tables raid1 i915 i2c_algo_bit cec
Mar 25 19:12:41 kernel:  drm_kms_helper crc32c_intel r8169 drm ums_realtek uas e1000e usb_storage wmi video
Mar 25 19:12:41 kernel: CPU: 3 PID: 2026 Comm: kworker/3:0 Tainted: G        W         5.6.0-0.rc7.git0.2.iwlwififix4.fc31.x86_64 #1
Mar 25 19:12:41 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Mar 25 19:12:41 kernel: Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Mar 25 19:12:41 kernel: Call Trace:
Mar 25 19:12:41 kernel:  dump_stack+0x8f/0xd0
Mar 25 19:12:41 kernel:  __schedule_bug.cold+0x64/0x71
Mar 25 19:12:41 kernel:  __schedule+0x822/0x990
Mar 25 19:12:41 kernel:  schedule+0x5f/0xd0
Mar 25 19:12:41 kernel:  schedule_timeout+0x1a8/0x3a0
Mar 25 19:12:41 kernel:  ? __next_timer_interrupt+0xd0/0xd0
Mar 25 19:12:41 kernel:  iwl_trans_pcie_send_hcmd+0x258/0x420 [iwlwifi]
Mar 25 19:12:41 kernel:  ? finish_wait+0x90/0x90
Mar 25 19:12:41 kernel:  iwl_trans_send_cmd+0x79/0x150 [iwlwifi]
Mar 25 19:12:41 kernel:  iwl_mvm_send_cmd_status+0x47/0xf0 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_send_cmd_pdu_status+0x45/0x50 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_invalidate_sta_queue+0x17b/0x2b0 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_inactivity_check+0xce2/0xf00 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_add_new_dqa_stream_wk+0x176/0x920 [iwlmvm]
Mar 25 19:12:41 kernel:  process_one_work+0x25d/0x570
Mar 25 19:12:41 kernel:  worker_thread+0x50/0x3b0
Mar 25 19:12:41 kernel:  kthread+0x106/0x140
Mar 25 19:12:41 kernel:  ? process_one_work+0x570/0x570
Mar 25 19:12:41 kernel:  ? kthread_park+0x90/0x90
Mar 25 19:12:41 kernel:  ret_from_fork+0x3a/0x50
Mar 25 19:12:41 kernel: ------------[ cut here ]------------
Comment 2 Mark Asselstine 2020-03-27 02:27:52 UTC
Created attachment 288083 [details]
partial patch

commit cfbc6c4c5b91 [iwlwifi: mvm: support mac80211 TXQs model] extended the RCU read critical section in iwl_mvm_inactivity_check() to encompass iwl_mvm_free_inactive_queue() to protect sta/queue_owner. However, this puts the call to synchronize_net() in an RCU read critical section:

iwl_mvm_inactivity_check()
  -> iwl_mvm_free_inactive_queue()
    -> iwl_mvm_remove_sta_queue_marking()
      -> synchronize_net()

The attached patch removes the synchronize_net() as we can't call this from a RCU read critical section. I just need to finish my analysis on the original assertion that the TX path might be busy while we are destroying it.
Comment 3 Mark Asselstine 2020-04-03 14:27:03 UTC
Created attachment 288181 [details]
Patch, exit RCU read critical section earlier
Comment 4 gabriello.ramirez 2020-04-04 02:19:08 UTC
Created attachment 288197 [details]
complete logs with WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]

(In reply to Mark Asselstine from comment #3)
> Created attachment 288181 [details]
> Patch, exit RCU read critical section earlier

Hi, 

using kernel 5.6.2-300.iwlwififix5.fc31.x86_64 with the following patches:

0001-iwlwifi-Don-t-use-non-existent-queues.patch
0001-iwlwifi-set-free_queue-to-the-loop-variable-not-a-me.patch
0001-iwlwifi-we-can-synchronize_net-while-in-a-RCU-read-c.patch (2020-04-03 14:27:03)

in the logs appears:

Apr 03 15:12:23 kernel: ------------[ cut here ]------------
Apr 03 15:12:23 kernel: Invalid TXQ id
Apr 03 15:12:23 kernel: WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
Apr 03 15:12:23 kernel: Modules linked in: ccm 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat iwlmvm mac80211 libarc4 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mei_hdcp snd_hda_codec_generic iwlwifi iTCO_wdt intel_rapl_perf snd_hda_codec_hdmi iTCO_vendor_support wmi_bmof ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device joydev rfkill snd_pcm mei_me i2c_i801 snd_timer snd soundcore mei lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Apr 03 15:12:23 kernel:  drm_kms_helper crc32c_intel drm ums_realtek uas e1000e usb_storage r8169 wmi video
Apr 03 15:12:23 kernel: CPU: 2 PID: 887 Comm: irq/42-iwlwifi Not tainted 5.6.2-300.iwlwififix5.fc31.x86_64 #1
Apr 03 15:12:23 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Apr 03 15:12:23 kernel: RIP: 0010:iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
Apr 03 15:12:23 kernel: Code: 7d 08 e9 74 fd ff ff 44 0f b7 74 24 18 44 8b 6c 24 20 e9 dd fb ff ff 48 c7 c7 76 3c b2 c0 c6 05 94 13 05 00 01 e8 f0 3e 5f f5 <0f> 0b e9 74 fe ff ff c7 44 24 08 08 00 00 00 e9 57 fd ff ff 0f 0b
Apr 03 15:12:23 kernel: RSP: 0018:ffffadb9c0148bc0 EFLAGS: 00010282
Apr 03 15:12:23 kernel: RAX: 000000000000000e RBX: ffff901e5a454cd0 RCX: 0000000000000002
Apr 03 15:12:23 kernel: RDX: 0000000000007788 RSI: ffff901e84678d78 RDI: 0000000000000246
Apr 03 15:12:23 kernel: RBP: ffff901e76cdb3e8 R08: 00000062feef9e44 R09: 0000000000000000
Apr 03 15:12:23 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff901e60f2eb00
Apr 03 15:12:23 kernel: R13: ffffadb9c0148c40 R14: 0000000000004288 R15: 0000000000000000
Apr 03 15:12:23 kernel: FS:  0000000000000000(0000) GS:ffff901e8ec00000(0000) knlGS:0000000000000000
Apr 03 15:12:23 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 15:12:23 kernel: CR2: 00007f15e1ae3000 CR3: 00000001e3612005 CR4: 00000000001606e0
Apr 03 15:12:23 kernel: Call Trace:
Apr 03 15:12:23 kernel:  <IRQ>
Apr 03 15:12:23 kernel:  iwl_mvm_tx_skb_sta+0x1a3/0x480 [iwlmvm]
Apr 03 15:12:23 kernel:  ? find_held_lock+0x32/0x90
Apr 03 15:12:23 kernel:  ? mark_held_locks+0x2d/0x80
Apr 03 15:12:23 kernel:  iwl_mvm_tx_skb+0x17/0x40 [iwlmvm]
Apr 03 15:12:23 kernel:  ieee80211_tx_frags+0x16d/0x230 [mac80211]
Apr 03 15:12:23 kernel:  __ieee80211_tx+0x79/0x1c0 [mac80211]
Apr 03 15:12:23 kernel:  ieee80211_tx+0x11f/0x150 [mac80211]
Apr 03 15:12:23 kernel:  ieee80211_tx_pending+0x118/0x380 [mac80211]
Apr 03 15:12:23 kernel:  ? mark_held_locks+0x2d/0x80
Apr 03 15:12:23 kernel:  tasklet_action_common.isra.0+0x7e/0x190
Apr 03 15:12:23 kernel:  __do_softirq+0xe1/0x45d
Apr 03 15:12:23 kernel:  do_softirq_own_stack+0x2a/0x40
Apr 03 15:12:23 kernel:  </IRQ>
Apr 03 15:12:23 kernel:  ? iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel:  do_softirq.part.0+0x60/0x70
Apr 03 15:12:23 kernel:  __local_bh_enable_ip+0xd0/0xe0
Apr 03 15:12:23 kernel:  iwl_pcie_irq_handler+0x7b0/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel:  ? irq_thread+0x88/0x190
Apr 03 15:12:23 kernel:  ? irq_forced_thread_fn+0x80/0x80
Apr 03 15:12:23 kernel:  irq_thread_fn+0x20/0x60
Apr 03 15:12:23 kernel:  irq_thread+0xe9/0x190
Apr 03 15:12:23 kernel:  ? lockdep_hardirqs_on+0xf0/0x180
Apr 03 15:12:23 kernel:  ? wake_threads_waitq+0x30/0x30
Apr 03 15:12:23 kernel:  kthread+0x106/0x140
Apr 03 15:12:23 kernel:  ? irq_thread_check_affinity+0xb0/0xb0
Apr 03 15:12:23 kernel:  ? kthread_park+0x90/0x90
Apr 03 15:12:23 kernel:  ret_from_fork+0x3a/0x50
Apr 03 15:12:23 kernel: irq event stamp: 666353
Apr 03 15:12:23 kernel: hardirqs last  enabled at (666352): [<ffffffffb616fae5>] vprintk_emit+0x155/0x350
Apr 03 15:12:23 kernel: hardirqs last disabled at (666353): [<ffffffffb60038b2>] trace_hardirqs_off_thunk+0x1a/0x1c
Apr 03 15:12:23 kernel: softirqs last  enabled at (666334): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel: softirqs last disabled at (666335): [<ffffffffb6c0107a>] do_softirq_own_stack+0x2a/0x40
Apr 03 15:12:23 kernel: ---[ end trace 588779a720b2e611 ]---


and at 19:18:12 appears another WARNING:

Apr 03 19:18:12 kernel: ------------[ cut here ]------------
Apr 03 19:18:12 kernel: WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1728 iwl_mvm_rx_tx_cmd+0xb52/0xc80 [iwlmvm]
Apr 03 19:18:12 kernel: Modules linked in: ccm 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat iwlmvm mac80211 libarc4 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mei_hdcp snd_hda_codec_generic iwlwifi iTCO_wdt intel_rapl_perf snd_hda_codec_hdmi iTCO_vendor_support wmi_bmof ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device joydev rfkill snd_pcm mei_me i2c_i801 snd_timer snd soundcore mei lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Apr 03 19:18:12 kernel:  drm_kms_helper crc32c_intel drm ums_realtek uas e1000e usb_storage r8169 wmi video
Apr 03 19:18:12 kernel: CPU: 2 PID: 887 Comm: irq/42-iwlwifi Tainted: G        W         5.6.2-300.iwlwififix5.fc31.x86_64 #1
Apr 03 19:18:12 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Apr 03 19:18:12 kernel: RIP: 0010:iwl_mvm_rx_tx_cmd+0xb52/0xc80 [iwlmvm]
Apr 03 19:18:12 kernel: Code: 0f 85 4f f7 ff ff 48 c7 c2 f8 8d b2 c0 be ba 04 00 00 48 c7 c7 c8 90 b2 c0 c6 05 35 f9 04 00 01 e8 5f 95 66 f5 e9 2b f7 ff ff <0f> 0b e9 74 f8 ff ff 0f 0b e9 3d f8 ff ff 48 63 44 24 14 44 8b 4c
Apr 03 19:18:12 kernel: RSP: 0018:ffffadb9c0537c78 EFLAGS: 00010202
Apr 03 19:18:12 kernel: RAX: 0000000000000005 RBX: ffffffffc0b1c2c0 RCX: 0000000000000001
Apr 03 19:18:12 kernel: RDX: 0000000000000004 RSI: ffffadb9c0537e00 RDI: ffff901e76cdb3e8
Apr 03 19:18:12 kernel: RBP: ffff901e76cdb3e8 R08: ffffadb9c0537e00 R09: 0000000000000000
Apr 03 19:18:12 kernel: R10: 00000003e9d18000 R11: 0000000000000000 R12: ffffadb9c0537e00
Apr 03 19:18:12 kernel: R13: ffff901e76cdc438 R14: ffff901e76cdb3e8 R15: ffff901e69d18000
Apr 03 19:18:12 kernel: FS:  0000000000000000(0000) GS:ffff901e8ec00000(0000) knlGS:0000000000000000
Apr 03 19:18:12 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 19:18:12 kernel: CR2: 00007f15e1ae3000 CR3: 00000001e3612002 CR4: 00000000001606e0
Apr 03 19:18:12 kernel: Call Trace:
Apr 03 19:18:12 kernel:  ? rcu_read_lock_sched_held+0x32/0x70
Apr 03 19:18:12 kernel:  ? __iwl_dbg+0x12d/0x160 [iwlwifi]
Apr 03 19:18:12 kernel:  iwl_mvm_rx_common+0xcd/0x2b0 [iwlmvm]
Apr 03 19:18:12 kernel:  iwl_pcie_rx_handle+0x408/0xbf0 [iwlwifi]
Apr 03 19:18:12 kernel:  iwl_pcie_irq_handler+0x79f/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel:  ? irq_thread+0x88/0x190
Apr 03 19:18:12 kernel:  ? irq_forced_thread_fn+0x80/0x80
Apr 03 19:18:12 kernel:  irq_thread_fn+0x20/0x60
Apr 03 19:18:12 kernel:  irq_thread+0xe9/0x190
Apr 03 19:18:12 kernel:  ? lockdep_hardirqs_on+0xf0/0x180
Apr 03 19:18:12 kernel:  ? wake_threads_waitq+0x30/0x30
Apr 03 19:18:12 kernel:  kthread+0x106/0x140
Apr 03 19:18:12 kernel:  ? irq_thread_check_affinity+0xb0/0xb0
Apr 03 19:18:12 kernel:  ? kthread_park+0x90/0x90
Apr 03 19:18:12 kernel:  ret_from_fork+0x3a/0x50
Apr 03 19:18:12 kernel: irq event stamp: 66553165
Apr 03 19:18:12 kernel: hardirqs last  enabled at (66553164): [<ffffffffb6b605b6>] _raw_spin_unlock_irqrestore+0x46/0x60
Apr 03 19:18:12 kernel: hardirqs last disabled at (66553165): [<ffffffffb60038b2>] trace_hardirqs_off_thunk+0x1a/0x1c
Apr 03 19:18:12 kernel: softirqs last  enabled at (66553156): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel: softirqs last disabled at (66553162): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel: ---[ end trace 588779a720b2e612 ]---
...
...

the complete log is in the attachment.

thanks,

Gabriel
Comment 5 gabriello.ramirez 2020-04-04 03:03:26 UTC
Created attachment 288199 [details]
kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924

complete logs with WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]

(In reply to Mark Asselstine from comment #3)
> Created attachment 288181 [details]
> Patch, exit RCU read critical section earlier

Hi, 

using kernel 5.6.2-300.iwlwififix5.fc31.x86_64 with the following patches:

0001-iwlwifi-Don-t-use-non-existent-queues.patch
0001-iwlwifi-set-free_queue-to-the-loop-variable-not-a-me.patch
0001-iwlwifi-we-can-synchronize_net-while-in-a-RCU-read-c.patch (2020-04-03 14:27:03)

in the logs appears:

Apr 03 15:12:23 kernel: ------------[ cut here ]------------
Apr 03 15:12:23 kernel: Invalid TXQ id
Apr 03 15:12:23 kernel: WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
Apr 03 15:12:23 kernel: Modules linked in: ccm 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat iwlmvm mac80211 libarc4 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mei_hdcp snd_hda_codec_generic iwlwifi iTCO_wdt intel_rapl_perf snd_hda_codec_hdmi iTCO_vendor_support wmi_bmof ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device joydev rfkill snd_pcm mei_me i2c_i801 snd_timer snd soundcore mei lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Apr 03 15:12:23 kernel:  drm_kms_helper crc32c_intel drm ums_realtek uas e1000e usb_storage r8169 wmi video
Apr 03 15:12:23 kernel: CPU: 2 PID: 887 Comm: irq/42-iwlwifi Not tainted 5.6.2-300.iwlwififix5.fc31.x86_64 #1
Apr 03 15:12:23 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Apr 03 15:12:23 kernel: RIP: 0010:iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
Apr 03 15:12:23 kernel: Code: 7d 08 e9 74 fd ff ff 44 0f b7 74 24 18 44 8b 6c 24 20 e9 dd fb ff ff 48 c7 c7 76 3c b2 c0 c6 05 94 13 05 00 01 e8 f0 3e 5f f5 <0f> 0b e9 74 fe ff ff c7 44 24 08 08 00 00 00 e9 57 fd ff ff 0f 0b
Apr 03 15:12:23 kernel: RSP: 0018:ffffadb9c0148bc0 EFLAGS: 00010282
Apr 03 15:12:23 kernel: RAX: 000000000000000e RBX: ffff901e5a454cd0 RCX: 0000000000000002
Apr 03 15:12:23 kernel: RDX: 0000000000007788 RSI: ffff901e84678d78 RDI: 0000000000000246
Apr 03 15:12:23 kernel: RBP: ffff901e76cdb3e8 R08: 00000062feef9e44 R09: 0000000000000000
Apr 03 15:12:23 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff901e60f2eb00
Apr 03 15:12:23 kernel: R13: ffffadb9c0148c40 R14: 0000000000004288 R15: 0000000000000000
Apr 03 15:12:23 kernel: FS:  0000000000000000(0000) GS:ffff901e8ec00000(0000) knlGS:0000000000000000
Apr 03 15:12:23 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 15:12:23 kernel: CR2: 00007f15e1ae3000 CR3: 00000001e3612005 CR4: 00000000001606e0
Apr 03 15:12:23 kernel: Call Trace:
Apr 03 15:12:23 kernel:  <IRQ>
Apr 03 15:12:23 kernel:  iwl_mvm_tx_skb_sta+0x1a3/0x480 [iwlmvm]
Apr 03 15:12:23 kernel:  ? find_held_lock+0x32/0x90
Apr 03 15:12:23 kernel:  ? mark_held_locks+0x2d/0x80
Apr 03 15:12:23 kernel:  iwl_mvm_tx_skb+0x17/0x40 [iwlmvm]
Apr 03 15:12:23 kernel:  ieee80211_tx_frags+0x16d/0x230 [mac80211]
Apr 03 15:12:23 kernel:  __ieee80211_tx+0x79/0x1c0 [mac80211]
Apr 03 15:12:23 kernel:  ieee80211_tx+0x11f/0x150 [mac80211]
Apr 03 15:12:23 kernel:  ieee80211_tx_pending+0x118/0x380 [mac80211]
Apr 03 15:12:23 kernel:  ? mark_held_locks+0x2d/0x80
Apr 03 15:12:23 kernel:  tasklet_action_common.isra.0+0x7e/0x190
Apr 03 15:12:23 kernel:  __do_softirq+0xe1/0x45d
Apr 03 15:12:23 kernel:  do_softirq_own_stack+0x2a/0x40
Apr 03 15:12:23 kernel:  </IRQ>
Apr 03 15:12:23 kernel:  ? iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel:  do_softirq.part.0+0x60/0x70
Apr 03 15:12:23 kernel:  __local_bh_enable_ip+0xd0/0xe0
Apr 03 15:12:23 kernel:  iwl_pcie_irq_handler+0x7b0/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel:  ? irq_thread+0x88/0x190
Apr 03 15:12:23 kernel:  ? irq_forced_thread_fn+0x80/0x80
Apr 03 15:12:23 kernel:  irq_thread_fn+0x20/0x60
Apr 03 15:12:23 kernel:  irq_thread+0xe9/0x190
Apr 03 15:12:23 kernel:  ? lockdep_hardirqs_on+0xf0/0x180
Apr 03 15:12:23 kernel:  ? wake_threads_waitq+0x30/0x30
Apr 03 15:12:23 kernel:  kthread+0x106/0x140
Apr 03 15:12:23 kernel:  ? irq_thread_check_affinity+0xb0/0xb0
Apr 03 15:12:23 kernel:  ? kthread_park+0x90/0x90
Apr 03 15:12:23 kernel:  ret_from_fork+0x3a/0x50
Apr 03 15:12:23 kernel: irq event stamp: 666353
Apr 03 15:12:23 kernel: hardirqs last  enabled at (666352): [<ffffffffb616fae5>] vprintk_emit+0x155/0x350
Apr 03 15:12:23 kernel: hardirqs last disabled at (666353): [<ffffffffb60038b2>] trace_hardirqs_off_thunk+0x1a/0x1c
Apr 03 15:12:23 kernel: softirqs last  enabled at (666334): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel: softirqs last disabled at (666335): [<ffffffffb6c0107a>] do_softirq_own_stack+0x2a/0x40
Apr 03 15:12:23 kernel: ---[ end trace 588779a720b2e611 ]---


and at 19:18:12 appears another WARNING:

Apr 03 19:18:12 kernel: ------------[ cut here ]------------
Apr 03 19:18:12 kernel: WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1728 iwl_mvm_rx_tx_cmd+0xb52/0xc80 [iwlmvm]
Apr 03 19:18:12 kernel: Modules linked in: ccm 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat iwlmvm mac80211 libarc4 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mei_hdcp snd_hda_codec_generic iwlwifi iTCO_wdt intel_rapl_perf snd_hda_codec_hdmi iTCO_vendor_support wmi_bmof ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device joydev rfkill snd_pcm mei_me i2c_i801 snd_timer snd soundcore mei lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Apr 03 19:18:12 kernel:  drm_kms_helper crc32c_intel drm ums_realtek uas e1000e usb_storage r8169 wmi video
Apr 03 19:18:12 kernel: CPU: 2 PID: 887 Comm: irq/42-iwlwifi Tainted: G        W         5.6.2-300.iwlwififix5.fc31.x86_64 #1
Apr 03 19:18:12 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Apr 03 19:18:12 kernel: RIP: 0010:iwl_mvm_rx_tx_cmd+0xb52/0xc80 [iwlmvm]
Apr 03 19:18:12 kernel: Code: 0f 85 4f f7 ff ff 48 c7 c2 f8 8d b2 c0 be ba 04 00 00 48 c7 c7 c8 90 b2 c0 c6 05 35 f9 04 00 01 e8 5f 95 66 f5 e9 2b f7 ff ff <0f> 0b e9 74 f8 ff ff 0f 0b e9 3d f8 ff ff 48 63 44 24 14 44 8b 4c
Apr 03 19:18:12 kernel: RSP: 0018:ffffadb9c0537c78 EFLAGS: 00010202
Apr 03 19:18:12 kernel: RAX: 0000000000000005 RBX: ffffffffc0b1c2c0 RCX: 0000000000000001
Apr 03 19:18:12 kernel: RDX: 0000000000000004 RSI: ffffadb9c0537e00 RDI: ffff901e76cdb3e8
Apr 03 19:18:12 kernel: RBP: ffff901e76cdb3e8 R08: ffffadb9c0537e00 R09: 0000000000000000
Apr 03 19:18:12 kernel: R10: 00000003e9d18000 R11: 0000000000000000 R12: ffffadb9c0537e00
Apr 03 19:18:12 kernel: R13: ffff901e76cdc438 R14: ffff901e76cdb3e8 R15: ffff901e69d18000
Apr 03 19:18:12 kernel: FS:  0000000000000000(0000) GS:ffff901e8ec00000(0000) knlGS:0000000000000000
Apr 03 19:18:12 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 19:18:12 kernel: CR2: 00007f15e1ae3000 CR3: 00000001e3612002 CR4: 00000000001606e0
Apr 03 19:18:12 kernel: Call Trace:
Apr 03 19:18:12 kernel:  ? rcu_read_lock_sched_held+0x32/0x70
Apr 03 19:18:12 kernel:  ? __iwl_dbg+0x12d/0x160 [iwlwifi]
Apr 03 19:18:12 kernel:  iwl_mvm_rx_common+0xcd/0x2b0 [iwlmvm]
Apr 03 19:18:12 kernel:  iwl_pcie_rx_handle+0x408/0xbf0 [iwlwifi]
Apr 03 19:18:12 kernel:  iwl_pcie_irq_handler+0x79f/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel:  ? irq_thread+0x88/0x190
Apr 03 19:18:12 kernel:  ? irq_forced_thread_fn+0x80/0x80
Apr 03 19:18:12 kernel:  irq_thread_fn+0x20/0x60
Apr 03 19:18:12 kernel:  irq_thread+0xe9/0x190
Apr 03 19:18:12 kernel:  ? lockdep_hardirqs_on+0xf0/0x180
Apr 03 19:18:12 kernel:  ? wake_threads_waitq+0x30/0x30
Apr 03 19:18:12 kernel:  kthread+0x106/0x140
Apr 03 19:18:12 kernel:  ? irq_thread_check_affinity+0xb0/0xb0
Apr 03 19:18:12 kernel:  ? kthread_park+0x90/0x90
Apr 03 19:18:12 kernel:  ret_from_fork+0x3a/0x50
Apr 03 19:18:12 kernel: irq event stamp: 66553165
Apr 03 19:18:12 kernel: hardirqs last  enabled at (66553164): [<ffffffffb6b605b6>] _raw_spin_unlock_irqrestore+0x46/0x60
Apr 03 19:18:12 kernel: hardirqs last disabled at (66553165): [<ffffffffb60038b2>] trace_hardirqs_off_thunk+0x1a/0x1c
Apr 03 19:18:12 kernel: softirqs last  enabled at (66553156): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel: softirqs last disabled at (66553162): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel: ---[ end trace 588779a720b2e612 ]---
...
Apr 03 19:28:42 kernel: WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected

Apr 03 20:46:36 kernel: WARNING: CPU: 0 PID: 3126 at drivers/net/wireless/intel/iwlwifi/mvm/utils.c:635 iwl_mvm_reconfig_scd+0x153/0x160 [iwlmvm]
Apr 03 20:49:46 kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924
Apr 03 20:49:46 kernel: WARNING: CPU: 2 PID: 3191 at drivers/net/wireless/intel/iwlwifi/mvm/sta.c:861 iwl_mvm_enable_txq+0x221/0x2b0 [iwlmvm]
Apr 03 20:49:47 kernel: WARNING: CPU: 2 PID: 3142 at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1770 iwl_pcie_enqueue_hcmd+0xa46/0xc00 [iwlwifi]
...

the complete log is in the attachment.

obsoleted the previous attachment because this log contains the previous attachment and also appears:

Apr 03 20:49:46 kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924

thanks,

Gabriel
Comment 6 gabriello.ramirez 2020-04-27 19:03:00 UTC
Created attachment 288779 [details]
iwl_mvm_free_inactive_queue returns 0 or a error not a queue number

Hi,

iwl_mvm_free_inactive_queue returns 0 or a error not a queue number.
I don't know if when iwl_mvm_disable_txq returns nonzero the queue is usable so better to set free_queue to -ENOSPC

maybe is better to modify the for loop in iwl_mvm_inactivity_check
to get all free queues and not only the first one
so iwl_mvm_free_inactive_queue have more opportunity to get a usable queue

but iwl_mvm_free_inactive_queue cannot be under rcu_read_lock();

so the bug is resolved for me with:

https://bugzilla.kernel.org/attachment.cgi?id=287903
https://bugzilla.kernel.org/attachment.cgi?id=287955
and this patch

I hope the patches get to all stable kernel series soon I'm currently in 5.6.7 and it don't have the first two patches and them solved a kernel NULL pointer dereference.

Mark, thanks for all your help in solving the bugs,

Gabriel
Comment 7 triffid.hunter 2020-05-15 17:34:44 UTC
I have locally applied the two patches from #206765 mentioned in comment 6 plus attachment 288779 [details], and I'm still regularly getting

[51383.047342] WARNING: CPU: 7 PID: 1065 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1500 iwl_mvm_rx_tx_cmd+0x42e/0x750 [iwlmvm]
[51383.047343] Modules linked in: bridge stp llc xfrm_user rfcomm algif_skcipher bnep xt_mac xt_connmark xt_owner xt_mark xt_MASQUERADE iptable_nat nvidia_drm(PO) binfmt_misc vboxnetflt(O) vboxnetadp(O) vboxdrv(O) uinput nf_nat_sip nf_nat_irc nf_nat_ftp nvidia_modeset(PO) xt_state xt_nat nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conncount snd_hda_codec_hdmi xt_conntrack nct6775 hwmon_vid ppp_async crc_ccitt xt_l2tp l2tp_ppp pppox btusb ppp_generic btrtl slhc btbcm btintel l2tp_ip6 l2tp_ip bluetooth l2tp_eth l2tp_netlink mei_hdcp l2tp_debugfs l2tp_core kvm_intel ecdh_generic joydev ip6_udp_tunnel iwlmvm snd_hda_codec_realtek ecc udp_tunnel kvm snd_hda_codec_generic irqbypass fuse nvidia(PO) crc32_pclmul snd_hda_intel iwlwifi snd_intel_dspcfg i2c_i801 snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei efivarfs
[51383.047398] CPU: 7 PID: 1065 Comm: irq/144-iwlwifi Tainted: P        W  O      5.5.19-gentoo-dorellan #1
[51383.047399] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270M-ITX/ac, BIOS P2.20 05/17/2017
[51383.047411] RIP: 0010:iwl_mvm_rx_tx_cmd+0x42e/0x750 [iwlmvm]
[51383.047415] Code: 82 ff ff 4c 8b 54 24 08 41 88 85 f2 17 00 00 e9 05 fe ff ff 66 81 fb 8e 00 40 0f 94 c5 eb 9c 66 81 fb 85 00 40 0f 94 c5 eb 91 <0f> 0b 41 81 4e 28 00 01 00 00 eb 83 4d 89 ee e8 1e bf 3e d0 0f b6
[51383.047416] RSP: 0018:ffff979d004afcd0 EFLAGS: 00010246
[51383.047419] RAX: ffff9239a1657101 RBX: 0000000000000088 RCX: 0000000000d463e7
[51383.047421] RDX: 0000000000d463df RSI: 0000000000d463df RDI: 000000000002fad0
[51383.047422] RBP: ffff923d81e81e68 R08: 00000000000000fc R09: 0000000000000000
[51383.047424] R10: ffff923ae167ef30 R11: 0000000000000000 R12: ffff92398fa3502a
[51383.047426] R13: ffff923d81e81e68 R14: ffff923ae167ef00 R15: ffff92398fe8b000
[51383.047428] FS:  0000000000000000(0000) GS:ffff923d8edc0000(0000) knlGS:0000000000000000
[51383.047430] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[51383.047432] CR2: 000033587dafe000 CR3: 00000001bf60a003 CR4: 00000000003606e0
[51383.047434] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[51383.047436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[51383.047437] Call Trace:
[51383.047449]  iwl_mvm_rx_common+0xd9/0x290 [iwlmvm]
[51383.047458]  iwl_pcie_rx_handle+0x322/0x850 [iwlwifi]
[51383.047467]  iwl_pcie_irq_handler+0x1a9/0x760 [iwlwifi]
[51383.047471]  ? irq_forced_thread_fn+0x70/0x70
[51383.047474]  irq_thread_fn+0x1b/0x60
[51383.047478]  irq_thread+0xd7/0x160
[51383.047482]  ? wake_threads_waitq+0x30/0x30
[51383.047486]  kthread+0xf6/0x130
[51383.047489]  ? irq_thread_check_affinity+0x80/0x80
[51383.047492]  ? kthread_park+0x80/0x80
[51383.047498]  ret_from_fork+0x35/0x40
[51383.047501] ---[ end trace e80778eb46da3062 ]---
Comment 8 gabriello.ramirez 2020-05-16 18:51:44 UTC
Created attachment 289159 [details]
dont call iwl_mvm_free_inactive_queue, iwl_mvm_unshare_queue, and iwl_mvm_change_queue_tid under rcu_read_lock()

(In reply to triffid.hunter from comment #7)
> I have locally applied the two patches from #206765 mentioned in comment 6
> plus attachment 288779 [details], and I'm still regularly getting
> 
> [51383.047342] WARNING: CPU: 7 PID: 1065 at
> drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1500
> iwl_mvm_rx_tx_cmd+0x42e/0x750 [iwlmvm]

The last time the above WARNING triggered in my machine was under kernel 5.6.8

in 5.6.[11-12-13] don't happen, but I'm running a different patch (attached in this comment) from comment 6 because my machine triggered:

BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924

in functions iwl_mvm_unshare_queue and iwl_mvm_change_queue_tid

so I don't know if the patch attached here fixed the WARNING from comment 7 or was fixed in a kernel newer than 5.6.8

with this patch the only WARNING triggered in my machine are:

May 15 21:16:38 kernel: Invalid TXQ id
May 15 21:16:38 kernel: WARNING: CPU: 2 PID: 920 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x53f/0x560 [iwlmvm]

and

May 15 22:47:59 kernel: WARNING: CPU: 2 PID: 920 at drivers/net/wireless/intel/iwlwifi/mvm/sta.c:3732 iwl_mvm_sta_modify_sleep_tx_count+0x259/0x300 [iwlmvm] 

Gabriel
Comment 9 Luca Coelho 2020-06-08 11:27:53 UTC
We made a bunch of fixes in this area which were included in v5.7-rc3, v5.4.36 and v5.6.8.  Can you retest with one of these versions?
Comment 10 gabriello.ramirez 2020-06-09 23:21:19 UTC
(In reply to Luca Coelho from comment #9)
> We made a bunch of fixes in this area which were included in v5.7-rc3,
> v5.4.36 and v5.6.8.  Can you retest with one of these versions?

Hi,

I tested v5.6.8 and newer currently v5.7 and this bug isn't fixed.

this bug is fixed with:

https://bugzilla.kernel.org/attachment.cgi?id=289159

because iwl_mvm_free_inactive_queue returns an error not a queue number 
so when it returns an error we need set queue_number to -ENOSPC not the error returned 

or maybe instead of returning -ENOSPC returns the queue_number

And dont call iwl_mvm_free_inactive_queue, iwl_mvm_unshare_queue, and iwl_mvm_change_queue_tid under rcu_read_lock() to prevent this bug

I'm running this patch for several days under v5.6.8+, currently v5.7, and this bug isn't triggered anymore

thanks,

Gabriel
Comment 11 Luca Coelho 2020-06-10 06:12:41 UTC
Okay, thanks for the report.  Where did you send the attached email to?

We have this patch that should solve the same problem: https://patchwork.kernel.org/patch/11472099/

But I didn't apply it because there were some concerns.  I'll send it for the rc series and for fixes soon.
Comment 12 triffid.hunter 2020-06-10 06:22:08 UTC
I'm having zero kernel messages from 5.7.1 after 17 hours of testing, whereas I could reliably get tons of aforementioned warnings in earlier releases.

I'll keep an eye on it, and report back if I come across anything such as the bug mentioned in comment 10 which doesn't seem to have triggered here yet.
Comment 13 gabriello.ramirez 2020-06-10 22:09:57 UTC
(In reply to Luca Coelho from comment #11)
> Okay, thanks for the report.  Where did you send the attached email to?
> 

I didn't, only attached here, because I was testing the patch so I wasn't sure to be any side effects from moving rcu_read_unlock(); above iwl_mvm_unshare_queue and iwl_mvm_change_queue_tid. but in my testing several weeks the bug isn't triggered anymore, and no side effects happened

and the patch has the journal of the bug instances when iwl_mvm_unshare_queue, and iwl_mvm_change_queue_tid were executed under rcu_read_unlock();

> We have this patch that should solve the same problem:
> https://patchwork.kernel.org/patch/11472099/

I tried a similar version several weeks but in my testing the bug still happened when iwl_mvm_unshare_queue and/or iwl_mvm_change_queue_tid executed under rcu_read_unlock 

I reviewed https://bugzilla.kernel.org/attachment.cgi?id=289159 again and seems to be unnecessary set queue_number = -ENOSPC so I'm currently testing a patch with return ret; instead

> 
> But I didn't apply it because there were some concerns.  I'll send it for
> the rc series and for fixes soon.

Well in my computer the bug still happen with it, but may be particular to my computer workload.

Gabriel
Comment 14 triffid.hunter 2020-06-15 16:23:02 UTC
After a little more testing with 5.7.1 it turns out it's just taking longer to trigger.

[380624.367421] WARNING: CPU: 7 PID: 1019 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1495 iwl_mvm_rx_tx_cmd+0x434/0x760 [iwlmvm]
[380624.367422] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic libcurve25519_generic bridge stp llc xfrm_user rfcomm algif_skcipher bnep xt_mac xt_connmark xt_owner xt_mark xt_MASQUERADE iptable_nat nvidia_drm(PO) binfmt_misc vboxnetflt(O) vboxnetadp(O) vboxdrv(O) uinput nvidia_modeset(PO) nf_nat_sip nf_nat_irc nf_nat_ftp xt_state xt_nat nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conncount xt_conntrack nct6775 hwmon_vid ppp_async crc_ccitt xt_l2tp btusb l2tp_ppp btrtl pppox btbcm ppp_generic btintel slhc bluetooth l2tp_ip6 l2tp_ip l2tp_eth l2tp_netlink ecdh_generic ecc l2tp_debugfs joydev kvm_intel l2tp_core snd_hda_codec_realtek ip6_udp_tunnel udp_tunnel mei_hdcp snd_hda_codec_generic snd_hda_codec_hdmi fuse kvm iwlmvm snd_hda_intel irqbypass snd_intel_dspcfg nvidia(PO) crc32_pclmul snd_hda_codec snd_hwdep iwlwifi i2c_i801 snd_hda_core snd_pcm
[380624.367463]  mei_me mei efivarfs
[380624.367468] CPU: 7 PID: 1019 Comm: irq/144-iwlwifi Tainted: P        W  O      5.7.1-gentoo-dorellan #1
[380624.367469] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270M-ITX/ac, BIOS P2.20 05/17/2017
[380624.367478] RIP: 0010:iwl_mvm_rx_tx_cmd+0x434/0x760 [iwlmvm]
[380624.367489] Code: 82 ff ff 4c 8b 54 24 08 41 88 85 ea 17 00 00 e9 ff fd ff ff 66 81 fb 8e 00 40 0f 94 c5 eb 9c 66 81 fb 85 00 40 0f 94 c5 eb 91 <0f> 0b 41 81 4e 28 00 01 00 00 eb 83 4d 89 ee e8 d8 03 b0 c1 0f b6
[380624.367495] RSP: 0018:ffffa90500433cd0 EFLAGS: 00010246
[380624.367502] RAX: ffffa1cba7097101 RBX: 0000000000000088 RCX: 00000000048aa3ff
[380624.367504] RDX: 00000000048aa3f7 RSI: 00000000048aa3f7 RDI: 000000000002fcc0
[380624.367505] RBP: ffffa1d281cf9e88 R08: 00000000000000fd R09: 0000000000000000
[380624.367507] R10: ffffa1ceb136b630 R11: 0000000000000000 R12: ffffa1d1ee30a02e
[380624.367508] R13: ffffa1d281cf9e88 R14: ffffa1ceb136b600 R15: ffffa1cb4faeb000
[380624.367510] FS:  0000000000000000(0000) GS:ffffa1d28edc0000(0000) knlGS:0000000000000000
[380624.367512] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[380624.367513] CR2: 00001af755f8b000 CR3: 00000003c080a001 CR4: 00000000003606e0
[380624.367515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[380624.367516] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[380624.367517] Call Trace:
[380624.367526]  iwl_mvm_rx_common+0xd9/0x290 [iwlmvm]
[380624.367534]  iwl_pcie_rx_handle+0x320/0x820 [iwlwifi]
[380624.367547]  iwl_pcie_irq_handler+0x1a9/0x760 [iwlwifi]
[380624.367558]  ? irq_forced_thread_fn+0x70/0x70
[380624.367567]  irq_thread_fn+0x1b/0x60
[380624.367576]  irq_thread+0xd7/0x160
[380624.367584]  ? wake_threads_waitq+0x30/0x30
[380624.367592]  kthread+0x10e/0x150
[380624.367598]  ? irq_thread_check_affinity+0x80/0x80
[380624.367601]  ? kthread_park+0x80/0x80
[380624.367604]  ret_from_fork+0x35/0x40
[380624.367607] ---[ end trace f114c08ca41e94d5 ]---

Uptime 6 days, 63 instances of this warning in the kernel buffer, and it's overflowed (dmesg | head shows half a warning message) so there may have been more.
Comment 15 triffid.hunter 2020-07-02 07:58:36 UTC
From 5.7.6:

[49246.106047] ------------[ cut here ]------------
[49246.106074] WARNING: CPU: 6 PID: 944 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1495 iwl_mvm_rx_tx_cmd+0x434/0x760 [iwlmvm]
[49246.106076] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic libcurve25519_generic bridge stp llc xfrm_user rfcomm algif_skcipher bnep xt_mac xt_connmark xt_owner xt_mark xt_MASQUERADE iptable_nat nvidia_drm(PO) binfmt_misc uinput nvidia_modeset(PO) nf_nat_sip nf_nat_irc nf_nat_ftp xt_state xt_nat nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conncount xt_conntrack nct6775 hwmon_vid btusb btrtl btbcm btintel ppp_async crc_ccitt bluetooth joydev xt_l2tp ecdh_generic ecc l2tp_ppp pppox ppp_generic slhc l2tp_ip6 l2tp_ip l2tp_eth l2tp_netlink l2tp_debugfs l2tp_core ip6_udp_tunnel udp_tunnel mei_hdcp snd_hda_codec_realtek kvm_intel snd_hda_codec_generic snd_hda_codec_hdmi iwlmvm fuse kvm snd_hda_intel irqbypass nvidia(PO) i2c_i801 crc32_pclmul snd_intel_dspcfg iwlwifi snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei efivarfs
[49246.106158] CPU: 6 PID: 944 Comm: irq/144-iwlwifi Tainted: P           O      5.7.6-gentoo-dorellan #1
[49246.106161] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270M-ITX/ac, BIOS P2.20 05/17/2017
[49246.106179] RIP: 0010:iwl_mvm_rx_tx_cmd+0x434/0x760 [iwlmvm]
[49246.106185] Code: 82 ff ff 4c 8b 54 24 08 41 88 85 ea 17 00 00 e9 ff fd ff ff 66 81 fb 8e 00 40 0f 94 c5 eb 9c 66 81 fb 85 00 40 0f 94 c5 eb 91 <0f> 0b 41 81 4e 28 00 01 00 00 eb 83 4d 89 ee e8 98 04 41 f8 0f b6
[49246.106189] RSP: 0018:ffff9d0a006d7cd0 EFLAGS: 00010246
[49246.106193] RAX: 0000000000000000 RBX: 0000000000000088 RCX: 000000008020001f
[49246.106196] RDX: 0000000080200020 RSI: 0000000000000001 RDI: 0000000000000000
[49246.106199] RBP: ffff9991810d1e88 R08: 0000000000000001 R09: 0000000000000001
[49246.106202] R10: ffff999183cb9e30 R11: 0000000000000000 R12: ffff998fdc2fb826
[49246.106204] R13: ffff9991810d1e88 R14: ffff999183cb9e00 R15: ffff998ee2bdd000
[49246.106209] FS:  0000000000000000(0000) GS:ffff99918ed80000(0000) knlGS:0000000000000000
[49246.106212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[49246.106215] CR2: 00005651d0c0ef50 CR3: 000000035480a003 CR4: 00000000003606e0
[49246.106218] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[49246.106221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[49246.106223] Call Trace:
[49246.106244]  iwl_mvm_rx_common+0xd9/0x290 [iwlmvm]
[49246.106257]  iwl_pcie_rx_handle+0x320/0x820 [iwlwifi]
[49246.106267]  iwl_pcie_irq_handler+0x1a9/0x760 [iwlwifi]
[49246.106275]  ? irq_forced_thread_fn+0x70/0x70
[49246.106278]  irq_thread_fn+0x1b/0x60
[49246.106283]  irq_thread+0xd7/0x160
[49246.106287]  ? wake_threads_waitq+0x30/0x30
[49246.106293]  kthread+0x10e/0x150
[49246.106297]  ? irq_thread_check_affinity+0x80/0x80
[49246.106301]  ? kthread_park+0x80/0x80
[49246.106306]  ret_from_fork+0x35/0x40
[49246.106311] ---[ end trace e97e4e008c682eab ]---
Comment 16 triffid.hunter 2020-11-29 11:05:41 UTC
From 5.8.18:

[159490.455557] WARNING: CPU: 6 PID: 961 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1492 iwl_mvm_rx_tx_cmd+0x43b/0x760 [iwlmvm]
[159490.455557] Modules linked in: wireguard libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic curve25519_x86_64 libcurve25519_generic bridge stp llc xfrm_user rfcomm algif_skcipher bnep zram zsmalloc xt_connmark xt_owner xt_mac xt_mark xt_MASQUERADE iptable_nat binfmt_misc nvidia_drm(PO) uinput nf_nat_sip nf_nat_irc nf_nat_ftp xt_state xt_nat nf_conntrack_sip nf_conntrack_pptp nvidia_modeset(PO) nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conncount xt_conntrack nct6775 hwmon_vid ppp_async crc_ccitt xt_l2tp l2tp_ppp pppox ppp_generic slhc l2tp_ip6 l2tp_ip joydev l2tp_eth l2tp_netlink l2tp_debugfs btusb btrtl btbcm btintel l2tp_core snd_hda_codec_realtek ip6_udp_tunnel udp_tunnel snd_hda_codec_hdmi snd_hda_codec_generic ee1004 bluetooth mei_hdcp fuse ecdh_generic ecc snd_hda_intel nvidia(PO) snd_intel_dspcfg iwlmvm kvm_intel snd_hda_codec kvm snd_hwdep i2c_i801 irqbypass snd_hda_core i2c_smbus iwlwifi crc32_pclmul snd_pcm mei_me
[159490.455571]  mei efivarfs
[159490.455572] CPU: 6 PID: 961 Comm: irq/144-iwlwifi Tainted: P        W  O      5.8.18-gentoo-dorellan #1
[159490.455573] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270M-ITX/ac, BIOS P2.20 05/17/2017
[159490.455576] RIP: 0010:iwl_mvm_rx_tx_cmd+0x43b/0x760 [iwlmvm]
[159490.455576] Code: 81 ff ff 4c 8b 54 24 08 41 88 85 e2 17 00 00 e9 01 fe ff ff 66 81 fb 8e 00 40 0f 94 c5 eb 9c 66 81 fb 85 00 40 0f 94 c5 eb 91 <0f> 0b 41 81 4e 28 00 01 00 00 eb 83 4d 89 ee e8 a1 e9 ce cb 48 63
[159490.455577] RSP: 0018:ffffb8e7003d7cd0 EFLAGS: 00010246
[159490.455578] RAX: ffff9573db82ed01 RBX: 0000000000000088 RCX: 000000000045ed16
[159490.455578] RDX: 000000000045ed0e RSI: 000000000045ed0e RDI: 000000000002f040
[159490.455579] RBP: ffff9576bb311e68 R08: 00000000000000ff R09: 0000000000000000
[159490.455579] R10: ffff9574f150cb30 R11: 0000000000000000 R12: ffff957258f975c0
[159490.455580] R13: ffff9576bb311e68 R14: ffff9574f150cb00 R15: ffff9572ef729000
[159490.455580] FS:  0000000000000000(0000) GS:ffff9576ced80000(0000) knlGS:0000000000000000
[159490.455581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[159490.455581] CR2: 00007fb27f255000 CR3: 000000020d80a006 CR4: 00000000003606e0
[159490.455582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[159490.455582] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[159490.455583] Call Trace:
[159490.455586]  iwl_mvm_rx_common+0xd9/0x290 [iwlmvm]
[159490.455588]  iwl_pcie_rx_handle+0x34b/0x8b0 [iwlwifi]
[159490.455590]  iwl_pcie_irq_handler+0x1a9/0x760 [iwlwifi]
[159490.455591]  ? irq_forced_thread_fn+0x70/0x70
[159490.455592]  irq_thread_fn+0x1b/0x60
[159490.455593]  irq_thread+0xd7/0x160
[159490.455594]  ? wake_threads_waitq+0x30/0x30
[159490.455595]  ? irq_thread_check_affinity+0x80/0x80
[159490.455596]  kthread+0x111/0x150
[159490.455597]  ? kthread_park+0x80/0x80
[159490.455598]  ret_from_fork+0x22/0x30
[159490.455599] ---[ end trace c0b50996c06f211f ]---

And it seems like it's making my connection to my router drop and reconnect..