Bug 206971 - iwlwifi: BUG: sleeping function called from invalid context at net/core/dev.c:9919
Summary: iwlwifi: BUG: sleeping function called from invalid context at net/core/dev.c...
Status: ASSIGNED
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless-intel (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-26 02:20 UTC by gabriello.ramirez
Modified: 2020-11-29 11:05 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.6.0-0.rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
BUG: sleeping function called from invalid context at net/core/dev.c:9919 (297.31 KB, text/plain)
2020-03-26 02:20 UTC, gabriello.ramirez
Details
BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924 (514.61 KB, text/plain)
2020-03-26 02:39 UTC, gabriello.ramirez
Details
partial patch (2.82 KB, patch)
2020-03-27 02:27 UTC, Mark Asselstine
Details | Diff
Patch, exit RCU read critical section earlier (5.82 KB, patch)
2020-04-03 14:27 UTC, Mark Asselstine
Details | Diff
complete logs with WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm] (329.98 KB, text/plain)
2020-04-04 02:19 UTC, gabriello.ramirez
Details
kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924 (394.16 KB, text/plain)
2020-04-04 03:03 UTC, gabriello.ramirez
Details
iwl_mvm_free_inactive_queue returns 0 or a error not a queue number (1.65 KB, patch)
2020-04-27 19:03 UTC, gabriello.ramirez
Details | Diff
dont call iwl_mvm_free_inactive_queue, iwl_mvm_unshare_queue, and iwl_mvm_change_queue_tid under rcu_read_lock() (7.46 KB, application/mbox)
2020-05-16 18:51 UTC, gabriello.ramirez
Details

Description gabriello.ramirez 2020-03-26 02:20:28 UTC
Created attachment 288063 [details]
BUG: sleeping function called from invalid context at net/core/dev.c:9919

Hi, 

I have a Fedora 31 x86_64 machine with a wireless pci-express card with chipset 08:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 50)

as a wireless access point with hostapd-2.9-2.fc31.x86_64

and in the logs appear the following:

BUG: sleeping function called from invalid context at net/core/dev.c:9919
Mar 23 23:47:06 kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 372, name: kworker/0:2
Mar 23 23:47:06 kernel: INFO: lockdep is turned off.
Mar 23 23:47:06 kernel: CPU: 0 PID: 372 Comm: kworker/0:2 Not tainted 5.6.0-0.rc7.git0.2.iwlwififix3.fc31.x86_64 #1
Mar 23 23:47:06 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Mar 23 23:47:06 kernel: Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Mar 23 23:47:06 kernel: Call Trace:
Mar 23 23:47:06 kernel:  dump_stack+0x8f/0xd0
Mar 23 23:47:06 kernel:  ___might_sleep.cold+0xb3/0xc3
Mar 23 23:47:06 kernel:  synchronize_net+0x18/0x30
Mar 23 23:47:06 kernel:  iwl_mvm_inactivity_check+0x8ac/0xf10 [iwlmvm]
Mar 23 23:47:06 kernel:  iwl_mvm_add_new_dqa_stream_wk+0x176/0x920 [iwlmvm]
Mar 23 23:47:06 kernel:  process_one_work+0x25d/0x570
Mar 23 23:47:06 kernel:  worker_thread+0x50/0x3b0
Mar 23 23:47:06 kernel:  kthread+0x106/0x140
Mar 23 23:47:06 kernel:  ? process_one_work+0x570/0x570
Mar 23 23:47:06 kernel:  ? kthread_park+0x90/0x90
Mar 23 23:47:06 kernel:  ret_from_fork+0x3a/0x50
Mar 23 23:47:06 kernel: BUG: scheduling while atomic: kworker/0:2/372/0x00000002
Mar 23 23:47:06 kernel: INFO: lockdep is turned off.
Mar 23 23:47:06 kernel: Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables ccm 8021q garp mrp bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat intel_rapl_msr mei_hdcp iTCO_wdt iTCO_vendor_support intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass iwlmvm mac80211 libarc4 snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic iwlwifi ledtrig_audio joydev ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg intel_cstate cfg80211 intel_uncore snd_hda_codec intel_rapl_perf snd_hda_core snd_hwdep rfkill snd_seq wmi_bmof i2c_i801 snd_seq_device mei_me snd_pcm snd_timer snd mei soundcore lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Mar 23 23:47:06 kernel:  drm_kms_helper crc32c_intel drm r8169 e1000e ums_realtek uas usb_storage wmi video
Mar 23 23:47:06 kernel: CPU: 0 PID: 372 Comm: kworker/0:2 Tainted: G        W         5.6.0-0.rc7.git0.2.iwlwififix3.fc31.x86_64 #1
Mar 23 23:47:06 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Mar 23 23:47:06 kernel: Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Mar 23 23:47:06 kernel: Call Trace:
Mar 23 23:47:06 kernel:  dump_stack+0x8f/0xd0
Mar 23 23:47:06 kernel:  __schedule_bug.cold+0x64/0x71
Mar 23 23:47:06 kernel:  __schedule+0x822/0x990
Mar 23 23:47:06 kernel:  ? wait_for_completion+0xe9/0x150
Mar 23 23:47:06 kernel:  schedule+0x5f/0xd0
Mar 23 23:47:06 kernel:  schedule_timeout+0x265/0x3a0
Mar 23 23:47:06 kernel:  ? wait_for_completion+0x38/0x150
Mar 23 23:47:06 kernel:  ? wait_for_completion+0xe9/0x150
Mar 23 23:47:06 kernel:  wait_for_completion+0x108/0x150
Mar 23 23:47:06 kernel:  ? wake_up_q+0xa0/0xa0
Mar 23 23:47:06 kernel:  __wait_rcu_gp+0x157/0x170
Mar 23 23:47:06 kernel:  synchronize_rcu.part.0+0x38/0x40
Mar 23 23:47:06 kernel:  ? invoke_rcu_core+0xb0/0xb0
Mar 23 23:47:06 kernel:  ? __bpf_trace_rcu_utilization+0x10/0x10
Mar 23 23:47:06 kernel:  ? _cond_resched+0x15/0x30
Mar 23 23:47:06 kernel:  iwl_mvm_inactivity_check+0x8ac/0xf10 [iwlmvm]
Mar 23 23:47:06 kernel:  iwl_mvm_add_new_dqa_stream_wk+0x176/0x920 [iwlmvm]
Mar 23 23:47:06 kernel:  process_one_work+0x25d/0x570
Mar 23 23:47:06 kernel:  worker_thread+0x50/0x3b0
Mar 23 23:47:06 kernel:  kthread+0x106/0x140
Mar 23 23:47:06 kernel:  ? process_one_work+0x570/0x570
Mar 23 23:47:06 kernel:  ? kthread_park+0x90/0x90
Mar 23 23:47:06 kernel:  ret_from_fork+0x3a/0x50
Mar 23 23:47:06 kernel: ------------[ cut here ]------------

after several hours the machine freeze, and reboots via a watchdog

thanks,

Gabriel
Comment 1 gabriello.ramirez 2020-03-26 02:39:29 UTC
Created attachment 288065 [details]
BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924

Hi,

using the emailed patch the first bug in the logs is:

hostapd[1750]: wlp8s0: STA 78:00:9e:ff:6d:3e IEEE 802.11: did not acknowledge authentication response
Mar 25 19:12:41 kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924
Mar 25 19:12:41 kernel: in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2026, name: kworker/3:0
Mar 25 19:12:41 kernel: INFO: lockdep is turned off.
Mar 25 19:12:41 kernel: CPU: 3 PID: 2026 Comm: kworker/3:0 Not tainted 5.6.0-0.rc7.git0.2.iwlwififix4.fc31.x86_64 #1
Mar 25 19:12:41 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Mar 25 19:12:41 kernel: Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Mar 25 19:12:41 kernel: Call Trace:
Mar 25 19:12:41 kernel:  dump_stack+0x8f/0xd0
Mar 25 19:12:41 kernel:  ___might_sleep.cold+0xb3/0xc3
Mar 25 19:12:41 kernel:  iwl_trans_pcie_send_hcmd+0xdf/0x420 [iwlwifi]
Mar 25 19:12:41 kernel:  iwl_trans_send_cmd+0x79/0x150 [iwlwifi]
Mar 25 19:12:41 kernel:  ? __iwl_dbg+0x107/0x160 [iwlwifi]
Mar 25 19:12:41 kernel:  iwl_mvm_send_cmd+0x28/0x70 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_send_cmd_pdu+0x43/0x50 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_disable_txq.constprop.0+0x169/0x210 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_inactivity_check+0x8c6/0xf00 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_add_new_dqa_stream_wk+0x176/0x920 [iwlmvm]
Mar 25 19:12:41 kernel:  process_one_work+0x25d/0x570
Mar 25 19:12:41 kernel:  worker_thread+0x50/0x3b0
Mar 25 19:12:41 kernel:  kthread+0x106/0x140
Mar 25 19:12:41 kernel:  ? process_one_work+0x570/0x570
Mar 25 19:12:41 kernel:  ? kthread_park+0x90/0x90
Mar 25 19:12:41 kernel:  ret_from_fork+0x3a/0x50
Mar 25 19:12:41 kernel: BUG: scheduling while atomic: kworker/3:0/2026/0x00000002
Mar 25 19:12:41 kernel: INFO: lockdep is turned off.
Mar 25 19:12:41 kernel: Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables ccm 8021q garp mrp bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel iwlmvm mac80211 kvm libarc4 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support mei_hdcp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio joydev iwlwifi snd_hda_codec_hdmi intel_cstate snd_hda_intel snd_intel_dspcfg intel_uncore snd_hda_codec wmi_bmof snd_hda_core intel_rapl_perf cfg80211 i2c_i801 snd_hwdep rfkill snd_seq snd_seq_device snd_pcm mei_me snd_timer lpc_ich mei snd soundcore ip_tables raid1 i915 i2c_algo_bit cec
Mar 25 19:12:41 kernel:  drm_kms_helper crc32c_intel r8169 drm ums_realtek uas e1000e usb_storage wmi video
Mar 25 19:12:41 kernel: CPU: 3 PID: 2026 Comm: kworker/3:0 Tainted: G        W         5.6.0-0.rc7.git0.2.iwlwififix4.fc31.x86_64 #1
Mar 25 19:12:41 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Mar 25 19:12:41 kernel: Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Mar 25 19:12:41 kernel: Call Trace:
Mar 25 19:12:41 kernel:  dump_stack+0x8f/0xd0
Mar 25 19:12:41 kernel:  __schedule_bug.cold+0x64/0x71
Mar 25 19:12:41 kernel:  __schedule+0x822/0x990
Mar 25 19:12:41 kernel:  schedule+0x5f/0xd0
Mar 25 19:12:41 kernel:  schedule_timeout+0x1a8/0x3a0
Mar 25 19:12:41 kernel:  ? __next_timer_interrupt+0xd0/0xd0
Mar 25 19:12:41 kernel:  iwl_trans_pcie_send_hcmd+0x258/0x420 [iwlwifi]
Mar 25 19:12:41 kernel:  ? finish_wait+0x90/0x90
Mar 25 19:12:41 kernel:  iwl_trans_send_cmd+0x79/0x150 [iwlwifi]
Mar 25 19:12:41 kernel:  iwl_mvm_send_cmd_status+0x47/0xf0 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_send_cmd_pdu_status+0x45/0x50 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_invalidate_sta_queue+0x17b/0x2b0 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_inactivity_check+0xce2/0xf00 [iwlmvm]
Mar 25 19:12:41 kernel:  iwl_mvm_add_new_dqa_stream_wk+0x176/0x920 [iwlmvm]
Mar 25 19:12:41 kernel:  process_one_work+0x25d/0x570
Mar 25 19:12:41 kernel:  worker_thread+0x50/0x3b0
Mar 25 19:12:41 kernel:  kthread+0x106/0x140
Mar 25 19:12:41 kernel:  ? process_one_work+0x570/0x570
Mar 25 19:12:41 kernel:  ? kthread_park+0x90/0x90
Mar 25 19:12:41 kernel:  ret_from_fork+0x3a/0x50
Mar 25 19:12:41 kernel: ------------[ cut here ]------------
Comment 2 Mark Asselstine 2020-03-27 02:27:52 UTC
Created attachment 288083 [details]
partial patch

commit cfbc6c4c5b91 [iwlwifi: mvm: support mac80211 TXQs model] extended the RCU read critical section in iwl_mvm_inactivity_check() to encompass iwl_mvm_free_inactive_queue() to protect sta/queue_owner. However, this puts the call to synchronize_net() in an RCU read critical section:

iwl_mvm_inactivity_check()
  -> iwl_mvm_free_inactive_queue()
    -> iwl_mvm_remove_sta_queue_marking()
      -> synchronize_net()

The attached patch removes the synchronize_net() as we can't call this from a RCU read critical section. I just need to finish my analysis on the original assertion that the TX path might be busy while we are destroying it.
Comment 3 Mark Asselstine 2020-04-03 14:27:03 UTC
Created attachment 288181 [details]
Patch, exit RCU read critical section earlier
Comment 4 gabriello.ramirez 2020-04-04 02:19:08 UTC
Created attachment 288197 [details]
complete logs with WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]

(In reply to Mark Asselstine from comment #3)
> Created attachment 288181 [details]
> Patch, exit RCU read critical section earlier

Hi, 

using kernel 5.6.2-300.iwlwififix5.fc31.x86_64 with the following patches:

0001-iwlwifi-Don-t-use-non-existent-queues.patch
0001-iwlwifi-set-free_queue-to-the-loop-variable-not-a-me.patch
0001-iwlwifi-we-can-synchronize_net-while-in-a-RCU-read-c.patch (2020-04-03 14:27:03)

in the logs appears:

Apr 03 15:12:23 kernel: ------------[ cut here ]------------
Apr 03 15:12:23 kernel: Invalid TXQ id
Apr 03 15:12:23 kernel: WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
Apr 03 15:12:23 kernel: Modules linked in: ccm 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat iwlmvm mac80211 libarc4 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mei_hdcp snd_hda_codec_generic iwlwifi iTCO_wdt intel_rapl_perf snd_hda_codec_hdmi iTCO_vendor_support wmi_bmof ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device joydev rfkill snd_pcm mei_me i2c_i801 snd_timer snd soundcore mei lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Apr 03 15:12:23 kernel:  drm_kms_helper crc32c_intel drm ums_realtek uas e1000e usb_storage r8169 wmi video
Apr 03 15:12:23 kernel: CPU: 2 PID: 887 Comm: irq/42-iwlwifi Not tainted 5.6.2-300.iwlwififix5.fc31.x86_64 #1
Apr 03 15:12:23 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Apr 03 15:12:23 kernel: RIP: 0010:iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
Apr 03 15:12:23 kernel: Code: 7d 08 e9 74 fd ff ff 44 0f b7 74 24 18 44 8b 6c 24 20 e9 dd fb ff ff 48 c7 c7 76 3c b2 c0 c6 05 94 13 05 00 01 e8 f0 3e 5f f5 <0f> 0b e9 74 fe ff ff c7 44 24 08 08 00 00 00 e9 57 fd ff ff 0f 0b
Apr 03 15:12:23 kernel: RSP: 0018:ffffadb9c0148bc0 EFLAGS: 00010282
Apr 03 15:12:23 kernel: RAX: 000000000000000e RBX: ffff901e5a454cd0 RCX: 0000000000000002
Apr 03 15:12:23 kernel: RDX: 0000000000007788 RSI: ffff901e84678d78 RDI: 0000000000000246
Apr 03 15:12:23 kernel: RBP: ffff901e76cdb3e8 R08: 00000062feef9e44 R09: 0000000000000000
Apr 03 15:12:23 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff901e60f2eb00
Apr 03 15:12:23 kernel: R13: ffffadb9c0148c40 R14: 0000000000004288 R15: 0000000000000000
Apr 03 15:12:23 kernel: FS:  0000000000000000(0000) GS:ffff901e8ec00000(0000) knlGS:0000000000000000
Apr 03 15:12:23 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 15:12:23 kernel: CR2: 00007f15e1ae3000 CR3: 00000001e3612005 CR4: 00000000001606e0
Apr 03 15:12:23 kernel: Call Trace:
Apr 03 15:12:23 kernel:  <IRQ>
Apr 03 15:12:23 kernel:  iwl_mvm_tx_skb_sta+0x1a3/0x480 [iwlmvm]
Apr 03 15:12:23 kernel:  ? find_held_lock+0x32/0x90
Apr 03 15:12:23 kernel:  ? mark_held_locks+0x2d/0x80
Apr 03 15:12:23 kernel:  iwl_mvm_tx_skb+0x17/0x40 [iwlmvm]
Apr 03 15:12:23 kernel:  ieee80211_tx_frags+0x16d/0x230 [mac80211]
Apr 03 15:12:23 kernel:  __ieee80211_tx+0x79/0x1c0 [mac80211]
Apr 03 15:12:23 kernel:  ieee80211_tx+0x11f/0x150 [mac80211]
Apr 03 15:12:23 kernel:  ieee80211_tx_pending+0x118/0x380 [mac80211]
Apr 03 15:12:23 kernel:  ? mark_held_locks+0x2d/0x80
Apr 03 15:12:23 kernel:  tasklet_action_common.isra.0+0x7e/0x190
Apr 03 15:12:23 kernel:  __do_softirq+0xe1/0x45d
Apr 03 15:12:23 kernel:  do_softirq_own_stack+0x2a/0x40
Apr 03 15:12:23 kernel:  </IRQ>
Apr 03 15:12:23 kernel:  ? iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel:  do_softirq.part.0+0x60/0x70
Apr 03 15:12:23 kernel:  __local_bh_enable_ip+0xd0/0xe0
Apr 03 15:12:23 kernel:  iwl_pcie_irq_handler+0x7b0/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel:  ? irq_thread+0x88/0x190
Apr 03 15:12:23 kernel:  ? irq_forced_thread_fn+0x80/0x80
Apr 03 15:12:23 kernel:  irq_thread_fn+0x20/0x60
Apr 03 15:12:23 kernel:  irq_thread+0xe9/0x190
Apr 03 15:12:23 kernel:  ? lockdep_hardirqs_on+0xf0/0x180
Apr 03 15:12:23 kernel:  ? wake_threads_waitq+0x30/0x30
Apr 03 15:12:23 kernel:  kthread+0x106/0x140
Apr 03 15:12:23 kernel:  ? irq_thread_check_affinity+0xb0/0xb0
Apr 03 15:12:23 kernel:  ? kthread_park+0x90/0x90
Apr 03 15:12:23 kernel:  ret_from_fork+0x3a/0x50
Apr 03 15:12:23 kernel: irq event stamp: 666353
Apr 03 15:12:23 kernel: hardirqs last  enabled at (666352): [<ffffffffb616fae5>] vprintk_emit+0x155/0x350
Apr 03 15:12:23 kernel: hardirqs last disabled at (666353): [<ffffffffb60038b2>] trace_hardirqs_off_thunk+0x1a/0x1c
Apr 03 15:12:23 kernel: softirqs last  enabled at (666334): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel: softirqs last disabled at (666335): [<ffffffffb6c0107a>] do_softirq_own_stack+0x2a/0x40
Apr 03 15:12:23 kernel: ---[ end trace 588779a720b2e611 ]---


and at 19:18:12 appears another WARNING:

Apr 03 19:18:12 kernel: ------------[ cut here ]------------
Apr 03 19:18:12 kernel: WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1728 iwl_mvm_rx_tx_cmd+0xb52/0xc80 [iwlmvm]
Apr 03 19:18:12 kernel: Modules linked in: ccm 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat iwlmvm mac80211 libarc4 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mei_hdcp snd_hda_codec_generic iwlwifi iTCO_wdt intel_rapl_perf snd_hda_codec_hdmi iTCO_vendor_support wmi_bmof ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device joydev rfkill snd_pcm mei_me i2c_i801 snd_timer snd soundcore mei lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Apr 03 19:18:12 kernel:  drm_kms_helper crc32c_intel drm ums_realtek uas e1000e usb_storage r8169 wmi video
Apr 03 19:18:12 kernel: CPU: 2 PID: 887 Comm: irq/42-iwlwifi Tainted: G        W         5.6.2-300.iwlwififix5.fc31.x86_64 #1
Apr 03 19:18:12 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Apr 03 19:18:12 kernel: RIP: 0010:iwl_mvm_rx_tx_cmd+0xb52/0xc80 [iwlmvm]
Apr 03 19:18:12 kernel: Code: 0f 85 4f f7 ff ff 48 c7 c2 f8 8d b2 c0 be ba 04 00 00 48 c7 c7 c8 90 b2 c0 c6 05 35 f9 04 00 01 e8 5f 95 66 f5 e9 2b f7 ff ff <0f> 0b e9 74 f8 ff ff 0f 0b e9 3d f8 ff ff 48 63 44 24 14 44 8b 4c
Apr 03 19:18:12 kernel: RSP: 0018:ffffadb9c0537c78 EFLAGS: 00010202
Apr 03 19:18:12 kernel: RAX: 0000000000000005 RBX: ffffffffc0b1c2c0 RCX: 0000000000000001
Apr 03 19:18:12 kernel: RDX: 0000000000000004 RSI: ffffadb9c0537e00 RDI: ffff901e76cdb3e8
Apr 03 19:18:12 kernel: RBP: ffff901e76cdb3e8 R08: ffffadb9c0537e00 R09: 0000000000000000
Apr 03 19:18:12 kernel: R10: 00000003e9d18000 R11: 0000000000000000 R12: ffffadb9c0537e00
Apr 03 19:18:12 kernel: R13: ffff901e76cdc438 R14: ffff901e76cdb3e8 R15: ffff901e69d18000
Apr 03 19:18:12 kernel: FS:  0000000000000000(0000) GS:ffff901e8ec00000(0000) knlGS:0000000000000000
Apr 03 19:18:12 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 19:18:12 kernel: CR2: 00007f15e1ae3000 CR3: 00000001e3612002 CR4: 00000000001606e0
Apr 03 19:18:12 kernel: Call Trace:
Apr 03 19:18:12 kernel:  ? rcu_read_lock_sched_held+0x32/0x70
Apr 03 19:18:12 kernel:  ? __iwl_dbg+0x12d/0x160 [iwlwifi]
Apr 03 19:18:12 kernel:  iwl_mvm_rx_common+0xcd/0x2b0 [iwlmvm]
Apr 03 19:18:12 kernel:  iwl_pcie_rx_handle+0x408/0xbf0 [iwlwifi]
Apr 03 19:18:12 kernel:  iwl_pcie_irq_handler+0x79f/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel:  ? irq_thread+0x88/0x190
Apr 03 19:18:12 kernel:  ? irq_forced_thread_fn+0x80/0x80
Apr 03 19:18:12 kernel:  irq_thread_fn+0x20/0x60
Apr 03 19:18:12 kernel:  irq_thread+0xe9/0x190
Apr 03 19:18:12 kernel:  ? lockdep_hardirqs_on+0xf0/0x180
Apr 03 19:18:12 kernel:  ? wake_threads_waitq+0x30/0x30
Apr 03 19:18:12 kernel:  kthread+0x106/0x140
Apr 03 19:18:12 kernel:  ? irq_thread_check_affinity+0xb0/0xb0
Apr 03 19:18:12 kernel:  ? kthread_park+0x90/0x90
Apr 03 19:18:12 kernel:  ret_from_fork+0x3a/0x50
Apr 03 19:18:12 kernel: irq event stamp: 66553165
Apr 03 19:18:12 kernel: hardirqs last  enabled at (66553164): [<ffffffffb6b605b6>] _raw_spin_unlock_irqrestore+0x46/0x60
Apr 03 19:18:12 kernel: hardirqs last disabled at (66553165): [<ffffffffb60038b2>] trace_hardirqs_off_thunk+0x1a/0x1c
Apr 03 19:18:12 kernel: softirqs last  enabled at (66553156): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel: softirqs last disabled at (66553162): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel: ---[ end trace 588779a720b2e612 ]---
...
...

the complete log is in the attachment.

thanks,

Gabriel
Comment 5 gabriello.ramirez 2020-04-04 03:03:26 UTC
Created attachment 288199 [details]
kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924

complete logs with WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]

(In reply to Mark Asselstine from comment #3)
> Created attachment 288181 [details]
> Patch, exit RCU read critical section earlier

Hi, 

using kernel 5.6.2-300.iwlwififix5.fc31.x86_64 with the following patches:

0001-iwlwifi-Don-t-use-non-existent-queues.patch
0001-iwlwifi-set-free_queue-to-the-loop-variable-not-a-me.patch
0001-iwlwifi-we-can-synchronize_net-while-in-a-RCU-read-c.patch (2020-04-03 14:27:03)

in the logs appears:

Apr 03 15:12:23 kernel: ------------[ cut here ]------------
Apr 03 15:12:23 kernel: Invalid TXQ id
Apr 03 15:12:23 kernel: WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
Apr 03 15:12:23 kernel: Modules linked in: ccm 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat iwlmvm mac80211 libarc4 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mei_hdcp snd_hda_codec_generic iwlwifi iTCO_wdt intel_rapl_perf snd_hda_codec_hdmi iTCO_vendor_support wmi_bmof ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device joydev rfkill snd_pcm mei_me i2c_i801 snd_timer snd soundcore mei lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Apr 03 15:12:23 kernel:  drm_kms_helper crc32c_intel drm ums_realtek uas e1000e usb_storage r8169 wmi video
Apr 03 15:12:23 kernel: CPU: 2 PID: 887 Comm: irq/42-iwlwifi Not tainted 5.6.2-300.iwlwififix5.fc31.x86_64 #1
Apr 03 15:12:23 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Apr 03 15:12:23 kernel: RIP: 0010:iwl_mvm_tx_mpdu+0x580/0x5a0 [iwlmvm]
Apr 03 15:12:23 kernel: Code: 7d 08 e9 74 fd ff ff 44 0f b7 74 24 18 44 8b 6c 24 20 e9 dd fb ff ff 48 c7 c7 76 3c b2 c0 c6 05 94 13 05 00 01 e8 f0 3e 5f f5 <0f> 0b e9 74 fe ff ff c7 44 24 08 08 00 00 00 e9 57 fd ff ff 0f 0b
Apr 03 15:12:23 kernel: RSP: 0018:ffffadb9c0148bc0 EFLAGS: 00010282
Apr 03 15:12:23 kernel: RAX: 000000000000000e RBX: ffff901e5a454cd0 RCX: 0000000000000002
Apr 03 15:12:23 kernel: RDX: 0000000000007788 RSI: ffff901e84678d78 RDI: 0000000000000246
Apr 03 15:12:23 kernel: RBP: ffff901e76cdb3e8 R08: 00000062feef9e44 R09: 0000000000000000
Apr 03 15:12:23 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff901e60f2eb00
Apr 03 15:12:23 kernel: R13: ffffadb9c0148c40 R14: 0000000000004288 R15: 0000000000000000
Apr 03 15:12:23 kernel: FS:  0000000000000000(0000) GS:ffff901e8ec00000(0000) knlGS:0000000000000000
Apr 03 15:12:23 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 15:12:23 kernel: CR2: 00007f15e1ae3000 CR3: 00000001e3612005 CR4: 00000000001606e0
Apr 03 15:12:23 kernel: Call Trace:
Apr 03 15:12:23 kernel:  <IRQ>
Apr 03 15:12:23 kernel:  iwl_mvm_tx_skb_sta+0x1a3/0x480 [iwlmvm]
Apr 03 15:12:23 kernel:  ? find_held_lock+0x32/0x90
Apr 03 15:12:23 kernel:  ? mark_held_locks+0x2d/0x80
Apr 03 15:12:23 kernel:  iwl_mvm_tx_skb+0x17/0x40 [iwlmvm]
Apr 03 15:12:23 kernel:  ieee80211_tx_frags+0x16d/0x230 [mac80211]
Apr 03 15:12:23 kernel:  __ieee80211_tx+0x79/0x1c0 [mac80211]
Apr 03 15:12:23 kernel:  ieee80211_tx+0x11f/0x150 [mac80211]
Apr 03 15:12:23 kernel:  ieee80211_tx_pending+0x118/0x380 [mac80211]
Apr 03 15:12:23 kernel:  ? mark_held_locks+0x2d/0x80
Apr 03 15:12:23 kernel:  tasklet_action_common.isra.0+0x7e/0x190
Apr 03 15:12:23 kernel:  __do_softirq+0xe1/0x45d
Apr 03 15:12:23 kernel:  do_softirq_own_stack+0x2a/0x40
Apr 03 15:12:23 kernel:  </IRQ>
Apr 03 15:12:23 kernel:  ? iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel:  do_softirq.part.0+0x60/0x70
Apr 03 15:12:23 kernel:  __local_bh_enable_ip+0xd0/0xe0
Apr 03 15:12:23 kernel:  iwl_pcie_irq_handler+0x7b0/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel:  ? irq_thread+0x88/0x190
Apr 03 15:12:23 kernel:  ? irq_forced_thread_fn+0x80/0x80
Apr 03 15:12:23 kernel:  irq_thread_fn+0x20/0x60
Apr 03 15:12:23 kernel:  irq_thread+0xe9/0x190
Apr 03 15:12:23 kernel:  ? lockdep_hardirqs_on+0xf0/0x180
Apr 03 15:12:23 kernel:  ? wake_threads_waitq+0x30/0x30
Apr 03 15:12:23 kernel:  kthread+0x106/0x140
Apr 03 15:12:23 kernel:  ? irq_thread_check_affinity+0xb0/0xb0
Apr 03 15:12:23 kernel:  ? kthread_park+0x90/0x90
Apr 03 15:12:23 kernel:  ret_from_fork+0x3a/0x50
Apr 03 15:12:23 kernel: irq event stamp: 666353
Apr 03 15:12:23 kernel: hardirqs last  enabled at (666352): [<ffffffffb616fae5>] vprintk_emit+0x155/0x350
Apr 03 15:12:23 kernel: hardirqs last disabled at (666353): [<ffffffffb60038b2>] trace_hardirqs_off_thunk+0x1a/0x1c
Apr 03 15:12:23 kernel: softirqs last  enabled at (666334): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 15:12:23 kernel: softirqs last disabled at (666335): [<ffffffffb6c0107a>] do_softirq_own_stack+0x2a/0x40
Apr 03 15:12:23 kernel: ---[ end trace 588779a720b2e611 ]---


and at 19:18:12 appears another WARNING:

Apr 03 19:18:12 kernel: ------------[ cut here ]------------
Apr 03 19:18:12 kernel: WARNING: CPU: 2 PID: 887 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1728 iwl_mvm_rx_tx_cmd+0xb52/0xc80 [iwlmvm]
Apr 03 19:18:12 kernel: Modules linked in: ccm 8021q garp mrp ebtable_filter ebtables ip6table_filter ip6_tables bridge stp llc xt_pkttype xt_state xt_conntrack iptable_filter xt_nat iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c nf_log_ipv4 nf_log_common xt_dscp xt_multiport xt_LOG xt_set xt_length xt_mark iptable_mangle ip_set_hash_net ip_set_bitmap_port ip_set_hash_ip ip_set_hash_netport ip_set nfnetlink sunrpc vfat fat iwlmvm mac80211 libarc4 intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore mei_hdcp snd_hda_codec_generic iwlwifi iTCO_wdt intel_rapl_perf snd_hda_codec_hdmi iTCO_vendor_support wmi_bmof ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec cfg80211 snd_hda_core snd_hwdep snd_seq snd_seq_device joydev rfkill snd_pcm mei_me i2c_i801 snd_timer snd soundcore mei lpc_ich ip_tables raid1 i915 i2c_algo_bit cec
Apr 03 19:18:12 kernel:  drm_kms_helper crc32c_intel drm ums_realtek uas e1000e usb_storage r8169 wmi video
Apr 03 19:18:12 kernel: CPU: 2 PID: 887 Comm: irq/42-iwlwifi Tainted: G        W         5.6.2-300.iwlwififix5.fc31.x86_64 #1
Apr 03 19:18:12 kernel: Hardware name: LENOVO 10A8S08P00/SHARKBAY, BIOS FBKTDBAUS 12/24/2019
Apr 03 19:18:12 kernel: RIP: 0010:iwl_mvm_rx_tx_cmd+0xb52/0xc80 [iwlmvm]
Apr 03 19:18:12 kernel: Code: 0f 85 4f f7 ff ff 48 c7 c2 f8 8d b2 c0 be ba 04 00 00 48 c7 c7 c8 90 b2 c0 c6 05 35 f9 04 00 01 e8 5f 95 66 f5 e9 2b f7 ff ff <0f> 0b e9 74 f8 ff ff 0f 0b e9 3d f8 ff ff 48 63 44 24 14 44 8b 4c
Apr 03 19:18:12 kernel: RSP: 0018:ffffadb9c0537c78 EFLAGS: 00010202
Apr 03 19:18:12 kernel: RAX: 0000000000000005 RBX: ffffffffc0b1c2c0 RCX: 0000000000000001
Apr 03 19:18:12 kernel: RDX: 0000000000000004 RSI: ffffadb9c0537e00 RDI: ffff901e76cdb3e8
Apr 03 19:18:12 kernel: RBP: ffff901e76cdb3e8 R08: ffffadb9c0537e00 R09: 0000000000000000
Apr 03 19:18:12 kernel: R10: 00000003e9d18000 R11: 0000000000000000 R12: ffffadb9c0537e00
Apr 03 19:18:12 kernel: R13: ffff901e76cdc438 R14: ffff901e76cdb3e8 R15: ffff901e69d18000
Apr 03 19:18:12 kernel: FS:  0000000000000000(0000) GS:ffff901e8ec00000(0000) knlGS:0000000000000000
Apr 03 19:18:12 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 19:18:12 kernel: CR2: 00007f15e1ae3000 CR3: 00000001e3612002 CR4: 00000000001606e0
Apr 03 19:18:12 kernel: Call Trace:
Apr 03 19:18:12 kernel:  ? rcu_read_lock_sched_held+0x32/0x70
Apr 03 19:18:12 kernel:  ? __iwl_dbg+0x12d/0x160 [iwlwifi]
Apr 03 19:18:12 kernel:  iwl_mvm_rx_common+0xcd/0x2b0 [iwlmvm]
Apr 03 19:18:12 kernel:  iwl_pcie_rx_handle+0x408/0xbf0 [iwlwifi]
Apr 03 19:18:12 kernel:  iwl_pcie_irq_handler+0x79f/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel:  ? irq_thread+0x88/0x190
Apr 03 19:18:12 kernel:  ? irq_forced_thread_fn+0x80/0x80
Apr 03 19:18:12 kernel:  irq_thread_fn+0x20/0x60
Apr 03 19:18:12 kernel:  irq_thread+0xe9/0x190
Apr 03 19:18:12 kernel:  ? lockdep_hardirqs_on+0xf0/0x180
Apr 03 19:18:12 kernel:  ? wake_threads_waitq+0x30/0x30
Apr 03 19:18:12 kernel:  kthread+0x106/0x140
Apr 03 19:18:12 kernel:  ? irq_thread_check_affinity+0xb0/0xb0
Apr 03 19:18:12 kernel:  ? kthread_park+0x90/0x90
Apr 03 19:18:12 kernel:  ret_from_fork+0x3a/0x50
Apr 03 19:18:12 kernel: irq event stamp: 66553165
Apr 03 19:18:12 kernel: hardirqs last  enabled at (66553164): [<ffffffffb6b605b6>] _raw_spin_unlock_irqrestore+0x46/0x60
Apr 03 19:18:12 kernel: hardirqs last disabled at (66553165): [<ffffffffb60038b2>] trace_hardirqs_off_thunk+0x1a/0x1c
Apr 03 19:18:12 kernel: softirqs last  enabled at (66553156): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel: softirqs last disabled at (66553162): [<ffffffffc094906d>] iwl_pcie_irq_handler+0x77d/0xe20 [iwlwifi]
Apr 03 19:18:12 kernel: ---[ end trace 588779a720b2e612 ]---
...
Apr 03 19:28:42 kernel: WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected

Apr 03 20:46:36 kernel: WARNING: CPU: 0 PID: 3126 at drivers/net/wireless/intel/iwlwifi/mvm/utils.c:635 iwl_mvm_reconfig_scd+0x153/0x160 [iwlmvm]
Apr 03 20:49:46 kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924
Apr 03 20:49:46 kernel: WARNING: CPU: 2 PID: 3191 at drivers/net/wireless/intel/iwlwifi/mvm/sta.c:861 iwl_mvm_enable_txq+0x221/0x2b0 [iwlmvm]
Apr 03 20:49:47 kernel: WARNING: CPU: 2 PID: 3142 at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1770 iwl_pcie_enqueue_hcmd+0xa46/0xc00 [iwlwifi]
...

the complete log is in the attachment.

obsoleted the previous attachment because this log contains the previous attachment and also appears:

Apr 03 20:49:46 kernel: BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924

thanks,

Gabriel
Comment 6 gabriello.ramirez 2020-04-27 19:03:00 UTC
Created attachment 288779 [details]
iwl_mvm_free_inactive_queue returns 0 or a error not a queue number

Hi,

iwl_mvm_free_inactive_queue returns 0 or a error not a queue number.
I don't know if when iwl_mvm_disable_txq returns nonzero the queue is usable so better to set free_queue to -ENOSPC

maybe is better to modify the for loop in iwl_mvm_inactivity_check
to get all free queues and not only the first one
so iwl_mvm_free_inactive_queue have more opportunity to get a usable queue

but iwl_mvm_free_inactive_queue cannot be under rcu_read_lock();

so the bug is resolved for me with:

https://bugzilla.kernel.org/attachment.cgi?id=287903
https://bugzilla.kernel.org/attachment.cgi?id=287955
and this patch

I hope the patches get to all stable kernel series soon I'm currently in 5.6.7 and it don't have the first two patches and them solved a kernel NULL pointer dereference.

Mark, thanks for all your help in solving the bugs,

Gabriel
Comment 7 triffid.hunter 2020-05-15 17:34:44 UTC
I have locally applied the two patches from #206765 mentioned in comment 6 plus attachment 288779 [details], and I'm still regularly getting

[51383.047342] WARNING: CPU: 7 PID: 1065 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1500 iwl_mvm_rx_tx_cmd+0x42e/0x750 [iwlmvm]
[51383.047343] Modules linked in: bridge stp llc xfrm_user rfcomm algif_skcipher bnep xt_mac xt_connmark xt_owner xt_mark xt_MASQUERADE iptable_nat nvidia_drm(PO) binfmt_misc vboxnetflt(O) vboxnetadp(O) vboxdrv(O) uinput nf_nat_sip nf_nat_irc nf_nat_ftp nvidia_modeset(PO) xt_state xt_nat nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conncount snd_hda_codec_hdmi xt_conntrack nct6775 hwmon_vid ppp_async crc_ccitt xt_l2tp l2tp_ppp pppox btusb ppp_generic btrtl slhc btbcm btintel l2tp_ip6 l2tp_ip bluetooth l2tp_eth l2tp_netlink mei_hdcp l2tp_debugfs l2tp_core kvm_intel ecdh_generic joydev ip6_udp_tunnel iwlmvm snd_hda_codec_realtek ecc udp_tunnel kvm snd_hda_codec_generic irqbypass fuse nvidia(PO) crc32_pclmul snd_hda_intel iwlwifi snd_intel_dspcfg i2c_i801 snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei efivarfs
[51383.047398] CPU: 7 PID: 1065 Comm: irq/144-iwlwifi Tainted: P        W  O      5.5.19-gentoo-dorellan #1
[51383.047399] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270M-ITX/ac, BIOS P2.20 05/17/2017
[51383.047411] RIP: 0010:iwl_mvm_rx_tx_cmd+0x42e/0x750 [iwlmvm]
[51383.047415] Code: 82 ff ff 4c 8b 54 24 08 41 88 85 f2 17 00 00 e9 05 fe ff ff 66 81 fb 8e 00 40 0f 94 c5 eb 9c 66 81 fb 85 00 40 0f 94 c5 eb 91 <0f> 0b 41 81 4e 28 00 01 00 00 eb 83 4d 89 ee e8 1e bf 3e d0 0f b6
[51383.047416] RSP: 0018:ffff979d004afcd0 EFLAGS: 00010246
[51383.047419] RAX: ffff9239a1657101 RBX: 0000000000000088 RCX: 0000000000d463e7
[51383.047421] RDX: 0000000000d463df RSI: 0000000000d463df RDI: 000000000002fad0
[51383.047422] RBP: ffff923d81e81e68 R08: 00000000000000fc R09: 0000000000000000
[51383.047424] R10: ffff923ae167ef30 R11: 0000000000000000 R12: ffff92398fa3502a
[51383.047426] R13: ffff923d81e81e68 R14: ffff923ae167ef00 R15: ffff92398fe8b000
[51383.047428] FS:  0000000000000000(0000) GS:ffff923d8edc0000(0000) knlGS:0000000000000000
[51383.047430] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[51383.047432] CR2: 000033587dafe000 CR3: 00000001bf60a003 CR4: 00000000003606e0
[51383.047434] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[51383.047436] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[51383.047437] Call Trace:
[51383.047449]  iwl_mvm_rx_common+0xd9/0x290 [iwlmvm]
[51383.047458]  iwl_pcie_rx_handle+0x322/0x850 [iwlwifi]
[51383.047467]  iwl_pcie_irq_handler+0x1a9/0x760 [iwlwifi]
[51383.047471]  ? irq_forced_thread_fn+0x70/0x70
[51383.047474]  irq_thread_fn+0x1b/0x60
[51383.047478]  irq_thread+0xd7/0x160
[51383.047482]  ? wake_threads_waitq+0x30/0x30
[51383.047486]  kthread+0xf6/0x130
[51383.047489]  ? irq_thread_check_affinity+0x80/0x80
[51383.047492]  ? kthread_park+0x80/0x80
[51383.047498]  ret_from_fork+0x35/0x40
[51383.047501] ---[ end trace e80778eb46da3062 ]---
Comment 8 gabriello.ramirez 2020-05-16 18:51:44 UTC
Created attachment 289159 [details]
dont call iwl_mvm_free_inactive_queue, iwl_mvm_unshare_queue, and iwl_mvm_change_queue_tid under rcu_read_lock()

(In reply to triffid.hunter from comment #7)
> I have locally applied the two patches from #206765 mentioned in comment 6
> plus attachment 288779 [details], and I'm still regularly getting
> 
> [51383.047342] WARNING: CPU: 7 PID: 1065 at
> drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1500
> iwl_mvm_rx_tx_cmd+0x42e/0x750 [iwlmvm]

The last time the above WARNING triggered in my machine was under kernel 5.6.8

in 5.6.[11-12-13] don't happen, but I'm running a different patch (attached in this comment) from comment 6 because my machine triggered:

BUG: sleeping function called from invalid context at drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1924

in functions iwl_mvm_unshare_queue and iwl_mvm_change_queue_tid

so I don't know if the patch attached here fixed the WARNING from comment 7 or was fixed in a kernel newer than 5.6.8

with this patch the only WARNING triggered in my machine are:

May 15 21:16:38 kernel: Invalid TXQ id
May 15 21:16:38 kernel: WARNING: CPU: 2 PID: 920 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1144 iwl_mvm_tx_mpdu+0x53f/0x560 [iwlmvm]

and

May 15 22:47:59 kernel: WARNING: CPU: 2 PID: 920 at drivers/net/wireless/intel/iwlwifi/mvm/sta.c:3732 iwl_mvm_sta_modify_sleep_tx_count+0x259/0x300 [iwlmvm] 

Gabriel
Comment 9 Luca Coelho 2020-06-08 11:27:53 UTC
We made a bunch of fixes in this area which were included in v5.7-rc3, v5.4.36 and v5.6.8.  Can you retest with one of these versions?
Comment 10 gabriello.ramirez 2020-06-09 23:21:19 UTC
(In reply to Luca Coelho from comment #9)
> We made a bunch of fixes in this area which were included in v5.7-rc3,
> v5.4.36 and v5.6.8.  Can you retest with one of these versions?

Hi,

I tested v5.6.8 and newer currently v5.7 and this bug isn't fixed.

this bug is fixed with:

https://bugzilla.kernel.org/attachment.cgi?id=289159

because iwl_mvm_free_inactive_queue returns an error not a queue number 
so when it returns an error we need set queue_number to -ENOSPC not the error returned 

or maybe instead of returning -ENOSPC returns the queue_number

And dont call iwl_mvm_free_inactive_queue, iwl_mvm_unshare_queue, and iwl_mvm_change_queue_tid under rcu_read_lock() to prevent this bug

I'm running this patch for several days under v5.6.8+, currently v5.7, and this bug isn't triggered anymore

thanks,

Gabriel
Comment 11 Luca Coelho 2020-06-10 06:12:41 UTC
Okay, thanks for the report.  Where did you send the attached email to?

We have this patch that should solve the same problem: https://patchwork.kernel.org/patch/11472099/

But I didn't apply it because there were some concerns.  I'll send it for the rc series and for fixes soon.
Comment 12 triffid.hunter 2020-06-10 06:22:08 UTC
I'm having zero kernel messages from 5.7.1 after 17 hours of testing, whereas I could reliably get tons of aforementioned warnings in earlier releases.

I'll keep an eye on it, and report back if I come across anything such as the bug mentioned in comment 10 which doesn't seem to have triggered here yet.
Comment 13 gabriello.ramirez 2020-06-10 22:09:57 UTC
(In reply to Luca Coelho from comment #11)
> Okay, thanks for the report.  Where did you send the attached email to?
> 

I didn't, only attached here, because I was testing the patch so I wasn't sure to be any side effects from moving rcu_read_unlock(); above iwl_mvm_unshare_queue and iwl_mvm_change_queue_tid. but in my testing several weeks the bug isn't triggered anymore, and no side effects happened

and the patch has the journal of the bug instances when iwl_mvm_unshare_queue, and iwl_mvm_change_queue_tid were executed under rcu_read_unlock();

> We have this patch that should solve the same problem:
> https://patchwork.kernel.org/patch/11472099/

I tried a similar version several weeks but in my testing the bug still happened when iwl_mvm_unshare_queue and/or iwl_mvm_change_queue_tid executed under rcu_read_unlock 

I reviewed https://bugzilla.kernel.org/attachment.cgi?id=289159 again and seems to be unnecessary set queue_number = -ENOSPC so I'm currently testing a patch with return ret; instead

> 
> But I didn't apply it because there were some concerns.  I'll send it for
> the rc series and for fixes soon.

Well in my computer the bug still happen with it, but may be particular to my computer workload.

Gabriel
Comment 14 triffid.hunter 2020-06-15 16:23:02 UTC
After a little more testing with 5.7.1 it turns out it's just taking longer to trigger.

[380624.367421] WARNING: CPU: 7 PID: 1019 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1495 iwl_mvm_rx_tx_cmd+0x434/0x760 [iwlmvm]
[380624.367422] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic libcurve25519_generic bridge stp llc xfrm_user rfcomm algif_skcipher bnep xt_mac xt_connmark xt_owner xt_mark xt_MASQUERADE iptable_nat nvidia_drm(PO) binfmt_misc vboxnetflt(O) vboxnetadp(O) vboxdrv(O) uinput nvidia_modeset(PO) nf_nat_sip nf_nat_irc nf_nat_ftp xt_state xt_nat nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conncount xt_conntrack nct6775 hwmon_vid ppp_async crc_ccitt xt_l2tp btusb l2tp_ppp btrtl pppox btbcm ppp_generic btintel slhc bluetooth l2tp_ip6 l2tp_ip l2tp_eth l2tp_netlink ecdh_generic ecc l2tp_debugfs joydev kvm_intel l2tp_core snd_hda_codec_realtek ip6_udp_tunnel udp_tunnel mei_hdcp snd_hda_codec_generic snd_hda_codec_hdmi fuse kvm iwlmvm snd_hda_intel irqbypass snd_intel_dspcfg nvidia(PO) crc32_pclmul snd_hda_codec snd_hwdep iwlwifi i2c_i801 snd_hda_core snd_pcm
[380624.367463]  mei_me mei efivarfs
[380624.367468] CPU: 7 PID: 1019 Comm: irq/144-iwlwifi Tainted: P        W  O      5.7.1-gentoo-dorellan #1
[380624.367469] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270M-ITX/ac, BIOS P2.20 05/17/2017
[380624.367478] RIP: 0010:iwl_mvm_rx_tx_cmd+0x434/0x760 [iwlmvm]
[380624.367489] Code: 82 ff ff 4c 8b 54 24 08 41 88 85 ea 17 00 00 e9 ff fd ff ff 66 81 fb 8e 00 40 0f 94 c5 eb 9c 66 81 fb 85 00 40 0f 94 c5 eb 91 <0f> 0b 41 81 4e 28 00 01 00 00 eb 83 4d 89 ee e8 d8 03 b0 c1 0f b6
[380624.367495] RSP: 0018:ffffa90500433cd0 EFLAGS: 00010246
[380624.367502] RAX: ffffa1cba7097101 RBX: 0000000000000088 RCX: 00000000048aa3ff
[380624.367504] RDX: 00000000048aa3f7 RSI: 00000000048aa3f7 RDI: 000000000002fcc0
[380624.367505] RBP: ffffa1d281cf9e88 R08: 00000000000000fd R09: 0000000000000000
[380624.367507] R10: ffffa1ceb136b630 R11: 0000000000000000 R12: ffffa1d1ee30a02e
[380624.367508] R13: ffffa1d281cf9e88 R14: ffffa1ceb136b600 R15: ffffa1cb4faeb000
[380624.367510] FS:  0000000000000000(0000) GS:ffffa1d28edc0000(0000) knlGS:0000000000000000
[380624.367512] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[380624.367513] CR2: 00001af755f8b000 CR3: 00000003c080a001 CR4: 00000000003606e0
[380624.367515] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[380624.367516] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[380624.367517] Call Trace:
[380624.367526]  iwl_mvm_rx_common+0xd9/0x290 [iwlmvm]
[380624.367534]  iwl_pcie_rx_handle+0x320/0x820 [iwlwifi]
[380624.367547]  iwl_pcie_irq_handler+0x1a9/0x760 [iwlwifi]
[380624.367558]  ? irq_forced_thread_fn+0x70/0x70
[380624.367567]  irq_thread_fn+0x1b/0x60
[380624.367576]  irq_thread+0xd7/0x160
[380624.367584]  ? wake_threads_waitq+0x30/0x30
[380624.367592]  kthread+0x10e/0x150
[380624.367598]  ? irq_thread_check_affinity+0x80/0x80
[380624.367601]  ? kthread_park+0x80/0x80
[380624.367604]  ret_from_fork+0x35/0x40
[380624.367607] ---[ end trace f114c08ca41e94d5 ]---

Uptime 6 days, 63 instances of this warning in the kernel buffer, and it's overflowed (dmesg | head shows half a warning message) so there may have been more.
Comment 15 triffid.hunter 2020-07-02 07:58:36 UTC
From 5.7.6:

[49246.106047] ------------[ cut here ]------------
[49246.106074] WARNING: CPU: 6 PID: 944 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1495 iwl_mvm_rx_tx_cmd+0x434/0x760 [iwlmvm]
[49246.106076] Modules linked in: wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic libcurve25519_generic bridge stp llc xfrm_user rfcomm algif_skcipher bnep xt_mac xt_connmark xt_owner xt_mark xt_MASQUERADE iptable_nat nvidia_drm(PO) binfmt_misc uinput nvidia_modeset(PO) nf_nat_sip nf_nat_irc nf_nat_ftp xt_state xt_nat nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conncount xt_conntrack nct6775 hwmon_vid btusb btrtl btbcm btintel ppp_async crc_ccitt bluetooth joydev xt_l2tp ecdh_generic ecc l2tp_ppp pppox ppp_generic slhc l2tp_ip6 l2tp_ip l2tp_eth l2tp_netlink l2tp_debugfs l2tp_core ip6_udp_tunnel udp_tunnel mei_hdcp snd_hda_codec_realtek kvm_intel snd_hda_codec_generic snd_hda_codec_hdmi iwlmvm fuse kvm snd_hda_intel irqbypass nvidia(PO) i2c_i801 crc32_pclmul snd_intel_dspcfg iwlwifi snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me mei efivarfs
[49246.106158] CPU: 6 PID: 944 Comm: irq/144-iwlwifi Tainted: P           O      5.7.6-gentoo-dorellan #1
[49246.106161] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270M-ITX/ac, BIOS P2.20 05/17/2017
[49246.106179] RIP: 0010:iwl_mvm_rx_tx_cmd+0x434/0x760 [iwlmvm]
[49246.106185] Code: 82 ff ff 4c 8b 54 24 08 41 88 85 ea 17 00 00 e9 ff fd ff ff 66 81 fb 8e 00 40 0f 94 c5 eb 9c 66 81 fb 85 00 40 0f 94 c5 eb 91 <0f> 0b 41 81 4e 28 00 01 00 00 eb 83 4d 89 ee e8 98 04 41 f8 0f b6
[49246.106189] RSP: 0018:ffff9d0a006d7cd0 EFLAGS: 00010246
[49246.106193] RAX: 0000000000000000 RBX: 0000000000000088 RCX: 000000008020001f
[49246.106196] RDX: 0000000080200020 RSI: 0000000000000001 RDI: 0000000000000000
[49246.106199] RBP: ffff9991810d1e88 R08: 0000000000000001 R09: 0000000000000001
[49246.106202] R10: ffff999183cb9e30 R11: 0000000000000000 R12: ffff998fdc2fb826
[49246.106204] R13: ffff9991810d1e88 R14: ffff999183cb9e00 R15: ffff998ee2bdd000
[49246.106209] FS:  0000000000000000(0000) GS:ffff99918ed80000(0000) knlGS:0000000000000000
[49246.106212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[49246.106215] CR2: 00005651d0c0ef50 CR3: 000000035480a003 CR4: 00000000003606e0
[49246.106218] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[49246.106221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[49246.106223] Call Trace:
[49246.106244]  iwl_mvm_rx_common+0xd9/0x290 [iwlmvm]
[49246.106257]  iwl_pcie_rx_handle+0x320/0x820 [iwlwifi]
[49246.106267]  iwl_pcie_irq_handler+0x1a9/0x760 [iwlwifi]
[49246.106275]  ? irq_forced_thread_fn+0x70/0x70
[49246.106278]  irq_thread_fn+0x1b/0x60
[49246.106283]  irq_thread+0xd7/0x160
[49246.106287]  ? wake_threads_waitq+0x30/0x30
[49246.106293]  kthread+0x10e/0x150
[49246.106297]  ? irq_thread_check_affinity+0x80/0x80
[49246.106301]  ? kthread_park+0x80/0x80
[49246.106306]  ret_from_fork+0x35/0x40
[49246.106311] ---[ end trace e97e4e008c682eab ]---
Comment 16 triffid.hunter 2020-11-29 11:05:41 UTC
From 5.8.18:

[159490.455557] WARNING: CPU: 6 PID: 961 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1492 iwl_mvm_rx_tx_cmd+0x43b/0x760 [iwlmvm]
[159490.455557] Modules linked in: wireguard libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 libblake2s blake2s_x86_64 libblake2s_generic curve25519_x86_64 libcurve25519_generic bridge stp llc xfrm_user rfcomm algif_skcipher bnep zram zsmalloc xt_connmark xt_owner xt_mac xt_mark xt_MASQUERADE iptable_nat binfmt_misc nvidia_drm(PO) uinput nf_nat_sip nf_nat_irc nf_nat_ftp xt_state xt_nat nf_conntrack_sip nf_conntrack_pptp nvidia_modeset(PO) nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conncount xt_conntrack nct6775 hwmon_vid ppp_async crc_ccitt xt_l2tp l2tp_ppp pppox ppp_generic slhc l2tp_ip6 l2tp_ip joydev l2tp_eth l2tp_netlink l2tp_debugfs btusb btrtl btbcm btintel l2tp_core snd_hda_codec_realtek ip6_udp_tunnel udp_tunnel snd_hda_codec_hdmi snd_hda_codec_generic ee1004 bluetooth mei_hdcp fuse ecdh_generic ecc snd_hda_intel nvidia(PO) snd_intel_dspcfg iwlmvm kvm_intel snd_hda_codec kvm snd_hwdep i2c_i801 irqbypass snd_hda_core i2c_smbus iwlwifi crc32_pclmul snd_pcm mei_me
[159490.455571]  mei efivarfs
[159490.455572] CPU: 6 PID: 961 Comm: irq/144-iwlwifi Tainted: P        W  O      5.8.18-gentoo-dorellan #1
[159490.455573] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z270M-ITX/ac, BIOS P2.20 05/17/2017
[159490.455576] RIP: 0010:iwl_mvm_rx_tx_cmd+0x43b/0x760 [iwlmvm]
[159490.455576] Code: 81 ff ff 4c 8b 54 24 08 41 88 85 e2 17 00 00 e9 01 fe ff ff 66 81 fb 8e 00 40 0f 94 c5 eb 9c 66 81 fb 85 00 40 0f 94 c5 eb 91 <0f> 0b 41 81 4e 28 00 01 00 00 eb 83 4d 89 ee e8 a1 e9 ce cb 48 63
[159490.455577] RSP: 0018:ffffb8e7003d7cd0 EFLAGS: 00010246
[159490.455578] RAX: ffff9573db82ed01 RBX: 0000000000000088 RCX: 000000000045ed16
[159490.455578] RDX: 000000000045ed0e RSI: 000000000045ed0e RDI: 000000000002f040
[159490.455579] RBP: ffff9576bb311e68 R08: 00000000000000ff R09: 0000000000000000
[159490.455579] R10: ffff9574f150cb30 R11: 0000000000000000 R12: ffff957258f975c0
[159490.455580] R13: ffff9576bb311e68 R14: ffff9574f150cb00 R15: ffff9572ef729000
[159490.455580] FS:  0000000000000000(0000) GS:ffff9576ced80000(0000) knlGS:0000000000000000
[159490.455581] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[159490.455581] CR2: 00007fb27f255000 CR3: 000000020d80a006 CR4: 00000000003606e0
[159490.455582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[159490.455582] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[159490.455583] Call Trace:
[159490.455586]  iwl_mvm_rx_common+0xd9/0x290 [iwlmvm]
[159490.455588]  iwl_pcie_rx_handle+0x34b/0x8b0 [iwlwifi]
[159490.455590]  iwl_pcie_irq_handler+0x1a9/0x760 [iwlwifi]
[159490.455591]  ? irq_forced_thread_fn+0x70/0x70
[159490.455592]  irq_thread_fn+0x1b/0x60
[159490.455593]  irq_thread+0xd7/0x160
[159490.455594]  ? wake_threads_waitq+0x30/0x30
[159490.455595]  ? irq_thread_check_affinity+0x80/0x80
[159490.455596]  kthread+0x111/0x150
[159490.455597]  ? kthread_park+0x80/0x80
[159490.455598]  ret_from_fork+0x22/0x30
[159490.455599] ---[ end trace c0b50996c06f211f ]---

And it seems like it's making my connection to my router drop and reconnect..

Note You need to log in before you can comment on or make changes to this bug.