Bug 199967

Summary: iwlwifi: 3160: warning at intel/iwlwifi/mvm/tx.c:1369 triggers frequently
Product: Drivers Reporter: Manuel Lauss (manuel.lauss)
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED CODE_FIX    
Severity: normal CC: bugzilla, dump, linuxwifi, luca
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.16.14 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg and lspci
Change WARN_ON(1) to IWL_ERR()
fix

Description Manuel Lauss 2018-06-07 19:03:39 UTC
Created attachment 276371 [details]
dmesg and lspci

iwlwifi-3160, ucode iwlwifi-3160-17, used as an access point w. hostapd.
since upgrading to 4.16 (from 4.15), these warnings appear very frequently:

[   74.347192] WARNING: CPU: 0 PID: 184 at /usr/src/linux-4.16.14/drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1369 iwl_mvm_rx_tx_cmd+0x53b/0x860
[   74.347239] CPU: 0 PID: 184 Comm: irq/124-iwlwifi Not tainted 4.16.14 #1
[   74.347259] Hardware name: ZOTAC XXXXXX/XXXXXX, BIOS B301P017 04/06/2016
[   74.347285] RIP: 0010:iwl_mvm_rx_tx_cmd+0x53b/0x860
[   74.347304] RSP: 0018:ffffc9000024fcf0 EFLAGS: 00010246
[   74.347325] RAX: ffff8802763fa701 RBX: ffff880276ae1388 RCX: 000000018015000e
[   74.347344] RDX: 000000018015000f RSI: ffffea0009d8fe80 RDI: 0000000040000000
[   74.347364] RBP: 0000000000000088 R08: ffff88027fcaf630 R09: 000000018015000e
[   74.347383] R10: ffffffff81391c00 R11: 0000000000000080 R12: 0000000000000900
[   74.347402] R13: ffff8802769e6000 R14: ffff880276b20000 R15: ffff88027fcaf600
[   74.347423] FS:  0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
[   74.347444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   74.347465] CR2: 00005631c7b71b1c CR3: 000000000300a000 CR4: 00000000001006f0
[   74.347484] Call Trace:
[   74.347512]  iwl_pcie_rx_handle+0x220/0x880
[   74.347537]  iwl_pcie_irq_handler+0x6c9/0xa20
[   74.347567]  ? irq_forced_thread_fn+0x60/0x60
[   74.347589]  ? irq_thread_dtor+0x90/0x90
[   74.347610]  irq_thread_fn+0x1d/0x50
[   74.347635]  ? irq_thread_dtor+0x90/0x90
[   74.347656]  irq_thread+0x11f/0x170
[   74.347679]  ? wake_threads_waitq+0x30/0x30
[   74.347704]  kthread+0x10a/0x120
[   74.347727]  ? __kthread_create_on_node+0x150/0x150
[   74.347754]  ret_from_fork+0x35/0x40
Comment 1 bugzilla 2018-08-18 02:40:22 UTC
*** Bug 200851 has been marked as a duplicate of this bug. ***
Comment 2 bugzilla 2018-08-18 02:41:08 UTC
Please fix this, or at least change the WARN_ON() to a simple printk() so it doesn't flood my logs.
Comment 3 bugzilla 2018-08-18 03:38:41 UTC
Created attachment 277925 [details]
Change WARN_ON(1) to IWL_ERR()
Comment 5 bugzilla 2018-08-18 03:55:04 UTC
According to https://osdn.net/users/richard_collins/pf/kernel/scm/commits/6ab10ff8738dfb098fd32132b7ebcf5cdb43ebde

> * when a station goes to sleep, the microcode notices
>   this and marks the station as asleep
> * when the station is marked asleep, the microcode
>   refuses to transmit to the station and rejects all
>   frames queued to it with the failure status code
>   TX_STATUS_FAIL_DEST_PS (a previous patch handled
>   this correctly)
> * when we need to send frames to the station _although_
>   it is asleep, we need to tell the ucode how many,
>   and this is asynchronous with sending so we cannot
>   just send the frames, we need to wait for all other
>   frames to be flushed, and then update the counter
>   before sending out the poll response frames. This
>   is handled partially in the driver and partially in
>   mac80211.

So this should only happen if the station goes to sleep, but I have power management turned off:

$ sudo iwconfig wlan0
wlan0     IEEE 802.11  Mode:Master  Tx-Power=22 dBm
          Retry short limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
$
Comment 6 Emmanuel Grumbach 2018-08-18 19:42:47 UTC
yeah maybe, but when your station scans, it tells the AP that it enters power save so that the AP will buffer the traffic in the meantime.
Comment 7 Emmanuel Grumbach 2018-08-19 12:03:17 UTC
Can you try this?

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 40db3bb..7ea572a 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1731,6 +1731,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx)
         */
        if (!ieee80211_hw_check(&sta->local->hw, AP_LINK_PS) &&
            !ieee80211_has_morefrags(hdr->frame_control) &&
+            !is_multicast_ether_addr(hdr->addr1) &&
            (ieee80211_is_mgmt(hdr->frame_control) ||
             ieee80211_is_data(hdr->frame_control)) &&
            !(status->rx_flags & IEEE80211_RX_DEFERRED_RELEASE) &&
Comment 8 Emmanuel Grumbach 2018-08-19 12:04:13 UTC
*** Bug 200389 has been marked as a duplicate of this bug. ***
Comment 9 Manuel Lauss 2018-08-20 08:47:32 UTC
(In reply to Emmanuel Grumbach from comment #7)
> Can you try this?
> 
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index 40db3bb..7ea572a 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1731,6 +1731,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data
> *rx)
>          */
>         if (!ieee80211_hw_check(&sta->local->hw, AP_LINK_PS) &&
>             !ieee80211_has_morefrags(hdr->frame_control) &&
> +            !is_multicast_ether_addr(hdr->addr1) &&
>             (ieee80211_is_mgmt(hdr->frame_control) ||
>              ieee80211_is_data(hdr->frame_control)) &&
>             !(status->rx_flags & IEEE80211_RX_DEFERRED_RELEASE) &&

That seems to have helped. The warnings are gone, and connectivity doesn't seem to suffer either.

Thanks!
Comment 10 Emmanuel Grumbach 2018-08-20 09:10:06 UTC
Great, others?
Comment 11 Emmanuel Grumbach 2018-08-20 09:58:00 UTC
Created attachment 277981 [details]
fix

This is the fix as a normal patch.
Will be sent upstream through the regular process.
Comment 12 Emmanuel Grumbach 2018-08-20 11:02:50 UTC
Fix is now published on the linux-wireless mailing list:

https://patchwork.kernel.org/patch/10570241/


I noticed that a few reports are on 4.14 which don't have the offending patch that was first released in 4.16.
But I think we can close this bug anyway.
If someone feels he hasn't received the proper support, please re-open this bug, or create a new one and do not forget to CC linuxwifi@intel.com to the bug.

Thanks.
Comment 13 bugzilla 2018-08-20 20:45:04 UTC
Thanks. Testing this now but looks good so far.
Comment 14 bugzilla 2018-08-22 21:24:46 UTC
Looks good but I did get this once (probably unrelated)

[156136.482862] iwlwifi 0000:03:00.0: Queue 5 is active on fifo 3 and stuck for 10000 ms. SW [23, 24] HW [23, 24] FH TRB=0x0803090e1
[156136.482974] iwlwifi 0000:03:00.0: Microcode SW error detected.  Restarting 0x2000000.
[156136.483111] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
[156136.483118] iwlwifi 0000:03:00.0: Status: 0x00000100, count: 6
[156136.483124] iwlwifi 0000:03:00.0: Loaded firmware version: 29.1044073957.0
[156136.483131] iwlwifi 0000:03:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
[156136.483136] iwlwifi 0000:03:00.0: 0x000002B0 | trm_hw_status0
[156136.483142] iwlwifi 0000:03:00.0: 0x00000000 | trm_hw_status1
[156136.483147] iwlwifi 0000:03:00.0: 0x00043D54 | branchlink2
[156136.483152] iwlwifi 0000:03:00.0: 0x0004AFA6 | interruptlink1
[156136.483157] iwlwifi 0000:03:00.0: 0x000008A0 | interruptlink2
[156136.483162] iwlwifi 0000:03:00.0: 0x00000000 | data1
[156136.483167] iwlwifi 0000:03:00.0: 0x00000080 | data2
[156136.483172] iwlwifi 0000:03:00.0: 0x07030000 | data3
[156136.483177] iwlwifi 0000:03:00.0: 0x00000000 | beacon time
[156136.483182] iwlwifi 0000:03:00.0: 0x59636510 | tsf low
[156136.483188] iwlwifi 0000:03:00.0: 0x00000024 | tsf hi
[156136.483193] iwlwifi 0000:03:00.0: 0x00000000 | time gp1
[156136.483198] iwlwifi 0000:03:00.0: 0x59636511 | time gp2
[156136.483203] iwlwifi 0000:03:00.0: 0x00000001 | uCode revision type
[156136.483208] iwlwifi 0000:03:00.0: 0x0000001D | uCode version major
[156136.483213] iwlwifi 0000:03:00.0: 0x3E3B4DE5 | uCode version minor
[156136.483218] iwlwifi 0000:03:00.0: 0x00000220 | hw version
[156136.483223] iwlwifi 0000:03:00.0: 0x00C89200 | board version
[156136.483228] iwlwifi 0000:03:00.0: 0x00E1011C | hcmd
[156136.483233] iwlwifi 0000:03:00.0: 0x00022020 | isr0
[156136.483238] iwlwifi 0000:03:00.0: 0x00000000 | isr1
[156136.483243] iwlwifi 0000:03:00.0: 0x00000002 | isr2
[156136.483248] iwlwifi 0000:03:00.0: 0x00417CC0 | isr3
[156136.483253] iwlwifi 0000:03:00.0: 0x00000000 | isr4
[156136.483258] iwlwifi 0000:03:00.0: 0x03B6001C | last cmd Id
[156136.483263] iwlwifi 0000:03:00.0: 0x00000000 | wait_event
[156136.483268] iwlwifi 0000:03:00.0: 0x00000080 | l2p_control
[156136.483273] iwlwifi 0000:03:00.0: 0x00010020 | l2p_duration
[156136.483278] iwlwifi 0000:03:00.0: 0x0000003F | l2p_mhvalid
[156136.483284] iwlwifi 0000:03:00.0: 0x00008000 | l2p_addr_match
[156136.483289] iwlwifi 0000:03:00.0: 0x00000005 | lmpm_pmg_sel
[156136.483294] iwlwifi 0000:03:00.0: 0x03041751 | timestamp
[156136.483299] iwlwifi 0000:03:00.0: 0x0034C0D0 | flow_handler
[156136.483306] ieee80211 phy0: Hardware restart was requested