Bug 202513

Summary: iwlwifi: 8265: TFD queue hang - WIFI-18132
Product: Drivers Reporter: Blanche Schaefer (1cntjhef0t)
Component: network-wirelessAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: CLOSED WILL_NOT_FIX    
Severity: normal CC: alexander, bugs, dikiy_evrej
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.0rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: debug firmware dump
firmware dump

Description Blanche Schaefer 2019-02-05 23:01:06 UTC
Hello,

For a couple of weeks I see following errors printed in
dmesg from time to time. Also I'm frequently disconnected from AP with
Received Deauthentication event, reason: 3, from_ap: false:
or
Received Deauthentication event, reason: 4, from_ap: false

log:

iwlwifi 0000:04:00.0: Queue 4 is active on fifo 1 and stuck for 10000 ms. SW [189, \
201] HW [189, 201] FH TRB=0x0c01040c8 iwlwifi 0000:04:00.0: Microcode SW error \
detected.  Restarting 0x2000000. iwlwifi 0000:04:00.0: Start IWL Error Log Dump:
iwlwifi 0000:04:00.0: Status: 0x00000100, count: 6
iwlwifi 0000:04:00.0: Loaded firmware version: 36.9f0a2d68.0
iwlwifi 0000:04:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
iwlwifi 0000:04:00.0: 0x200002B0 | trm_hw_status0
iwlwifi 0000:04:00.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:04:00.0: 0x0002485C | branchlink2
iwlwifi 0000:04:00.0: 0x0003A7CA | interruptlink1
iwlwifi 0000:04:00.0: 0x0003E6A4 | interruptlink2
iwlwifi 0000:04:00.0: 0x00000000 | data1
iwlwifi 0000:04:00.0: 0x00000080 | data2
iwlwifi 0000:04:00.0: 0x07830000 | data3
iwlwifi 0000:04:00.0: 0x82818D06 | beacon time
iwlwifi 0000:04:00.0: 0x4E60EC46 | tsf low
iwlwifi 0000:04:00.0: 0x00000257 | tsf hi
iwlwifi 0000:04:00.0: 0x00000000 | time gp1
iwlwifi 0000:04:00.0: 0xBF706F42 | time gp2
iwlwifi 0000:04:00.0: 0x00000001 | uCode revision type
iwlwifi 0000:04:00.0: 0x00000024 | uCode version major
iwlwifi 0000:04:00.0: 0x9F0A2D68 | uCode version minor
iwlwifi 0000:04:00.0: 0x00000230 | hw version
iwlwifi 0000:04:00.0: 0x00489000 | board version
iwlwifi 0000:04:00.0: 0x0000001C | hcmd
iwlwifi 0000:04:00.0: 0x0222204A | isr0
iwlwifi 0000:04:00.0: 0x00800000 | isr1
iwlwifi 0000:04:00.0: 0x0800180A | isr2
iwlwifi 0000:04:00.0: 0x004000C0 | isr3
iwlwifi 0000:04:00.0: 0x00000000 | isr4
iwlwifi 0000:04:00.0: 0xA138009D | last cmd Id
iwlwifi 0000:04:00.0: 0x00000000 | wait_event
iwlwifi 0000:04:00.0: 0x00000080 | l2p_control
iwlwifi 0000:04:00.0: 0x00011420 | l2p_duration
iwlwifi 0000:04:00.0: 0x0000003F | l2p_mhvalid
iwlwifi 0000:04:00.0: 0x000000CE | l2p_addr_match
iwlwifi 0000:04:00.0: 0x0000000D | lmpm_pmg_sel
iwlwifi 0000:04:00.0: 0x04120134 | timestamp
iwlwifi 0000:04:00.0: 0x0034B0C8 | flow_handler
iwlwifi 0000:04:00.0: 0x00000000 | ADVANCED_SYSASSERT
iwlwifi 0000:04:00.0: 0x00000000 | umac branchlink1
iwlwifi 0000:04:00.0: 0x00000000 | umac branchlink2
iwlwifi 0000:04:00.0: 0x00000000 | umac interruptlink1
iwlwifi 0000:04:00.0: 0x00000000 | umac interruptlink2
iwlwifi 0000:04:00.0: 0x00000000 | umac data1
iwlwifi 0000:04:00.0: 0x00000000 | umac data2
iwlwifi 0000:04:00.0: 0x00000000 | umac data3
iwlwifi 0000:04:00.0: 0x00000000 | umac major
iwlwifi 0000:04:00.0: 0x00000000 | umac minor
iwlwifi 0000:04:00.0: 0x00000000 | frame pointer
iwlwifi 0000:04:00.0: 0x00000000 | stack pointer
iwlwifi 0000:04:00.0: 0x00000000 | last host cmd
iwlwifi 0000:04:00.0: 0x00000000 | isr status reg
iwlwifi 0000:04:00.0: mcast filter cmd error. ret=-5

Recently I also saw following warning in Linux 5.0rc5 which
happened just after above:

WARNING: CPU: 0 PID: 8802 at net/mac80211/offchannel.c:158 \
ieee80211_offchannel_return+0x18f/0x1a0 [mac80211] Modules linked in: ccm algif_aead \
cbc des_generic ecb cmac md4 algif_hash hid_sensor_rotation hid_sensor_accel_3d \
nf_log_ipv4 hid_sensor_als nf_log_common hid_sensor_gyro_3d hid_sensor_magn_3d wacom \
nft_counter hid_sensor_trigger hid_sensor_iio_common xt_mark \
industrialio_triggered_buffer ipt_REJECT nf_reject_ipv4 kfifo_buf xt_LOG industrialio \
xt_addrtype xt_tcpudp snd_soc_skl xt_conntrack nf_conntrack snd_soc_hdac_hda arc4 \
snd_hda_ext_core nf_defrag_ipv4 snd_soc_acpi_intel_match libcrc32c hid_sensor_hub \
joydev snd_soc_acpi iwlmvm snd_soc_skl_ipc mousedev nft_compat intel_ishtp_hid \
snd_soc_sst_ipc intel_rapl intel_wmi_thunderbolt wmi_bmof snd_soc_sst_dsp mac80211 \
nf_tables x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi nfnetlink \
snd_soc_core coretemp snd_hda_codec_conexant snd_hda_codec_generic nls_iso8859_1 \
nls_cp437 kvm_intel snd_hda_intel vfat iwlwifi fat snd_hda_codec intel_cstate \
snd_hwdep intel_uncore snd_hda_core psmouse intel_rapl_perf snd_pcm input_leds  \
cfg80211 snd_timer rtsx_pci_ms memstick mei_me mei intel_pch_thermal intel_ish_ipc \
ucsi_acpi intel_ishtp typec_ucsi typec thinkpad_acpi wmi ledtrig_audio nvram snd \
soundcore rfkill battery tpm_crb ac i2c_hid evdev tpm_tis tpm_tis_core mac_hid tpm \
rng_core pcc_cpufreq ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 \
fscrypto algif_skcipher af_alg hid_logitech_hidpp hid_logitech_dj hid_generic usbhid \
hid dm_crypt dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel \
rtsx_pci_sdmmc serio_raw mmc_core atkbd aesni_intel libps2 aes_x86_64 crypto_simd \
cryptd glue_helper xhci_pci rtsx_pci i8042 serio xhci_hcd i915 intel_gtt i2c_algo_bit \
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgart kvmgt \
                vfio_mdev mdev vfio_iommu_type1 vfio kvm irqbypass
CPU: 0 PID: 8802 Comm: kworker/0:0 Tainted: G                T 5.0.0-rc5 #1
Workqueue: events_freezable ieee80211_restart_work [mac80211]
RIP: 0010:ieee80211_offchannel_return+0x18f/0x1a0 [mac80211]
Code: eb 98 41 83 0f 02 be 10 00 00 00 4c 89 ff e8 d8 22 ff ff e9 69 ff ff ff 31 d2 \
48 89 de 4c 89 ff e8 76 f6 04 00 e9 57 ff ff ff <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f \
                c3 0f 1f 40 00 0f 1f 44 00 00
RSP: 0018:ffff99350967fde0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff8c63172b9f80 RSI: 0000000000000000 RDI: ffff8c6307d907a0
RBP: ffff8c6307d907a0 R08: 0000000000000001 R09: 000000000000038d
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8c63075a28a0
R13: ffff8c6307d907a0 R14: 0000000000000000 R15: ffff8c6307d91438
FS:  0000000000000000(0000) GS:ffff8c6319400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00006d1f075aa000 CR3: 000000020900c006 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __ieee80211_scan_completed+0x1b7/0x3f0 [mac80211]
 ieee80211_scan_cancel+0xd1/0x1f0 [mac80211]
 ieee80211_restart_work+0x99/0xf0 [mac80211]
 process_one_work+0x19b/0x3c0
 worker_thread+0x30/0x380
 ? process_one_work+0x3c0/0x3c0
 kthread+0x113/0x130
 ? kthread_park+0x80/0x80
 ret_from_fork+0x35/0x40
---[ end trace 1ba7bbe576255260 ]---

System:

04:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 88)
        Subsystem: Intel Corporation Wireless 8265 / 8275
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 132
        Region 0: Memory at ec100000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: <access denied>
        Kernel driver in use: iwlwifi
Comment 1 Emmanuel Grumbach 2019-02-06 13:15:48 UTC
We need to debug the firmware here.
Please refer to https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging

Please take the time to read the privacy notice here:
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects
Comment 2 Blanche Schaefer 2019-02-07 20:38:19 UTC
Created attachment 281049 [details]
debug firmware dump

@Emmanuel Grumbach 

I created debug dump after issue occurred using debug version of firmware from
https://wireless.wiki.kernel.org/_media/en/users/drivers/iwlwifi/iwlwifi-8265-36.ucode.gz
I attached encrypted blob with all three developer keys.
Comment 3 Emmanuel Grumbach 2019-02-07 21:25:55 UTC
All right, you've done your part.
Now all we need is time to look at the output...

Thanks.
Comment 4 Emmanuel Grumbach 2019-02-14 07:27:31 UTC
What I see here is that we keep hearing Rx and because of that, we can't transmit anything. I am taking this with the firmware team.

Can you please take a look at https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#about_platform_noise ?

Thanks.
Comment 5 Emmanuel Grumbach 2019-02-14 07:48:41 UTC
Can you please disable power save:

add power_scheme=1 as an option to iwlmvm.

This should help but will consume more power.
Comment 6 Emmanuel Grumbach 2019-02-18 08:14:18 UTC
Hi, do you have an answer regarding the power save disabled?

The firmware team is asking.

Thank you.
Comment 7 Blanche Schaefer 2019-02-18 11:36:47 UTC
I didn't saw any firmware errors since Feb 10 without touching any power save options.

I'm still seeing periodically wifi disconnection messages:
Received Deauthentication event, reason: 3, from_ap: false:
or
Received Deauthentication event, reason: 4, from_ap: false

Do you think those could be related to power save too?
Comment 8 Emmanuel Grumbach 2019-02-18 15:29:23 UTC
The disconnections you see are not related.

Can you please check if the AP changed channel?
You seemed to be on channel 7 when you provided the data.

Thanks.
Comment 9 Blanche Schaefer 2019-02-18 17:31:41 UTC
Yes, the channel has changed. My AP has set "Channel" to "auto" in wireless interface settings so I guess it periodically switches them over. I think I should set some static one.
Comment 10 Emmanuel Grumbach 2019-02-18 18:06:56 UTC
Can you do that please to try to reproduce the problem?

The bug is probably related to the channel you are on.
Comment 11 Blanche Schaefer 2019-02-19 19:34:32 UTC
After changing back to channel 7 I saw those errors again (with powersave enabled). Now I'm going to test with powersave disabled.
Comment 12 Emmanuel Grumbach 2019-02-19 20:06:55 UTC
Great thank you!
Comment 13 Blanche Schaefer 2019-02-22 10:02:52 UTC
Created attachment 281277 [details]
firmware dump

I managed to reproduce errors on channel 7 with power save disabled. This time it's from non-debug firmware version.
Comment 14 Emmanuel Grumbach 2019-02-22 10:26:17 UTC
Ok, strange, can you please attach the dmesg output?

Thanks.

Does it happen a lot?
If yes, would it be possible to reproduce with the debug firmware?
Comment 15 Blanche Schaefer 2019-02-22 11:52:50 UTC
(In reply to Emmanuel Grumbach from comment #14)
> Ok, strange, can you please attach the dmesg output?
> 

iwlwifi 0000:04:00.0: Microcode SW error detected.  Restarting 0x2000000.
iwlwifi 0000:04:00.0: Start IWL Error Log Dump:
iwlwifi 0000:04:00.0: Status: 0x00000100, count: 6
iwlwifi 0000:04:00.0: Loaded firmware version: 36.9f0a2d68.0
iwlwifi 0000:04:00.0: 0x00004425 | ADVANCED_SYSASSERT          
iwlwifi 0000:04:00.0: 0x400002F2 | trm_hw_status0
iwlwifi 0000:04:00.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:04:00.0: 0x0002485C | branchlink2
iwlwifi 0000:04:00.0: 0x0003A7CA | interruptlink1
iwlwifi 0000:04:00.0: 0x00000000 | interruptlink2
iwlwifi 0000:04:00.0: 0x00000001 | data1
iwlwifi 0000:04:00.0: 0x00000628 | data2
iwlwifi 0000:04:00.0: 0x000013B8 | data3
iwlwifi 0000:04:00.0: 0xED4163CD | beacon time
iwlwifi 0000:04:00.0: 0xEA0D7D80 | tsf low
iwlwifi 0000:04:00.0: 0x00000030 | tsf hi
iwlwifi 0000:04:00.0: 0x00000000 | time gp1
iwlwifi 0000:04:00.0: 0x3F8AF7EB | time gp2
iwlwifi 0000:04:00.0: 0x00000001 | uCode revision type
iwlwifi 0000:04:00.0: 0x00000024 | uCode version major
iwlwifi 0000:04:00.0: 0x9F0A2D68 | uCode version minor
iwlwifi 0000:04:00.0: 0x00000230 | hw version
iwlwifi 0000:04:00.0: 0x00489000 | board version
iwlwifi 0000:04:00.0: 0x0000001C | hcmd
iwlwifi 0000:04:00.0: 0x02222042 | isr0
iwlwifi 0000:04:00.0: 0x00000000 | isr1
iwlwifi 0000:04:00.0: 0x08001802 | isr2
iwlwifi 0000:04:00.0: 0x404000C0 | isr3
iwlwifi 0000:04:00.0: 0x00000000 | isr4
iwlwifi 0000:04:00.0: 0x04ED001C | last cmd Id
iwlwifi 0000:04:00.0: 0x00000000 | wait_event
iwlwifi 0000:04:00.0: 0x00006611 | l2p_control
iwlwifi 0000:04:00.0: 0x00001420 | l2p_duration
iwlwifi 0000:04:00.0: 0x00000003 | l2p_mhvalid
iwlwifi 0000:04:00.0: 0x000000EE | l2p_addr_match
iwlwifi 0000:04:00.0: 0x0000000D | lmpm_pmg_sel
iwlwifi 0000:04:00.0: 0x04120134 | timestamp
iwlwifi 0000:04:00.0: 0x00001828 | flow_handler
iwlwifi 0000:04:00.0: Start IWL Error Log Dump:
iwlwifi 0000:04:00.0: Status: 0x00000100, count: 7
iwlwifi 0000:04:00.0: 0x00000070 | NMI_INTERRUPT_LMAC_FATAL
iwlwifi 0000:04:00.0: 0x00000000 | umac branchlink1
iwlwifi 0000:04:00.0: 0xC00868A4 | umac branchlink2
iwlwifi 0000:04:00.0: 0xC0083A94 | umac interruptlink1
iwlwifi 0000:04:00.0: 0xC0083A94 | umac interruptlink2
iwlwifi 0000:04:00.0: 0x00000800 | umac data1
iwlwifi 0000:04:00.0: 0xC0083A94 | umac data2
iwlwifi 0000:04:00.0: 0xDEADBEEF | umac data3
iwlwifi 0000:04:00.0: 0x00000024 | umac major
iwlwifi 0000:04:00.0: 0x9F0A2D68 | umac minor
iwlwifi 0000:04:00.0: 0xC088628C | frame pointer
iwlwifi 0000:04:00.0: 0xC088628C | stack pointer
iwlwifi 0000:04:00.0: 0x0080014E | last host cmd
iwlwifi 0000:04:00.0: 0x00000000 | isr status reg

> Thanks.
> 
> Does it happen a lot?
> If yes, would it be possible to reproduce with the debug firmware?

It happened just once at the end of 2-day testing period. I was just going to report that I can't reproduce it :)

I can't reproduce it on-demand.
Comment 16 Emmanuel Grumbach 2019-02-22 13:25:45 UTC
I see. Problem is that w/o the dump from the debug version we can't be sure that you are seeing the same problem.

Let me know.

Thank you.
Comment 17 Blanche Schaefer 2019-03-10 18:49:31 UTC
I can't reproduce it anymore with power_scheme=1. I previously made a typo so perhaps power saving wasn't disabled properly. If the problems with power save enabled are expected the you can close this.
Comment 18 Emmanuel Grumbach 2019-03-10 19:46:45 UTC
Well, no. The problem should never happen:)

I'll let the firmware know about your observations.
Comment 19 Emmanuel Grumbach 2019-03-20 10:50:46 UTC
Hello,

I have talked to the firmware specialists.
Their conclusion is that the problem is likely to be in the hardware module.
Have you tried on Windows?
If yes, it may be possible to convince your laptop manufacturer to replace your wifi module.

If you are happy with the current workaround (disable power save), then fine, but this will have a small impact on the battery life.

Let us know.

Thank you.
Comment 20 Blanche Schaefer 2019-03-20 12:52:39 UTC
I can't try it on windows. As workaround I just switched to fixed channel in my AP (other than 7  which was flawed) and it works good for me, without losing battery life.

Thank you for your help. Feel free to close this ticket.
Comment 21 Emmanuel Grumbach 2019-03-20 12:55:11 UTC
Ok, Thank you.
Comment 22 Ilya 2019-04-06 06:32:32 UTC
Please reopen, I have exactly the same problem since some weeks, after updating (I use Mint 18.3)
Comment 23 Ilya 2019-04-06 06:33:45 UTC
I'm on kernel 4.15.0-38-generic from Ubuntu 16.04.1 and using Thinkpad T470s
Comment 24 Emmanuel Grumbach 2019-04-06 17:06:51 UTC
please open a new bug.