Bug 203709 - iwlwifi: 8260: frequently disconnects since Linux 5.1 "No beacon heard and the time event is over already" - WIFI-25906
Summary: iwlwifi: 8260: frequently disconnects since Linux 5.1 "No beacon heard and th...
Status: ASSIGNED
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Intel Linux Wireless
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-25 23:06 UTC by Denis Lisov
Modified: 2019-09-18 19:58 UTC (History)
8 users (show)

See Also:
Kernel Version: 5.1.5
Tree: Mainline
Regression: Yes


Attachments
concatenation of full dmesg and post-boot logs of kernel+NM+wpa_supplicant (377.13 KB, text/plain)
2019-05-25 23:06 UTC, Denis Lisov
Details
journal excerpt for intel 8265 NIC on iwd/networkd, iwlwifi debug=0x20 (37.00 KB, text/plain)
2019-07-26 19:24 UTC, Ronan Pigott
Details
iwlwifi trace (5.2.9 w/patch from comment 6) (1.71 MB, application/octet-stream)
2019-08-23 19:06 UTC, Denis Lisov
Details
trace-cmd: Intel Corporation Wireless 7260 (rev 83) (1.45 MB, application/x-xz)
2019-08-27 22:37 UTC, brian
Details
Intel 7260 syslog accompanying the trace (1.56 KB, text/plain)
2019-08-27 22:38 UTC, brian
Details

Description Denis Lisov 2019-05-25 23:06:34 UTC
Created attachment 282941 [details]
concatenation of full dmesg and post-boot logs of kernel+NM+wpa_supplicant

Hardware: Thinkpad P50 with Intel Corporation Wireless 8260 [8086:24f3]
Software: Gentoo Linux with vanilla kernel, NetworkManager using wpa_supplicant

In Linux 5.0 the WiFi connection to my home network is rather stable. In Linux 5.1 (5.1.5 at the moment) it's unstable with the following symptoms.

Ping check to the router looks like this
64 bytes from 192.168.183.1: icmp_seq=54 ttl=64 time=5.81 ms
64 bytes from 192.168.183.1: icmp_seq=55 ttl=64 time=1.00 ms
64 bytes from 192.168.183.1: icmp_seq=56 ttl=64 time=1.17 ms
64 bytes from 192.168.183.1: icmp_seq=57 ttl=64 time=3.31 ms
From 192.168.183.5 icmp_seq=58 Destination Host Unreachable
From 192.168.183.5 icmp_seq=59 Destination Host Unreachable
From 192.168.183.5 icmp_seq=60 Destination Host Unreachable
From 192.168.183.5 icmp_seq=65 Destination Host Unreachable
From 192.168.183.5 icmp_seq=66 Destination Host Unreachable
From 192.168.183.5 icmp_seq=67 Destination Host Unreachable
From 192.168.183.5 icmp_seq=68 Destination Host Unreachable
From 192.168.183.5 icmp_seq=69 Destination Host Unreachable
From 192.168.183.5 icmp_seq=70 Destination Host Unreachable
64 bytes from 192.168.183.1: icmp_seq=78 ttl=64 time=1.07 ms
64 bytes from 192.168.183.1: icmp_seq=79 ttl=64 time=0.940 ms
64 bytes from 192.168.183.1: icmp_seq=80 ttl=64 time=0.951 ms
64 bytes from 192.168.183.1: icmp_seq=81 ttl=64 time=0.950 ms
64 bytes from 192.168.183.1: icmp_seq=82 ttl=64 time=1.17 ms
and so on, changing from ok to host unreachable a few times per minute.

The dmesg output looks like this (full log in attachment):
[  408.580520] wlp4s0: associated
[  420.822273] wlp4s0: Connection to AP ec:43:f6:07:90:84 lost
[  423.420415] wlp4s0: authenticate with ec:43:f6:07:90:84
[  423.431798] wlp4s0: send auth to ec:43:f6:07:90:84 (try 1/3)
[  423.436016] wlp4s0: authenticated
[  423.438864] wlp4s0: associate with ec:43:f6:07:90:84 (try 1/3)
[  423.542285] wlp4s0: associate with ec:43:f6:07:90:84 (try 2/3)
[  423.550083] wlp4s0: RX AssocResp from ec:43:f6:07:90:84 (capab=0xc11 status=0 aid=3)
[  423.552862] wlp4s0: associated
[  424.045586] iwlwifi 0000:04:00.0: No beacon heard and the time event is over already...
[  424.045622] wlp4s0: Connection to AP ec:43:f6:07:90:84 lost
[  425.178388] wlp4s0: authenticate with ec:43:f6:07:90:84
[  425.188576] wlp4s0: send auth to ec:43:f6:07:90:84 (try 1/3)
[  425.191274] wlp4s0: authenticated
[  425.192183] wlp4s0: associate with ec:43:f6:07:90:84 (try 1/3)
[  425.202483] wlp4s0: RX AssocResp from ec:43:f6:07:90:84 (capab=0xc11 status=0 aid=3)
[  425.206217] wlp4s0: associated
[  425.802779] iwlwifi 0000:04:00.0: No beacon heard and the time event is over already...
[  425.802814] wlp4s0: Connection to AP ec:43:f6:07:90:84 lost
[  427.060487] wlp4s0: authenticate with ec:43:f6:07:90:84
[  427.069870] wlp4s0: send auth to ec:43:f6:07:90:84 (try 1/3)
[  427.072586] wlp4s0: authenticated
[  427.078966] wlp4s0: associate with ec:43:f6:07:90:84 (try 1/3)
[  427.185781] wlp4s0: associate with ec:43:f6:07:90:84 (try 2/3)
[  427.222630] wlp4s0: RX AssocResp from ec:43:f6:07:90:84 (capab=0xc11 status=0 aid=3)
[  427.225860] wlp4s0: associated
[  427.683876] iwlwifi 0000:04:00.0: No beacon heard and the time event is over already...
[  427.683911] wlp4s0: Connection to AP ec:43:f6:07:90:84 lost
[  440.104459] wlp4s0: authenticate with ec:43:f6:07:90:84
[  440.112495] wlp4s0: send auth to ec:43:f6:07:90:84 (try 1/3)
[  440.116197] wlp4s0: authenticated
[  440.119343] wlp4s0: associate with ec:43:f6:07:90:84 (try 1/3)
[  440.124865] wlp4s0: RX AssocResp from ec:43:f6:07:90:84 (capab=0xc11 status=0 aid=3)
[  440.129543] wlp4s0: associated
[  440.750534] IPv6: ADDRCONF(NETDEV_CHANGE): wlp4s0: link becomes ready
[  452.055008] wlp4s0: Connection to AP ec:43:f6:07:90:84 lost

The corresponding part of NetworkManager logs is attached too.
Comment 1 Ronan Pigott 2019-07-26 19:24:22 UTC
Created attachment 283993 [details]
journal excerpt for intel 8265 NIC on iwd/networkd, iwlwifi debug=0x20
Comment 2 Ronan Pigott 2019-07-26 19:26:16 UTC
I am experiencing what seems to be the same issue on the intel 8265.

```
$ lspci -kvnn | sed -n '/Network/,/^$/ p'
02:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 [8086:24fd] (rev 78)
	Subsystem: Intel Corporation Dual Band Wireless-AC 8265 [8086:1010]
	Flags: bus master, fast devsel, latency 0, IRQ 141
	Memory at edc00000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi
```
In my experience the interface can remain stable for a long time, but occasionally will disconnect and then struggles to reassociate/authenticate for a while before becoming stable again. It sometimes records the 'no beacon heard' message and always disconnects with reason 4, being inactivity.

I'm using a different network stack, so I'll leave journal excerpts in case the iwd logging is more helpful. I've got some debugging set for iwlwifi as I was trying to debug my issue.

$ printf %#x\n $(cat /sys/module/iwlwifi/parameters/debug)
0x20

$ journalctl --boot=517d350fbd224b28a65e491ff95036db _KERNEL_DEVICE=+pci:0000:02:00.0 + _SYSTEMD_UNIT={iwd,systemd-network}.service
(attached)
Comment 3 Ronan Pigott 2019-07-26 19:35:49 UTC
forgot to mention -- in my case,

$ uname -r
5.2.2-arch1-1-ARCH
Comment 4 bugzilla.kernel.org 2019-08-01 01:57:29 UTC
Can see this occuring Intel Corporation Wireless 7260 as well on `5.1.15-arch1-1-ARCH`.
Comment 5 kurmikon 2019-08-12 20:29:36 UTC
It's happening also on my Intel 3165, even on Linux 5.0.
Comment 6 Luca Coelho 2019-08-22 05:22:33 UTC
This could be related to TX-AMSDU, which we unfortunately had to disable for older devices.  Can you try this and see if it helps you with stability?

https://patchwork.kernel.org/patch/11029027/
Comment 7 kurmikon 2019-08-22 06:25:42 UTC
Already used that patch compiling the kernel, but I had same issues.
Comment 8 Ronan Pigott 2019-08-22 06:34:28 UTC
If I'm not mistaken, patch 11029027 has been backported to the Arch core/linux since the release of version 5.2.arch2.

See here for the first appearance of the patch in Arch linux.git (v5.2, since 07/09): https://git.archlinux.org/linux.git/commit/?h=v5.2-arch2&id=f6f6b798c7330d7851fee4bbe835c3954886fc3f

Additionally see here for the inclusion of the patch in Arch linux.git (v5.2.2):  https://git.archlinux.org/linux.git/commit/?h=v5.2.2-arch1&id=d426d34276c9f17081179be6497cddd74154556a

Which means that the TX-AMSDU patch had already been applied when I encountered the error as I described above, and applied for the boot where my journal excerpt was taken and attached above.

Additionally, I have examples in my journal of what appears to be the same error (cycling disconnect with iwlwifi time event over already error) in 5.1.15-arch1, which predates the application of that patch, and another isolated example all the way back in 5.1.4-arch1 (a different AP as well), which predates the authorship of that patch. I can attach more logs if you would like.

So I have examples of the error with and without the patch.

For better or for worse, this error has somehow become much less common for me. The most recent example of the iwlwifi error for me was one spurious disconnect 1 week ago, in 5.2.7-arch1, even though I use this laptop every day.
Comment 9 Luca Coelho 2019-08-22 06:36:58 UTC
Thanks kurmikon@libero.it! But your device is very different from Denis' and Ronan's... I think it would be best to create a new bug report here and provide all the logs we recommend in our wiki[1]

[1] https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging
Comment 10 Luca Coelho 2019-08-22 06:38:56 UTC
Thanks for all the info, Ronan!

Can you also provide trace-cmd logs as explained in the wiki I linked in comment 9?
Comment 11 Ronan Pigott 2019-08-22 07:03:20 UTC
Okay, If I get the error again, I'll trace it.

I don't have a reliable way to reproduce atm, so if anyone experiencing this issue knows of a method that may trigger it, please let me know.
Comment 12 Denis Lisov 2019-08-23 09:22:21 UTC
I've tried the patch suggested in comment 6 on top of vanilla 5.2.9 and the bug seems to be still there. Anything else to try? Should I provide the trace-cmd trace too?
Comment 13 Luca Coelho 2019-08-23 09:24:58 UTC
Thanks for testing, Denis.

So, if the issue is not related, we could use some traces as mentioned in comment #9.
Comment 14 Denis Lisov 2019-08-23 19:06:20 UTC
Created attachment 284575 [details]
iwlwifi trace (5.2.9 w/patch from comment 6)

I've generated a short trace during which there were a number of lost ICMP packets and messages from wpa_supplicant in the log. The kernel was 5.2.9 + patch from comment 6. The command used was

trace-cmd record -e iwlwifi -e mac80211 -e cfg80211 -e iwlwifi_msg
Comment 15 Luca Coelho 2019-08-26 06:19:56 UTC
Thanks again.  We will assign someone internally to look into this.
Comment 16 brian 2019-08-27 22:37:12 UTC
Created attachment 284643 [details]
trace-cmd: Intel Corporation Wireless 7260 (rev 83)

Trace from Intel 7260 in ThinkPad T440s attached!
Comment 17 brian 2019-08-27 22:38:43 UTC
Created attachment 284645 [details]
Intel 7260 syslog accompanying the trace

This may help correlate timestamps in the trace.
Comment 18 brian 2019-08-27 22:40:00 UTC
Intel 7260 seems to be affected.

Linux version 5.2.9-arch1-1-ARCH (builduser@heftig-119803) (gcc version 9.1.0 (GCC)) #1 SMP PREEMPT Fri Aug 16 11:29:43 UTC 2019
Comment 19 brian 2019-08-31 19:04:25 UTC
Confirmed likely a software issue. Just installed another 7260 in same machine. Same issue.
Comment 20 Denis Lisov 2019-09-18 19:58:09 UTC
The problem is still present in 5.3.0.

Maybe they're independent, but I've also seen multiple firmware errors in GEO_TX_POWER_LIMIT (looking like https://bugzilla.kernel.org/show_bug.cgi?id=204151) on 5.3 while checking for this bug.

Note You need to log in before you can comment on or make changes to this bug.