Bug 194717
Summary: | iwlwifi: 8260: traffic loss upon initiator delba - WIFILNX-676 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Matt Berry (matthew.william.berry) |
Component: | network-wireless | Assignee: | DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | luca, matthew.william.berry, mike.cloaked, roee.zamir |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.11.6-3-ARCH | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
wpa.log with debugging on. Exhibits a 15+ period of no access after which one webpage loaded and all subsequent access was lost
dmesg from a session that included several periods of no network access Core24 with uSniffer Core28 with uSniffer fix bar fix bar fix bar do not send delBA |
Created attachment 254929 [details]
dmesg from a session that included several periods of no network access
Hi, sorry for the long delay. One of our developers has been assigned to work on this bug. Can you please record tracing of the problem? Please refer to [1] to check how to do this. This will allow us to see what is the interaction between the wpa_s and the kernel. [1] https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#tracing Apologies, I was traveling. I'm back in my proper timezone so I'll work on getting you a trace today. I've capture two traces, totaling about 280 MB compressed. What's the procedure for submitting them? I also captured the WPA Supplicant log and dmesg. Dropbox, Google Drive, everything goes :) Mail a self-addressed stamped envelope to P.O. Box 99999..... You said everything goes! But just in case, here's some links. https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-1.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-2.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-3.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/dmesg.log.2017-06-04.gz https://s3.amazonaws.com/maberry-temporary/2017-06-04/wpa.log.2017-06-04.gz Trace #1 captures the beginning of a problematic time period. Just like in the original problem description the card reported being associated with the AP but there was no network connectivity. That trace also captures me applying the only known workaround, which is to "sudo netctl switch-to <NetworkProfile>". Trace #2 was started just after a problematic time period (I ran out of room in /tmp and had to restart). Unlike the last trace, network connectivity restored itself this time after a few minutes. Trace #3 starts with the card un-associated. I then try to associate to an AP and the process fails. The WPA supplicant log makes it look like it succeeded, indicating that association and authentication worked. But I never obtained network connectivity. The netctl command hangs, I see the network name appear in the status bar, and then it disappears and I end up un-associated again. The WPA log indicates it was told to terminate, but that was not a conscious action on my part. The WPA log and dmesg links cover boot until traces 1 and 2. Trace #3 has it's own WPA log and dmesg files in the tarball. Those logs start after the end of the WPA log / dmesg from # 1 & 2. Seems Markdown concatenated my links because I didn't leave enough space between them. Here they are again. https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-1.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-2.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-3.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/dmesg.log.2017-06-04.gz https://s3.amazonaws.com/maberry-temporary/2017-06-04/wpa.log.2017-06-04.gz Thanks - I attached those to our internal tracker and the assignee will analyse them. I wanted to let you know that as of the last OS update I performed, I'm now getting a new iwlwifi related error message: "capa flags index 3 larger than supported by driver". slate% dmesg | grep iwlwifi [ 7.331132] iwlwifi 0000:04:00.0: enabling device (0000 -> 0002) [ 7.333791] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-28.ucode failed with error -2 [ 7.339846] iwlwifi 0000:04:00.0: capa flags index 3 larger than supported by driver [ 7.340242] iwlwifi 0000:04:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm [ 7.355382] iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208 [ 7.357592] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 7.358763] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 7.496753] iwlwifi 0000:04:00.0 wlp4s0: renamed from wlan0 [ 12.094184] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 12.094706] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 12.223120] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 12.223666] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled slate% uname -a Linux slate 4.11.5-1-ARCH #1 SMP PREEMPT Wed Jun 14 16:19:27 CEST 2017 x86_64 GNU/Linux slate% pacman -Qi linux-firmware Name : linux-firmware Version : 20170422.ade8332-1 Hi Matt, Can you please report this as a separate bug? I'll take a look into it ASAP. Hi Matt, seems that trace.dat file in trace-cmd-3 corrupted and can't be parsed. can you try to parse[1] it? in addition air capture[2] will really help [1] $ trace-cmd report -i trace.dat [2] https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#air_sniffing Thanks, Roee Roee, Parsing trace-cmd-3 works fine on my end. I've captured the output of `trace-cmd report -i trace-cmd-3/trace.dat` as `trace.dat.decoded.gpg` and you can download it from here: https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace.dat.decoded.gpg For the air capture, would it be more useful if you had a capture from the perspective of the problematic device or another wireless card in proximity to it (or both for the same period of time)? Thanks, Matt! I'll sync internally with Roee to help him decode the trace-cmd logs you captured. Matt, It seems that trace-3 works fine, we actually cannot decode trace-1. Can you double-check that one? And we actually need the binary .dat, because we have some extra (non-public) scripts to parse it (you can still encrypt is as usual). Figures, I ran out of room in /tmp while recording trace-1. I'll get you a new trace. For the air capture, do you want me to perform it on the problematic device itself, or would you like to see the AP and the problematic device from the perspective of another wireless client? Here's trace-cmd-4. This trace starts out unassociated, I then associate, time passes, the problem begins, I let the problem persist for 5 minutes while I fruitlessly ssh/dig/ping, and then finally I bring the network down and back up and access is restored. The timeline file records when I noticed the problem begin (not an exact time) and the time at which I cycled the network to restore access. Also included is dmesg and wpa_supplicant logs for the affected timeframe. https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-4.tar.gz.gpg Thanks! We'll look into the new logs. Regarding the air capture, it's better to do it from a third device, running in monitor mode, because the problematic may be missing something and that could be the cause of the problem. But I think in this case we may not need it, so we'll come back to you if we really need it. Hi Matt, We've looked at the new logs and our conclusions are that for some reason that we can't determine right now, we can't hear any *unicast* packet from the AP starting from a certain point in time. This point in time doesn't exhibit any particularity that could explain the lack of unicast Rx packet afterwards. We can't know *why* we can't hear any unicast packet, because the log we have right now is at the driver level, so all we can say is that the driver doesn't get any new unicast packet from the AP. We know that the driver isn't even getting those (as opposed to getting them and dropping them), because the log you provided includes all the packets at firmware / driver interface which is way before the layer in the driver that drops packet (duplicates, decryption issues and all that kind stuff). Now we need to understand why the firmware doesn't give us new unicast packets. It can be: * the AP stops sending them because it gets stuck on RTS / CTS (may be our bug: us not responding to RTS). * the AP stops sending them because of another reason * the AP does send the packets but the firmware doesn't hear them (unlikely since we do hear the multicast packets, and we'd expect the AP to retry until it reaches a low rate at which we should really hear the packet). * the AP does sent the packet and the firmware hears it, but the firmware drops it for an unknown reason. An air sniffer capture will be useful here, but maybe there is something easier that can be done and can teach us what is happening here. I'll attach a firmware with debug enabled, and I'd like you to follow the procedure described here: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging This will allow to collect a dump of the firmware. We will know what the firmware is doing. I'd like you to create the dump (echo 1 > /sys/.../fw-dbg-collect) when the pings stop working. Please create a few dumps by repeating the echo command a few times (and collecting the data either manually or by the udev rule) while waiting ~10 seconds between the dumps. 3 dumps should be more than enough. This should be easier to do than having and air sniffer and it can teach us a lot of things. Please take the time to read the privacy notice here: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects Thank you. Oh, I also see that you use -27.ucode. We published -31.ucode, it may be worthwhile to try it (check our backport based dirver here: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/core_release You should Core29. Created attachment 257233 [details]
Core24 with uSniffer
This is the firmware with debug enabled.
Thank you.
Did I get the correct attachment? I don't seem to have the requisite sysfs entries after rebooting. ``` slate% sudo openssl dgst -sha256 /usr/lib/firmware/iwlwifi-8000C-27* SHA256(/usr/lib/firmware/iwlwifi-8000C-27.ucode)= 0947146a61ce9ecffc73fb110186c3fe7cc04097aa260627c6f22ab4dd23cb46 SHA256(/usr/lib/firmware/iwlwifi-8000C-27.ucode.debug)= 0947146a61ce9ecffc73fb110186c3fe7cc04097aa260627c6f22ab4dd23cb46 SHA256(/usr/lib/firmware/iwlwifi-8000C-27.ucode.original)= a4432158038960fd9dccace083f77c050393c3a8117cccad5535bf7e24d99ea7 ``` ``` slate% sudo ls /sys/kernel/debug/iwlwifi ls: cannot access '/sys/kernel/debug/iwlwifi': No such file or directory ``` ``` slate% dmesg | grep iwlwifi | grep loaded [ 142.844963] iwlwifi 0000:04:00.0: loaded firmware version 27.546686.0 op_mode iwlmvm ``` Here's what I do have ``` slate% sudo find /sys -name '*iwlwifi*' /sys/kernel/debug/tracing/events/iwlwifi_io /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_ioread32 /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_iowrite8 /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_iowrite32 /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_iowrite64 /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_iowrite_prph32 /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_iowrite_prph64 /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_ioread_prph32 /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_irq /sys/kernel/debug/tracing/events/iwlwifi_io/iwlwifi_dev_ict_read /sys/kernel/debug/tracing/events/iwlwifi_ucode /sys/kernel/debug/tracing/events/iwlwifi_ucode/iwlwifi_dev_ucode_cont_event /sys/kernel/debug/tracing/events/iwlwifi_ucode/iwlwifi_dev_ucode_wrap_event /sys/kernel/debug/tracing/events/iwlwifi_msg /sys/kernel/debug/tracing/events/iwlwifi_msg/iwlwifi_err /sys/kernel/debug/tracing/events/iwlwifi_msg/iwlwifi_warn /sys/kernel/debug/tracing/events/iwlwifi_msg/iwlwifi_info /sys/kernel/debug/tracing/events/iwlwifi_msg/iwlwifi_crit /sys/kernel/debug/tracing/events/iwlwifi_msg/iwlwifi_dbg /sys/kernel/debug/tracing/events/iwlwifi_data /sys/kernel/debug/tracing/events/iwlwifi_data/iwlwifi_dev_tx_data /sys/kernel/debug/tracing/events/iwlwifi_data/iwlwifi_dev_tx_tso_chunk /sys/kernel/debug/tracing/events/iwlwifi_data/iwlwifi_dev_rx_data /sys/kernel/debug/tracing/events/iwlwifi /sys/kernel/debug/tracing/events/iwlwifi/iwlwifi_dev_hcmd /sys/kernel/debug/tracing/events/iwlwifi/iwlwifi_dev_rx /sys/kernel/debug/tracing/events/iwlwifi/iwlwifi_dev_tx /sys/kernel/debug/tracing/events/iwlwifi/iwlwifi_dev_ucode_error /sys/kernel/debug/tracing/events/iwlwifi/iwlwifi_dev_ucode_event /sys/bus/pci/drivers/iwlwifi /sys/module/iwlwifi /sys/module/iwlwifi/drivers/pci:iwlwifi /sys/module/cfg80211/holders/iwlwifi ``` That's because you don't have debugfs enabled. Can you compile a driver with CONFIG_IWLWIFI_DEBUGFS enabled? You can try out our backport version if you want. That will allow you not to compile a full kernel... Not sure if you're up early or late :P I just recompiled the backports tree while working on https://bugzilla.kernel.org/show_bug.cgi?id=196195 . I'll report back in a bit with some dumps for you. Would it be worth while to get dumps from the Core30 firmware now that I'm on the backports branch? If you attach it I will test it. I can't share Core30 firmware yet, but I can give you a core24 firmware. That will have to wait a few hours though. Sorry, I meant a core 28 firmware. No rush, I'm on PDT time so I'm about to head to sleep. Created attachment 257245 [details]
Core28 with uSniffer
Hi,
here is Core28 with uSniffer for the firmware debug data collection.
Thanks.
Here is a link to trace-cmd-5, which covers a period where several FW dumps were taken. This is on the backports driver using the provided -27 microcode. dmesg, iw events, and wpa_supplicant logs are also included. The most interesting bit about this round of data is that the problem recovered after some time, which has happened before but is not typical of this problem. Normally I need to take the network down to restore connectivity. https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-5.tar.gz.gpg I will try to collect more data tomorrow with the -31 microcode, as it's well past midnight at this point. No rush. I won't be able to look at this until Sunday anyway. Please make sure to produce the dumps while the traffic stalls. Thanks I am starting to look at this. Can you please check in your router if you can disable A-MSDU? It can be called aggregations or something like that. Thanks. Sadly I don't think so. This router's stock firmware has only the most minimal set of configuration options. Basically just channel width and mixed-mode/802.11ac-only-mode. I don't have any of the options my old DD-WRT or Tomato APs had. Nope. Not needed. Can you please load iwlmvm with power_scheme=1 instead? Since iwlmvm is automatically loaded, you'll need to add an option in iwlwifi.conf file. Thanks. I am checking with the firmware team. So I have spent a bunch of time on the traces and the firmware dumps. I can't see anything obvious besides the fact that the client seems to have lots its IP address. I can't be sure since I don't have all the data. At this point, I have to ask for more sensitive data. Can you send a tcpdump capture of the wlan interface? Please do the following *when you detect the traffic loss*: sudo tcpdump -w traffic.pcap -i wlp4s0 (this window will be stuck) Open a new window and in that window, run a few pings to your router then, stop the tcpdump and send the traffic.pcap file. Please don't let traffic.pcap run all the time since it will just add misleading info. I have to emphasis the fact that tcpdump will contain all the data in plaintext. So that will basically give us access to the actual data you are sending. SSL connections remain encrypted at the app layer of course. What I am looking for is the explanation of a weird pattern that I see: I see that we send a packet, and the router replies with a BCAST DATA packet. I bet that this packet is an ARP request, but I can't know with the current data you provided. Thanks! Sorry for the long delay. Took a long vacation around the 4th ('Murica, and so on). I apologize for the large size of this capture, I had a hard time getting the problem to reproduce. I still have not identified a sure-fire way to trigger the issue. Because it took so long to get a capture, I have uploaded two tarballs. One tarball (trace-cmd-6) contains the trace-cmd along with the pcaps, wpa_supplicant, and dmesg. The other tarball (pcap-only-6) contains the pcaps, wpa_supplicant, and dmesg but doesn't contain the trace-cmd due to the size. The timeline for this capture involves a good 30 minutes or so of nothing interesting (refreshing this bug report and netcat'ing /dev/urandom from another computer on the network). Then there were two events worth looking at. For each event I stopped all other (attempted) network traffic and just pinged the local gateway (10.0.0.1). Each event has it's own pcap file and there are approximate start/end timestamps in the tarballs. 422 MB: trace-cmd-6: https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-6.tar.gz.gpg 41 KB: pcap-only-6: https://s3.amazonaws.com/maberry-temporary/2017-06-04/pcap-only-6.tar.gz.gpg I'll analyse the data. I have a strong feeling that the issue is related to 11n, can you try to check what happens if you load iwlwifi with 11n_disable=1 ? Thanks. It's hard to prove a negative, but I've set `11n_disable=1`. If I don't see the problem by tomorrow night I'd say that's a significant sample size. I should note that the AP was configured for 802.11ac-only and 80 MHz bandwidth, so I had to change it to Mixed mode operation and 40 MHz bandwidth for this test. Ok. The next step will be to disable ampdu only. I hope I'll find the time to look at your logs today or tomorrow. Since disabling 802.11n yesterday I have not experienced the issue. Of course the speed is noticeably slower, but it consistently works. awesome. I haven't found the time to look at your logs yet. It will happen soon I hope. Now we can start to disable less and less features to see what is going wrong. Can you please run with 11n_disable=2 ? This will disable Tx A-MPDU, but the Rx speed will get back to what it used to be. thanks. I've adjusted 11n_disable=2 and for the sake of changing just one thing at a time left the AP at Mixed Mode and 40 MHz bandwidth. I'll report back tomorrow if I don't see any recurrences or sooner if I do. I've not noticed the problem re-occur while 11n_disable=2 All right, can you please put your AP in the same mode as before? I looked at your pcap files and indeed, we are stuck with ARP requests. Digging in the trace now... I spent quality time on your data and I am starting to fear a nasty bug on the AP side. Can you please test this patch? diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c index cb922c59eb59..23751a7e02a2 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c @@ -628,7 +628,7 @@ static int rs_tl_turn_on_agg_for_tid(struct iwl_mvm *mvm, IWL_DEBUG_HT(mvm, "Starting Tx agg: STA: %pM tid: %d\n", sta->addr, tid); - ret = ieee80211_start_tx_ba_session(sta, tid, 5000); + ret = ieee80211_start_tx_ba_session(sta, tid, 0); if (ret == -EAGAIN) { /* * driver and mac80211 is out of sync Thanks. Of course, when you test the patch above, I'll need you to remove the 11n_disable module parameter (or set it to 0 which is the default). thanks. BTW - Linksys's website seem to talk about firmware updates that improve wireless stability. Can you check what firmware you have on the AP? http://cache-www.belkin.com/support/dl/EA8500_1.1.4.169978_Customer_Release_Notes.txt Looks like my firmware is ahead of everything on that page. Firmware Version: 1.1.4.173728 Firmware Builddate: 2016-06-21 13:55 Product.type: production Linux: Linux version 3.4.103 (root@build-vm) (gcc version 4.6.3 20120201 (prerelease) (Linaro GCC 4.6-2012.02) ) #1 SMP PREEMPT Tue Jun 21 05:36:27 PDT 2016 Board: wraith I'm going to start with the AP settings and let that go for a bit. After I'm convinced the problem is not recurring I'll try your patch. Any chance to have (hopefully good) news before the week end? :) Apologies for the long delay. Had a rather busy week. An added complication is that one of the recent commits to release/LinuxCore30 horribly breaks my system. I ended up rolling back to the commit used for #196195, which is 8f1a7f43d199ef47f466040c3d0160d1e610979f. Also needed to account for an interface change in the kernel (https://github.com/torvalds/linux/commit/cf124db566e6b036b8bcbe8decbed740bdfac8c6). AP is setup back in the baseline configuration (802.11ac, 80MHz bw). 11n_disable back in the default configuration (0) I'll try this out for 24 hours and see if anything happens (hopefully not). Here's my current diff over 8f1a7f43 ``` diff --git a/drivers/net/wireless/intel/iwlwifi/fw/file.h b/drivers/net/wireless/intel/iwlwifi/fw/file.h index 01817510..cbdc127f 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/file.h +++ b/drivers/net/wireless/intel/iwlwifi/fw/file.h @@ -341,6 +341,7 @@ typedef unsigned int __bitwise iwl_ucode_tlv_capa_t; * @IWL_UCODE_TLV_CAPA_TX_POWER_ACK: reduced TX power API has larger * command size (command version 4) that supports toggling ACK TX * power reduction. + * @IWL_UCODE_TLV_CAPA_MLME_OFFLOAD: supports MLME offload * * @NUM_IWL_UCODE_TLV_CAPA: number of bits used */ @@ -391,6 +392,7 @@ enum iwl_ucode_tlv_capa { IWL_UCODE_TLV_CAPA_EXTEND_SHARED_MEM_CFG = (__force iwl_ucode_tlv_capa_t)80, IWL_UCODE_TLV_CAPA_LQM_SUPPORT = (__force iwl_ucode_tlv_capa_t)81, IWL_UCODE_TLV_CAPA_TX_POWER_ACK = (__force iwl_ucode_tlv_capa_t)84, + IWL_UCODE_TLV_CAPA_MLME_OFFLOAD = (__force iwl_ucode_tlv_capa_t)96, NUM_IWL_UCODE_TLV_CAPA #ifdef __CHECKER__ diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c index d5f2c83f..ba46556a 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rs.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs.c @@ -622,7 +622,7 @@ static int rs_tl_turn_on_agg_for_tid(struct iwl_mvm *mvm, IWL_DEBUG_HT(mvm, "Starting Tx agg: STA: %pM tid: %d\n", sta->addr, tid); - ret = ieee80211_start_tx_ba_session(sta, tid, 5000); + ret = ieee80211_start_tx_ba_session(sta, tid, 0); if (ret == -EAGAIN) { /* * driver and mac80211 is out of sync diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index ec282eaf..277073ac 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -1272,7 +1272,8 @@ static void ieee80211_if_setup(struct net_device *dev) ether_setup(dev); dev->priv_flags &= ~IFF_TX_SKB_SHARING; dev->netdev_ops = &ieee80211_dataif_ops; - dev->destructor = ieee80211_if_free; + dev->needs_free_netdev = true; + dev->priv_destructor = ieee80211_if_free; } static void ieee80211_if_setup_no_queue(struct net_device *dev) ``` Well that didn't take long. The problem occurred so quickly I didn't have a chance to start a trace, so trace-cmd-7 starts in the middle of an event and spans the time until recovery. As I went to go upload trace-cmd-7 another event started so again trace-cmd-8 missed the start of the event but recorded recovery. I tried to get a trace that included the beginning of an event, but after the first two occurrences it has not recurred again. I'll keep trying unless you indicate you don't need it. https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-7.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-8.tar.gz.gpg Regarding the issues you had with compilation, we are testing the merge back from upstream. When that will be fine, we'll have the destructor change supported in our tree. The fact that the traffic stall is still happening seems to mean that my guess was wrong. At this point, I can't really do much without an air sniffer. Please check https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#air_sniffing to see how to use a separate system as an air sniffer. Please make sure that you properly enter the channel definition. Thanks. I'm actively recording an air capture. The recording device is an Intel 7265. Although the Intel site claims that chip is 802.11ac, it is refusing to associate on an 80 MHz channel width (AssocResp status is 104, which isn't even in the spec). So I have the AP setup for Mixed mode, 40 MHz bandwidth. Hopefully what you need will appear in the capture as I don't have any other 802.11ac devices around. After an hour of browsing the problem has not occurred, which is definitely on the higher end of the frequency distribution for occurrences. We have seen the issue occur on 40 Mhz channel width, so it's possible I've just been very unlucky today. You mentioned the issue was observed during ARP requests? I tried to instigate the issue by periodically flushing the ARP cache on the client side to no avail. I don't think I can coerce the AP to flush its cache. Additionally, the pcap size is only 24 bytes, so I don't think it actually picked anything up. I verified the recording device reports the same channel, frequency, channel width, and center frequency on both devices. ``` # Recording device: $ iw dev wlo1 info Interface wlo1 ifindex 3 wdev 0x1 addr 64:80:99:XX:XX:XX type monitor wiphy 0 channel 40 (5200 MHz), width: 40 MHz, center1: 5190 MHz txpower 22.00 dBm # Problem device: $ iw dev wlp4s0 info | grep channel channel 40 (5200 MHz), width: 40 MHz, center1: 5190 MHz ``` The command used to record was: ``` sudo tcpdump -i wlo1 -w capture.pcap -C 2 2>&1 | tee capture.log ``` Why did you add -C2 to tcpdump? The man pages day it limits the size of the capture. I interpreted it as creating a series of size-limited output files. I tend to do this when I don't know in advance the size of the capture.
> -C file_size
Before writing a raw packet to a savefile, check whether the file is currently larger than file_size and, if so, close the current savefile and open a new one.
Created attachment 257685 [details]
fix bar
Hi,
people here identified a very rare case that can cause issues.
I doubt it is what you are suffering from since I didn't see any BAR in your captures. Can you please nevertheless give it a try?
PS: what happens when you remove the -C2 from tcpdump?
thanks.
Created attachment 257687 [details]
fix bar
fix the wrap around in the patch
Note that this patch is absolutely not tested...
Created attachment 257689 [details]
fix bar
yet another (last?) version.
Kinda late here.
Nope - this patch is absolute irrelevant. Sorry. Any news w/o the -C2 parameter to tcpdump? 2bytes is a bit short... I'm going to give it another shot right now. The -C2 was irrelevant. I don't know the root cause of why that capture didn't log anything in that one instance, but I suspect I misconfigured the frequency or channel width or something. After I had a known working capture setup I ran a simple experiment of refreshing the browser once while capturing. Both with and without -C2 captured ~4k packets each. So good news, I managed to capture an event in a reasonable amount of time. trace-cmd-9 is a mere 19 MB. Much better than that 400MB behemoth from a few weeks ago. It covers the entire span of an event. I began pinging the local gateway and taking firmware dumps approx. every 10s until 20s or so after the event self-resolved. As a reminder, I'm still on the patched backport branch which includes the patch from Comment 44. The firmware is the uSniffer-27 firmware. The AP settings are Mixed Mode, 40 MHz channel width. https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-9.tar.gz.gpg An additional point of interest: the computer recording the traffic suffered a microcode software error during the event. The dmesg of the recording device is included as capture.dmesg in the tarball for reference. Anecdotally, back in February before you picked up the bug report I attempted the air sniff procedure on my own and noticed that when the event occurred on the problem device, the entire OS on the recording device froze. It was the same recording device then as now, but it was not the same software stack. The capture device crashed because I forgot to mention this to you: https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi#about_the_monitorsniffer_mode looking at the data now. I hope that the capture has what you need then even without the amsdu_size argument. If not I can try again tomorrow. On a quasi-related note, the Intel site claims that the 7265 is 802.11ac but looking at the output of `iw phy` on the recording device I used there is no reference to `VHT` or 80 MHz. Is a firmware update needed to enable these features? I will check what I see in the capture for now. It'll take me a while. Your 7265 SKU doesn't support 11ac, but only 11n as mentioned in the dmesg. I have spent again a few hours on this trying to sync the tracing output and the air sniffer capture but, the first ping I could find in the trace is this one: ping-3887 [003] 6203.076880: iwlwifi_dev_tx: TX_CMD (0x1c) / seq 0xac1 (188 bytes) From tracing, I can see its sequence number, but this number doesn't appear in the air sniffer capture. One of the last packet I can see in the air sniffer is a deauth which means you disconnected. It feels like you first recorded an air sniffer capture, stopped it, and then ran the ping. I scrutinized the air sniffer capture, but couldn't find anything that could hint at which point you are having your traffic stall. In the firmware dumps, I can see periods of ~10 seconds without traffic, but when I correlated this with the tracing data, it seems like the driver isn't sending any data to the firmware. All that happened before the pings started running which make it hard for me to know if I am look at the actual problem, or we just don't have data to send. Can you please always leave a ping running in the background? Please use the -Q0x4f parameter. This will give a different priority to the ping packet which will make it easier for me to identify in the encrypted sniffer capture. I have the feeling that having the ping running in the background all the time will make the problem disappear but we'll see. I am sorry to make you work so hard, but I have spent quite a bit of time on this, but since I reproduce the problem on my end, I don't really have much choice. One thing I'd like to try also is the following (examples taken from my system): cat /sys/kernel/debug/ieee80211/phy0/netdev\:wlp2s0/stations/00\:90\:8f\:81\:3f\:3c/agg_status next dialog_token: 0x1 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 0 0x00 000 01 0 0x00 0x000 0 0x00 000 02 0 0x00 0x000 0 0x00 000 ... This means that Rx AMPDU is open on TID 0 (which is used for pretty much everything unless you use -Q parameter to ping e.g.). # echo tx start 0 > /sys/kernel/debug/ieee80211/phy0/netdev\:wlp2s0/stations/00\:90\:8f\:81\:3f\:3c/agg_status # cat /sys/kernel/debug/ieee80211/phy0/netdev\:wlp2s0/stations/00\:90\:8f\:81\:3f\:3c/agg_status next dialog_token: 0x3 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 1 0x02 000 Now I started A-MPDU on TX on TID 0. Now you can ping and see that everything goes fine. As long as you have pings flowing, we shouldn't stop the A-MPDU automatically. But you can do so manually: # echo tx stop 0 > /sys/kernel/debug/ieee80211/phy0/netdev\:wlp2s0/stations/00\:90\:8f\:81\:3f\:3c/agg_status # cat /sys/kernel/debug/ieee80211/phy0/netdev\:wlp2s0/stations/00\:90\:8f\:81\:3f\:3c/agg_status next dialog_token: 0x4 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 0 0x00 000 I suspect that your AP doesn't like us to stop A-MPDU while we have traffic flowing. Can you please check this sequence with normal pings (no -Q parameter) on your system? If we have a easy and short way to reproduce it will help greatly. Thanks! The sequence in which I ran the data collection steps was this: 1. Start air sniffer 2. Start trace-cmd 3. Wait for problem to begin 4. Begin pings to confirm it is not just slow DNS query 5. Dump firmware approximately every 10 seconds 6. Recovery happens 7. Take 1-2 more firmware dumps for good measure 6. Stop trace-cmd 7. Stop air-sniffer I had previously observed that having constant pings in the background greatly lowers the frequency of the issue occurring, but I couldn't confirm it with certainty so I didn't mention it in case it was a red-herring. I agree that a deterministic set of steps to reproduce would be amazingly helpful, as this normally takes a good chunk of my night to successfully record an event. Prior to starting this experiment my network traffic was pretty much just idle. I was not recording for this experiment, but it seems to have worked so I will repeat it with trace-cmd and an air sniffer soon. ``` $ sudo cat /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status next dialog_token: 0x3 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 0 0x00 000 01 0 0x00 0x000 0 0x00 000 02 0 0x00 0x000 0 0x00 000 03 0 0x00 0x000 0 0x00 000 04 0 0x00 0x000 0 0x00 000 05 0 0x00 0x000 0 0x00 000 06 1 0x01 0x002 0 0x00 000 07 0 0x00 0x000 0 0x00 000 08 0 0x00 0x000 0 0x00 000 09 0 0x00 0x000 0 0x00 000 10 0 0x00 0x000 0 0x00 000 11 0 0x00 0x000 0 0x00 000 12 0 0x00 0x000 0 0x00 000 13 0 0x00 0x000 0 0x00 000 14 0 0x00 0x000 0 0x00 000 15 0 0x00 0x000 0 0x00 000 ``` I began pinging with the command: `ping 10.0.0.1 -Q0x4f` ``` sudo cat /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status next dialog_token: 0x3 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 0 0x00 000 01 1 0x01 0x007 0 0x00 000 02 0 0x00 0x000 0 0x00 000 03 0 0x00 0x000 0 0x00 000 04 0 0x00 0x000 0 0x00 000 05 0 0x00 0x000 0 0x00 000 06 1 0x01 0x002 0 0x00 000 07 0 0x00 0x000 0 0x00 000 08 0 0x00 0x000 0 0x00 000 09 0 0x00 0x000 0 0x00 000 10 0 0x00 0x000 0 0x00 000 11 0 0x00 0x000 0 0x00 000 12 0 0x00 0x000 0 0x00 000 13 0 0x00 0x000 0 0x00 000 14 0 0x00 0x000 0 0x00 000 15 0 0x00 0x000 0 0x00 000 ``` I began aggregation on tx: ``` $ echo tx start 0 | sudo tee /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status tx start 0 $ sudo cat /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status next dialog_token: 0x4 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 0 0x00 000 01 1 0x01 0x007 0 0x00 000 02 0 0x00 0x000 0 0x00 000 03 0 0x00 0x000 0 0x00 000 04 0 0x00 0x000 0 0x00 000 05 0 0x00 0x000 0 0x00 000 06 1 0x01 0x002 0 0x00 000 07 0 0x00 0x000 0 0x00 000 08 0 0x00 0x000 0 0x00 000 09 0 0x00 0x000 0 0x00 000 10 0 0x00 0x000 0 0x00 000 11 0 0x00 0x000 0 0x00 000 12 0 0x00 0x000 0 0x00 000 13 0 0x00 0x000 0 0x00 000 14 0 0x00 0x000 0 0x00 000 15 0 0x00 0x000 0 0x00 000 ``` No change, so I tried it again: ``` $ echo tx start 0 | sudo tee /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status tx start 0 tee: '/sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status': Resource temporarily unavailable ``` Now it seems to have worked: ``` $ sudo cat /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status next dialog_token: 0x6 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 1 0x04 000 01 1 0x01 0x007 0 0x00 000 02 0 0x00 0x000 1 0x05 000 03 0 0x00 0x000 0 0x00 000 04 0 0x00 0x000 0 0x00 000 05 0 0x00 0x000 0 0x00 000 06 1 0x01 0x002 0 0x00 000 07 0 0x00 0x000 0 0x00 000 08 0 0x00 0x000 0 0x00 000 09 0 0x00 0x000 0 0x00 000 10 0 0x00 0x000 0 0x00 000 11 0 0x00 0x000 0 0x00 000 12 0 0x00 0x000 0 0x00 000 13 0 0x00 0x000 0 0x00 000 14 0 0x00 0x000 0 0x00 000 15 0 0x00 0x000 0 0x00 000 ``` So I stopped it: ``` $ echo tx stop 0 | sudo tee /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status tx stop 0 $ sudo cat /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status next dialog_token: 0x7 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 1 0x06 000 01 1 0x01 0x007 0 0x00 000 02 0 0x00 0x000 1 0x05 000 03 0 0x00 0x000 0 0x00 000 04 0 0x00 0x000 0 0x00 000 05 0 0x00 0x000 0 0x00 000 06 1 0x01 0x002 0 0x00 000 07 0 0x00 0x000 0 0x00 000 08 0 0x00 0x000 0 0x00 000 09 0 0x00 0x000 0 0x00 000 10 0 0x00 0x000 0 0x00 000 11 0 0x00 0x000 0 0x00 000 12 0 0x00 0x000 0 0x00 000 13 0 0x00 0x000 0 0x00 000 14 0 0x00 0x000 0 0x00 000 15 0 0x00 0x000 0 0x00 000 ``` Second time worked. (In retrospect, I think it takes a few seconds to take effect and I was being impatient). ``` $ echo tx stop 0 | sudo tee /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status tx stop 0 $ sudo cat /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status next dialog_token: 0x7 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 0 0x00 000 01 1 0x01 0x007 0 0x00 000 02 0 0x00 0x000 1 0x05 000 03 0 0x00 0x000 0 0x00 000 04 0 0x00 0x000 0 0x00 000 05 0 0x00 0x000 0 0x00 000 06 1 0x01 0x002 0 0x00 000 07 0 0x00 0x000 0 0x00 000 08 0 0x00 0x000 0 0x00 000 09 0 0x00 0x000 0 0x00 000 10 0 0x00 0x000 0 0x00 000 11 0 0x00 0x000 0 0x00 000 12 0 0x00 0x000 0 0x00 000 13 0 0x00 0x000 0 0x00 000 14 0 0x00 0x000 0 0x00 000 15 0 0x00 0x000 0 0x00 000 ``` I noticed this 'blip' in my pings: ``` 64 bytes from 10.0.0.1: icmp_seq=264 ttl=64 time=5.13 ms 64 bytes from 10.0.0.1: icmp_seq=265 ttl=64 time=3.57 ms 64 bytes from 10.0.0.1: icmp_seq=266 ttl=64 time=4.89 ms 64 bytes from 10.0.0.1: icmp_seq=267 ttl=64 time=4.61 ms <<<< 64 bytes from 10.0.0.1: icmp_seq=271 ttl=64 time=1018 ms <<<< 64 bytes from 10.0.0.1: icmp_seq=272 ttl=64 time=5.62 ms 64 bytes from 10.0.0.1: icmp_seq=273 ttl=64 time=3.27 ms ``` And the aggregation turned back on automatically after that: ``` $ sudo cat /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status next dialog_token: 0x8 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 1 0x07 000 01 1 0x01 0x007 0 0x00 000 02 0 0x00 0x000 1 0x05 000 03 0 0x00 0x000 0 0x00 000 04 0 0x00 0x000 0 0x00 000 05 0 0x00 0x000 0 0x00 000 06 1 0x01 0x002 0 0x00 000 07 0 0x00 0x000 0 0x00 000 08 0 0x00 0x000 0 0x00 000 09 0 0x00 0x000 0 0x00 000 10 0 0x00 0x000 0 0x00 000 11 0 0x00 0x000 0 0x00 000 12 0 0x00 0x000 0 0x00 000 13 0 0x00 0x000 0 0x00 000 14 0 0x00 0x000 0 0x00 000 15 0 0x00 0x000 0 0x00 000 ``` So I stopped it again: ``` $ echo tx stop 0 | sudo tee /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status tx stop 0 $ sudo cat /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status next dialog_token: 0x8 TID RX DTKN SSN TX DTKN pending 00 1 0x01 0x000 0 0x00 000 01 1 0x01 0x007 0 0x00 000 02 0 0x00 0x000 1 0x05 000 03 0 0x00 0x000 0 0x00 000 04 0 0x00 0x000 0 0x00 000 05 0 0x00 0x000 0 0x00 000 06 1 0x01 0x002 0 0x00 000 07 0 0x00 0x000 0 0x00 000 08 0 0x00 0x000 0 0x00 000 09 0 0x00 0x000 0 0x00 000 10 0 0x00 0x000 0 0x00 000 11 0 0x00 0x000 0 0x00 000 12 0 0x00 0x000 0 0x00 000 13 0 0x00 0x000 0 0x00 000 14 0 0x00 0x000 0 0x00 000 15 0 0x00 0x000 0 0x00 000 ``` And a few seconds later there was another aberration in the pings: ``` 64 bytes from 10.0.0.1: icmp_seq=387 ttl=64 time=5.08 ms 64 bytes from 10.0.0.1: icmp_seq=388 ttl=64 time=1032 ms 64 bytes from 10.0.0.1: icmp_seq=389 ttl=64 time=27.6 ms 64 bytes from 10.0.0.1: icmp_seq=390 ttl=64 time=3.77 ms 64 bytes from 10.0.0.1: icmp_seq=391 ttl=64 time=3.81 ms From 10.0.0.12 icmp_seq=398 Destination Host Unreachable From 10.0.0.12 icmp_seq=399 Destination Host Unreachable From 10.0.0.12 icmp_seq=400 Destination Host Unreachable 64 bytes from 10.0.0.1: icmp_seq=401 ttl=64 time=2031 ms 64 bytes from 10.0.0.1: icmp_seq=402 ttl=64 time=1018 ms 64 bytes from 10.0.0.1: icmp_seq=403 ttl=64 time=5.05 ms 64 bytes from 10.0.0.1: icmp_seq=404 ttl=64 time=4.80 ms ``` The last two nights have not played out as planned. I will get a recording for you ASAP, but I don't know exactly when that will be. What do you mean? What happened? I am sorry I haven't got back to you yet. Was planning to do so today I had promised you in my last update that I'd get a recording of the behavior seen when enabling / disabling the transmit aggregation. My plan was to get that recording last night but this is actually the first I've gotten any free moments since then. That's fine. No worries. I'll send exact instructions on how to do so. My plan is to get a minimal are sniffer capture of that for 2 reasons: 1) to share with our system guys so that we can try to rely less on stopping ampdu. 2) to check that the air sniffer capture is really a smoking gun before sending this to the AP manufacturer But I can't write the instructions on my phone on my way to work;) Getting back to you on this. As I said in my previous comment, I'd like to get a nice sniffer capture, now that we know how to reproduce the problem at will. First, I think that capture this sniffer capture without security will make it much easier to understand, but we are talking about ping traffic only. This will allow to see the ping request / reply and it is much easier to see (and proof) what packet should get a reply. Second, note that when you -Q parameter to ping, it changes the TID (traffic ID which corresponds to TOS) used for the ping. When you do "echo tx start 0", you open A-MPDU on TID 0. So we need to stop using -Q parameter on ping for this test. I asked for it because it makes it easier to identify the ping packet when I read an encrypted sniffer capture. But if the sniffer capture is clear (no security), then I can see the payload of all the packets, and I don't need the -Q parameter. Please close all the other networking related apps. Here are the steps. Check that A-MPDU is not on for TID 0. If it is, you can simply do: echo tx stop 0 | sudo tee /sys/kernel/debug/ieee80211/phy0/netdev:wlp4s0/stations/14:91:82:72:7c:1f/agg_status Then, 1) start the air sniffer. 2) ping <IP of the AP> 3) wait 5 seconds and leave the ping command running 3) echo tx start 0 | sudo tee /sys/... 4) wait 5 seconds 5) echo tx stop 0 | sudo tee /sys/... 6) wait 5 seconds <== here we expect a hiccup in the pings 7) stop the sniffer I'll need the air sniffer capture .pcap file and the output of the ping commend. If you want to delete certain packets from the sniffer capture because it was captured without encryption, please feel free to do so, but let me know that you did so. I'll need the ping req / reply and all the wifi management packets. Thanks! I'll come up with a patch that completely disables usage of stopping A-MPDU in our driver, but I won't be able to merge this in our code base. I need to take this issue with our system guys. We also need to involve your AP manufacturer, but let's look at the sniffer capture first to make sure that the AP is in fault here. I have three captures for you. The first two, trace-cmd-10 and trace-cmd-11, did not exhibit the problem. I did my best to keep those traces clean of all traffic except pings. For trace-cmd-10 I followed your procedure verbatim. For trace-cmd-11 I repeated the cycle of turning aggregation on and off trying to coerce the issue to occur. For trace-cmd-12 I added in some minimal web traffic. I refreshed this bug report and a few pages on wireless.wiki.kernel.org. That seemed to have dome something important. About 30 seconds after the last round of refreshing I disabled aggregation and the problem occurred. For trace-cmd-11 and trace-cmd-12, since I was not following the procedure verbatim, there is a file called 'timeline' that recorded each agg_status call I made along with the timestamp. AP Configuration: I re-configured the AP's 5GHz network to have the SSID "WIFILNX-676" and no security. It remains at 40MHz bandwidth and in mixed mode. Recording device configuration: I setup the recording device with the amsdu_size=3 argument. Problematic device configuration: Same state as in Comment 62. https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-10.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-11.tar.gz.gpg https://s3.amazonaws.com/maberry-temporary/2017-06-04/trace-cmd-12.tar.gz.gpg Hey, I was about to update this bug as well :) We had a meeting with the system guys and we agreed to move away from closing the A-MPDU session. We will proceed to implementation this week. So I hope that I'll get back to you soon with a patch to test. I'll look at your captures tomorrow or Monday. Thanks! I have new findings based on the sniffer capture. Clearly the AP is not answering our ARP requests anymore. At some point, we give up. It is very late here, but I'll write a detailed description later. Again quality time with Wireshark on your data :) This is smoking gun... You are welcome to send the description below to the AP manufacturer: * The BA agreement is opened on packet #12589 / #12617 * The BA agreement is torn down on packet #16113 The last DATA packet that was sent before the delBA in #16113 was packet #16090 and its sequence number was 2151. It was properly ACK'ed by the AP. Then the Intel device is sending packet packet with sequence 2152 at #16131, which is ACK'ed, the AP doesn't reply with the ping response. This behavior continues until we open A-MPDU again in #17083 / #17087. Then, the Intel device sends an ARP request (#17107) which is properly propagated to the BSS since it was transmitted to the BCAST address (#17109). Finally, the AP starts to react to Intel device's packet again by replying to the ARP at packet #17114. From this point on, the pings are working again. This is clearly a bad behavior from the AP side. As I said, Intel will change its behavior to avoid sending delBAs and that should stop confusing the AP, but the AP manufacturer should fix the bug. I am leaving this bug open until we send the patch to change our behavior. Phew... this was a long story... I have updated our internal ticket to remove the initiator delba frames that we send. At this point, we have no more use of all the awesome data you have provided. As far as Intel is concerned, you can remove them from the website. I strongly encourage you to send this data to the AP manufacturer along with my analysis. Feel free to let us know what they reply. This is wonderful news, both that you were able to positively root-cause the issue and that there is a path to work around the issue while Linksys/Belkin resolves it on their end. I've begun the process of informing them of our findings. The case number on their side is 08501094. I've also fully upgraded the router's firmware to revision 1.1.5.178908 and can confirm the issue is still present and reproducible via the same steps listed above. Yeah. I am sorry it took 6 months from the date you opened the ticket, but this was definitely a tough one especially because these flows are really not the ones we would suspect for such an issue. This kind of bug is always hard to narrow down: ping loss, etc... A WARNING, or a firmware assert are of course easier to fix... Without your patience and the time you spent of gathering high quality data, we could make it... Created attachment 257951 [details]
do not send delBA
Hi,
as promised, here is the patch that changes our behavior to not send delBAs which is what confuses your AP.
Please let me know how it goes.
Thanks.
Hi, the patch has been merged in the master branch of our internal tree: https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/ This is a backport based driver. The commit of the patch is afd2863606420734857c48a13253e238495e7dda Please let us know if the bug is now fixed for you. Thank you. Any news? The patch is on his way to 4.14. Last weekend was eclipse weekend. Went on a roadtrip and so I couldn't test it. I may test this weekend, depends when Linksys gets back to me and if they want more tests. Great. Let me know when we can close this issue. I am now considering this as fixed. I am closing the bug. I'd love to hear from you if you want to send any update. I'll still be notified about any comment you may add. Thank you! Just wanted to drop by to let you know I've reached level 3 Linksys support and am in contact with an engineer who is looking at the last capture. Wow... this is fascinating really :) Keep me posted :) He may have questions about the capture, feel free to route his questions to me. Our W/A has been merged and sent upstream though. So, what did they say? I am asking before I'd love to know if they fix the bug or if they claim that the bug in on Intel side. Thanks. Well the good news is that they did not attempt to blame the bug on Intel. The bad news is that their solution to the bug probably won't scale very well. About 2 weeks ago they sent me a replacement router and the replacement (which is not the same model) doesn't have the same behavior. So for anyone else reading this who has the Linksys EA8500 and Emmanuel's patch is not helping, my recommendation is to contact Linksys support and link them to this bug report. It took me 2.5 months to get in touch with an engineer, but they initially seemed very interested in solving the problem. Ok. Thanks for letting us know. The most important here is that we could understand what went wrong and that you can now have a good connectivity with Intel's device. It was fun to debug this... and for the posterity, Intel Wifi on Linux stopped sending the legit frame that it used to send and that seemed to be confusing that AP. |
Created attachment 254927 [details] wpa.log with debugging on. Exhibits a 15+ period of no access after which one webpage loaded and all subsequent access was lost I've been struggling to get an Intel 8260 Wireless card to provide a reliable network connection on a new laptop. It will work fine for 5-10 minutes and then I will suddenly be unable to send/receive any network traffic. The card still believes it is associated with the AP, but no traffic is flowing. Sometimes the only way to get back online is to bring the card down with '/usr/lib/network/ network stop ProfileName' and then bringing it back up again. The timing is random and can be as low as a minute or as long as 15-20 minutes. This happens whether I am plugged in or on battery; using my distribution's iwlwifi or using the Core24 backports driver; using kernel 4.9.6 or 4.9.11. Symptoms seem to have some correlation with the Access Point being connected to. The problem is most frequent when connecting to a Linksys EA-8500 5 GHz N network with WPA2-PSK. The problem is completely absent when connecting to a Cisco E3000 5 GHz N Network also with WPA2-PSK. Other wireless devices (including but not limited to Apple and Android phones, Windows laptops, and other Linux laptops) have not experienced this issue but two laptops with the 8260 rev. 3a do. I've taken a couple of steps so far, including turning up the debug logging in wpa_supplicant, disabling power management, and disabling Bluetooth co-existance. None of those improved reliability. The wpa_supplicant logs did provide some additional information. Just before disconnects occur, the log is quiet and then one of two sets of messages appar. One set of messages includes a number of log lines appear that mention "wpa_bss_flush_by_age" [1]. The other appear to be a WPA key-rotation [2]. dmesg is normally quiet during disconnects. When I take the network down, when using the backports driver, I get a few lines in dmesg [3]. Symptom [1] sometimes appears without losing network access. Symptom [2] always causes a loss of network access. Sometimes, but not always, they appear in succession. I should note that I run a 2.4 GHz and a 5 GHz network from the AP, with similar but different names (WI of the FIger vs. WI of the 5ger) but the same passphrase. I'm connecting to the 5GHz band. The only reason I mention it is that this card purportedly supports dual-band, but I have not looked into whether it is usable / automatic on Linux yet. It is also worth noting (but probably should be a separate bug report) that iw_get_basic_config returns non-zero for the the wlp4s0 interface after installing the backports driver. This breaks tools such as iwconfig and Xmobar's Wireless plugin. [1] Symptom "wpa_bss_flush_by_age" (preceded by at least 20 seconds of silence) 1486886349.661134: wlp4s0: BSS: Remove id 1 BSSID 14:91:82:72:7c:1e SSID 'WI of the FIger' due to wpa_bss_flush_by_age 1486886349.661174: wlp4s0: BSS: Remove id 2 BSSID 0c:51:01:e7:3c:94 SSID 'BenderNet' due to wpa_bss_flush_by_age 1486886349.661189: wlp4s0: BSS: Remove id 3 BSSID 18:a6:f7:82:1f:67 SSID 'Trash Fest 2015' due to wpa_bss_flush_by_age 1486886349.661201: wlp4s0: BSS: Remove id 4 BSSID 18:a6:f7:82:1f:66 SSID 'Trash Fest 2016' due to wpa_bss_flush_by_age 1486886349.661213: wlp4s0: BSS: Remove id 5 BSSID 0c:51:01:e7:3c:95 SSID 'BenderNet' due to wpa_bss_flush_by_age 1486886349.661225: wlp4s0: BSS: Remove id 6 BSSID 74:d0:2b:89:e7:b0 SSID 'Fantasia' due to wpa_bss_flush_by_age 1486886349.661237: wlp4s0: BSS: Remove id 7 BSSID f4:f2:6d:70:09:98 SSID 'TP-LINK_0998_5G' due to wpa_bss_flush_by_age 1486886349.661248: wlp4s0: BSS: Remove id 8 BSSID 14:22:db:08:51:c4 SSID '928d' due to wpa_bss_flush_by_age 1486886349.661260: wlp4s0: BSS: Remove id 9 BSSID 74:44:01:3e:32:c3 SSID 'NETGEAR70-5G' due to wpa_bss_flush_by_age 1486886349.661273: wlp4s0: BSS: Remove id 10 BSSID 58:8b:f3:c9:7d:95 SSID 'CenturyLink1775' due to wpa_bss_flush_by_age 1486886349.661285: wlp4s0: BSS: Remove id 11 BSSID 02:1d:d6:32:12:80 SSID '' due to wpa_bss_flush_by_age 1486886349.661309: wlp4s0: BSS: Remove id 12 BSSID 14:22:db:08:51:c6 SSID 'pimpmetro' due to wpa_bss_flush_by_age 1486886349.661328: wlp4s0: BSS: Remove id 13 BSSID ce:35:40:ca:4c:c8 SSID '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' due to wpa_bss_flush_by_age 1486886349.661340: wlp4s0: BSS: Remove id 14 BSSID f4:f2:6d:2c:d8:59 SSID 'Milo_2.4ghz' due to wpa_bss_flush_by_age 1486886349.661351: wlp4s0: BSS: Remove id 15 BSSID 14:22:db:08:51:cc SSID 'pimpmetro Guest' due to wpa_bss_flush_by_age 1486886349.661363: wlp4s0: BSS: Remove id 16 BSSID 74:44:01:3e:32:c4 SSID 'NETGEAR70' due to wpa_bss_flush_by_age 1486886349.661374: wlp4s0: BSS: Remove id 17 BSSID 0c:54:a5:74:d3:79 SSID '' due to wpa_bss_flush_by_age 1486886349.661386: wlp4s0: BSS: Remove id 18 BSSID 84:00:2d:75:a0:88 SSID 'KobeonRetirement' due to wpa_bss_flush_by_age 1486886349.661399: wlp4s0: BSS: Remove id 19 BSSID 04:bf:6d:30:c3:04 SSID 'CenturyLink3662' due to wpa_bss_flush_by_age 1486886349.661411: wlp4s0: BSS: Remove id 20 BSSID 00:25:9c:13:1a:90 SSID 'pimpcity-2' due to wpa_bss_flush_by_age 1486886349.661422: wlp4s0: BSS: Remove id 21 BSSID cc:03:fa:64:68:54 SSID 'HOME-6854' due to wpa_bss_flush_by_age 1486886349.661434: wlp4s0: BSS: Remove id 22 BSSID 2c:30:33:b2:30:03 SSID 'secret sauce' due to wpa_bss_flush_by_age 1486886349.661445: wlp4s0: BSS: Remove id 23 BSSID 24:a0:74:75:e9:bc SSID 'Booster II' due to wpa_bss_flush_by_age 1486886349.661457: wlp4s0: BSS: Remove id 24 BSSID 00:40:10:10:00:03 SSID 'TheMan' due to wpa_bss_flush_by_age 1486886349.661468: wlp4s0: BSS: Remove id 25 BSSID fa:8f:ca:73:31:d0 SSID '' due to wpa_bss_flush_by_age 1486886349.661479: wlp4s0: BSS: Remove id 26 BSSID 14:22:db:08:51:c8 SSID '' due to wpa_bss_flush_by_age 1486886349.661490: wlp4s0: BSS: Remove id 27 BSSID 0c:54:a5:74:d3:7a SSID 'xfinitywifi' due to wpa_bss_flush_by_age 1486886349.661502: wlp4s0: BSS: Remove id 28 BSSID 06:1d:d6:32:12:80 SSID 'xfinitywifi' due to wpa_bss_flush_by_age 1486886349.661514: wlp4s0: BSS: Remove id 29 BSSID 02:25:9c:13:1a:90 SSID 'pimpcity-2-guest' due to wpa_bss_flush_by_age 1486886349.661525: wlp4s0: BSS: Remove id 30 BSSID e6:3e:fc:8d:7c:f0 SSID 'xfinitywifi' due to wpa_bss_flush_by_age 1486886349.661536: wlp4s0: BSS: Remove id 31 BSSID ce:03:fa:64:68:56 SSID 'xfinitywifi' due to wpa_bss_flush_by_age 1486886349.661548: wlp4s0: BSS: Remove id 32 BSSID 00:00:00:00:00:00 SSID '' due to wpa_bss_flush_by_age [2] Symptom "State: GROUP_HANDSHAKE -> COMPLETED" 1488145955.124692: l2_packet_receive: src=14:91:82:72:7c:1f len=131 1488145955.124756: wlp4s0: RX EAPOL from 14:91:82:72:7c:1f 1488145955.124778: RX EAPOL - hexdump(len=131): 02 03 00 7f 02 13 82 00 10 00 00 00 00 00 00 00 03 59 da 17 82 e0 98 02 37 d6 b3 41 83 81 4b 03 82 85 2a 18 af 77 fa 15 a4 f9 74 b4 b7 f6 bd 6a b4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 96 2e 81 37 53 b7 3f 59 be 80 62 65 8e f6 9c e1 00 20 72 47 56 1f 30 a2 1a 5d 62 b9 bd cb 75 8d 08 a9 52 6c 3d 18 8f f3 6c af 0f 8e 69 6a 9e 12 be b8 1488145955.124865: wlp4s0: IEEE 802.1X RX: version=2 type=3 length= 127 1488145955.124874: WPA: RX EAPOL-Key - hexdump(len=131): 02 03 00 7f 02 13 82 00 10 00 00 00 00 00 00 00 03 59 da 17 82 e0 98 02 37 d6 b3 41 83 81 4b 03 82 85 2a 18 af 77 fa 15 a4 f9 74 b4 b7 f6 bd 6a b4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 96 2e 81 37 53 b7 3f 59 be 80 62 65 8e f6 9c e1 00 20 72 47 56 1f 30 a2 1a 5d 62 b9 bd cb 75 8d 08 a9 52 6c 3d 18 8f f3 6c af 0f 8e 69 6a 9e 12 be b8 1488145955.124956: wlp4s0: EAPOL-Key type=2 1488145955.124979: wlp4s0: key_info 0x1382 (ver=2 keyidx=0 rsvd=0 Group Ack MIC Secure Encr) 1488145955.124992: wlp4s0: key_length=16 key_data_length=32 1488145955.125000: replay_counter - hexdump(len=8): 00 00 00 00 00 00 00 03 1488145955.125011: key_nonce - hexdump(len=32): 59 da 17 82 e0 98 02 37 d6 b3 41 83 81 4b 03 82 85 2a 18 af 77 fa 15 a4 f9 74 b4 b7 f6 bd 6a b4 1488145955.125033: key_iv - hexdump(len=16): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1488145955.125048: key_rsc - hexdump(len=8): 00 00 00 00 00 00 00 00 1488145955.125058: key_id (reserved) - hexdump(len=8): 00 00 00 00 00 00 00 00 1488145955.125068: key_mic - hexdump(len=16): 96 2e 81 37 53 b7 3f 59 be 80 62 65 8e f6 9c e1 1488145955.125121: RSN: encrypted key data - hexdump(len=32): 72 47 56 1f 30 a2 1a 5d 62 b9 bd cb 75 8d 08 a9 52 6c 3d 18 8f f3 6c af 0f 8e 69 6a 9e 12 be b8 1488145955.125198: WPA: decrypted EAPOL-Key key data - hexdump(len =24): [REMOVED] 1488145955.125218: wlp4s0: WPA: RX message 1 of Group Key Handshake from 14:91:82:72:7c:1f (ver=2) 1488145955.125227: RSN: msg 1/2 key data - hexdump(len=24): dd 16 00 0f ac 01 01 00 6e 3a 08 4e d2 9d c3 2d 39 85 58 2b 3e 95 8a 51 1488145955.125247: WPA: GTK in EAPOL-Key - hexdump(len=24): [REMOVED] 1488145955.125255: RSN: received GTK in group key handshake - hexdump(len=18): [REMOVED] 1488145955.125266: wlp4s0: State: COMPLETED -> GROUP_HANDSHAKE 1488145955.125279: WPA: Group Key - hexdump(len=16): [REMOVED] 1488145955.125290: wlp4s0: WPA: Installing GTK to the driver (keyidx=1 tx=0 len=16) 1488145955.125299: WPA: RSC - hexdump(len=6): 00 00 00 00 00 00 1488145955.125387: wpa_driver_nl80211_set_key: ifindex=2 (wlp4s0) alg=3 addr=0x5369b9 key_idx=1 set_tx=0 seq_len=6 key_len=16 1488145955.125421: nl80211: KEY_DATA - hexdump(len=16): [REMOVED] 1488145955.125429: nl80211: KEY_SEQ - hexdump(len=6): 00 00 00 00 00 00 1488145955.125438: broadcast key 1488145955.126098: wlp4s0: WPA: Sending EAPOL-Key 2/2 1488145955.126159: WPA: KCK - hexdump(len=16): [REMOVED] 1488145955.126170: WPA: Derived Key MIC - hexdump(len=16): 66 f0 e5 e6 fd 78 51 14 d6 c8 b1 67 9c 80 a1 62 1488145955.126185: WPA: TX EAPOL-Key - hexdump(len=99): 01 03 00 5f 02 03 02 00 00 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 66 f0 e5 e6 fd 78 51 14 d6 c8 b1 67 9c 80 a1 62 00 00 1488145955.126421: wlp4s0: WPA: Group rekeying completed with 14:91:82:72:7c:1f [GTK=CCMP] 1488145955.126437: CTRL-DEBUG: ctrl_sock-sendmsg: sock=13 sndbuf=212992 outq=0 send_len=63 1488145955.126490: CTRL_IFACE monitor sent successfully to /tmp/wpa_ctrl_381-1\x00 1488145955.126502: wlp4s0: Cancelling authentication timeout 1488145955.126516: wlp4s0: State: GROUP_HANDSHAKE -> COMPLETED 1488145955.126530: nl80211: Set rekey offload [3] Symptom "failed to remove key from hardware" when stopping wifi (the number of failed to remove key from hardware messages is variable and ranges from 1 - 2) [ 4217.679104] wlp4s0: Limiting TX power to 27 (30 - 3) dBm as advertised by 14:91:82:72:7c:1f [ 5257.962551] wlp4s0: deauthenticating from 14:91:82:72:7c:1f by local choice (Reason: 3=DEAUTH_LEAVING) [ 5257.983312] iwlwifi 0000:04:00.0: Failed to find station [ 5257.983322] wlp4s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-22) [ 5257.983409] iwlwifi 0000:04:00.0: Failed to find station [ 5257.983414] wlp4s0: failed to remove key (2, ff:ff:ff:ff:ff:ff) from hardware (-22) System Information: - Kernel Linux slate 4.9.6-1-ARCH #1 SMP PREEMPT Thu Jan 26 09:22:26 CET 2017 x86_64 GNU/Linux Linux slate 4.9.11-1-ARCH #1 SMP PREEMPT Sun Feb 19 13:45:52 UTC 2017 x86_64 GNU/Linux - wpa_supplicant invocation (via netctl): wpa_supplicant -dd -f/tmp/wpa.log -t -B -P "$pidfile" -i "$interface" -D "$driver" configuration="$STATE_DIR/wpa_supplicant_$interface.conf - Hardware (on 4.9.6 kernel) *-network description: Wireless interface product: Wireless 8260 vendor: Intel Corporation physical id: 0 bus info: pci@0000:04:00.0 logical name: wlp4s0 version: 3a serial: 34:f3:9a:60:99:24 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless configuration: broadcast=yes driver=iwlwifi driverversion=4.9.6-1-ARCH firmware=22.391740.0 ip=10.0.0.188 latency=0 link=yes multicast=yes wireless=IEEE 802.11 resources: irq:127 memory:f1000000-f1001fff (on 4.9.11 kernel w/o backports) *-network description: Wireless interface product: Wireless 8260 vendor: Intel Corporation physical id: 0 bus info: pci@0000:04:00.0 logical name: wlp4s0 version: 3a serial: 34:f3:9a:60:99:24 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list ethernet physical wireless configuration: broadcast=yes driver=iwlwifi driverversion=4.9.11-1-ARCH firmware=22.391740.0 ip=10.0.0.188 latency=0 link=yes multicast=yes wireless=IEEE 802.11 resources: irq:127 memory:f1000000-f1001fff (on 4.9.11 kernel w/ backports) *-network description: Ethernet interface product: Wireless 8260 vendor: Intel Corporation physical id: 0 bus info: pci@0000:04:00.0 logical name: wlp4s0 version: 3a serial: 34:f3:9a:60:99:24 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list ethernet physical configuration: broadcast=yes driver=iwlwifi driverversion=4.9.11-1-ARCH firmware=27.455470.0 ip=10.0.0.188 latency=0 link=yes multicast=yes resources: irq:127 memory:f1000000-f1001fff - /etc/modprobe.d/iwlwifi.conf options iwlwifi bt_coex_active=0 - /sys/module/iwlwifi/parameters/bt_coex_active N - /sys/module/iwlwifi/parameters/power_save N - /sys/module/iwlwifi/parameters/power_level 0 - /sys/module/iwlwifi/parameters/swcrypto 0 - iwconfig (pre-backports) wlp4s0 IEEE 802.11 ESSID:"WI of the 5ger" Mode:Managed Frequency:5.18 GHz Access Point: 14:91:82:72:7C:1F Bit Rate=400 Mb/s Tx-Power=22 dBm Retry short limit:7 RTS thr:off Fragment thr:off Power Management:off Link Quality=70/70 Signal level=-25 dBm Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0 Tx excessive retries:0 Invalid misc:27 Missed beacon:0 - iwconfig (post-backports) wlp4s0 no wireless extensions. - iw dev wlp4s0 link (post-backports) Connected to 14:91:82:72:7c:1f (on wlp4s0) SSID: WI of the 5ger freq: 5180 RX: 1864340 bytes (2820 packets) TX: 200166 bytes (1419 packets) signal: -28 dBm tx bitrate: 400.0 MBit/s VHT-MCS 9 40MHz short GI VHT-NSS 2 bss flags: short-slot-time dtim period: 1 beacon int: 100 - Firmware (post-backports) /usr/lib/firmware/iwlwifi-8000C-13.ucode /usr/lib/firmware/iwlwifi-8000C-14.ucode /usr/lib/firmware/iwlwifi-8000C-15.ucode /usr/lib/firmware/iwlwifi-8000C-16.ucode /usr/lib/firmware/iwlwifi-8000C-17.ucode /usr/lib/firmware/iwlwifi-8000C-18.ucode /usr/lib/firmware/iwlwifi-8000C-19.ucode /usr/lib/firmware/iwlwifi-8000C-20.ucode /usr/lib/firmware/iwlwifi-8000C-21.ucode /usr/lib/firmware/iwlwifi-8000C-22.ucode /usr/lib/firmware/iwlwifi-8000C-27.ucode - dmesg boot up (pre-backports) [ 8.413762] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 8.413799] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt [ 8.416027] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 8.432892] iTCO_vendor_support: vendor-support=0 [ 8.434157] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 [ 8.434290] iTCO_wdt: Found a Intel PCH TCO device (Version=4, TCOBASE=0x0400) [ 8.434437] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) [ 8.439859] Intel(R) Wireless WiFi driver for Linux [ 8.439861] Copyright(c) 2003- 2015 Intel Corporation [ 8.442327] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-26.ucode failed with error -2 [ 8.442337] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-25.ucode failed with error -2 [ 8.442347] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-24.ucode failed with error -2 [ 8.442356] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-23.ucode failed with error -2 [ 8.448422] iwlwifi 0000:04:00.0: loaded firmware version 22.391740.0 op_mode iwlmvm [ 8.470688] iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208 [ 8.473081] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 8.474178] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 8.609794] ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs' [ 8.611275] iwlwifi 0000:04:00.0 wlp4s0: renamed from wlan0 - dmesg boot up (post-backports) [ 7.823904] i801_smbus 0000:00:1f.4: SPD Write Disable is set [ 7.823936] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt [ 7.826091] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode [ 7.827770] compat: loading out-of-tree module taints kernel. [ 7.827924] Loading modules backported from iwlwifi [ 7.827924] iwlwifi-stack-public:release/LinuxCore24:5784:f94e9dbf [ 7.839236] iTCO_vendor_support: vendor-support=0 [ 7.839718] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 [ 7.839890] iTCO_wdt: Found a Intel PCH TCO device (Version=4, TCOBASE=0x0400) [ 7.840004] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) [ 7.841075] Intel(R) Wireless WiFi driver for Linux [ 7.841075] Copyright(c) 2003- 2015 Intel Corporation [ 7.844367] iwlwifi 0000:04:00.0: Direct firmware load for iwl-dbg-cfg.ini failed with error -2 [ 7.849457] iwlwifi 0000:04:00.0: capa flags index 3 larger than supported by driver [ 7.849842] iwlwifi 0000:04:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm [ 7.865153] iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208 [ 7.866736] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 7.867688] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled [ 8.007505] ieee80211 phy0: Selected rate control algorithm 'iwl-mvm-rs' [ 8.008292] thermal thermal_zone3: failed to read out thermal zone (-5) [ 8.008400] Unable to register with Frequency Manager: -22 [ 8.010746] iwlwifi 0000:04:00.0 wlp4s0: renamed from wlan0