Bug 72601

Summary: iwlwifi 7260: beacon loss which leads to disconnections
Product: Drivers Reporter: Ralf (post+kernel)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: bernhard.cygan, bgamari, chapaswork, dave, florian, ignacio, ilw, linville, marc.collin, mike.cloaked, nmschulte, ronnieandrew92, scott.ashford
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.13 Subsystem:
Regression: No Bisected commit-id:
Attachments: syslog from one of the bursts of connection drops (this happened while I was not even at the machine)
Kernel log from the same instance of the issue
dmesg of disconnects with latest stable kernel and firmware
syslog from the same disconnects as the last dmesg
Kernel log from the issue happening without connection security
syslog from the issue happening without connection security
dmesg after disconnection problems with kernel 3.13.7 and firmware 8
syslog of disconnection problems with firmware 8 and kernel 3.13.7
add threshold to beacon loss
dmesg+syslog of the isue happening with linux 3.14 and power_scheme=1
not able to connect to 5ghz network with kernel 3.14
dmesg.log
disable beacon filtering
trace with 3.14 + patch during a burst
trace.dat with patch + latest supplicant

Description Ralf 2014-03-21 16:33:54 UTC
I am having trouble with the wireless connection of my Laptop (using an Intel 7260). Every now and then (totally unreproducible so far, but usually 2 to 5 times a day), packages will just stop flowing. Sometimes it helps to quit all applications using the network. Sometimes I have to disconnect and reconnect to the wireless, or power the card off and back on. Alternatively I can wait ~10min for the issue to pass by.

Checking with Wireshark what's happening, I can see tons and tons of EAPOL messages, and (almost) nothing else. In the syslog, I can see the following every ~5 seconds, each time followed by a connection re-negotation:
  wlan0: CTRL-EVENT-DISCONNECTED bssid=00:09:5b:96:8a:bc reason=4
According to [0], reason 4 is "WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY" - but the issue also happens when I am browsing, or chatting.

I tried switching the access point, to no avail (the issue is present both with the FritzBox 3270, and the fairly old NetGear WGR614 v2).
I also tried upgrading the firmware from version 22.15.8.0 (shipped with Debian) to 22.24.8.0, which did not help either.

I will attach the syslog and dmesg output from one of these "bursts of connection drops". Here's "lspci -v" about the card:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 73)
        Subsystem: Intel Corporation Dual Band Wireless-AC 7260
        Flags: bus master, fast devsel, latency 0, IRQ 50
        Memory at ddc00000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [40] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number fc-f8-ae-ff-ff-51-88-9e
        Capabilities: [14c] Latency Tolerance Reporting
        Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 <?>
        Kernel driver in use: iwlwifi

This is with the "3.13-1-amd64 #1 SMP Debian 3.13.5-1" kernel from current Debian testing amd64.

[0]: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/include/linux/ieee80211.h
Comment 1 Ralf 2014-03-21 16:36:27 UTC
Created attachment 130171 [details]
syslog from one of the bursts of connection drops (this happened while I was not even at the machine)
Comment 2 Ralf 2014-03-21 16:36:48 UTC
Created attachment 130181 [details]
Kernel log from the same instance of the issue
Comment 3 Emmanuel Grumbach 2014-03-22 18:52:53 UTC
Can you change the security parameters of your AP?
I'd try to disabled security to see what happens, and then to set it up with WPA2 only (AES).

The only thing I am sure about is that this is not related to the Intel NIC.
Comment 4 Ralf 2014-03-22 19:52:20 UTC
The NetGear AP does not support WPA2, but I could try unencrypted mode of course.
The FritzBox runs with WPA2 only.

> The only thing I am sure about is that this is not related to the Intel NIC.
Well, I had an Atheros chip in the same laptop before, and no such problems with the FritzBox.
But then, with the AP at my other place (which I cannot test currently), the issue is at least much less annoying - I would have to check again to see what's in the logs.
Comment 5 Ignacio Huerta 2014-03-29 16:15:52 UTC
I am experimenting the exact same issues with Intel 7260 and a Netgear CG3100D-RG AP. I have tried:
- Different security settings (WEP, WPA-PSK, WPA2-PSK, disabled)
- Disabling wifi N both in the AP and in the module (11n_disable=1)
- Additional module settings (iwlmvm power_scheme=1 / iwlwifi bt_coex_active=N swcrypto=1)

But the problem still happens: a few times a day the connection drops and reconnects immediately, losing lots of packets during some minutes, making the connection unusable.

The card is very similar to the original reporter, but a different revision:

03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 6b)
	Subsystem: Intel Corporation Wireless-N 7260
	Flags: bus master, fast devsel, latency 0, IRQ 46
	Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [c8] Power Management version 3
	Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [40] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 5c-51-4f-ff-ff-c9-60-49
	Capabilities: [14c] Latency Tolerance Reporting
	Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 <?>
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi

My kernel version is 3.13.1-031301-generic. Is there anything I can do to help debug this problem?
Comment 6 Ignacio Huerta 2014-03-29 16:29:57 UTC
Ok, I just realised that for kernel 3.13 there is a newer firmware available (iwlwifi-7260-8.ucode). The connection problems are still there, but now I'm getting "Microcode SW error detected" when they happen:


Mar 29 17:25:49 cherokee2 kernel: [  312.102815] iwlwifi 0000:03:00.0: Microcode SW error detected.  Restarting 0x2000000.
Mar 29 17:25:49 cherokee2 kernel: [  312.102819] iwlwifi 0000:03:00.0: CSR values:
Mar 29 17:25:49 cherokee2 kernel: [  312.102821] iwlwifi 0000:03:00.0: (2nd byte of CSR_INT_COALESCING is CSR_INT_PERIODIC_REG)
Mar 29 17:25:49 cherokee2 kernel: [  312.102829] iwlwifi 0000:03:00.0:        CSR_HW_IF_CONFIG_REG: 0X00489204
Mar 29 17:25:49 cherokee2 kernel: [  312.102842] iwlwifi 0000:03:00.0:          CSR_INT_COALESCING: 0X8000ff40
Mar 29 17:25:49 cherokee2 kernel: [  312.102853] iwlwifi 0000:03:00.0:                     CSR_INT: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.102866] iwlwifi 0000:03:00.0:                CSR_INT_MASK: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.102880] iwlwifi 0000:03:00.0:           CSR_FH_INT_STATUS: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.102894] iwlwifi 0000:03:00.0:                 CSR_GPIO_IN: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.102908] iwlwifi 0000:03:00.0:                   CSR_RESET: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.102922] iwlwifi 0000:03:00.0:                CSR_GP_CNTRL: 0X080403c5
Mar 29 17:25:49 cherokee2 kernel: [  312.102936] iwlwifi 0000:03:00.0:                  CSR_HW_REV: 0X00000144
Mar 29 17:25:49 cherokee2 kernel: [  312.102950] iwlwifi 0000:03:00.0:              CSR_EEPROM_REG: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.102963] iwlwifi 0000:03:00.0:               CSR_EEPROM_GP: 0X80000000
Mar 29 17:25:49 cherokee2 kernel: [  312.102977] iwlwifi 0000:03:00.0:              CSR_OTP_GP_REG: 0X803a0000
Mar 29 17:25:49 cherokee2 kernel: [  312.102991] iwlwifi 0000:03:00.0:                 CSR_GIO_REG: 0X00080042
Mar 29 17:25:49 cherokee2 kernel: [  312.103005] iwlwifi 0000:03:00.0:            CSR_GP_UCODE_REG: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.103019] iwlwifi 0000:03:00.0:           CSR_GP_DRIVER_REG: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.103033] iwlwifi 0000:03:00.0:           CSR_UCODE_DRV_GP1: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.103046] iwlwifi 0000:03:00.0:           CSR_UCODE_DRV_GP2: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.103060] iwlwifi 0000:03:00.0:                 CSR_LED_REG: 0X00000060
Mar 29 17:25:49 cherokee2 kernel: [  312.103074] iwlwifi 0000:03:00.0:        CSR_DRAM_INT_TBL_REG: 0X88402d4c
Mar 29 17:25:49 cherokee2 kernel: [  312.103088] iwlwifi 0000:03:00.0:        CSR_GIO_CHICKEN_BITS: 0X27800200
Mar 29 17:25:49 cherokee2 kernel: [  312.103102] iwlwifi 0000:03:00.0:             CSR_ANA_PLL_CFG: 0Xd55555d5
Mar 29 17:25:49 cherokee2 kernel: [  312.103116] iwlwifi 0000:03:00.0:           CSR_HW_REV_WA_REG: 0X0001001a
Mar 29 17:25:49 cherokee2 kernel: [  312.103130] iwlwifi 0000:03:00.0:        CSR_DBG_HPET_MEM_REG: 0Xffff0000
Mar 29 17:25:49 cherokee2 kernel: [  312.103131] iwlwifi 0000:03:00.0: FH register values:
Mar 29 17:25:49 cherokee2 kernel: [  312.103153] iwlwifi 0000:03:00.0:         FH_RSCSR_CHNL0_STTS_WPTR_REG: 0X35feb800
Mar 29 17:25:49 cherokee2 kernel: [  312.103167] iwlwifi 0000:03:00.0:        FH_RSCSR_CHNL0_RBDCB_BASE_REG: 0X03c06570
Mar 29 17:25:49 cherokee2 kernel: [  312.103181] iwlwifi 0000:03:00.0:                  FH_RSCSR_CHNL0_WPTR: 0X000000d0
Mar 29 17:25:49 cherokee2 kernel: [  312.103195] iwlwifi 0000:03:00.0:         FH_MEM_RCSR_CHNL0_CONFIG_REG: 0X80801114
Mar 29 17:25:49 cherokee2 kernel: [  312.103209] iwlwifi 0000:03:00.0:          FH_MEM_RSSR_SHARED_CTRL_REG: 0X000000fc
Mar 29 17:25:49 cherokee2 kernel: [  312.103222] iwlwifi 0000:03:00.0:            FH_MEM_RSSR_RX_STATUS_REG: 0X07030000
Mar 29 17:25:49 cherokee2 kernel: [  312.103236] iwlwifi 0000:03:00.0:    FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.103250] iwlwifi 0000:03:00.0:                FH_TSSR_TX_STATUS_REG: 0X07fb0001
Mar 29 17:25:49 cherokee2 kernel: [  312.103264] iwlwifi 0000:03:00.0:                 FH_TSSR_TX_ERROR_REG: 0X00000000
Mar 29 17:25:49 cherokee2 kernel: [  312.103280] iwlwifi 0000:03:00.0: FW error in SYNC CMD SCAN_REQUEST_CMD
Mar 29 17:25:49 cherokee2 kernel: [  312.103284] CPU: 0 PID: 312 Comm: kworker/u16:4 Tainted: G        W    3.13.1-031301-generic #201401291035
Mar 29 17:25:49 cherokee2 kernel: [  312.103286] Hardware name: LENOVO 20AN0069US/20AN0069US, BIOS GLET43WW (1.18 ) 12/04/2013
Mar 29 17:25:49 cherokee2 kernel: [  312.103301] Workqueue: phy7 ieee80211_scan_work [mac80211]
Mar 29 17:25:49 cherokee2 kernel: [  312.103303]  ffff88035d107218 ffff880401117b58 ffffffff8173356d 0000000000000000
Mar 29 17:25:49 cherokee2 kernel: [  312.103306]  ffff88035d104000 ffff880401117be8 ffffffffa0417150 00000000000000a0
Mar 29 17:25:49 cherokee2 kernel: [  312.103308]  0000000000000019 0000001300000000 0000000000000019 0000000000000000
Mar 29 17:25:49 cherokee2 kernel: [  312.103311] Call Trace:
Mar 29 17:25:49 cherokee2 kernel: [  312.103316]  [<ffffffff8173356d>] dump_stack+0x46/0x58
Mar 29 17:25:49 cherokee2 kernel: [  312.103324]  [<ffffffffa0417150>] iwl_pcie_send_hcmd_sync+0x580/0x590 [iwlwifi]
Mar 29 17:25:49 cherokee2 kernel: [  312.103328]  [<ffffffff810ad7d0>] ? __wake_up_sync+0x20/0x20
Mar 29 17:25:49 cherokee2 kernel: [  312.103333]  [<ffffffffa041858a>] iwl_trans_pcie_send_hcmd+0x2a/0x80 [iwlwifi]
Mar 29 17:25:49 cherokee2 kernel: [  312.103339]  [<ffffffffa058fe46>] iwl_mvm_send_cmd_status+0x56/0x170 [iwlmvm]
Mar 29 17:25:49 cherokee2 kernel: [  312.103344]  [<ffffffffa059681a>] iwl_mvm_scan_request+0x36a/0x460 [iwlmvm]
Mar 29 17:25:49 cherokee2 kernel: [  312.103347]  [<ffffffffa058ae53>] iwl_mvm_mac_hw_scan+0x93/0xa0 [iwlmvm]
Mar 29 17:25:49 cherokee2 kernel: [  312.103355]  [<ffffffffa04c2d5d>] __ieee80211_scan_completed+0x1bd/0x330 [mac80211]
Mar 29 17:25:49 cherokee2 kernel: [  312.103358]  [<ffffffff8109fe2a>] ? arch_vtime_task_switch+0x8a/0x90
Mar 29 17:25:49 cherokee2 kernel: [  312.103364]  [<ffffffffa04c3ac2>] ieee80211_scan_work+0xe2/0x250 [mac80211]
Mar 29 17:25:49 cherokee2 kernel: [  312.103367]  [<ffffffff8108443f>] process_one_work+0x17f/0x4c0
Mar 29 17:25:49 cherokee2 kernel: [  312.103369]  [<ffffffff8108566b>] worker_thread+0x11b/0x3d0
Mar 29 17:25:49 cherokee2 kernel: [  312.103371]  [<ffffffff81085550>] ? manage_workers.isra.21+0x190/0x190
Mar 29 17:25:49 cherokee2 kernel: [  312.103374]  [<ffffffff8108c5c9>] kthread+0xc9/0xe0
Mar 29 17:25:49 cherokee2 kernel: [  312.103376]  [<ffffffff8108c500>] ? flush_kthread_worker+0xb0/0xb0
Mar 29 17:25:49 cherokee2 kernel: [  312.103379]  [<ffffffff817489bc>] ret_from_fork+0x7c/0xb0
Mar 29 17:25:49 cherokee2 kernel: [  312.103381]  [<ffffffff8108c500>] ? flush_kthread_worker+0xb0/0xb0
Mar 29 17:25:49 cherokee2 kernel: [  312.103384] iwlwifi 0000:03:00.0: Scan failed! status 0x1 ret -5
Mar 29 17:25:49 cherokee2 kernel: [  312.103400] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
Mar 29 17:25:49 cherokee2 kernel: [  312.103402] iwlwifi 0000:03:00.0: Status: 0x00000000, count: 6
Mar 29 17:25:49 cherokee2 kernel: [  312.103404] iwlwifi 0000:03:00.0: 0x000014F4 | ADVANCED_SYSASSERT          
Mar 29 17:25:49 cherokee2 kernel: [  312.103406] iwlwifi 0000:03:00.0: 0x000002A0 | uPc
Mar 29 17:25:49 cherokee2 kernel: [  312.103408] iwlwifi 0000:03:00.0: 0x00000000 | branchlink1
Mar 29 17:25:49 cherokee2 kernel: [  312.103410] iwlwifi 0000:03:00.0: 0x00000BEA | branchlink2
Mar 29 17:25:49 cherokee2 kernel: [  312.103411] iwlwifi 0000:03:00.0: 0x00015EB4 | interruptlink1
Mar 29 17:25:49 cherokee2 kernel: [  312.103413] iwlwifi 0000:03:00.0: 0x004ACCD9 | interruptlink2
Mar 29 17:25:49 cherokee2 kernel: [  312.103415] iwlwifi 0000:03:00.0: 0x00000187 | data1
Mar 29 17:25:49 cherokee2 kernel: [  312.103416] iwlwifi 0000:03:00.0: 0x00000024 | data2
Mar 29 17:25:49 cherokee2 kernel: [  312.103418] iwlwifi 0000:03:00.0: 0x000000E0 | data3
Mar 29 17:25:49 cherokee2 kernel: [  312.103420] iwlwifi 0000:03:00.0: 0x61C0F190 | beacon time
Mar 29 17:25:49 cherokee2 kernel: [  312.103422] iwlwifi 0000:03:00.0: 0x51BD5E89 | tsf low
Mar 29 17:25:49 cherokee2 kernel: [  312.103423] iwlwifi 0000:03:00.0: 0x00000004 | tsf hi
Mar 29 17:25:49 cherokee2 kernel: [  312.103425] iwlwifi 0000:03:00.0: 0x00000000 | time gp1
Mar 29 17:25:49 cherokee2 kernel: [  312.103427] iwlwifi 0000:03:00.0: 0x0262D2CB | time gp2
Mar 29 17:25:49 cherokee2 kernel: [  312.103428] iwlwifi 0000:03:00.0: 0x00000000 | time gp3
Mar 29 17:25:49 cherokee2 kernel: [  312.103430] iwlwifi 0000:03:00.0: 0x00041618 | uCode version
Mar 29 17:25:49 cherokee2 kernel: [  312.103431] iwlwifi 0000:03:00.0: 0x00000144 | hw version
Mar 29 17:25:49 cherokee2 kernel: [  312.103433] iwlwifi 0000:03:00.0: 0x00489204 | board version
Mar 29 17:25:49 cherokee2 kernel: [  312.103435] iwlwifi 0000:03:00.0: 0x09330080 | hcmd
Mar 29 17:25:49 cherokee2 kernel: [  312.103437] iwlwifi 0000:03:00.0: 0x000220C4 | isr0
Mar 29 17:25:49 cherokee2 kernel: [  312.103439] iwlwifi 0000:03:00.0: 0x00000000 | isr1
Mar 29 17:25:49 cherokee2 kernel: [  312.103440] iwlwifi 0000:03:00.0: 0x00000002 | isr2
Mar 29 17:25:49 cherokee2 kernel: [  312.103442] iwlwifi 0000:03:00.0: 0x0041C0C2 | isr3
Mar 29 17:25:49 cherokee2 kernel: [  312.103444] iwlwifi 0000:03:00.0: 0x00000000 | isr4
Mar 29 17:25:49 cherokee2 kernel: [  312.103445] iwlwifi 0000:03:00.0: 0x01080112 | isr_pref
Mar 29 17:25:49 cherokee2 kernel: [  312.103447] iwlwifi 0000:03:00.0: 0x00000000 | wait_event
Mar 29 17:25:49 cherokee2 kernel: [  312.103449] iwlwifi 0000:03:00.0: 0x00000080 | l2p_control
Mar 29 17:25:49 cherokee2 kernel: [  312.103451] iwlwifi 0000:03:00.0: 0x00018020 | l2p_duration
Mar 29 17:25:49 cherokee2 kernel: [  312.103452] iwlwifi 0000:03:00.0: 0x0000003F | l2p_mhvalid
Mar 29 17:25:49 cherokee2 kernel: [  312.103454] iwlwifi 0000:03:00.0: 0x00000080 | l2p_addr_match
Mar 29 17:25:49 cherokee2 kernel: [  312.103456] iwlwifi 0000:03:00.0: 0x00000005 | lmpm_pmg_sel
Mar 29 17:25:49 cherokee2 kernel: [  312.103458] iwlwifi 0000:03:00.0: 0x23021719 | timestamp
Mar 29 17:25:49 cherokee2 kernel: [  312.103460] iwlwifi 0000:03:00.0: 0x0000D0E0 | flow_handler
Mar 29 17:25:49 cherokee2 kernel: [  312.103463] ieee80211 phy7: Hardware restart was requested
Mar 29 17:25:49 cherokee2 kernel: [  312.107141] iwlwifi 0000:03:00.0: Failing on timeout while stopping DMA channel 2 [0x07fb0001]
Mar 29 17:25:49 cherokee2 kernel: [  312.107492] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0S
Mar 29 17:25:49 cherokee2 kernel: [  312.107723] iwlwifi 0000:03:00.0: L1 Enabled; Disabling L0Sl
Comment 7 Emmanuel Grumbach 2014-03-29 18:53:26 UTC
(In reply to Ignacio Huerta from comment #6)
> Ok, I just realised that for kernel 3.13 there is a newer firmware available
> (iwlwifi-7260-8.ucode). The connection problems are still there, but now I'm
> getting "Microcode SW error detected" when they happen:
> 

This is another issue which has been already fixed in a later stable release of 3.13. Please update your kernel.
Comment 8 Ignacio Huerta 2014-03-29 19:54:32 UTC
Thanks you very much Emmanuel. I updated my kernel to 3.13.7 and now that error is gone and my Wifi looks much more stable. That should teach me to install the latest stable kernel before bothering you. Thanks a lot!
Comment 9 Ralf 2014-03-29 20:01:07 UTC
I made another very... curious observation during the last days: The error is very location-dependent. If one of these "events" happens, when I move the laptop by three meters, it immediately stops disconnecting from the wireless. Once I move it back to the table, the disconnects happen again. I already tried removing various electronic devices from around the laptop, to no avail. I am quite at loss, to be honest...

> Ok, I just realised that for kernel 3.13 there is a newer firmware available
> (iwlwifi-7260-8.ucode).
Which is the exact version you are using? "8" here is just the microcode "generation" or how to call it. I am trying version 22.24.8.0 (that's printed by dmesg on module load).

I am currently compiling kernel 3.13.7 to see of this helps. I am not seeing these error messages with the current older kernel (Debian kernel based on 3.13.5) though.
Comment 10 Emmanuel Grumbach 2014-03-30 07:38:57 UTC
22.24.8.0 is the latest firmware available - please use this one.
Comment 11 Ralf 2014-03-30 09:37:03 UTC
Unfortunately, that did not help - with kernel 3.13.7 and firmware 22.24.8.0, the problem just happened again.
Comment 12 Emmanuel Grumbach 2014-03-30 10:14:56 UTC
can you sure again your dmesg?

did you try to disable security as suggested above?
Comment 13 Ralf 2014-03-30 10:39:14 UTC
Created attachment 131021 [details]
dmesg of disconnects with latest stable kernel and firmware

New, fresh dmesg is attached.

I did not yet try to disable encryption, as I experimented with some other things during the week. It's the next option I am pursuing though. I just wonder whether my usual "fgrep" for these error messages is still of any use now, as wpa_supplicant won't be in the loop anymore, will it?
Comment 14 Emmanuel Grumbach 2014-03-30 10:47:13 UTC
from your logs it is actually related to supplicant :)
Need to check though - I am not a supplicant expert at all.
I'll need to ask a few people here.

Can you attach the syslog again?
I see that the prints about VHT / HT being not compatible with TKIP have disappeared.

BTW - don't bother to disable security for now.
Comment 15 Ralf 2014-03-30 11:00:57 UTC
Created attachment 131031 [details]
syslog from the same disconnects as the last dmesg

> Can you attach the syslog again?
Okay.

> I see that the prints about VHT / HT being not compatible with TKIP have
> disappeared.
That's probably due to the different base station - the last logs were with the NetGear, these are with the FritzBox. The NetGear can only do WPA1, the FritzBox does WPA2.

> BTW - don't bother to disable security for now.
Well, I already disabled it (and enabled MAC checking, whatever that helps), so I think I'll just leave it running till the evening. I don't feel good leaving it insecure for too long^^
Comment 16 Ralf 2014-03-30 12:29:39 UTC
I just had exactly the same issue (including the wpa_supplicant error) on a connection without security. Will attach dmesg and syslog shortly.
Comment 17 Ralf 2014-03-30 12:32:46 UTC
Created attachment 131051 [details]
Kernel log from the issue happening without connection security
Comment 18 Ralf 2014-03-30 12:33:13 UTC
Created attachment 131061 [details]
syslog from the issue happening without connection security
Comment 19 Ralf 2014-03-31 12:33:58 UTC
I now also have logs of the issue happening with Linux 3.14 and "iwlmvm power_scheme=1" - let me know if they are interesting for you.
Comment 20 Ignacio Huerta 2014-04-02 12:23:27 UTC
Hi,

With kernel 3.13.7 and the "8" firmware the problem just happened again: it's the first time I notice a connection drop since I updated the kernel, but the symptoms seem to be the same. I also have power_scheme=1 for iwlmvm. Please say if I can provide any additional information.
Comment 21 Emmanuel Grumbach 2014-04-02 12:25:00 UTC
(In reply to Ignacio Huerta from comment #20)
> Hi,
> 
> With kernel 3.13.7 and the "8" firmware the problem just happened again:
> it's the first time I notice a connection drop since I updated the kernel,
> but the symptoms seem to be the same. I also have power_scheme=1 for iwlmvm.
> Please say if I can provide any additional information.

Please attach your dmesg...
Comment 22 Ignacio Huerta 2014-04-02 12:30:53 UTC
Created attachment 131201 [details]
dmesg after disconnection problems with kernel 3.13.7 and firmware 8

Here comes the dmesg.
Comment 23 Emmanuel Grumbach 2014-04-02 12:34:13 UTC
This times it looks like Ralf's issue. Can you please attach your syslog?

Thanks.
Comment 24 Ignacio Huerta 2014-04-02 13:14:40 UTC
Created attachment 131211 [details]
syslog of disconnection problems with firmware 8 and kernel 3.13.7

You are very welcome. I'm attaching my syslog (only the relevant time frame, since before the disconnections until the problem went away a few minutes later).
Comment 25 Emmanuel Grumbach 2014-04-02 13:54:15 UTC
Ok - this looks the exact same issue as Ralf.

Thanks.

Can you run tracing?

sudo trace-cmd -e iwlwifi -e mac80211 -e cfg80211
Comment 26 Ignacio Huerta 2014-04-02 14:48:00 UTC
Yes, I'll try to run tracing the next time it arises. Let's hope it doesn't take too long :).
Comment 27 Ignacio Huerta 2014-04-02 15:52:53 UTC
OK, it happened again and here's the trace: https://www.dropbox.com/s/t80ec54qfw2273j/trace1.dat
Comment 28 Emmanuel Grumbach 2014-04-03 05:34:10 UTC
Created attachment 131311 [details]
add threshold to beacon loss

Can you please test the patch attached?
This patch is included in 3.14, and I can't be sure it'll apply on 3.13.
If you confirm that this patch helps (or that 3.14 fixes the issue), I'll backport this patch to 3.13.

Thanks!
Comment 29 Emmanuel Grumbach 2014-04-03 05:35:21 UTC
Oh, I just noticed that Ralf reproduced the issue with 3.14....
Ok... I guess I'll need traces from him too.
Comment 30 Emmanuel Grumbach 2014-04-03 07:58:49 UTC
(In reply to Ignacio Huerta from comment #27)
> OK, it happened again and here's the trace:
> https://www.dropbox.com/s/t80ec54qfw2273j/trace1.dat


you seem to have sw_crypto enabled - can you disable this please?
Thanks.
Comment 31 Mike Cloaked 2014-04-03 08:44:38 UTC
There seems to be some problem about whether or not the firmware microcode gets into the iwlwifi module. Running arch linux with kernel 3.13.18 the problems shows in the following:

# pacman -Ss linux-firmware
core/linux-firmware 20140316.dec41bc-1 [installed]
Firmware files for Linux
# ls /lib/firmware/iwlwifi*
/lib/firmware/iwlwifi-1000-3.ucode /lib/firmware/iwlwifi-5000-2.ucode
/lib/firmware/iwlwifi-1000-5.ucode /lib/firmware/iwlwifi-5000-5.ucode
/lib/firmware/iwlwifi-100-5.ucode /lib/firmware/iwlwifi-5150-2.ucode
/lib/firmware/iwlwifi-105-6.ucode /lib/firmware/iwlwifi-6000-4.ucode
/lib/firmware/iwlwifi-135-6.ucode /lib/firmware/iwlwifi-6000g2a-5.ucode
/lib/firmware/iwlwifi-2000-6.ucode /lib/firmware/iwlwifi-6000g2a-6.ucode
/lib/firmware/iwlwifi-2030-6.ucode /lib/firmware/iwlwifi-6000g2b-5.ucode
/lib/firmware/iwlwifi-3160-7.ucode /lib/firmware/iwlwifi-6000g2b-6.ucode
/lib/firmware/iwlwifi-3160-8.ucode /lib/firmware/iwlwifi-6050-4.ucode
/lib/firmware/iwlwifi-3945-2.ucode /lib/firmware/iwlwifi-6050-5.ucode
/lib/firmware/iwlwifi-4965-2.ucode /lib/firmware/iwlwifi-7260-7.ucode
/lib/firmware/iwlwifi-5000-1.ucode /lib/firmware/iwlwifi-7260-8.ucode
# modinfo iwlwifi | grep firmware
firmware: iwlwifi-100-5.ucode
firmware: iwlwifi-1000-5.ucode
firmware: iwlwifi-135-6.ucode
firmware: iwlwifi-105-6.ucode
firmware: iwlwifi-2030-6.ucode
firmware: iwlwifi-2000-6.ucode
firmware: iwlwifi-5150-2.ucode
firmware: iwlwifi-5000-5.ucode
firmware: iwlwifi-6000g2b-6.ucode
firmware: iwlwifi-6000g2a-5.ucode
firmware: iwlwifi-6050-5.ucode
firmware: iwlwifi-6000-4.ucode
firmware: iwlwifi-3160-7.ucode
firmware: iwlwifi-7260-7.ucode
parm: fw_restart:restart firmware in case of error (default true) (bool)

So in /lib/firmware there are several files for which there are two versions, but in the module not always the latest is loaded!

7260 is the older one, 3160 also, 6000g2a-5 also - but some are OK. 

Could this be underlying this issue?
Comment 32 Mike Cloaked 2014-04-03 08:49:50 UTC
The kernel in my previous comment should be 3.13.8 not .18 - apologies for the typo.
Comment 33 Ralf 2014-04-03 08:57:10 UTC
Created attachment 131331 [details]
dmesg+syslog of the isue happening with linux 3.14 and power_scheme=1

Yes, the issue happened with kernel 3.14, dmesg and syslog are attached.

Unfortunately, I am no longer at my parents' place where I experienced the problem daily. I will have to see whether it happens at all here at my own place. If it does, I will provide traces.

> So in /lib/firmware there are several files for which there are two versions,
> but in the module not always the latest is loaded!
> 
> 7260 is the older one, 3160 also, 6000g2a-5 also - but some are OK. 
> 
> Could this be underlying this issue?

However, despite what modinfo says, according to dmesg, the new firmware is loaded:

  iwlwifi 0000:03:00.0: loaded firmware version 22.24.8.0 op_mode iwlmvm
Comment 34 Emmanuel Grumbach 2014-04-03 09:04:27 UTC
@Mike - this is totally unrelated.

@Ralf - you have latest firmware.
Comment 35 Mike Cloaked 2014-04-03 09:41:33 UTC
OK I will enter a new bug for the issue I posted in comment #31
Comment 36 Mike Cloaked 2014-04-03 09:59:56 UTC
It seems that my report was based on confusing output, and that the correct later firmware is in fact loaded. The issue was that  Intel chooses to list the oldest usable firmware even though the later firmware is loaded. Apologies for the noise.
Comment 37 Emmanuel Grumbach 2014-04-03 10:02:11 UTC
No - We load the latest available supported FW.
3.13 supports iwlwifi-7260-8.ucode and hence will load iwlwifi-7260-8.ucode.
Comment 38 Ronnie Andrew 2014-04-04 04:53:12 UTC
I have connection drops and huge packet loss also with 7260 when just a couple of meters away from the router.

Kernel: 3.13.8 with iwlmvm power_scheme=1

Loaded Module: 22.24.8.0

Its probably not relevant but, the laptop is a Vostro 5470.
Comment 39 Ignacio Huerta 2014-04-04 05:15:26 UTC
@Emmanuel: I have disabled sw_crypto, upgraded to kernel 3.14.0 and currently monitoring the logs for the error to happen again. If it happens I'll send you the trace. Thanks for your support, I'll keep you updated.
Comment 40 Marc Collin 2014-04-05 01:01:37 UTC
i use kernel 3.14 

modinfo iwlwifi
filename:       /lib/modules/3.14.0-2.gfa168d7-desktop/kernel/drivers/net/wireless/iwlwifi/iwlwifi.ko
license:        GPL
author:         Copyright(c) 2003- 2014 Intel Corporation <ilw@linux.intel.com>
version:        in-tree:d
description:    Intel(R) Wireless WiFi driver for Linux
firmware:       iwlwifi-100-5.ucode
firmware:       iwlwifi-1000-5.ucode
firmware:       iwlwifi-135-6.ucode
firmware:       iwlwifi-105-6.ucode
firmware:       iwlwifi-2030-6.ucode
firmware:       iwlwifi-2000-6.ucode
firmware:       iwlwifi-5150-2.ucode
firmware:       iwlwifi-5000-5.ucode
firmware:       iwlwifi-6000g2b-6.ucode
firmware:       iwlwifi-6000g2a-5.ucode
firmware:       iwlwifi-6050-5.ucode
firmware:       iwlwifi-6000-4.ucode
firmware:       iwlwifi-3160-7.ucode
firmware:       iwlwifi-7260-7.ucode

so .8 don't seem there

will be include in 3.14.1?
Comment 41 Marc Collin 2014-04-05 01:08:23 UTC
.8 seem there but it's not used...
/lib/firmware/iwlwifi-7260-8.ucode

why?
Comment 42 Mike Cloaked 2014-04-05 10:25:05 UTC
Marc: I had an answer about the apparent missing -8 microcode file in the modinfo output from an answer to my arch linux bug report at https://bugs.archlinux.org/task/39722 - see the last comment before that bug was closed.

It seems that the later microcode file is loaded but apparently the modinfo only lists the earliest usable one. It would be better if the modinfo output was a bit more logical.
Comment 43 Ralf 2014-04-05 11:01:02 UTC
Unfortunately (or not ;-), the problem does not seem to happen here at my own place. Maybe it's because the wireless AP is just 2m from my laptop (so I usually don't even use it), but it seems O can only test this when I'm at my parents place.
Comment 44 Marc Collin 2014-04-05 12:11:32 UTC
cat /var/log/messages | grep iwlwifi

linux-rnkx kernel: [    9.941312] iwlwifi 0000:03:00.0: loaded firmware version 22.24.8.0 op_mode iwlmvm
linux-rnkx kernel: [    9.954234] iwlwifi 0000:03:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144

my router is less then 1m from my laptop.

i will add an attachment when i disconnect my cable and try to connect to a 5Ghz network.
Comment 45 Marc Collin 2014-04-05 12:12:07 UTC
Created attachment 131491 [details]
not able to connect to 5ghz network with kernel 3.14
Comment 46 Ralf 2014-04-05 12:17:46 UTC
@Marc: Please open a new bug for this. What makes you think it that you are experiencing the bug we are discussing here? The "characteristic" error message we talked about does not appear in your log, and the symptom is completely different.
This bug is about trouble with a connection that has long been established: Suddenly packet loss goes to 50% or even more (up to completely blocking the connection), and huge amount of "DISCONNECT" messages from wpa_supplicant appear in the syslog. This goes on for 5-10min, then everything is normal again for a few hours.
Comment 47 Emmanuel Grumbach 2014-04-05 19:19:31 UTC
way too many people on the same bug adding unrelated logs.
@Marc - please, this bug is not what you are having. Please open a new bug.
@Ronnie Andrew - please open a new bug and add your logs there. Worst case, I'll close it as duplicate of this bug.

Both - when you create a new bug, please CC ilw@linux.intel.com.

I wish I could delete all the unrelated noise on this bad.
Back to focus - Ralf and Ignacio have very similar logs. Thanks to Ignacio, I could see that he is suffering from a beacon loss which results in disconnections.

This is the only usable data I had until now.
So Ignacio, please reproduce the bug on 3.14 and again, add traces.
This time, I'd like to have -e iwlwifi_msg along with the rest I already asked for in the previous round.

Thanks.
Comment 48 Ignacio Huerta 2014-04-08 08:14:40 UTC
Hi Emmanuel,

I have managed to reproduce the bug with Kernel 3.14 and latest firmware. I don't have swcrypto enabled anymore, and the only module option I have is "iwlmvm power_scheme=1". This is the trace created with "trace-cmd record -e iwlwifi -e iwlwifi_msg -e mac80211 -e cfg80211":

https://www.dropbox.com/s/eop1zph852p19qp/trace_20140408.dat

Please say if I can do anything else to help.

Regards,
Ignacio
Comment 49 Emmanuel Grumbach 2014-04-13 07:01:58 UTC
I haven't forgotten you - but I doubt I'll be able to look at this in the coming 2 weeks sorry.

Thank you for your patience.
Comment 50 Ignacio Huerta 2014-04-13 19:14:17 UTC
Don't worry Emmanuel, there's no rush. Thanks for the update!
Comment 51 Emmanuel Grumbach 2014-05-07 10:14:25 UTC
So I finally found time for this. I can't see anything bad besides the fact that we are missing beacons for a reason I can't understand.

Would you be able to apply https://git.kernel.org/cgit/linux/kernel/git/iwlwifi/iwlwifi-fixes.git/commit/?id=431031851ea72a25abb9ad4df56a0f3b997e3026 and then upgrage your firmware to this one https://git.kernel.org/cgit/linux/kernel/git/egrumbach/linux-firmware.git/plain/iwlwifi-7260-9.ucode?id=7a67dbf9c087ef64b3d3fc9ce448c2efdb2e365f

Thanks.
Comment 52 chapaswork 2014-05-07 21:47:17 UTC
Emmanuel,

I found this bug while searching for information about why the wireless card on my notebook won't connect to my flat's router.

lspci:
02:00.0 Network controller: Intel Corporation Wireless 7260 (rev 73)

Kernel:
Linux 3.14.3-031403-generic #201405061153 SMP Tue May 6 15:54:50 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Firmware with md5 (downloaded from the driver's webpage):

1005c3b82879ecfa0d60544a78de7f92  /lib/firmware/iwlwifi-7260-9.ucode

I can't even connect to the router anymore, which is a "Fritz!Box 6340 Cable", some generic german brand of router, to which I have no access nor can reconfigure if needed.

I tried previous kernels and firmwares without success. Let me know what information is useful and I can gather it.

Kind regards,

-Ciro
Comment 53 Emmanuel Grumbach 2014-05-08 05:19:10 UTC
Ciro, please open a new bug and add your logs there.

Thank you.
Comment 54 Emmanuel Grumbach 2014-05-12 08:16:43 UTC
@Ignacio:

can you please do the following:

echo "bf_enable_beacon_filter=0" > /sys/kernel/debug/iwlwifi/*/iwlmvm/netdev\:wlan0/bf_params

and tell me if it helps?

Thanks.
Comment 55 Ignacio Huerta 2014-05-12 15:16:22 UTC
Hi Emmanuel, thanks a lot for your answers. I haven't had time until now to look into this. I'm gonna first test "bf_enable_beacon_filter=0" and if that doesn't help I'll test the patch and the upgraded firmware. The issue happens one or two times a day, so I will need a few days to come up with results.

Regards,
Ignacio
Comment 56 Emmanuel Grumbach 2014-05-12 16:38:02 UTC
Hi,

only twice a day?
You seemed to see that several times a minute.
Ok - in that case, it is probably something else.

Anyway I noticed that the debugs line I sent would be overridden after re-association so it is not relevant.

but this should work:

diff --git a/drivers/net/wireless/iwlwifi/mvm/power.c b/drivers/net/wireless/iwlwifi/mvm/power.c
index 4aab126..e67271c 100644
--- a/drivers/net/wireless/iwlwifi/mvm/power.c
+++ b/drivers/net/wireless/iwlwifi/mvm/power.c
@@ -857,6 +857,7 @@ int iwl_mvm_enable_beacon_filter(struct iwl_mvm *mvm,
                .bf_enable_beacon_filter = cpu_to_le32(1),
        };

+       return 0;
        return _iwl_mvm_enable_beacon_filter(mvm, vif, &cmd, flags, false);
 }
Comment 57 Ignacio Huerta 2014-05-12 16:55:38 UTC
I meant twice a day for "bursts of errors". It happens a lot of times during some minutes and then stops. Hours later it starts again for some minutes.

Thanks Emmanuel for the patch, I will test it and come back.
Comment 58 Florian Vallee 2014-05-12 21:16:34 UTC
Created attachment 135921 [details]
dmesg.log

Hi,

I'm seeing the issue (disconnection bursts) with the latest .9 firmware, dmesg.log attached (this is a short occurence from a couple of minutes ago, I've seen more severe ones an hour before)

The kernel I'm running does not have the latest patch your suggested (it does not apply on 3.14, I'll need to fetch a more recent kernel)

Hope it helps,

Florian
Comment 59 Emmanuel Grumbach 2014-05-13 08:34:34 UTC
can you run tracing?

Or re-compile with CFG80211_REG_DEBUG and CFG80211_DEVELOPER_WARNINGS?
Comment 60 Florian Vallee 2014-05-13 21:05:11 UTC
Hi Emmanuel,

I've built a 3.15 kernel : 3.15.0-rc5-g14186fe with your beacon filter patch and CFG80211_REG_DEBUG / CFG80211_DEVELOPER_WARNINGS enabled

I'll keep you updated, however, I'm a bit concerned regarding these last two options : they do not seem to have had much effect, see :

$ dmesg | grep cfg80211
[   19.550955] cfg80211: Calling CRDA to update world regulatory domain
[   20.127536] cfg80211: Ignoring regulatory request set by core since the driver uses its own custom regulatory domain
[  392.162117] cfg80211: All devices are disconnected, going to restore regulatory settings
[  392.162120] cfg80211: Restoring regulatory settings
[  392.162123] cfg80211: Kicking the queue
[  392.162127] cfg80211: Calling CRDA to update world regulatory domain

I would have expected more logs

kernel conf is there :

$ zcat /proc/config.gz  | sprunge
http://sprunge.us/cGDb

trace-cmd seem to have more useful data :

$ trace-cmd record -e iwlwifi -e iwlwifi_msg -e mac80211 -e cfg80211
[..]
$ trace-cmd report | sprunge
http://sprunge.us/CQNf

Did I miss something for the DEBUG defines ? I'll run the tracing command in the meantime.
Comment 61 Emmanuel Grumbach 2014-05-16 07:33:36 UTC
Ok - on another bug, someone reported that 3.15 is fine regardless of the firmware version running, can someone test this?

Thanks.
Comment 62 Florian Vallee 2014-05-16 09:22:34 UTC
Hi Emmanuel,

I did not see any occurence of the issue with 3.15 + beacon filter patch in two days.

I'll try without the patch tonight.
Comment 63 Emmanuel Grumbach 2014-05-16 09:42:08 UTC
Hi Florian,

good. Thank you. What firmware are you using?
Comment 64 Florian Vallee 2014-05-16 17:23:16 UTC
Hi Emmanuel,

iwlwifi 0000:03:00.0: loaded firmware version 23.214.9.0 op_mode iwlmvm

I'm reverting to an unpatched 3.15-rc5 right now, I'll run it over the week end and let you know how it went by Monday :)
Comment 65 Ralf 2014-05-17 12:11:44 UTC
I am spending this weekend with the "problematic" wireless again.

I'm currently running an unpatched 3.15-rc5 and firmware 22.24.8.0. So far, I've had no disconnects for 5h. That's pretty good, but also occasionally happened with the older kernels.
I will keep you posted.
Comment 66 Emmanuel Grumbach 2014-05-17 19:13:15 UTC
@Florian, from another bug report it seems that 3.15-rc5 plain seems to solve the issues too... OTOH, I am pretty sure there are several issues that lead to the same effect...

Thanks for your help!
Comment 67 Emmanuel Grumbach 2014-05-18 11:02:27 UTC
Ok - I think I have a lead:

Can someone try this:

diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
index cd6ea2e..17c097d 100644
--- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
@@ -619,7 +619,7 @@ static int iwl_mvm_mac_add_interface(struct ieee80211_hw *hw,
        if (ret)
                goto out_remove_mac;

-       if (!mvm->bf_allowed_vif &&
+       if (!mvm->bf_allowed_vif && false &&
            vif->type == NL80211_IFTYPE_STATION && !vif->p2p &&
            mvm->fw->ucode_capa.flags & IWL_UCODE_TLV_FLAGS_BF_UPDATED){
                mvm->bf_allowed_vif = mvmvif;

on 3.13 / 3.14?

thanks!
Comment 68 Florian Vallee 2014-05-18 14:16:51 UTC
building now with that patch on 3.14.

I have not seen the issue since my last comment. But it's been a sunny week end so far, so I did not spend that much time with the laptop ;)
Comment 69 Emmanuel Grumbach 2014-05-18 16:08:25 UTC
Created attachment 136601 [details]
disable beacon filtering

This is a fix candidate for 3.13 / 3.14.

Please test - thank you.
Comment 70 Ralf 2014-05-18 19:00:03 UTC
Not sure if this is still interesting, but: I have been running 3.15-rc5 the entire weekend and did not get a single disconnect.
Comment 71 Emmanuel Grumbach 2014-05-18 19:04:27 UTC
(In reply to Ralf Jung from comment #70)
> Not sure if this is still interesting, but: I have been running 3.15-rc5 the
> entire weekend and did not get a single disconnect.

:) not very interesting indeed. But thank you!

What I'd really like to see is 3.14 / 3.13 with the patch from comment 69.
Thanks!
Comment 72 Ralf 2014-05-18 19:06:36 UTC
Well, I only was using that wireless for this week-end... I can try that patch next time I'm there, which is probably in three weeks. Or whatever patch is "current" then ;-)
Comment 73 Florian Vallee 2014-05-18 19:08:13 UTC
I've been running the 3.13 / 3.14 patch for a few hours now

21:04:53 ey3ball@omnicron ~ :) $ uname -a
Linux omnicron 3.14.2-1-custom #2 SMP PREEMPT Sun May 18 16:49:31 CEST 2014 x86_64 GNU/Linux
20:50:36 ey3ball@omnicron ~ :) $ uptime 
 21:04:53 up  4:02,  2 users,  load average: 0.29, 0.22, 0.16

No issue to report so far

Will post an update later

Do you need logs of any sort ?
Comment 74 Emmanuel Grumbach 2014-05-18 19:10:14 UTC
@Florian - is this an improvement compared to 3.14 without the patch?

If you don't have issues, no need for log, but it can't hurt :)
Comment 75 Florian Vallee 2014-05-18 19:17:43 UTC
4 hours without seeing the issue is nice, and feels like an improvement but I haven't moved around much.

I'm gonna use the laptop at another location in the house which is usually more problematic, this should be a good test.

Depending on "the weather" the issue could happen from multiple times per hour to once or twice every few hours so we're not there yet ;).
Comment 76 Florian Vallee 2014-05-18 22:36:14 UTC
$ uptime 
 00:34:56 up  7:32,  2 users,  load average: 0.17, 0.16, 0.13

So far so good, it definitely looks like things have improved with that patch.
Comment 77 Florian Vallee 2014-05-18 23:30:10 UTC
After 8+ hours no sign of this issue.

I've ran into a new problem though, Emmanuel you might want to have a look at bug #42978 (it does not look like you're in copy there ATM, although I might be mistaken)
Comment 78 Emmanuel Grumbach 2014-05-19 10:41:39 UTC
Ok - I'll try to push this upstream.

regarding the second bugs, this is really weird, I'll try to take a look.
Comment 79 Florian Vallee 2014-05-19 21:44:13 UTC
I have not seen this bug (#72601) tonight, however #42978 already bit me twice in a few hours, I've posted new informations there.
Comment 80 Emmanuel Grumbach 2014-05-20 05:18:12 UTC
Florian says that this issue is fixed. Closing this bug now.
Please re-open if the proposed solution doesn't fix the issue for you.
Comment 81 Emmanuel Grumbach 2014-05-20 05:18:38 UTC
Also - the patch has been sent to GregKH and it will hit 3.13 / 3.14.
Comment 82 Florian Vallee 2014-05-20 17:42:05 UTC
Hi there,

I have bad news, I just arrived from work, booted the laptop (3.14 kernel with your patch), and ... disconnexion bursts.

[..]
[  179.109532] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[  179.115839] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[  179.119493] wlp3s0: authenticated
[  179.120368] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[  179.133101] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=4)
[  179.136458] wlp3s0: associated
[  184.479257] cfg80211: Calling CRDA to update world regulatory domain
[  188.196057] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[  188.202696] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[  188.205962] wlp3s0: authenticated
[  188.207032] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[  188.211204] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=4)
[  188.215095] wlp3s0: associated
[  190.448612] cfg80211: Calling CRDA to update world regulatory domain
[  194.165158] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[  194.171331] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[  194.174515] wlp3s0: authenticated
[  194.176913] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[  194.180978] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=4)

Interestingly I rebooted once to the same kernel, saw the bursts again, then switched to 3.15 and saw a "mini" burst (sadly small enough that I did not get a chance to save a trace). There is a new message in syslog though :


[   20.491876] psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 3/3
[   20.711555] input: TPPS/2 IBM TrackPoint as /devices/platform/i8042/serio1/serio2/input/input10
[   23.348966] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[   23.357247] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[   23.359901] wlp3s0: authenticated
[   23.360898] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[   23.364986] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=4)
[   23.368852] wlp3s0: associated
[   23.368908] IPv6: ADDRCONF(NETDEV_CHANGE): wlp3s0: link becomes ready
[   23.664309] iwlwifi 0000:03:00.0: No association and the time event is over already...
[   23.664348] wlp3s0: Connection to AP d4:ca:6d:25:f3:fc lost
[   23.691365] cfg80211: Calling CRDA to update world regulatory domain
[   27.408453] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[   27.414487] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[   27.417045] wlp3s0: authenticated
[   27.418693] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[   27.422829] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=4)
[   27.426200] wlp3s0: associated
[   27.721541] iwlwifi 0000:03:00.0: No association and the time event is over already...
[   27.721578] wlp3s0: Connection to AP d4:ca:6d:25:f3:fc lost
[   27.799266] cfg80211: Calling CRDA to update world regulatory domain
[   31.488935] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[   31.494093] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[   31.496679] wlp3s0: authenticated
[   31.499832] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[   31.504101] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=4)
[   31.507325] wlp3s0: associated

I'll send a 3.14 trace shortly





I'll send a trace shorthly
Comment 83 Florian Vallee 2014-05-20 17:44:43 UTC
Created attachment 136851 [details]
trace with 3.14 + patch during a burst
Comment 84 Emmanuel Grumbach 2014-05-20 19:13:21 UTC
what is the version of your supplicant?
Comment 85 Florian Vallee 2014-05-20 19:22:37 UTC
core/wpa_supplicant 2.1-3 [installed]
Comment 86 Emmanuel Grumbach 2014-05-20 19:30:52 UTC
This one has real issues I think.
Can you take the latest master branch?

git clone git://w1.fi/srv/git/hostap.git
Comment 87 Florian Vallee 2014-05-20 20:03:46 UTC
Ok, Built it from the following commit :

commit 4e0a94b7dc76db58cddbbcfe0be0bfef547f6dd0
Author: Jouni Malinen <jouni@qca.qualcomm.com>
Date:   Fri May 16 19:24:47 2014 +0300

Rebooting now to a 3.14 kernel with this supplicant
Comment 88 Florian Vallee 2014-05-26 09:48:45 UTC
I've only seen one disconnection since my last message, no big burst yet.
Comment 89 Florian Vallee 2014-05-28 23:05:10 UTC
Created attachment 137651 [details]
trace.dat with patch + latest supplicant

Hi Emmanuel,

I saw some big bursts tonight.

I can count at least 3 long series of 

[48822.626368] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[48822.632610] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[48822.635836] wlp3s0: authenticated
[48822.639236] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[48822.652132] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=3)
[48822.655948] wlp3s0: associated
[48824.246838] cfg80211: Calling CRDA to update world regulatory domain
[48827.953921] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[48827.960065] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[48827.963370] wlp3s0: authenticated
[48827.965032] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[48827.969158] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=3)
[48827.980664] wlp3s0: associated
[48830.691704] cfg80211: Calling CRDA to update world regulatory domain
[48834.393664] wlp3s0: authenticate with d4:ca:6d:25:f3:fc
[48834.400271] wlp3s0: send auth to d4:ca:6d:25:f3:fc (try 1/3)
[48834.403484] wlp3s0: authenticated
[48834.405492] wlp3s0: associate with d4:ca:6d:25:f3:fc (try 1/3)
[48834.409509] wlp3s0: RX AssocResp from d4:ca:6d:25:f3:fc (capab=0x411 status=0 aid=3)
[48834.413842] wlp3s0: associated
[48836.217129] cfg80211: Calling CRDA to update world regulatory domain


Two of them occured while I was AFK. I was eventually able to catch the third live and record a trace (attached)

For reference :

kernel = 3.14.2-1-custom (3.14 + beacon patch)
firmware = 23.214.9.0
wpa_supplicant = wpa_supplicant v2.2-devel (git, 2014-05-20)

Any thoughts ?

Thank you,

Florian
Comment 90 Florian Vallee 2014-05-28 23:21:33 UTC
Additionnal information : I was caught in a "bad burst" (not stopping for more than 5 minutes), so I eventually rebooted to 3.15 and saw exactly the same behaviour than described in comment #82 (ie: a mini burst after reboot, then I'm back to a stable connexion). Sadly I failed to get a trace running  fast enough under 3.15 again :(
Comment 91 Emmanuel Grumbach 2014-05-29 06:32:49 UTC
Ok - thank you for that.

I can see very clearly that this is a FW / radio / environment issue.

The FW is reporting that it missed beacons. We give it a bit of time, but then when we missed 10 beacons we report that the connection is lost.

I am trying to see what could have solved this in 3.15.

Note that there are tons of patches that I tagged for 3.14 and weren't merged even if they hit Linus's tree. Weird...
Comment 92 Emmanuel Grumbach 2014-06-08 05:18:06 UTC
I'll close this as the issue was fixed in 3.14.6.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=f47fc3c1b48dd8fc7a0a591551454459eca0ca94
Comment 93 Florian Vallee 2014-06-10 18:26:02 UTC
So no further testing necessary on 3.14.6 ? If I understand correctly the fix is only partial and no further action will be taken since 3.15 is out ?
Comment 94 Emmanuel Grumbach 2014-06-10 18:47:45 UTC
(In reply to Florian Vallee from comment #93)
> So no further testing necessary on 3.14.6 ? If I understand correctly the
> fix is only partial and no further action will be taken since 3.15 is out ?

Not entirely true. The original issues were:
1) bad uAPSD behavior with -8.ucode - uAPSD is disabled in 3.14.6
2) bad beacon filtering - beacon filtering is disabled in 3.14.6

so from my POV, 3.14.6 is fine, or at least not worse than 3.15.
There is still this other bug: bug 42978, but I think we can close the current one.

If not, let me know what I am missing here.
Comment 95 Florian Vallee 2014-06-10 20:17:32 UTC
> If not, let me know what I am missing here.

Great, thanks for the clarification. I did not notice the uAPSD patch, hence my question.