Bug 70881

Summary: iwlwifi failing to associate, "changed bandwidth" issues
Product: Networking Reporter: Nate Carlson (kernel)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: ilw, kernel
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.11.0-17-generic (Ubuntu build) Subsystem:
Regression: No Bisected commit-id:
Attachments: Various logs requested in the other ticket

Description Nate Carlson 2014-02-19 22:43:14 UTC
Created attachment 126791 [details]
Various logs requested in the other ticket

Note - I initially posted a followup to https://bugzilla.kernel.org/show_bug.cgi?id=58561 on this; as requested, opening a new ticket here, since it does not appear to be the same issue.

I'm upgrading to a modern version of Cisco's WLC solution (2504 controller and 3702i APs), and having issues connecting to it from Linux with an Intel 7260 card. Sometimes I am able to successfully connect and get an IP from the DHCP server, but the connection will soon fail afterwards. Other times, it will never even get an IP. Here's the error that the connection ends up failing with:

Feb 19 16:14:12 nc-vostro kernel: [10563.257052] wlan0: AP 18:e7:28:42:5e:24 changed bandwidth, new config is 5320 MHz, width 2 (5310/0 MHz)
Feb 19 16:14:12 nc-vostro kernel: [10563.257063] wlan0: AP 18:e7:28:42:5e:24 changed bandwidth in a way we can't support - disconnect

On my primary laptop, I applied the patch noted in the other ticket:
http://p.sipsolutions.net/9d1dd0734d2c3a7a.txt

..and that lets me connect (and get full 802.11ac speeds.) The kernel does log:
Feb 19 13:50:04 knight kernel: [87613.286635] wlan1: AP 18:e7:28:42:5e:25 seems to have broken HT/VHT support, disable bandwidth tracking

..but everything seems to work.

I grabbed a spare laptop off the shelf, installed Ubuntu, and set it up for testing. As requested in the other ticket, I grabbed the following logs/debug info:

* Pre-association:
** iw wlan0 scan dump -b (filename: 2014-12-29-wlan_scan_dump_-b)
** iw wlan0 scan dump -u (filename: 2014-12-29-wlan_scan_dump_-u)
** iw list (filename: 2014-12-29-iw_list)

* While associating:
** tcpdump -i moni0 -s0 -w /tmp/assoc.pcap  (Note, I changed 'wlan0' to 'moni0'; I hope that's what was intended. If you need a pcap of wlan0, let me know. Filename: 2014-12-29-assoc-moni0.pcap)
** iw event -t -f (Filename: 2014-12-29-iw_event_-t_-f)

I've also attached the output of syslog during this period - it is in the file named '2014-12-29-syslog'.

During this test, it did successfully associate to the AP once; I disconnected it, and then let it fail a number of times. Also, it's worth noting that the same card works fine under Windows.

Here's the details on the hardware and software in use:

Access point side:
* Cisco WLC 2504 controller; software version 7.6.100.0
* Cisco 3702i lightweight access point; software version 7.6.100.0 (as loaded by the controller)

There are multiple SSIDs set up on the AP; each SSID is enabled for both 2.4ghz and 5ghz, and Band Select is enabled for the SSIDs (the AP will try to push clients to the 5ghz band.) The SSID I did the testing against is 'IB-test-alt', which is identical to 'IB-test' but with a different PSK (just in case the PSK leaks in any log files/etc.) The behavior between IB-test-alt and IB-test is the same.  I can provide many more details on the configuration of the WLC if needed. (Also, just to note - if this is not purely a kernel issue, but is a defect on Cisco's side, I can open a TAC case to try to have them resolve it - but it'd be great if we can get a kernel-side workaround too.)

Client side:

* Dell Vostro 3550
* Intel 7260 802.11ac card
* Ubuntu 12.04 with all updates
* Ubuntu's 3.11.0-17-generic kernel (see the other ticket -- on my laptop I tried many varieties of kernels, and also tried the compat wireless drivers from linux-next with the same behavior. If you want me to test other kernels on the test laptop, let me know.)

Here's details on the driver from startup:
Feb 19 13:28:05 nc-vostro kernel: [  584.675656] Copyright(c) 2003-2013 Intel Corporation
Feb 19 13:28:05 nc-vostro kernel: [  584.675805] iwlwifi 0000:09:00.0: irq 51 for MSI/MSI-X
Feb 19 13:28:05 nc-vostro kernel: [  584.677615] iwlwifi 0000:09:00.0: loaded firmware version 22.1.7.0 op_mode iwlmvm
Feb 19 13:28:05 nc-vostro kernel: [  584.714969] iwlwifi 0000:09:00.0: Detected Intel(R) Dual Band Wireless AC 7260, REV=0x144
Feb 19 13:28:05 nc-vostro kernel: [  584.715031] iwlwifi 0000:09:00.0: L1 Disabled; Enabling L0S
Feb 19 13:28:05 nc-vostro kernel: [  584.715175] iwlwifi 0000:09:00.0: RF_KILL bit toggled to disable radio.
Feb 19 13:28:05 nc-vostro kernel: [  584.715235] iwlwifi 0000:09:00.0: L1 Disabled; Enabling L0S

Please let me know if there is any more information I can provide, or more testing you'd like me to do!

(It appears I can only do one attachment at a time, so I've attached a tarball containing the files listed above.)
Comment 1 Johannes Berg 2014-02-20 07:19:40 UTC
Sorry, yes, I must have not paid attention - I meant tcpdump of moni0 of course, thanks!

I looked at this for a bit and don't yet see the issue, the AP seems to correctly and consistently advertise HT/VHT with an 80MHz channel.
Comment 2 Johannes Berg 2014-02-20 07:23:46 UTC
I have feeling this is a regdomain issue, since I see nothing really wrong with the AP information.

Could you disable CRDA for another test? The easiest way to do that is probably something like

 sudo mv /sbin/crda /sbin/crda.disabled

or so.

My hunch is that without CRDA, the regdomain allows 80 MHz, but with CRDA updating for the US database you get rules that don't allow 80 MHz, and then it disconnects. Which arguably it shouldn't, so likely my commit ddfe49b42d8ad4bfdf92d63d4a74f162660d878d ("mac80211: continue using disabled channels while connected") was incomplete.
Comment 3 Nate Carlson 2014-02-21 00:07:42 UTC
Looks like you got it!

Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Auto-activating connection 'IB-test-alt'.
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) starting connection 'IB-test-alt'
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> (wlan0): device state change: disconnected -> prepare (reason 'none') [30 40 0]
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) scheduled...
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) started...
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) scheduled...
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) complete.
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) starting...
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> (wlan0): device state change: prepare -> config (reason 'none') [40 50 0]
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0/wireless): access point 'IB-test-alt' has security, but secrets are required.
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> (wlan0): device state change: config -> need-auth (reason 'none') [50 60 0]
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) complete.
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) scheduled...
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) started...
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> (wlan0): device state change: need-auth -> prepare (reason 'none') [60 40 0]
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) scheduled...
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 1 of 5 (Device Prepare) complete.
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) starting...
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> (wlan0): device state change: prepare -> config (reason 'none') [40 50 0]
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0/wireless): connection 'IB-test-alt' has security, and secrets exist.  No new secrets needed.
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Config: added 'ssid' value 'IB-test-alt'
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Config: added 'scan_ssid' value '1'
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Config: added 'key_mgmt' value 'WPA-PSK'
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Config: added 'psk' value '<omitted>'
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 2 of 5 (Device Configure) complete.
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> Config: set interface ap_scan to 1
Feb 20 17:24:32 nc-vostro NetworkManager[921]: <info> (wlan0): supplicant interface state: disconnected -> scanning
Feb 20 17:24:36 nc-vostro wpa_supplicant[4049]: Trying to authenticate with 18:e7:28:42:5e:2b (SSID='IB-test-alt' freq=2412 MHz)
Feb 20 17:24:36 nc-vostro kernel: [101290.829939] wlan0: authenticate with 18:e7:28:42:5e:2b
Feb 20 17:24:36 nc-vostro kernel: [101290.831134] wlan0: direct probe to 18:e7:28:42:5e:2b (try 1/3)
Feb 20 17:24:36 nc-vostro NetworkManager[921]: <info> (wlan0): supplicant interface state: scanning -> authenticating
Feb 20 17:24:36 nc-vostro kernel: [101291.032918] wlan0: direct probe to 18:e7:28:42:5e:2b (try 2/3)
Feb 20 17:24:36 nc-vostro kernel: [101291.236934] wlan0: direct probe to 18:e7:28:42:5e:2b (try 3/3)
Feb 20 17:24:36 nc-vostro kernel: [101291.440535] wlan0: authentication with 18:e7:28:42:5e:2b timed out
Feb 20 17:24:37 nc-vostro wpa_supplicant[4049]: Trying to authenticate with 18:e7:28:42:5e:24 (SSID='IB-test-alt' freq=5320 MHz)
Feb 20 17:24:37 nc-vostro wpa_supplicant[4049]: Trying to associate with 18:e7:28:42:5e:24 (SSID='IB-test-alt' freq=5320 MHz)
Feb 20 17:24:37 nc-vostro kernel: [101291.602277] wlan0: authenticate with 18:e7:28:42:5e:24
Feb 20 17:24:37 nc-vostro kernel: [101291.603574] wlan0: send auth to 18:e7:28:42:5e:24 (try 1/3)
Feb 20 17:24:37 nc-vostro kernel: [101291.604174] wlan0: authenticated
Feb 20 17:24:37 nc-vostro kernel: [101291.604772] wlan0: associate with 18:e7:28:42:5e:24 (try 1/3)
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> (wlan0): supplicant interface state: authenticating -> associating
Feb 20 17:24:37 nc-vostro wpa_supplicant[4049]: Associated with 18:e7:28:42:5e:24
Feb 20 17:24:37 nc-vostro kernel: [101291.605933] wlan0: RX AssocResp from 18:e7:28:42:5e:24 (capab=0x111 status=0 aid=1)
Feb 20 17:24:37 nc-vostro kernel: [101291.607029] wlan0: associated
Feb 20 17:24:37 nc-vostro kernel: [101291.607092] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Feb 20 17:24:37 nc-vostro kernel: [101291.607276] cfg80211: Calling CRDA for country: US
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> (wlan0): supplicant interface state: associating -> 4-way handshake
Feb 20 17:24:37 nc-vostro udevd[13550]: failed to execute '/sbin/crda' '/sbin/crda': No such file or directory
Feb 20 17:24:37 nc-vostro wpa_supplicant[4049]: WPA: Key negotiation completed with 18:e7:28:42:5e:24 [PTK=CCMP GTK=CCMP]
Feb 20 17:24:37 nc-vostro wpa_supplicant[4049]: CTRL-EVENT-CONNECTED - Connection to 18:e7:28:42:5e:24 completed (reauth) [id=0 id_str=]
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> (wlan0): supplicant interface state: 4-way handshake -> completed
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> Activation (wlan0/wireless) Stage 2 of 5 (Device Configure) successful.  Connected to wireless network 'IB-test-alt'.
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 3 of 5 (IP Configure Start) scheduled.
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 3 of 5 (IP Configure Start) started...
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> (wlan0): device state change: config -> ip-config (reason 'none') [50 70 0]
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Beginning DHCPv4 transaction (timeout in 45 seconds)
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> dhclient started with pid 13551
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Beginning IP6 addrconf.
Feb 20 17:24:37 nc-vostro dhclient: Internet Systems Consortium DHCP Client 4.1-ESV-R4
Feb 20 17:24:37 nc-vostro dhclient: Copyright 2004-2011 Internet Systems Consortium.
Feb 20 17:24:37 nc-vostro dhclient: All rights reserved.
Feb 20 17:24:37 nc-vostro dhclient: For info, please visit https://www.isc.org/software/dhcp/
Feb 20 17:24:37 nc-vostro dhclient: 
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> Activation (wlan0) Stage 3 of 5 (IP Configure Start) complete.
Feb 20 17:24:37 nc-vostro NetworkManager[921]: <info> (wlan0): DHCPv4 state changed nbi -> preinit
Feb 20 17:24:37 nc-vostro dhclient: Listening on LPF/wlan0/fc:f8:ae:6d:12:62
Feb 20 17:24:37 nc-vostro dhclient: Sending on   LPF/wlan0/fc:f8:ae:6d:12:62
Feb 20 17:24:37 nc-vostro dhclient: Sending on   Socket/fallback
Feb 20 17:24:37 nc-vostro dhclient: DHCPDISCOVER on wlan0 to 255.255.255.255 port 67 interval 3
Feb 20 17:24:38 nc-vostro avahi-daemon[712]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::fef8:aeff:fe6d:1262.
Feb 20 17:24:38 nc-vostro avahi-daemon[712]: New relevant interface wlan0.IPv6 for mDNS.
Feb 20 17:24:38 nc-vostro avahi-daemon[712]: Registering new address record for fe80::fef8:aeff:fe6d:1262 on wlan0.*.
Feb 20 17:24:40 nc-vostro dhclient: DHCPDISCOVER on wlan0 to 255.255.255.255 port 67 interval 5
Feb 20 17:24:45 nc-vostro dhclient: DHCPDISCOVER on wlan0 to 255.255.255.255 port 67 interval 9
Feb 20 17:24:45 nc-vostro dhclient: DHCPREQUEST of 10.0.7.202 on wlan0 to 255.255.255.255 port 67
Feb 20 17:24:45 nc-vostro dhclient: DHCPOFFER of 10.0.7.202 from 10.0.6.254
Feb 20 17:24:45 nc-vostro dhclient: DHCPACK of 10.0.7.202 from 10.0.6.254
Feb 20 17:24:45 nc-vostro dhclient: bound to 10.0.7.202 -- renewal in 42215 seconds.


No errors/etc. I then moved crda back, and tried reconnecting - same error as before.

I'm currently running crda version 1.1.2-1ubuntu1 (as shipped with Ubuntu Precise), if that makes a difference - but from your comments it sounds like it's likely an issue with the driver, and not crda?
Comment 4 Nate Carlson 2014-02-21 16:46:06 UTC
@Johannes -- I re-read what you said above: "My hunch is that without CRDA, the regdomain allows 80 MHz, but with CRDA updating for the US database you get rules that don't allow 80 MHz, and then it disconnects."

..and then realized that crda must get the regulatory info from somewhere, and found the pages on the wireless-regdb stuff. Short version of the story - I updated to regdb 2013.11.27, and everything connects fine with CRDA enabled. Hurray!

So I guess this is still a valid bug in the sense that a working connection breaks when CRDA gets called, which is what I believe you were alluding to when you said "likely my commit ddfe49b42d8ad4bfdf92d63d4a74f162660d878d ("mac80211: continue using disabled channels while connected") was incomplete."

From my point of view on how to 'fix' this, it'd be great if there was a way to:
* Keep the connection up (even if it involved forcing negotiations down to whatever the regulatory domain does allow -- although that could be a crazy amount of work, I really don't know the ins and outs of wireless link negotiations!)
* If possible, check to see if the reason for the link issue is indeed a regulatory issue, and if so, log a message that very clearly states that. If that isn't possible (ie, this codepath doesn't have any knowledge of what regulatory domain is in place, just what channels it is allowed to use), maybe log the error, along with a note that there _could_ be a regulatory issue?

I do think that it would be important to try to make it clear in the logs that the link is attempting to do something that is outside of what is allowed by the loaded regulatory restrictions, even if the wireless link does stay up.. otherwise it's hard to tell what the issue could be!

Appreciate it!
Comment 5 Johannes Berg 2014-02-21 18:47:50 UTC
Hi, sorry for the delay, timezone seems to work against us.

Right, so what you did confirms my hunch, and you're also correct in that what I meant by "my commit ddfe... was incomplete" - it seems it should allow the connection to be used if the original regdomain let us. There's no down-negotiation involved, I think we should just rely on the AP 'knowing better' (although sometimes they really don't, but hey... at least we're not the ones initiating the breakage)

The problem with the logging is that there are so many different reasons for this to go wrong and most aren't related to regulatory, that I'm not sure we can distinguish this correctly. What might be useful would be to print out some regulatory information when we connect/in this case so you could compare that with a single glance, I'll have to think about that.
Comment 6 Nate Carlson 2014-02-21 19:07:29 UTC
No worries on delay!

I totally understand what you mean about too many different ways it can go wrong - that's kind of what I was guessing, but never hurts to ask. Regulatory info when connecting (or erroring) would help for sure; but totally understand if it doesn't make sense to add it to the output (more noise in an already verbose log); at the minimum, hopefully people getting the same error I did will find this ticket as one of the possibilities for the error when searching for it.  ;)

Relying on the AP to be correct sounds good to me! Do you happen to have a rough timeframe of when you may be able to look at what's missing from the patch? If it will be in a relatively short amount of time (next week or two), I'll keep my test machine set up for this; otherwise, when there is something to be tested, I can always set up another one.

Thanks!

-Nate
Comment 7 Johannes Berg 2014-02-21 19:09:40 UTC
I'm taking a look at it now, but it's difficult to tell the difference between getting disabled for regulatory and other reasons ... but it shouldn't take a massive amount of time. However, I can probably easily set up a test case for this as well, now that we know what's going on.
Comment 8 Johannes Berg 2014-02-21 19:13:03 UTC
Hm, here's something I'd be interested in. You get this message:

        sdata_info(sdata,
                   "AP %pM changed bandwidth, new config is %d MHz, width %d (%d/%d MHz)\n",
                   ifmgd->bssid, chandef.chan->center_freq, chandef.width,
                   chandef.center_freq1, chandef.center_freq2);

(I hope I got that right)

maybe you could print out the old chandef, say by adding

        sdata_info(sdata,
                   "old config is %d MHz, width %d (%d/%d MHz)\n",
                   sdata->vif.bss_conf.chandef.chan->center_freq,
                   sdata->vif.bss_conf.chandef.width,
                   sdata->vif.bss_conf.chandef.center_freq1,
                   sdata->vif.bss_conf.chandef.center_freq2);

immediately after the other message?

That might help me set up this scenario. The part about changing confuses me a little bit.
Comment 9 Johannes Berg 2014-02-21 19:15:28 UTC
Oh, no, never mind. I get it. The 'chandef' that's printed out is width=2 (40 MHz) while the one when connecting was 80 MHz. Forget it, sorry!
Comment 10 Nate Carlson 2014-02-21 19:19:25 UTC
OK; let me know if you do run into something you want me to test/enable debug on/whatever! :)
Comment 11 Johannes Berg 2014-02-21 19:33:34 UTC
Try this patch please: http://p.sipsolutions.net/56ad5637d91e130e.txt
Comment 12 Johannes Berg 2014-02-21 19:33:58 UTC
(with the old regulatory database, obviously)
Comment 13 Nate Carlson 2014-02-21 21:57:52 UTC
On my primary laptop (the one where I did my initial testing), I:
* Built the wireless drivers from linux-next compat without any patches, and validated that I still get the old error when connecting
* Applied patch 56ad5637d91e130e.txt from above, rebuilt the drivers, removed and reloaded them
* Tried connecting again - and it seems to work without issues!

I'll keep running on the driver with this patch to ensure that I don't see any other oddness, but so far so good!
Comment 14 Johannes Berg 2014-02-24 09:12:34 UTC
Ok, thanks. I'll submit this patch through the regular channels.