Bug 16462 - unable to connect to hidden SSID AP on legal channel 13
Summary: unable to connect to hidden SSID AP on legal channel 13
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Johannes Berg
URL:
Keywords:
: 16619 (view as bug list)
Depends on:
Blocks: 16055
  Show dependency tree
 
Reported: 2010-07-25 17:06 UTC by Daniel J Blueman
Modified: 2010-10-11 20:13 UTC (History)
8 users (show)

See Also:
Kernel Version: 2.6.35-rc6
Tree: Mainline
Regression: Yes


Attachments
debug attaching to AP on channel 11 (success) (136.68 KB, text/plain)
2010-07-27 13:51 UTC, Daniel J Blueman
Details
debug attaching to AP on channel 13 (failure) (237.81 KB, text/plain)
2010-07-27 13:52 UTC, Daniel J Blueman
Details
Trace from attaching to channel 13 AP with 2.6.35 (534.05 KB, application/x-bzip)
2010-08-02 15:02 UTC, Daniel J Blueman
Details
debug attaching to AP on channel 13 (failure) (98.96 KB, text/plain)
2010-08-02 15:37 UTC, Daniel J Blueman
Details
Trace from attaching to channel 13 AP with 2.6.35 (530.62 KB, application/x-bzip)
2010-08-02 15:40 UTC, Daniel J Blueman
Details
test script (400 bytes, application/octet-stream)
2010-08-06 18:44 UTC, Aaron
Details
trace and dmesg logs with failing and working AP setup (596.32 KB, application/x-compressed-tar)
2010-08-19 14:13 UTC, andreas.neiser
Details
patch for 2.6.35.2 (11.73 KB, patch)
2010-08-20 08:46 UTC, Johannes Berg
Details | Diff
failing AP config for ath9k (41.32 KB, application/octet-stream)
2010-09-28 17:18 UTC, Daniel J Blueman
Details
patch to enable probe-after-rx on 2.4 GHz (4.34 KB, patch)
2010-09-29 08:24 UTC, Johannes Berg
Details | Diff

Description Daniel J Blueman 2010-07-25 17:06:11 UTC
My Intel 5300AGN hardware is unable to connect access points running on channel 12 or 13, though these channels are available according to various subsystems, and legal.

In one UK-purchased AP I have control over, changing the channel to 11 allows the 5300AGN to connect, moving it back to 12 or 13 prevents connection (though it is always visible in the scan and the channel is reported correctly). It transmits the correct GB country IE also.

The laptop is a Dell Studio 15 model 1557 purchased in the UK, with the iwlagn EEPROM set to a GB regdomain all as expected. Kernel is stock mainline; userspace is stock Ubuntu 10.10 pre-release, ie wireless-crda 1.12 and wireless-tools 30~pre9.

Relevant dmesg output:

iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:

iwlagn: Copyright(c) 2003-2010 Intel Corporation

iwlagn 0000:05:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17

iwlagn 0000:05:00.0: setting latency timer to 64

iwlagn 0000:05:00.0: Detected Intel(R) Ultimate N WiFi Link 5300 AGN, REV=0x24

iwlagn 0000:05:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels

iwlagn 0000:05:00.0: irq 54 for MSI/MSI-X

iwlagn 0000:05:00.0: loaded firmware version 8.24.2.12

phy0: Selected rate control algorithm 'iwl-agn-rs'

cfg80211: Calling CRDA for country: GB

cfg80211: Received country IE:

cfg80211: Regulatory domain: GB

    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)

    (2402000 KHz - 2494000 KHz @ 40000 KHz), (10000 mBi, 1600 mBm)   

cfg80211: CRDA thinks this should applied:

cfg80211: Regulatory domain: GB

    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)

    (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)

    (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)

    (5250000 KHz - 5330000 KHz @ 40000 KHz), (N/A, 2000 mBm)

    (5490000 KHz - 5710000 KHz @ 40000 KHz), (N/A, 2700 mBm)

cfg80211: We intersect both of these and get:

cfg80211: Regulatory domain: 98

    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)

    (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 1600 mBm)

cfg80211: Leaving channel 5180 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5200 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5220 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5240 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5260 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5280 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5300 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5320 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5500 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5520 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5540 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5560 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5580 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5600 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5620 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5640 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5660 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5680 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5700 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5745 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5765 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5785 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5805 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Leaving channel 5825 MHz intact on phy0 - no rule found in band on Country IE

cfg80211: Current regulatory domain updated by AP to: GB

    (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)

    (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 1600 mBm)


$ iw reg get
country 98:
	(2402 - 2482 @ 40), (N/A, 16)

$ iw list
Wiphy phy0
	Band 1:
		Capabilities: 0x872
			HT20/HT40
			Static SM Power Save
			RX Greenfield
			RX HT20 SGI
			RX HT40 SGI
			No RX STBC
			Max AMSDU length: 3839 bytes
			No DSSS/CCK HT40
		Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
		Minimum RX AMPDU time spacing: 4 usec (0x05)
		HT TX/RX MCS rate indexes supported: 0-23, 32
		Frequencies:
			* 2412 MHz [1] (15.0 dBm)
			* 2417 MHz [2] (15.0 dBm)
			* 2422 MHz [3] (15.0 dBm)
			* 2427 MHz [4] (15.0 dBm)
			* 2432 MHz [5] (15.0 dBm)
			* 2437 MHz [6] (15.0 dBm)
			* 2442 MHz [7] (15.0 dBm)
			* 2447 MHz [8] (15.0 dBm)
			* 2452 MHz [9] (15.0 dBm)
			* 2457 MHz [10] (15.0 dBm)
			* 2462 MHz [11] (15.0 dBm)
			* 2467 MHz [12] (15.0 dBm) (passive scanning, no IBSS)
			* 2472 MHz [13] (15.0 dBm) (passive scanning, no IBSS)
		Bitrates (non-HT):
			* 1.0 Mbps
			* 2.0 Mbps (short preamble supported)
			* 5.5 Mbps (short preamble supported)
			* 11.0 Mbps (short preamble supported)
			* 6.0 Mbps
			* 9.0 Mbps
			* 12.0 Mbps
			* 18.0 Mbps
			* 24.0 Mbps
			* 36.0 Mbps
			* 48.0 Mbps
			* 54.0 Mbps
	Band 2:
		Capabilities: 0x872
			HT20/HT40
			Static SM Power Save
			RX Greenfield
			RX HT20 SGI
			RX HT40 SGI
			No RX STBC
			Max AMSDU length: 3839 bytes
			No DSSS/CCK HT40
		Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
		Minimum RX AMPDU time spacing: 4 usec (0x05)
		HT TX/RX MCS rate indexes supported: 0-23, 32
		Frequencies:
			* 5180 MHz [36] (15.0 dBm) (passive scanning, no IBSS)
			* 5200 MHz [40] (15.0 dBm) (passive scanning, no IBSS)
			* 5220 MHz [44] (15.0 dBm) (passive scanning, no IBSS)
			* 5240 MHz [48] (15.0 dBm) (passive scanning, no IBSS)
			* 5260 MHz [52] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5280 MHz [56] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5300 MHz [60] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5320 MHz [64] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5500 MHz [100] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5520 MHz [104] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5540 MHz [108] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5560 MHz [112] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5580 MHz [116] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5600 MHz [120] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5620 MHz [124] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5640 MHz [128] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5660 MHz [132] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5680 MHz [136] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5700 MHz [140] (15.0 dBm) (passive scanning, no IBSS, radar detection)
			* 5745 MHz [149] (15.0 dBm) (passive scanning, no IBSS)
			* 5765 MHz [153] (15.0 dBm) (passive scanning, no IBSS)
			* 5785 MHz [157] (15.0 dBm) (passive scanning, no IBSS)
			* 5805 MHz [161] (15.0 dBm) (passive scanning, no IBSS)
			* 5825 MHz [165] (15.0 dBm) (passive scanning, no IBSS)
		Bitrates (non-HT):
			* 6.0 Mbps
			* 9.0 Mbps
			* 12.0 Mbps
			* 18.0 Mbps
			* 24.0 Mbps
			* 36.0 Mbps
			* 48.0 Mbps
			* 54.0 Mbps
	max # scan SSIDs: 20
	Supported interface modes:
		 * IBSS
		 * managed
		 * monitor
	Supported commands:
		 * new_interface
		 * set_interface
		 * new_key
		 * new_beacon
		 * new_station
		 * new_mpath
		 * set_mesh_params
		 * set_bss
		 * authenticate
		 * associate
		 * deauthenticate
		 * disassociate
		 * join_ibss
		 * Unknown command (55)
		 * Unknown command (57)
		 * Unknown command (59)
		 * set_wiphy_netns
		 * Unknown command (65)
		 * connect
		 * disconnect

Other wireless devices I have associate as expected with the AP on channel eg 13, however I see repeated:

wpa_supplicant[1438]: Trying to associate with 00:02:cf:ea:2b:e0 (SSID='satellite' freq=2472 MHz)
NetworkManager: <info>  (wlan0): supplicant connection state:  scanning -> associating
kernel: [29114.994159] wlan0: direct probe to 00:02:cf:ea:2b:e0 (try 1)
kernel: [29115.186137] wlan0: direct probe to 00:02:cf:ea:2b:e0 (try 2)
kernel: [29115.385942] wlan0: direct probe to 00:02:cf:ea:2b:e0 (try 3)
kernel: [29115.588242] wlan0: direct probe to 00:02:cf:ea:2b:e0 timed out
wpa_supplicant[1438]: Authentication with 00:02:cf:ea:2b:e0 timed out.   

What further information is useful here?
Comment 1 John W. Linville 2010-07-26 13:28:36 UTC
Based on the provided logs, I would say this is _not_ a case of any regulatory enforcement by the kernel -- it appears to be changing to the requested channel and simply not authenticating.  Hopefully the iwlwifi people have some ideas?
Comment 2 wey-yi.w.guy 2010-07-26 14:32:24 UTC
we will look into this, we usually don't test channel 12/13 on our regular testcases, but we do have one AP can beacon on channel 12/13 and we will try to re-pro the problem here.

Thanks
Wey
Comment 3 Daniel J Blueman 2010-07-26 14:38:36 UTC
Thanks Wey. To simplify matters, I can reproduce the issue with this kernel:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.35-rc6-maverick/linux-image-2.6.35-020635rc6-generic_2.6.35-020635rc6_amd64.deb

If it fails to reproduce there, I can quickly gather debugging as you prefer.
Comment 4 wey-yi.w.guy 2010-07-26 14:41:30 UTC
yes and please, if you could run load the module with debug=0x43fff, then send us the log, it will be very helpful for us to understand what the problem is.

Thanks
Wey
Comment 5 Daniel J Blueman 2010-07-27 13:49:59 UTC
Ok, comparing the debug goodness from the AP on channel 11 (working) and channel 13 (failing), it is clear that in the latter, iwlagn is sending commands to the MAC controller with channel 1 specified, ie:

* without RXON_FILTER_ASSOC_MSK
* channel = 1
* bssid = 00:00:00:00:00:00
iwl_print_rx_config_cmd u16 channel: 0x1

This directly explains why we see the timeouts, as the radio is tuned to the wrong channel.

Attaching full logs in working and non-working case.
Comment 6 Daniel J Blueman 2010-07-27 13:51:32 UTC
Created attachment 27268 [details]
debug attaching to AP on channel 11 (success)
Comment 7 Daniel J Blueman 2010-07-27 13:52:07 UTC
Created attachment 27269 [details]
debug attaching to AP on channel 13 (failure)
Comment 8 wey-yi.w.guy 2010-07-27 14:12:44 UTC
the following are normal behavior and you ca nsee on both success and failure case:

* without RXON_FILTER_ASSOC_MSK
* channel = 1
* bssid = 00:00:00:00:00:00
iwl_print_rx_config_cmd u16 channel: 0x1

Thanks for the log, we are working on it.

Wey
Comment 9 Daniel J Blueman 2010-07-27 14:26:39 UTC
Indeed these messages are normal and expected behaviour, but in the working case (AP on channel 11), we consistently see channel 11 being specified and in the failing case (AP on channel 13), we consistently see channel 1 being specified.

Thanks for your help guys! All the better if the fix for this regression is suitable for inclusion into 2.6.35-final, as the impact is significant in many countries.
Comment 10 wey-yi.w.guy 2010-07-27 14:29:12 UTC
Hi Daniel,

Thanks for the help, it is good information and wea are looking into it now and we might need your help for testing once we have the the fix for it.

Wey
Comment 11 Johannes Berg 2010-07-30 07:09:06 UTC
I was able to reproduce this once, but then not again. Any idea why that may be?

FWIW, going back to channel 1 is normal, but it should still go to channel 13 obviously. This puts a different light on it though, I was suspecting a bug elsewhere.

Daniel, can you try to gather tracing data? To do that see if you can compile iwlwifi with CONFIG_IWLWIFI_DEVICE_TRACING, then obtain trace-cmd from git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git and run "trace-cmd record -e iwlwifi" as root/sudo.
Comment 12 Daniel J Blueman 2010-07-30 08:30:05 UTC
Thanks Johannes. I no longer have access to an AP with channels 12 and 13 to test with, so was trying to run my Debian AP at home on channel 13.

I'll try to get around the world regulatory domain set in it's EEPROM (despite being purchased in the UK), preventing legal use of these channels. If I succeed, I'll grab the trace. Intel will be trying to reproduce this anyway.

Note that the interesting thing with channel 1 being erroneously used, rather than 13, is: 13 % len(US channels) == 1.
Comment 13 Johannes Berg 2010-07-30 10:55:22 UTC
Yeah, in case it wasn't clear, I do work for Intel, but now I can no longer reproduce this for some reason even with the same hardware and microcode you use. :-) Will have to go back to your kernel version too I guess.
Comment 14 Johannes Berg 2010-07-30 11:39:59 UTC
Hm. I can't even reproduce with that kernel :-(
Comment 15 Daniel J Blueman 2010-08-02 15:01:24 UTC
I reproduced this against my ath9k AP on channel 13 with 2.6.35-final.

Trace attached; let me know if you need further testing/information.
Comment 16 Daniel J Blueman 2010-08-02 15:02:28 UTC
Created attachment 27316 [details]
Trace from attaching to channel 13 AP with 2.6.35
Comment 17 Johannes Berg 2010-08-02 15:06:16 UTC
Thanks. I see no evidence of it even trying to connect in this log, which is very odd. It did print out the "timed out" messages again, right?
Comment 18 Daniel J Blueman 2010-08-02 15:37:45 UTC
Created attachment 27317 [details]
debug attaching to AP on channel 13 (failure)
Comment 19 Daniel J Blueman 2010-08-02 15:40:00 UTC
Created attachment 27318 [details]
Trace from attaching to channel 13 AP with 2.6.35

updated trace and iwlagn debug messages, captured at the time of tracing
Comment 20 Daniel J Blueman 2010-08-04 14:19:27 UTC
Johannes: thanks for your time so far. Do the last attachments provide the information you're looking for?

Just let me know if you need a different pattern of debug or capture etc.

Thanks, Daniel
Comment 21 Johannes Berg 2010-08-06 10:02:24 UTC
I can't quite figure out the issue, and I can only reproduce it exactly once every day :-(

I have a vague feeling this could be related to the RTS/CTS mess, could you try this change?

--- a/drivers/net/wireless/iwlwifi/iwl-agn-hcmd.c
+++ b/drivers/net/wireless/iwlwifi/iwl-agn-hcmd.c
@@ -212,7 +212,11 @@ static void iwlagn_chain_noise_reset(struct iwl_priv *priv)
 static void iwlagn_rts_tx_cmd_flag(struct ieee80211_tx_info *info,
 			__le32 *tx_flags)
 {
-	*tx_flags |= TX_CMD_FLG_RTS_CTS_MSK;
+	if ((info->control.rates[0].flags & IEEE80211_TX_RC_USE_RTS_CTS) ||
+	    (info->control.rates[0].flags & IEEE80211_TX_RC_USE_CTS_PROTECT))
+		*tx_flags |= TX_CMD_FLG_RTS_CTS_MSK;
+	else
+		*tx_flags &= ~TX_CMD_FLG_RTS_CTS_MSK;
 }
 
 /* Calc max signal level (dBm) among 3 possible receivers */
Comment 22 Daniel J Blueman 2010-08-06 10:24:25 UTC
Hi Johannes, this patch doesn't help. The debug still shows the RX_CONFIG_COMMAND as having channel 1 in it's payload, so the radio will be incorrectly listening on that channel for a reply from the AP on channel 13.
Comment 23 Johannes Berg 2010-08-06 10:35:17 UTC
Thanks for testing.

I think you mean RXON? But I'm still not convinced that is the error, since I do see it go to channel 13 for the assoc. And it will normally go to channel 1 again after trying (which isn't really necessary, but still).

BTW, your second trace data doesn't contain the association sequence, it only seems to contain scanning?
Comment 24 Johannes Berg 2010-08-06 10:37:15 UTC
Oh also, did this work on 2.6.34, if you ever tried? We occasionally seem to be running into a similar bug on 5 GHz channels after going from 2.6.34 to 2.6.35.
Comment 25 Johannes Berg 2010-08-06 13:08:24 UTC
Still poking at straws ... but I found another stupid bug in the driver, can you try this patch?


diff --git a/drivers/net/wireless/iwlwifi/iwl-power.c b/drivers/net/wireless/iwlwifi/iwl-power.c
index da1f2ae..a444d0e 100644
--- a/drivers/net/wireless/iwlwifi/iwl-power.c
+++ b/drivers/net/wireless/iwlwifi/iwl-power.c
@@ -284,13 +284,12 @@ int iwl_power_update_mode(struct iwl_priv *priv, bool force)
 		 priv->hw->conf.flags & IEEE80211_CONF_IDLE)
 		iwl_static_sleep_cmd(priv, &cmd, IWL_POWER_INDEX_5, 20);
 	else if (priv->cfg->ops->lib->tt_ops.lower_power_detection &&
-		 priv->cfg->ops->lib->tt_ops.tt_power_mode) {
-		if (priv->cfg->ops->lib->tt_ops.lower_power_detection(priv)) {
-			/* in thermal throttling low power state */
-			iwl_static_sleep_cmd(priv, &cmd,
-			    priv->cfg->ops->lib->tt_ops.tt_power_mode(priv),
-			    dtimper);
-		}
+		 priv->cfg->ops->lib->tt_ops.tt_power_mode &&
+		 priv->cfg->ops->lib->tt_ops.lower_power_detection(priv)) {
+		/* in thermal throttling low power state */
+		iwl_static_sleep_cmd(priv, &cmd,
+			priv->cfg->ops->lib->tt_ops.tt_power_mode(priv),
+			dtimper);
 	} else if (!enabled)
 		iwl_power_sleep_cam_cmd(priv, &cmd);
 	else if (priv->power_data.debug_sleep_level_override >= 0)
Comment 26 Johannes Berg 2010-08-06 13:10:11 UTC
Oh wait, I think that's not relevant to 2.6.35?
Comment 27 Daniel J Blueman 2010-08-06 14:03:10 UTC
Yes, I meant RXON. Trying 2.6.34, this does not work either.

There is no member 'lower_power_detection', so indeed I can't apply this patch against 2.6.35.

The trace and dmesg output were captured together; this suggests the MAC wasn't sent a command perhaps? Or should I capture both again?

U iwl_mac_config enter to channel 13 changed 0x100
U iwl_power_sleep_cam_cmd Sleep command for CAM
U iwl_mac_config Not re-sending same RXON configuration.
U iwl_mac_config leave
wlan0: direct probe to 00:25:d3:5e:ef:62 (try 1)
ieee80211 phy1: I iwl_mac_tx enter
ieee80211 phy1: I iwl_mac_tx leave
wlan0: direct probe to 00:25:d3:5e:ef:62 (try 2)
ieee80211 phy1: I iwl_mac_tx enter
ieee80211 phy1: I iwl_mac_tx leave
wlan0: direct probe to 00:25:d3:5e:ef:62 (try 3)
ieee80211 phy1: I iwl_mac_tx enter
ieee80211 phy1: I iwl_mac_tx leave
wlan0: direct probe to 00:25:d3:5e:ef:62 timed out
Comment 28 Aaron 2010-08-06 18:40:07 UTC
Unable to reproduce the problem consistently in the lab.

Configurations:
-----------------
Intel(R) Ultimate N WiFi Lin 5300 AGN, REV-0x24
Kernel: 2.6.35-rc6
uCode: 8.24.2.12
Platform: Calpella 
AP: CISCO 1200


Test
----------------
Repeatedly switch between two networks (one of them is on channel 12). See attached scripts. Enough wait time is given between commands.


Result
----------------

Ran for a 100 times. Only 2% of test runs failed to associate with unspecified errors. Reproducibility is very low.
Comment 29 Aaron 2010-08-06 18:44:05 UTC
Created attachment 27367 [details]
test script

Test script used to reproduce the issue. Result: cannot reproduce the issue (only 2% failures)
Comment 30 John W. Linville 2010-08-18 20:45:44 UTC
*** Bug 16619 has been marked as a duplicate of this bug. ***
Comment 31 andreas.neiser 2010-08-19 14:13:27 UTC
Created attachment 27510 [details]
trace and dmesg logs with failing and working AP setup

Hope this helps in debugging the problem!
After 4 tries the connection works if AP with Channel 3 is used, AP with channel 12 fails 4 out of 4 (so all "failing" files show failed attempts, I think). trace-cmd (see file trace.dat) was run during ./euro_loop.sh. dmesg with debug info supplied after each connection try. leading numbers under logs/ show which try of euro_loops.sh (which was modified for my needs). 
No WPA encryption was used. Please don't hesitate if you need further information.
Comment 32 wey-yi.w.guy 2010-08-19 14:34:03 UTC
Andreas, thanks for the information.
Aaron, could we re-run the test with euro_loop.sh script provided by Andreas?

Thanks
Wey
Comment 33 andreas.neiser 2010-08-19 14:51:35 UTC
@Wey,Aaron: I didn't change anything in euro_loop.sh concerning reproducing the bug. I just added some lines to collect the logs (and provided so you can see when the logs are created). Hopefully someone can analyze the trace.
Comment 34 wey-yi.w.guy 2010-08-19 16:36:49 UTC
Hi Andreas, 
looks like you have 2 APs
AP1: 00:21:29:b5:e9:7b, channel #3 
AP2: 00:04:0e:6b:80:e6, channel #12

all the success cases are connect to AP1, but never AP2. Correct? what if you change AP2 to different channel (say channel #6), and ran the same test, is that works?

Thanks
Wey
Comment 35 andreas.neiser 2010-08-19 16:49:43 UTC
Yes, exactly, I worked with 2 APs and AP1 was the working one. It's a bit tedious to setup the two APs again, do you really need that information? AFAIK it's quite obvious since when I change AP2 to channel 11 (the current setup), wireless works flawlessly (not one "probe" line in dmesg). Tell me if you need a definitive answer, then I will reconstruct the setup.
Comment 36 wey-yi.w.guy 2010-08-19 17:04:08 UTC
no problem, I just need to make sure it is the setting you have. Thanks for the help, I do not see anything stand-out in the log file, but I will try re-produce the issues here.

Wey
Comment 37 wey-yi.w.guy 2010-08-19 17:38:25 UTC
btw, what kernel version you are using?

Wey
Comment 38 Johannes Berg 2010-08-19 17:45:29 UTC
Can you please pick up this patch and see if it helps?

http://git.kernel.org/?p=linux/kernel/git/linville/wireless-2.6.git;a=patch;h=94597ab23ea10b3bdcba534be00a9f7b35791c07
Comment 39 andreas.neiser 2010-08-19 18:06:20 UTC
I'm using ArchLinux Kernel 2.6.35.2 (mostly vanilla sources), but the patch does not apply:

patching file drivers/net/wireless/iwlwifi/iwl-1000.c
Hunk #1 succeeded at 252 (offset -8 lines).
patching file drivers/net/wireless/iwlwifi/iwl-3945.c
Hunk #1 succeeded at 915 (offset 146 lines).
Hunk #2 succeeded at 2849 (offset 148 lines).
patching file drivers/net/wireless/iwlwifi/iwl-4965.c
Hunk #1 succeeded at 2178 (offset -45 lines).
patching file drivers/net/wireless/iwlwifi/iwl-5000.c
Hunk #1 succeeded at 489 (offset -17 lines).
Hunk #2 succeeded at 549 (offset 12 lines).
Hunk #3 succeeded at 611 (offset 14 lines).
Hunk #4 FAILED at 628.
Hunk #5 FAILED at 659.
2 out of 5 hunks FAILED -- saving rejects to file drivers/net/wireless/iwlwifi/iwl-5000.c.rej
patching file drivers/net/wireless/iwlwifi/iwl-6000.c
Hunk #1 succeeded at 431 (offset 50 lines).
Hunk #2 succeeded at 578 (offset 89 lines).
Hunk #3 succeeded at 650 (offset 87 lines).
Hunk #4 FAILED at 637.
Hunk #5 FAILED at 714.
Hunk #6 FAILED at 821.
Hunk #7 FAILED at 859.
Hunk #8 FAILED at 933.
5 out of 8 hunks FAILED -- saving rejects to file drivers/net/wireless/iwlwifi/iwl-6000.c.rej
patching file drivers/net/wireless/iwlwifi/iwl-agn-hcmd.c
Hunk #1 FAILED at 211.
Hunk #2 succeeded at 266 (offset -2 lines).
1 out of 2 hunks FAILED -- saving rejects to file drivers/net/wireless/iwlwifi/iwl-agn-hcmd.c.rej
patching file drivers/net/wireless/iwlwifi/iwl-agn-tx.c
Hunk #1 succeeded at 376 (offset -3 lines).
Hunk #2 succeeded at 450 with fuzz 2 (offset -3 lines).
patching file drivers/net/wireless/iwlwifi/iwl-agn.c
Hunk #1 succeeded at 200 (offset -2 lines).
Hunk #2 succeeded at 1577 (offset -34 lines).
Hunk #3 FAILED at 3421.
Hunk #4 FAILED at 3486.
2 out of 4 hunks FAILED -- saving rejects to file drivers/net/wireless/iwlwifi/iwl-agn.c.rej
patching file drivers/net/wireless/iwlwifi/iwl-core.c
Hunk #1 succeeded at 400 (offset -1 lines).
Hunk #2 succeeded at 1953 (offset 67 lines).
patching file drivers/net/wireless/iwlwifi/iwl-core.h
Hunk #1 succeeded at 102 (offset -2 lines).
Hunk #2 succeeded at 241 (offset -9 lines).
Hunk #3 succeeded at 310 (offset -9 lines).
Hunk #4 succeeded at 376 (offset -15 lines).
patching file drivers/net/wireless/iwlwifi/iwl3945-base.c
Hunk #1 succeeded at 434 (offset -1 lines).
Comment 40 wey-yi.w.guy 2010-08-19 19:08:16 UTC
if it's ok for you, you can get the latest build from
http://git.kernel.org/?p=linux/kernel/git/linville/wireless-testing.git;a=summary
which has the patch already.

Wey
Comment 41 Daniel J Blueman 2010-08-19 20:37:13 UTC
Previously, I tested against APs on channel 13. Re-testing on channel 12, I can connect as expected on APs on channel 12. Tested with 2.6.34.4, 2.6.35.2 and 2.6.36-rc1. Thus the title of this bug is incorrect.

Thus the issue I experience is only with connecting to APs on channel 13. Johannes, were you testing against an AP on channel 12 or 13?
Comment 42 andreas.neiser 2010-08-19 20:44:34 UTC
Well, but for me it's channel 12 which is not working. I can test tomorrow if channel 13 works for me or not.
Comment 43 Johannes Berg 2010-08-20 07:40:38 UTC
Yeah, it's not consistent here. It typically reproduces exactly once in the morning for me, and after that starts working. I have no idea why :-)

I'll provide a patch against 2.6.35 since it needs to go there anyway.
Comment 44 andreas.neiser 2010-08-20 07:59:10 UTC
@Johannes: Maybe this helps in reproducing the bug:
Well, if I was connected to the WLAN successfully on channel 11, then change the AP to channel 12/13 and then reconnect, it works, but not flawlessly:

ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
wlan0: no IPv6 routers present
wlan0: deauthenticated from 00:04:0e:6b:80:e6 (Reason: 1)
iwlagn 0000:03:00.0: ACTIVATE a non DRIVER active station id 0 addr 00:04:0e:6b:80:e6
cfg80211: Calling CRDA to update world regulatory domain
wlan0: authenticate with 00:04:0e:6b:80:e6 (try 1)
wlan0: authenticate with 00:04:0e:6b:80:e6 (try 2)
wlan0: authenticate with 00:04:0e:6b:80:e6 (try 3)
wlan0: authentication with 00:04:0e:6b:80:e6 timed out
wlan0: direct probe to 00:04:0e:6b:80:e6 (try 1)
wlan0: direct probe to 00:04:0e:6b:80:e6 (try 2)
wlan0: direct probe responded
wlan0: authenticate with 00:04:0e:6b:80:e6 (try 1)
wlan0: authenticated
wlan0: associate with 00:04:0e:6b:80:e6 (try 1)
wlan0: associate with 00:04:0e:6b:80:e6 (try 2)
wlan0: RX AssocResp from 00:04:0e:6b:80:e6 (capab=0x451 status=0 aid=1)
wlan0: associated

You see that there are some "tries"...

BUT: If you do `rmmod iwlagn && modprobe iwlagn` and try to reconnect on channel 12 or 13 (@Daniel: both channels don't work for me), it fails constantly. After the failing tries on ch12/13, changing the AP to ch11 brings the wireless back working by just reconnecting (no reload of iwlagn needed).

It's good that you prepare a working patch against 2.6.35, I downloaded ~300 MB of wireless-testing and I don't know how to handle it. Is there a good howto how to compile just iwlagn and it's dependencies?
Comment 45 Johannes Berg 2010-08-20 08:26:38 UTC
Andreas, check out http://wireless.kernel.org/en/users/Download. Yesterday's release should work fine.
Comment 46 Johannes Berg 2010-08-20 08:46:17 UTC
Created attachment 27518 [details]
patch for 2.6.35.2

Please try this patch on 2.6.35.2.
Comment 47 andreas.neiser 2010-08-20 11:19:22 UTC
@Johannes: Well, now with your patch the channel 12 works (YES!), but when I switch to ch13, the WLAN is not discovered (`sudo iw wlan0 scan` doesn't show it). When I switch back from ch13 to ch12, I have to reload the module in order to see my ssid again!

Before the patch, the WLAN appeared on both channels (12&13) but a connection timed out. Do you need some debug info?
Comment 48 Johannes Berg 2010-08-20 11:35:40 UTC
Uh oh. But this doesn't change scanning, how often did you try to scan? Just once? Scanning on channels 12/13 isn't as reliable as on other channels since they're scanned passively.
Comment 49 Johannes Berg 2010-08-20 12:10:36 UTC
Also, it may be worthwhile to actually turn off your AP or your wireless for more than 15 seconds when you switch channel from 12 to 13 or vice versa, because otherwise I'm not sure what happens with the cached scan entries.
Comment 50 andreas.neiser 2010-08-21 22:36:15 UTC
Well, I tried the scan several times, and I'm quite convinced of the fact that ch13 does not even show up (as described). Unfortunately, I don't have time to concentrate on this problem further for now (in three weeks things look different, if the problem still persists;) since switching to channel 11 solves the problem for me (at least temporarily). I hope you can figure out what made the regression, there seems to be a lot changes between .34 and .35 concerning iwlagn... I wished I was a better kernel hacker, it's really interesting to dig into... but now other priorities came up. If there's something which could be really important to test, don't hesitate, maybe I find some time nevertheless.
Comment 51 Daniel J Blueman 2010-09-01 17:59:46 UTC
Excellent - retesting on channel 13 on 2.6.36-rc3 shows the issue is resolved.

Fantastic work Johannes et al!

Daniel
Comment 52 Daniel J Blueman 2010-09-01 18:02:15 UTC
Excellent - retesting on channel 13 on 2.6.36-rc3 shows the issue is resolved.

Fantastic work Johannes et al!

Daniel
Comment 53 Rafael J. Wysocki 2010-09-12 18:57:04 UTC
Fixed in mainline by commit 94597ab23ea10b3bdcba534be00a9f7b35791c07 .
Comment 54 Daniel J Blueman 2010-09-13 12:46:04 UTC
I spoke too soon, and thought it did work once previously once, I have not been able to connect other times. The behaviour is the same with -rc4.

Again, changing the AP channel to 12, I can connect immediately; moving back to channel 13, I cannot connect.
Comment 55 Johannes Berg 2010-09-13 13:34:16 UTC
Can you clarify how long you waited between switching the AP channel and trying again, and if maybe it works with a module reload?
Comment 56 Daniel J Blueman 2010-09-13 13:56:14 UTC
Hi Johannes - I went further to include powering off the client to ensure there was no state after the AP was set to channel 13. When the client could not connect, I verified my Android handset could.

If there is any further technical/debug etc information needed (including from the AP), just let me know and thanks for your help! Daniel
Comment 57 Rafael J. Wysocki 2010-09-20 21:10:44 UTC
On Monday, September 20, 2010, Daniel J Blueman wrote:
> 
> On 20 September 2010 20:57, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.34 and 2.6.35.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.34 and 2.6.35.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=16462
> > Subject         : unable to connect to AP on legal channel 13
> > Submitter       : Daniel J Blueman <daniel.blueman@gmail.com>
> > Date            : 2010-07-25 17:06 (58 days old)
> 
> I have verified this issue still exists in 2.6.36-rc4, alas.
Comment 58 Daniel J Blueman 2010-09-27 22:28:26 UTC
The issue still occurs on 2.6.36-rc5.
Comment 59 Johannes Berg 2010-09-28 07:27:54 UTC
Thanks, and sorry about the delay. I need to find time to reproduce this again, but it's been really busy for a while now. Still strange that you can use 12 but not 13 :-(
Comment 60 Johannes Berg 2010-09-28 15:53:22 UTC
Darn, I can't reproduce this, it associates just fine on channel 13:


$ iw wlan0 scan dump
BSS 00:24:01:38:ac:0a (on wlan0) -- associated
	TSF: 2576791452 usec (0d, 00:42:56)
	freq: 2472
	beacon interval: 100
	capability: ESS Privacy ShortPreamble ShortSlotTime (0x0431)
	signal: -22.00 dBm
	last seen: 301460 ms ago
	SSID: jo2ghz
	Supported rates: 1.0* 2.0* 5.5* 11.0* 
	DS Parameter set: channel 13
	ERP: <no flags>
	Extended supported rates: 6.0 9.0 12.0 18.0 24.0 36.0 48.0 54.0 
	RSN:	 * Version: 1
		 * Group cipher: TKIP
		 * Pairwise ciphers: TKIP CCMP
		 * Authentication suites: PSK
		 * Capabilities: (0x0000)
	WPA:	 * Version: 1
		 * Group cipher: TKIP
		 * Pairwise ciphers: TKIP CCMP
		 * Authentication suites: PSK
	WMM:	 * Parameter version 1
		 * BE: CW 15-1023, AIFSN 3
		 * BK: CW 15-1023, AIFSN 7
		 * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
		 * VO: CW 3-7, AIFSN 2, TXOP 1504 usec
Comment 61 Johannes Berg 2010-09-28 15:54:13 UTC
You do see the AP in your scan results, right? Can you paste the output of "iw wlan0 scan dump" for the correct AP, like I did above?
Comment 62 Daniel J Blueman 2010-09-28 17:16:55 UTC
Bingo. With ignore_broadcast_ssid=1 or 2 in hostapd.conf on the AP, I can't connect when the AP is on channel 13.

When I advertise the AP with ignore_broadcast_ssid=0, I connect as expected. This would explain the behaviour I saw when I though it was fixed (since I was fiddling with the configuration).

Testing multiple times, this does relate to the SSID being zerod or the SSID length being 0.

With ignore_broadcast_ssid=0 (can connect):

$ iw wlan0 scan dump
BSS 00:25:d3:5e:ef:62 (on wlan0)
	TSF: 91033984 usec (0d, 00:01:31)
	freq: 2472
	beacon interval: 100
	capability: ESS Privacy ShortPreamble ShortSlotTime (0x0431)
	signal: -34.00 dBm
	last seen: 4970 ms ago
	SSID: An error has occurred
	Supported rates: 1.0* 2.0* 5.5* 11.0* 6.0 9.0 12.0 18.0 
	DS Parameter set: channel 13
	ERP: <no flags>
	Extended supported rates: 24.0 36.0 48.0 54.0 
	RSN:	 * Version: 1
		 * Group cipher: CCMP
		 * Pairwise ciphers: CCMP
		 * Authentication suites: PSK PSK/SHA-256
		 * Capabilities: MFP-capable (0x0080)
		 * bogus tail data (6): 00 00 00 0f ac 06
	HT capabilities:
		Capabilities: 0x11ce
			HT20/HT40
			SM Power Save disabled
			RX HT40 SGI
			TX STBC
			RX STBC 1-stream
			Max AMSDU length: 7935 bytes
			DSSS/CCK HT40
		Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
		Minimum RX AMPDU time spacing: No restriction (0x00)
		HT RX MCS rate indexes supported: 0-15
		HT TX MCS rate indexes are undefined

With ignore_broadcast_ssid=1 (can't connect):

$ iw wlan0 scan dump
BSS 00:25:d3:5e:ef:62 (on wlan0)
	TSF: 10547584 usec (0d, 00:00:10)
	freq: 2472
	beacon interval: 100
	capability: ESS Privacy ShortPreamble ShortSlotTime (0x0431)
	signal: -35.00 dBm
	last seen: 640 ms ago
	SSID: 
	Supported rates: 1.0* 2.0* 5.5* 11.0* 6.0 9.0 12.0 18.0 
	DS Parameter set: channel 13
	ERP: <no flags>
	Extended supported rates: 24.0 36.0 48.0 54.0 
	RSN:	 * Version: 1
		 * Group cipher: CCMP
		 * Pairwise ciphers: CCMP
		 * Authentication suites: PSK PSK/SHA-256
		 * Capabilities: MFP-capable (0x0080)
		 * bogus tail data (6): 00 00 00 0f ac 06
	HT capabilities:
		Capabilities: 0x11ce
			HT20/HT40
			SM Power Save disabled
			RX HT40 SGI
			TX STBC
			RX STBC 1-stream
			Max AMSDU length: 7935 bytes
			DSSS/CCK HT40
		Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
		Minimum RX AMPDU time spacing: No restriction (0x00)
		HT RX MCS rate indexes supported: 0-15
		HT TX MCS rate indexes are undefined

With ignore_broadcast_ssid=2 (can't connect):

$ iw wlan0 scan dump
BSS 00:25:d3:5e:ef:62 (on wlan0)
	TSF: 66355583 usec (0d, 00:01:06)
	freq: 2472
	beacon interval: 100
	capability: ESS Privacy ShortPreamble ShortSlotTime (0x0431)
	signal: -32.00 dBm
	last seen: 8180 ms ago
	SSID: \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
	Supported rates: 1.0* 2.0* 5.5* 11.0* 6.0 9.0 12.0 18.0 
	DS Parameter set: channel 13
	ERP: <no flags>
	Extended supported rates: 24.0 36.0 48.0 54.0 
	RSN:	 * Version: 1
		 * Group cipher: CCMP
		 * Pairwise ciphers: CCMP
		 * Authentication suites: PSK PSK/SHA-256
		 * Capabilities: MFP-capable (0x0080)
		 * bogus tail data (6): 00 00 00 0f ac 06
	HT capabilities:
		Capabilities: 0x11ce
			HT20/HT40
			SM Power Save disabled
			RX HT40 SGI
			TX STBC
			RX STBC 1-stream
			Max AMSDU length: 7935 bytes
			DSSS/CCK HT40
		Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
		Minimum RX AMPDU time spacing: No restriction (0x00)
		HT RX MCS rate indexes supported: 0-15
		HT TX MCS rate indexes are undefined

I hope that helps to reproduce it. Thanks, Daniel
Comment 63 Daniel J Blueman 2010-09-28 17:18:40 UTC
Created attachment 31702 [details]
failing AP config for ath9k
Comment 64 Johannes Berg 2010-09-28 17:23:30 UTC
Ah, hidden SSID... that could be an issue, I'm surprised it works on channel 12! But I'll try to reproduce this.
Comment 65 Johannes Berg 2010-09-28 17:29:00 UTC
Ok, it now reproduces here! Let's see channel 12...
Comment 66 Johannes Berg 2010-09-28 17:31:07 UTC
Same on channel 12 though, but oh well :-)

I'll have to see if our scan is set up correctly.
Comment 67 Johannes Berg 2010-09-28 19:24:29 UTC
So the problem clearly is that you're on a channel that we mark passive, and we don't actively scan there, but connecting to a hidden SSID requires active scanning ...

We try to actually be better in iwlwifi and send an active probe when we receive a beacon, but either there's something wrong with that logic (I'll investigate) or it's just not reliable enough because of timing issues etc.

In my test, it connected after a long time of scanning (maybe 10 minutes later), which would point to a timing issue since there's no way it can connect if it never sent probe requests.

I'm not sure what to do about this. I can try to see if there's anything about the passive-to-active logic we can do, but for a passive channel you can't really expect hidden SSIDs to work well.
Comment 68 Johannes Berg 2010-09-28 19:57:42 UTC
Try this ...


--- linux-2.6.orig/drivers/net/wireless/iwlwifi/iwl-agn-lib.c	2010-09-28 21:56:19.000000000 +0200
+++ linux-2.6/drivers/net/wireless/iwlwifi/iwl-agn-lib.c	2010-09-28 21:56:31.000000000 +0200
@@ -1290,7 +1290,7 @@ void iwlagn_request_scan(struct iwl_priv
 			rate = IWL_RATE_1M_PLCP;
 			rate_flags = RATE_MCS_CCK_MSK;
 		}
-		scan->good_CRC_th = IWL_GOOD_CRC_TH_DISABLED;
+		scan->good_CRC_th = IWL_GOOD_CRC_TH_DEFAULT;
 		break;
 	case IEEE80211_BAND_5GHZ:
 		rate = IWL_RATE_6M_PLCP;
Comment 69 Daniel J Blueman 2010-09-28 22:17:50 UTC
Fantastic - this works consistently across the matrix of all unhidden, hidden #1, hidden #2 and passive and initiated connections, on channel 12 and 13.

Thank you very much Johannes!

It's probably a good idea to CC stable@kernel.org on your patch too, so distros releasing with 2.6.35.x will get this fix.
Comment 70 Johannes Berg 2010-09-29 07:17:27 UTC
Yeah, but wanted to test it out first -- helped here but I wasn't sure if it'd help generally :-)

I'll go submit the fix with your reported-by and tested-by, thanks!
Comment 71 Johannes Berg 2010-09-29 07:27:57 UTC
Hmmm.

I can't see how this can be a regression. I just traced the code all the way back to the original code drop of iwlwifi (b481de9c) and the value has always been zero.

Or maybe I'm confused and this particular thing wasn't even a regression...

Anyway, we'll need to test this patch more carefully, since the active-after-passive thing is more intended to be used for 5 GHz scanning.
Comment 72 Johannes Berg 2010-09-29 08:24:56 UTC
Created attachment 31862 [details]
patch to enable probe-after-rx on 2.4 GHz

Daniel, can you try this patch please? I hope it applies. It should be equivalent to your other patch, but I want to make totally sure before I add your tested-by tag.
Comment 73 Daniel J Blueman 2010-09-29 10:22:30 UTC
Great - the new patch is tested and works as expected on 2.6.36-rc6. Thanks, Daniel
Comment 74 Rafael J. Wysocki 2010-10-03 21:20:48 UTC
Patch : https://bugzilla.kernel.org/attachment.cgi?id=31862
Handled-By : Johannes Berg <johannes.berg@intel.com>
Comment 75 Rafael J. Wysocki 2010-10-11 20:07:40 UTC
On Monday, October 11, 2010, Berg, Johannes wrote:
> Rafael,
> 
> I believe that we have fixed the part of this bug that was a regression,
> namely being unable to connect to a non-hidden SSID, in 2.6.35 commit
> 3ee378d7e0dfdb9bb. Being able to connect to a hidden SSID I believe isn't a
> regression, and a patch for that is on its way upstream for 2.6.37. I think
> you should stop tracking this as a regression.

Note You need to log in before you can comment on or make changes to this bug.