Bug 194713 - iwlwifi: 3160: client can't associate to the device when running to AP mode - WIFILNX-675
Summary: iwlwifi: 3160: client can't associate to the device when running to AP mode -...
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-26 16:47 UTC by Jarek Kamiński
Modified: 2017-03-05 17:44 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.9
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
tentative fix (873 bytes, patch)
2017-02-27 08:37 UTC, Johannes Berg
Details | Diff
crashing firmware dmesg (143.52 KB, text/plain)
2017-02-28 17:53 UTC, Jarek Kamiński
Details

Description Jarek Kamiński 2017-02-26 16:47:05 UTC
Hello,

I have an accesspoint running on Intel 3160 (ZBOX RI-323). It was running just fine on 3.16 from Debian, but I recently tried upgrading it to 4.9 and noticed strange issues with wireless clients.

With the recent versions of iwlwifi, when a client disconnects, it is not noticed by the accesspoint. The following log comes from backport-iwlwifi master (7ae529a404a30fdb78d7dcc1a7e5883a014c6552) on 3.16 (similar results on 4.9 as well):
#v+
lut 21 22:12:04 Wintermute hostapd[2419]: wlan0: STA xx:xx:xx:xx:xx:xx RADIUS: VLAN ID 2
lut 21 22:12:04 Wintermute hostapd[2419]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
lut 21 22:12:04 Wintermute hostapd[2419]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 3)
lut 21 22:12:04 Wintermute hostapd[2419]: wlan0: STA xx:xx:xx:xx:xx:xx RADIUS: starting accounting session 58ACAD18-00000002
lut 21 22:12:04 Wintermute hostapd[2419]: wlan0: STA xx:xx:xx:xx:xx:xx WPA: pairwise key handshake completed (RSN)
[disconnect: nothing happens]
[try to reconnect: nothing happens, client doesn't connect]
#v-
Only running iw wlan0.2 station del xx:xx:xx:xx:xx:xx helps and allows the client to connect.

To compare, on iwlwifi from Debian 3.16.39-1-amd64 everything works fine:
#v+
lut 21 22:03:46 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx RADIUS: VLAN ID 2
lut 21 22:03:46 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
lut 21 22:03:46 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 2)
lut 21 22:03:49 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx RADIUS: starting accounting session 58A9650D-00000084
lut 21 22:03:49 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx WPA: pairwise key handshake completed (RSN)

lut 21 22:04:16 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: disassociated
lut 21 22:04:16 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: disassociated
lut 21 22:04:17 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)

lut 21 22:04:24 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx RADIUS: VLAN ID 2
lut 21 22:04:24 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: authenticated
lut 21 22:04:24 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx IEEE 802.11: associated (aid 2)
lut 21 22:04:26 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx RADIUS: starting accounting session 58A9650D-00000085
lut 21 22:04:26 Wintermute hostapd[12719]: wlan0: STA xx:xx:xx:xx:xx:xx WPA: pairwise key handshake completed (RSN)
#v-

I bisected it using the backport-iwlwifi tree on 4.5 down to 4061dda503118ef5c70e0afe77798c94a48a79b9 (iwlwifi: mvm: pass station to mac80211 RX where known, [1]). I was using the following test procedure:
1) Build, load, start hostapd.
2) Connect with an Android mobile phone (which I had at hand).
3) Disable Wi-Fi on Android.
4) Reenable Wi-Fi. Connected -> git bisect good, didn't connect -> git bisect bad.
I was using 4.5 because that was the lowest version against which all intermediate commits were compiling without errors.

To verify, I patched the backports-iwlwifi master with the following patch (on 3.16) and the Android phone started reconnecting.
#v+
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rx.c b/drivers/net/wireless/intel/iwlwifi/mvm/rx.c
index b02ad68..70edf42 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rx.c
@@ -144,7 +144,7 @@ static void iwl_mvm_pass_packet_to_mac80211(struct iwl_mvm *mvm,
 				fraglen, rxb->truesize);
 	}
 
-	ieee80211_rx_napi(mvm->hw, sta, skb, napi);
+	ieee80211_rx_napi(mvm->hw, NULL, skb, napi);
 }
 
 /*
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
index 66d237c..93c2542 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
@@ -209,7 +209,7 @@ static void iwl_mvm_pass_packet_to_mac80211(struct iwl_mvm *mvm,
 	if (iwl_mvm_check_pn(mvm, skb, queue, sta))
 		kfree_skb(skb);
 	else
-		ieee80211_rx_napi(mvm->hw, sta, skb, napi);
+		ieee80211_rx_napi(mvm->hw, NULL, skb, napi);
 }
 
 static void iwl_mvm_get_signal_strength(struct iwl_mvm *mvm,
#v-
I assume that's not a correct fix, but maybe it can help to pinpoint the root cause.

I'm running hostapd in a fairly normal configuration. One notable thing is dynamic_vlan=1 with accept_mac_file and individual PSKs set in wpa_psk_file.

What might be the cause? Is there anything I can do to help debug or fix this?

Thanks in advance!

[1] http://git.kernel.org/cgit/linux/kernel/git/iwlwifi/backport-iwlwifi.git/commit/?id=4061dda503118ef5c70e0afe77798c94a48a79b9
Comment 1 Johannes Berg 2017-02-27 08:37:13 UTC
Created attachment 254955 [details]
tentative fix

The dynamic_vlan=1 hint proved to be crucial, thanks for that.

Here's a tentative fix, can you please try it?
Comment 2 Jarek Kamiński 2017-02-27 22:19:58 UTC
Thank you for so quick response!

I confirm the patch fixed the issue I reported. I let it running for a couple of hours and encountered another issue though, possibly unrelated. The connection dropped with errors like:
#v+
Feb 27 22:18:40 Wintermute kernel: iwlwifi 0000:01:00.0: RX BA Session failed stopping, status 0x8
Feb 27 22:18:40 Wintermute kernel: wlan0.2: HW problem - can not stop rx aggregation for xx:xx:xx:xx:xx:xx tid 5
Feb 27 22:18:40 Wintermute kernel: iwlwifi 0000:01:00.0: RX BA Session failed stopping, status 0x8
Feb 27 22:18:40 Wintermute kernel: wlan0.2: HW problem - can not stop rx aggregation for xx:xx:xx:xx:xx:xx tid 6
Feb 27 22:18:40 Wintermute kernel: iwlwifi 0000:01:00.0: Couldn't drain frames for staid 4
#v-

All stations dropped eventually (iw station dump showing nothing) and weren't able to reconnect. Also, there are microcode SW errors every now and then:
#v+
Feb 27 23:09:06 Wintermute kernel: iwlwifi 0000:01:00.0: Microcode SW error detected.  Restarting 0x2000000.
Feb 27 23:09:06 Wintermute kernel: iwlwifi 0000:01:00.0: Start IWL Error Log Dump:
Feb 27 23:09:06 Wintermute kernel: iwlwifi 0000:01:00.0: Status: 0x00000000, count: 6
Feb 27 23:09:06 Wintermute kernel: iwlwifi 0000:01:00.0: Loaded firmware version: 17.459231.0
#v-
(I tried firmware 17.352738.0 as well)

I'm not sure if the above might be related or not. I'll try to investigate it further on Friday.
Comment 3 Emmanuel Grumbach 2017-02-27 22:51:19 UTC
Please attach the dmesg output with the firmware crashes.

Thank you.
Comment 4 Jarek Kamiński 2017-02-28 17:53:18 UTC
Created attachment 254991 [details]
crashing firmware dmesg

Please find the dmesg attached.
Comment 5 Emmanuel Grumbach 2017-02-28 18:07:55 UTC
OK, please open a new bug with this dmesg output.

Note You need to log in before you can comment on or make changes to this bug.