Bug 92451
Summary: | iwlwifi: 7260: WiFi connected but no data transferred after rekeying - MWG100235318 | ||
---|---|---|---|
Product: | Drivers | Reporter: | alexander |
Component: | network-wireless | Assignee: | drivers_network-wireless (drivers_network-wireless) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | ilw, linville |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
URL: | http://www.spinics.net/lists/linux-wireless/msg136515.html | ||
Kernel Version: | 3.18.5-gentoo | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Wifi capture from monitor station
iwlwifi client log router log iwlwifi trace during the problem capture showing packets are recived and decrypted but then lost workaround Capture showing old IV for new key |
Description
alexander
2015-02-01 16:02:39 UTC
Created attachment 165531 [details]
iwlwifi client log
Created attachment 165541 [details]
router log
Created attachment 165551 [details]
iwlwifi trace during the problem
what security are you using? Are you using TKIP by chance? You seem to have security problems. Can you try without security? First, the Wlan router is supporting CCMP only and the wpa_supplicant client log messages are confirming that. But that's probably not very interesing any more: I did run quite some tests around security with your tip and it looks like this is indeed the problem and unrelated to iwlwifi: - I did not disable security, but using WPA2-PSK instead of EAP-TLS fixes the problem - The problem always happens nearly exactly 1h (deriviation is often around 5s, worst was 25s) after connecting to the AP. But only when some download job was running. (Tested a nearly idle connection for 70min, no connection problem.) - I tested iwlwifi options swcrypto=1 and for no good reason also (seperatly) wd_disable=0. Both options change nothing, the problem is still reproduceable with these. - I can reproduce the EAP connection problem using a kubuntu 14.10 boot CD on the same hardware AND on other notebook using the iwl3945 driver (Intel Pro Wireless 3945) instead of iwlwifi So it looks like using EAP-TLS is causing the issue. I now think the problem is in wpa_supplicant, probably in the Openwrt 14.07 wpad binary. I'll run some further tests, but from a kernel point of view this bug can probably be marked as invalid and be closed. It seems a bit unlikely to me that the driver/kernel can do anything special for EAP-TLS (compared to PSK) which would cause problems for iwlwifi and iwl3945... Ok. I just saw that something seemed wrong from a security perspective. I close the bug. Please reopen it if you have data that points to the driver. Thanks I'm now very sure this is a kernel issue and reopen this ticket. I can reproduce the issue with the following cards assume that it is a generic problem in the mac80211 or related code. These are the network cards I tested and know they are affected: Intel Corporation Centrino Ultimate-N 6300 (rev 3e) Intel Corporation Wireless 7260 (rev 83) Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02) All of the above are using the mac80211 Softmac from the kernel. I could not reproduce the issue with an USB Wlan adapter: 0bda:8172 Realtek Semiconductor Corp. RTL8191SU 802.11n WLAN Adapter This usb card is using the the driver r8712u from staging with an own softmac implementation. It also has connection problems after a re-key, but recovers within a few seconds. Here I should also mention that Windows7 is having the issue with my test setup. Win7 tests ware run not on the main station but on a system with the Intel Wireless 7260 card. But Using Ubuntu on the same system reproduced the known problems to 100%. (I'm primary debugging this on the system with the Ultimate-N 6300 cards, the other ones were just hit with some quick test.) Here the summary of what I have found out so far: The issue is unrelated to EAP-TLS, the real problem are the eapol re-keys. When the 4-way-handshake is occurring while the wlan is "loaded" the rx path from a "ethernet" point of view stops while tx continues to work. A "ping -f 172.16.1.1 -s 1472" has a good chance to trigger the issue, but I'm using "ping -f 172.16.1.1 -s 10000" or tcp transfers in most tests. The communication is reestablished after the next rekey. When the ping is still running the load is back and the next rekey will fail again. And so on, without any exceptions. (With default rekey values hardly a user will wait. Disconnecting and reconnecting also fixes the issue in this case.) The interface RX counters of the wlan interface is claiming, that no packets are received any longer. But the interesting point here is, that a monitor interface running on the station sees the missing packets. They are even correctly decrypted! I've attached you a filtered capture, see below for details how this was created. A remote monitor station can also decrypt the packets in both direction and confirms, that the communication over the air is fine. (If you want to test that you need a patch for wireshark to support rekeys. It's currently for review in the wireshark gerrit.) Here a manual how you should be able to reproduce the issue: Most important of course: You need an AP with WPA-PSK encryption doing frequent rekeys. CCMP or TKIP can both be used, I prefer CCMP. If you have a AP running linux you can force the rekeys by setting "wpa_ptk_rekey=120" for wpa_supplicant to rekey all 2min. For my openwrt router I do it with this command: "uci set wireless.@wifi-iface[2].wpa_pair_rekey=120; wifi" Enable wpa_supplicant debugging on the client, so you can see if and when the rekey occures. Again here the command I'm using: dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 /fi/w1/wpa_supplicant1 org.freedesktop.DBus.Properties.Set string:fi.w1.wpa_supplicant1 string:DebugLevel variant:string:"msgdump" If you also want to see the real keys - makes probably only sense if you need the keys for wireshark - you can also run: dbus-send --system --print-reply --dest=fi.w1.wpa_supplicant1 /fi/w1/wpa_supplicant1 org.freedesktop.DBus.Properties.Set string:fi.w1.wpa_supplicant1 string:DebugShowKeys variant:boolean:true Now start a flood ping to a local system with fast connectivity ( Data rates ~1MB/s should trigger the issue and have seen it also with half of the load.) As soon as wpa_supplicant logs a rekey your connection should freeze and a bit later the arp times out, Till the next rekey, when it will recover. (You can e.g. look for the text: "State: COMPLETED -> 4WAY_HANDSHAKE" in your log to find the rekeys.) I've captured how this looks on my system. Additional to the above test setup I've also added a sub-interface on the station to capture packets in monitor mode. This is working a bit strange on the card, it looks like I have connect to a wlan, try the commands, disconnect again and try again. (Still fighting each time again, needs more testing) iw phy phy0 interface add mon0 type monitor flags none iw dev mon0 set monitor none # Verify, if not in monitor mode keep playing around. iw mon0 info Once we are in monitor mode download some data or use the flood ping again. Since in tcp we can see the error much better I prefer tcp connections here. I've attached you a sample pcap, captured this way. Once you set "IEEE 302.11 -> Ignore the Protection bit" to "Yes - with IV" wireshark will decode the traffic and we can look at the tcp connection. (Without decryption in wireshark, the packets are already decrypted.) And here we see the "missing" packets.... The wireshark filter is then nicely showing when the connection is lost: tcp.seq == 3591761481 or tcp.nxtseq==3591761481 or tcp.ack>=3591761481 or eapol Exactly after the last packet of the 4-way handshake. (We also seem to lose some packets around the key change, but that's probably normal) Created attachment 176151 [details]
capture showing packets are recived and decrypted but then lost
One critical word is missing above: ... Here I should also mention that Windows7 is NOT having the issue with my test setup. ... I can reproduce the issue up to 4.0.1-gentoo, which is the newest version I checked. I also had a test with the oldest kernel provided in gentoo, 3.4.105-gentoo. The problem is present on all. With some printk's I was able to track the issue down and even finally prevent the freezes. I've attached the patch "fixing" the issue for me, but this is probably not the correct fix. The packets are dropped in the function "ieee80211_crypto_ccmp_decrypt", at the lines I commented out in the patch. It looks indeed like a fundamental problem in mac80211. Here my current understanding of the issue: It looks like this is happening: * we rekey and are changing the used encryption key with "ieee80211_key_replace" * this correctly replaces the key and we start with a fresh counter "rx_pn" for the new key * But the rx handler still gets packets encrypted with the old key after the key change is completed (we just send out the 4th packet of the eapol handshake and the router still has to switch over to the new key, I assume) * the commented out code accepts that, since the old counter is >>0 * but some lines below we set rx_pn" for the new key according to the value packet (WRONG!!) * some dozen packets later we finally get packets encrypted with the new key, and with "rx_pn" values belonging to the new key * now the commented out code drops the packets, since we set the "rx_pn" for the new key according to the values of old key packets With the check disabled my system now rekeys quite well. But this makes handling of packet numbers probably completely useless. Created attachment 176911 [details]
workaround
Yes :) What you are doing is simple to disable the replay attack detection. Not good :) But your thorough analysis will help us continue from here. Apparently, your AP is really problematic. It keeps sending lots of packets with the old key even after it has rekeyed. I'll add your observation to the internal bug I opened. Created attachment 176941 [details]
Capture showing old IV for new key
I'm was sure that my AP is changing the key immediately after the 4th packet of the handshake, so your remark let me look at an older capture (EAP-TLS) of the problem again with the new knowledge. (Captured with a third station in monitor mode.)
There is something quite curious there:
With my wireshark patch (it's in wireshark master now) I can not only follow the rekeys, but I also see the used decryption keys, both TK and PMK/PSK per packet.
The encryption key indeed changes immediately after the last packet of the handshake, but the Initialization Vector is still counting up against the old value.
So this is a bug in openwrt after all, at least if the packet counter must reset to zero again with the first packet encrypted with the new key.
I've attached export of that, using the filter "eapol.type==3 || (ip.src == 10.16.2.1 && (frame.number >= 66340) &&frame.number <=66465 )"
When you configure the following PSK's in a very recent git version of wireshark you can look at the issue:
87c6958afa7085c2d6e7ce4fbd2d22cca6996ce08ec56ac89b8810f8740b1618
5a9c725b361733feb9e1490bf530aa3c297841140ddbc82971134eefd20f7330
64040b460406ebc484b84c6f272ce0720ef04cd309a6440fa5ff163b1ab248f8
The Frame 19 is the first one using the new key. Even filtering out all retransmits with "!(wlan.fc.retry == 1)" we have at least one packet with the new key and Initialization Vector based on the old one.
Unfortunately this capture has quite some packet loss, on the "air" and also during capture. (Channel was set prior to the capture and the monitor station was maybe 50cm from the station.)
I may have to sort that out and do it again, but already the capture is clearly showing that wrong Initialization Vector's are set.
yes - it seems that your AP is using the same key index *before* and *after* the re-keying. This means that mac80211 can't differentiate between a replay attack and a jump in the PN because the AP started using the new key index. Do you have a way to contact openWRT folks? I can (re)open the ticket I have there. I'll for sure start looking into the rekey code on openwrt and poke around there a bit more.
(The NIC driver on openwrt is also using mac80211, though.)
But you are indicating above, that openwrt (hostapd) could/should use a different key index for the pairwise key when rekeying?
Now I've only started looking at WPA security to track down my connection issue and I'm as from an expert here.
That said, my current understanding is, that the pairwise key must always have key index set to zero.
IEEE802.11-2012 has this statement in "11.6.11.3 Authenticator state machine variables":
> EAPOL-Key frames with a key type of Pairwise and a nonzero key index should
> be ignored.
So I assume the key index "0" is the only way how it can be and the problem is, that we get frames encrypted with the new key but old frame counter values for a while, bumping "rx_pn" to the value used by the old key and then triggering the reply prevention code when the AP is switching over the packet counter for the new key.
If I'm right, that is bringing up an interesting point to keep in mind: Why does it work on Windows?
The reply detection should be triggered there also.
Now I did notice a short hang and with a (normal rate, one packet each second) ping during the rekey, one ping was always lost. I suspect that thix is also caused by the AP issue and that a "proper" rekey would be seamless.
Now somehow Windows recovers while it should not - at least for my understanding. So potentially Windows is only recovering due to an bug/feature in their implementation or I'm got something major wrong here.
I have started a thread to plead your cause with the experts :) I just added you to it. How can you know what key has been used to encrypt the data? Do you decrypt the data (in Wireshark) and check what key needs to be used to avoid corruption? exactly. But as mentioned you need a very recent version of wireshark for that. Any stable version will stop decrypting at the rekey and also not include the key information to identify which key was used for the decryption. You should use a version of the master branch of wireshark from March 10 or newer. You can find working binaries at the moment only here: https://www.wireshark.org/download/automated/ or have to build wireshark from source. Ok - I think it is fair to close this bug on the kernel. The scenario you are hitting was diagnosed to be a problem in openWRT running an atheros device. Anyone hitting this issue, please see the thread in the URLs above. |