Bug 12589 - Random No ProbeResp - assume out of range
Summary: Random No ProbeResp - assume out of range
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 high
Assignee: networking_wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-01-31 14:51 UTC by Jan Dvorak
Modified: 2009-05-25 12:56 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.29-rc3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Kernel configuration (23.54 KB, application/gzip)
2009-01-31 14:56 UTC, Jan Dvorak
Details
lspci on my machine (2.54 KB, text/plain)
2009-01-31 14:57 UTC, Jan Dvorak
Details

Description Jan Dvorak 2009-01-31 14:51:36 UTC
Latest working kernel version: 2.6.26
Earliest failing kernel version: 2.6.28
Distribution: Debian Sid
Hardware Environment: iwl3945 (Thinkpad R61)
Software Environment: wpa_supplicant 0.6.4-3, wireless-tools 29-1.1
Problem Description:

After some time (totally random, ranges from several minutes to several hours) following kernel message appears and I am disconnected from the AP:

[31719.537037] wlan0: No ProbeResp from current AP 00:1b:fc:57:a9:90 - assume out of range

After that, I can no longer connect and have to remove whole mac80211 stack from the kernel with:

ifdown --force wlan0
rmmod iwl3945 mac80211 lib80211 cfg80211

Then put modules back and reconnect.

I am using following wpa_supplicant options:

  key_mgmt=WPA-PSK
  proto=RSN
  pairwise=CCMP
  group=CCMP

Googling suggests that I am not alone in this and several other users are experiencing this with 2.6.27, but I can not confirm that. Some suggests increasing IEEE80211_MONITORING_INTERVAL in net/mac80211/mlme.c to more than 2HZ, which I am going to try right after this report.


Steps to reproduce:

Connect to an AP with wpa_supplicant configuration above and wait. I am not able to test with an open wifi in a near future.
Comment 1 Jan Dvorak 2009-01-31 14:56:44 UTC
Created attachment 20051 [details]
Kernel configuration

Attached kernel configuration. I built the kernel with make-kpkg.
Comment 2 Jan Dvorak 2009-01-31 14:57:40 UTC
Created attachment 20052 [details]
lspci on my machine

Attached lspci output to exact hardware configuration.
Comment 3 Jan Dvorak 2009-02-01 02:04:15 UTC
> Some suggests
> increasing IEEE80211_MONITORING_INTERVAL in net/mac80211/mlme.c to more than
> 2HZ, which I am going to try right after this report.

Dit not help.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/200500 suggests that this bug is not iwl3945 specific and probably present in kernel I've reported as "Last working".
Comment 4 Jan Dvorak 2009-02-01 02:38:06 UTC
http://lkml.indiana.edu/hypermail/linux/kernel/0807.0/1712.html seems to describe the very same problem yet the only solution he suggests is to disable the problematic section.

By the way, this is my `iwlist wlan0 scan` output:

          Cell 01 - Address: 00:1B:FC:57:A9:90
                    ESSID:""
                    Mode:Master
                    Channel:1
                    Frequency:2.412 GHz (Channel 1)
                    Quality=66/100  Signal level:-67 dBm  Noise level=-127 dBm
                    Encryption key:on
                    IE: Unknown: 0003000000
                    IE: Unknown: 010C82848B8C12969824B048606C
                    IE: Unknown: 030101
                    IE: Unknown: 050400010000
                    IE: Unknown: 2A0100
                    IE: Unknown: 2F0100
                    IE: IEEE 802.11i/WPA2 Version 1
                        Group Cipher : CCMP
                        Pairwise Ciphers (1) : CCMP
                        Authentication Suites (1) : PSK
                    IE: Unknown: DD060010180200F0
                    Bit Rates:1 Mb/s; 2 Mb/s; 5.5 Mb/s; 6 Mb/s; 9 Mb/s
                              11 Mb/s; 12 Mb/s; 18 Mb/s; 24 Mb/s; 36 Mb/s
                              48 Mb/s; 54 Mb/s
                    Extra:tsf=0000031e2722e188
                    Extra: Last beacon: 72ms ago

and this is `iwconfig wlan0`:

wlan0     IEEE 802.11abg  ESSID:"PJK"  
          Mode:Managed  Frequency:2.412 GHz  Access Point: 00:1B:FC:57:A9:90   
          Bit Rate=54 Mb/s   Tx-Power=15 dBm   
          Retry min limit:7   RTS thr:off   Fragment thr=2352 B   
          Encryption key:XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX [2]   Security mode:open
          Power Management:off
          Link Quality=74/100  Signal level:-60 dBm  Noise level=-127 dBm
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

(the later after fresh reconnect)
Comment 5 John W. Linville 2009-02-03 11:53:11 UTC
We have seen some drivers filtering beacons that triggered this sort of problem, but I don't recall iwl3945 being one of them.  Anyway, we probably should react better to this just in case the AP really has moved out of range...

I'll Cc Johannes and Reinette in case they have other/better insight.
Comment 6 Johannes Berg 2009-02-04 13:58:51 UTC
We're a little over-zealous here, if we send a probe request and never get an answer we simply assume the AP is out of range. We don't try again or anything.

Kalle's beacon filtering work will clean up some parts of this and potentially remove this code entirely.
Comment 7 Reinette Chatre 2009-03-27 21:37:30 UTC
The beacon filtering work mentioned by Johannes has been merged into wireless-testing. Could you please test with this new code?
Comment 8 Johannes Berg 2009-03-27 21:39:38 UTC
Doesn't actually help -- I see this bug on my system too now (only with hidden SSIDs!) and the new code will say "beacon loss detected" and then behave very similarly.
Comment 9 Jan Dvorak 2009-03-29 20:37:22 UTC
I'm cloning wireless-testing 8f2487d3f1b445e20aebba2cb7b20f1896b94f6f right now and will test with both visible and hidden SSID after.
Comment 10 Sanjoy Mahajan 2009-04-04 17:11:48 UTC
I see the same problem using iwl3945 (Thinkpad T60 with Intel wireless) using vanilla 2.6.29.  Sometimes every 10 minutes, sometimes once every 2 hours or so, there's a message like the following in the syslog or the dmesgs:

wlan0: No ProbeResp from current AP 00:18:3a:95:d2:72 - assume out of range

and then the wireless connection gets dropped.  My router is an open router (no encryption, has a public SSID).

At first I thought maybe the router got moved behind a brick wall but moving the router even nearer (at maximum it's only 20 feet away) made no difference.

I don't see the problem with 2.6.27.4 (haven't tried any kernel in between).
Comment 11 Sanjoy Mahajan 2009-04-04 19:20:40 UTC
Correction: I just saw the problem occur with 2.6.27.4.  So it's not
strictly a regression for me, but it happens far more often on 2.6.29
than on 2.6.27.4
Comment 12 Jan Dvorak 2009-04-06 23:27:28 UTC
I apologize for such a late response, but I found myself unexpectedly busy later. I see the same problem on wireless-testing (as of 8f2487d3f1b445e20aebba2cb7b20f1896b94f6f).

I see some "beacon loss from AP 00:..." and then "no probe response from AP 00:..." followed by "deauthenticated (Reason: 7)" or sometimes "Reason: 3" in like 10 seconds after. Is there anything else to try?
Comment 13 Jan Dvorak 2009-04-06 23:28:44 UTC
To be exact:

[ 6035.013039] wlan0: beacon loss from AP 00:1b:fc:57:a9:90 - sending probe request
[ 6037.013044] wlan0: no probe response from AP 00:1b:fc:57:a9:90 - disassociating
[ 6047.829310] wlan0: deauthenticated (Reason: 7)

And this happens both with hidden and visible SSID.
Comment 14 Reinette Chatre 2009-04-17 23:05:09 UTC
This could be similar to the issue discussed in http://marc.info/?l=linux-wireless&m=123983768518133&w=2 - could you please try to increase the value of IEEE80211_MONITORING_INTERVAL found in net/mac80211/mlme.c ? One user reported that 10 seconds worked. Could you please try that also?

#define IEEE80211_MONITORING_INTERVAL (10 * HZ)
Comment 15 Reinette Chatre 2009-04-24 21:14:07 UTC
There is now a patch in wireless-testing that addresses the problem. Could you please try with:

commit 3b6dc5a431e4fef35717cba53544a95209f49b68
Author: Kalle Valo <kalle.valo@iki.fi>
Date:   Sun Apr 19 08:47:19 2009 +0300

    mac80211: fix beacon loss detection after scan
    
    Currently beacon loss detection triggers after a scan. A probe request
    is sent and a message like this is printed to the log:
    
    wlan0: beacon loss from AP 00:12:17:e7:98:de - sending probe request
    
    But in fact there is no beacon loss, the beacons are just not received
    because of the ongoing scan. Fix it by updating last_beacon after
    the scan has finished.
    
    Reported-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
    Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
    Acked-by: Johannes Berg <johannes@sipsolutions.net>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
Comment 16 Sanjoy Mahajan 2009-04-25 00:45:25 UTC
> Currently beacon loss detection triggers after a scan. A probe request
> is sent and a message like this is printed to the log:

No scans happen on my laptop (I don't run Neetwork Manager) -- unless
dhcp scans itself? -- and I don't use any encryption, but I still get
"No ProbeResp" messages:

wlan0: No ProbeResp from current AP 00:18:3a:95:d2:72 - assume out of range

That msg is always associated with this 'iwconfig wlan0' output:

wlan0     IEEE 802.11abg  ESSID:"CoppsHillTerrace"  
          Mode:Managed  Frequency:2.412 GHz  Access Point: Not-Associated   
          Tx-Power=15 dBm   
          Retry min limit:7   RTS thr:off   Fragment thr=2352 B   
          Encryption key:off
          Power Management:off
          Link Quality:0  Signal level:0  Noise level:0
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:0   Missed beacon:0

Could this be a separate problem?
Comment 17 Reinette Chatre 2009-05-01 20:43:45 UTC
Could somebody more familiar with mac80211 please look at this issue?

Summary: Associations are being dropped with high frequency (2.6.27.4 did do so, but not as often as 2.6.29). The symptom can be summarized with familiar "No ProbeResp from current AP" message even if the AP is 20 feet away. Original reporter of this bug has not responded to requests for testing of recent mac80211 changes in this area.

Thank you very much
Comment 18 Jan Dvorak 2009-05-04 11:16:50 UTC
Please accept, once again, my apologies for so late response.

I am running linux-2.6.30-rc3-wl and _I_am_not_sure_ whether this problem still occurs in it's original form. I will investigate further tonight and return with dmesg output.

With 2.6.28 I had to remove whole stack from the kernel in order to get the wifi back running. Now it only takes reconfiguration of interface. I got used to be running a ping-gw loop that will reconfigure if this happens and since reconfig is like 5 seconds long, I don't usually notice any problems.
Comment 19 Jan Dvorak 2009-05-05 20:48:17 UTC
Alright, with 2.6.30-rc3-wl I was not able to reproduce the problem. The AP was not hiding it's ESSID, so I can't say the problem is solved, but it definitely seems that way.
Comment 20 John W. Linville 2009-05-07 17:59:04 UTC
Closing on basis of comment 19 -- please reopen if the issue reappears...thanks!
Comment 21 Sanjoy Mahajan 2009-05-13 02:38:13 UTC
> Closing on basis of comment 19 -- please reopen if the issue
> reappears...thanks!

I just tested 2.6.30-rc5-wl (commit bf2c6a38af60), and I see the same
problem as with 2.6.29 (and 2.6.27.4, though less often):

  wlan0: no probe response from AP 00:18:3a:95:d2:72 - disassociating

There's no encryption, the SSID is not hidden, and the router is about
20 feet away, so the signal strength is high.  I don't think it's due to
the router.  As further evidence in that direction (though not
conclusive), I had the same 'no probe response' problem with a router at
MIT (with public SSID and no encryption) -- though that was with vanilla
-rc5, not with the wireless-testing changes.

I don't run Network Manager or other programs that regularly scan.
There's just a scan once when the interface is brought up to find all
the nearby SSID's.

The laptop is a Thinkpad T60 with Intel wireless and graphics.  Here is
the lspci -v output for the wireless card:

03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)
	Subsystem: Intel Corporation Device 1010
	Flags: bus master, fast devsel, latency 0, IRQ 30
	Memory at edf00000 (32-bit, non-prefetchable) [size=4K]
	Capabilities: [c8] Power Management version 2
	Capabilities: [d0] MSI: Mask- 64bit+ Count=1/1 Enable+
	Capabilities: [e0] Express Legacy Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 22-2f-25-ff-ff-02-13-00
	Kernel driver in use: iwl3945
Comment 22 Johannes Berg 2009-05-13 08:47:38 UTC
(In reply to comment #21)
> > Closing on basis of comment 19 -- please reopen if the issue
> > reappears...thanks!
> 
> I just tested 2.6.30-rc5-wl (commit bf2c6a38af60), and I see the same
> problem as with 2.6.29 (and 2.6.27.4, though less often):
> 
>   wlan0: no probe response from AP 00:18:3a:95:d2:72 - disassociating

There's no reason to believe that to be an actual bug or regression though. Wireless links are not perfect, so occasionally they will drop enough packets for us to assume that the AP has died. Sometimes APs even do interrupt beaconing for a short period of time.

One point to take away from this is that we should try to reassociate in that case. However, a large percentage of users simply use NM with wpa_supplicant for all their networks, in which case wpa_supplicant handles such reassociations and everything just works _despite_ the occasional hiccup in the connection. Therefore, despite the fact that this has been on our todo list for ages, nobody has bothered to fix it in the kernel.

This bug was concerned with that happening all the time, and then the reassociation failing, presumably due to a bug in the driver that caused the hardware to stop functioning, thereby causing _both_ the probe response timeout _and_ the reconnection failure.

I don't consider an occasional probe response timeout as you are reporting to be a bug -- at best it's a feature request to reconnect afterwards, but since there's an easy workaround (run wpa_supplicant) I'm not inclined to work on that.
Comment 23 Sanjoy Mahajan 2009-05-25 12:56:49 UTC
> I don't consider an occasional probe response timeout as you are
> reporting to be a bug -- at best it's a feature request to reconnect
> afterwards, but since there's an easy workaround (run wpa_supplicant)
> I'm not inclined to work on that.

Prompted by that suggestion, I finally learnt how to use wpa_supplicant,
and installed a minimal wpa_supplicant configuration (without using NM).
It seems to be working and reassociates as needed with the AP.

Note You need to log in before you can comment on or make changes to this bug.