Created attachment 25292 [details] Commented dmesg (ipw2200 debug=255) Overview: Since long time (at least 2.6.24), with some networks the connection gets systematically dropped when a scan is requested (i.e. by NetworkManager). Steps to reproduce: I could always reproduce this issue by stopping NetworkManager and manually configuring a wireless connection to an affected network in /etc/network/interfaces (ifupdown uses wpa_supplicant, like NM does): when the connection is active, a iwlist ethx scan command always causes ap disassociation and subsequent connection drop. If no scans are requested the connection is very reliable and fast and lasts hours (no disassociations happen). Actual results: The card gets disassociated from the ap and the connection drops. Expected results: The connection shouldn't drop when scanning is requested. Build date and platform: Debian Sid i386 current (kernel 2.6.32-9) on an Acer TravelMate 435LMi + Intel ProWireless 2915 (ipw2200). Additional info: - wireless device lspci -vvv: 02:02.0 Network controller: Intel Corporation PRO/Wireless 2915ABG [Calexico2] Network Connection (rev 05) Subsystem: Intel Corporation Device 1001 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (750ns min, 6000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 18 Region 0: Memory at d0200000 (32-bit, non-prefetchable) [size=4K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Kernel driver in use: ipw2200 - note: The logfile I'm attaching refers to a connection to a Linksys WAG325N: the ap was 3mt far from the laptop, no matter whether the 802.11n features were enabled or not, whether the ESSID was broadcasted or not and regardless of used encryption (RSN or WPA). The ipw2200 module was loaded with no additional parameters. Thanks Maurizio
This is likely caused by the scan spending too much time away from the operational channel. It looks like the scanning is done in the firmware on the device, so I'm not sure how much can be done. Perhaps the dwell time can be tweaked a bit. Does the behaviour change if you using "iwlist eth1 scan essid blah" (where blah is a specific ssid in your area)?
> ipw2200: U ipw_wx_get_essid Getting essid: 'sayonara' > ipw2200: U ipw_wx_get_wap Getting WAP BSSID: 00:1a:70:a2:d7:94 > ipw2200: U ipw_wx_get_nick Getting nick > ipw2200: U ipw_wx_get_rate GET Rate -> 54000000 > ipw2200: U ipw_wx_get_mode Get MODE -> 2 > ipw2200: U ipw_deinit Aborting scan during shutdown. > ipw2200: U ipw_deinit Disassociating during shutdown. > ipw2200: U ipw_send_disassociate Disassocation attempt from 00:1a:70:a2:d7:94 > on channel 13. Looks like the disassociation is caused by shutdown. I didn't find where it is triggered by scan. BTW, is it a regression?
(In reply to comment #1) > This is likely caused by the scan spending too much time away from the > operational channel. It looks like the scanning is done in the firmware on > the > device, so I'm not sure how much can be done. Perhaps the dwell time can be > tweaked a bit. Does the behaviour change if you using "iwlist eth1 scan > essid > blah" (where blah is a specific ssid in your area)? Unfortunately it seems that specifying an existing essid doesn't change things.
(In reply to comment #2) > Looks like the disassociation is caused by shutdown. I didn't find where it > is > triggered by scan. BTW, is it a regression? As I wrote I'm experiencing frequent disconnects with this card at least since kernel 2.6.24 but I could find a possible cause only recently. Furthermore, Network Manager might have implemented roaming support at that time, thus making evident an already existing issue. Please also notice that this issue doesn't show up with any network/environment: I can reliably and steadily connect to some other APs by using Network Manager (i.e. a Linksys WRT-54GL AP running DD-WRT v24, RSN/PSK hidden ESSID); the AP used in the example works flawlessly with other hardware (tested with some Atheros and Broadcom based laptop cards and with some wireless capable Nokia phones).
Excluding 802.11a with 'iwpriv eth1 set_mode 6' improves connection reliability with NetworkManager, at least with cited WAG325N ap: as far as I can see disconnects are far less frequent, but still happen from time to time. This seems to confirm John's hypothesis, but doesn't explain why this happens only with some aps. I noticed that iwspy often shows the same 'Scan request completed' message many times after a single 'iwlist eth1 scan' command: is this behaviour expected? [ifup eth1] 18:57:51.668521 eth1 Scan request completed 18:57:51.669416 eth1 Set Mode:Managed 18:57:51.669465 eth1 Set Frequency:2.472 GHz (Channel 13) 18:57:51.689680 eth1 New Access Point/Cell address:00:1A:70:A2:D7:94 [iwlist eth1 scan] 18:58:28.053119 eth1 Scan request completed 18:58:33.427150 eth1 Scan request completed [iwlist eth1 scan] 18:59:12.212311 eth1 Scan request completed 18:59:17.615095 eth1 Scan request completed 18:59:23.618718 eth1 Scan request completed Is there something more I can do to help investigating this issue?
NM can initiate scan as well. Do you have a dmesg with the disassociate reason?
(In reply to comment #6) > NM can initiate scan as well. Do you have a dmesg with the disassociate > reason? Unfortunately, NM does request a background scan every 2 min. max (pls. see https://bugzilla.gnome.org/show_bug.cgi?id=513820). In our case, this means a connection drop every 2 minutes. Please tell me how to get a more detailed dmesg.
(In reply to comment #7) > Please tell me how to get a more detailed dmesg. Just capture the dmesg into a file _while_ the STA disassociated (due to scan). The one you provided looks like it STA is disassociated due to network shutdown.
(In reply to comment #8) > The one you provided looks like it STA is disassociated due to network > shutdown. _No_ network shutdowns happened during the dmesg capture I attached (you mean the AP, which I manage by myself, went down, right? Well, when I captured that log at least another client was connected to the AP, and its connection didn't drop). The only commands I sent were the ones you can find in the attached log, and no parts of the dmesg have been omitted. In the very same conditions, by using ifupdown instead of NetworkManager, the connection with that AP stays up many hours, until I shut my laptop down, without a single drop (since I found the trick, everytime I need to connect to that AP I disable NM and manually configure the connection with ifupdown; as soon as I try to send an 'iwlist eth1 scan' command the connection drops and after a few seconds is automatically reestablished). Since now, all the logs I could obtain were very similar to the one I attached; I'll get some more logs, in bg only mode too, and I'll post them along with the command which produced them.
From comment #2, ipw_down is called. That means someone (ie. NM) is closing the interface. To avoid NM's misleading of this problem, can you disable it and attach the dmesg with only iwconfig and iwlist scan?
Adding some more logs - with _no_ encryption the scan requests didn't cause any connection drop, therefore I didn't attach any log about such configuration. In the attached tgz you'll find: - the ipw2200_log.sh script, whose output has been redirected to the dmesg_x.log files; - the test.conf used with wpa_supplicant - 2 dirs containing both dmesg and wpa_supplicant -dd outputs
Created attachment 25449 [details] 2 more logs
I see your problem. The scan has been reset because the driver *thinks* it takes too long (5 seconds). You should be able to see the log message with debug=0x800 (IPW_DL_SCAN). Please try to increase IPW_SCAN_CHECK_WATCHDOG defined value in ipw2200.c and see if the problem is moderated.
Created attachment 25481 [details] logs [debug=0x800 + IPW_SCAN_CHECK_WATCHDOG (10 * HZ)] By looking at the logs, it seems that the scan remains sometimes stuck on a channel: increasing IPW_SCAN_CHECK_WATCHDOG value didn't help. I found an old patch which seems to address a similar - if not the same - issue: http://article.gmane.org/gmane.linux.drivers.ipw2100.devel/8343 (didn't test it yet).
oops... I see, that patch has been already applied. Applying the discarded part (aborting scan instead of restarting the adapter) allows me at least not to have my connection dropped every 2 mins when using NM, but the scans always stop after the first 4 channels.
What's your AP's beacon interval?
(In reply to comment #16) > What's your AP's beacon interval? 100 msec (default setting)
Created attachment 25543 [details] a patch to try Please try if this patch. It doesn't restart the adapter unless abort_scan doesn't work. BTW, from your log, the firmware doesn't _always_ stuck at channel 52. Maybe change the passive channel dwell in ipw_passive_dwell_time() from (priv->assoc_network->beacon_interval - 10) to (priv->assoc_network->beacon_interval - 25) help?
(In reply to comment #18) > BTW, from your log, the firmware doesn't _always_ stuck at channel 52. Maybe > change the passive channel dwell in ipw_passive_dwell_time() from > (priv->assoc_network->beacon_interval - 10) to > (priv->assoc_network->beacon_interval - 25) help? No, unfortunately that didn't help. > Created an attachment (id=25543) [details] > a patch to try > > Please try if this patch. It doesn't restart the adapter unless abort_scan > doesn't work. The patch works: the connection doesn't drop anymore upon scan requests. I'm waiting to verify its behaviour when SCAN_ABORT command fails (it happened only once when I was testing the first part of Helmut Schaa's first patch, aborting the scan instead of resetting the adapter: if I'm remembering correctly, dmesg reported 'Failed to send SCAN_ABORT: Command timed out', the adapter was completely locked until I forced a reload of the driver).
(In reply to comment #19) > [...] I'm > waiting to verify its behaviour when SCAN_ABORT command fails (it happened > only > once when I was testing the first part of Helmut Schaa's first patch, > aborting > the scan instead of resetting the adapter: if I'm remembering correctly, > dmesg > reported 'Failed to send SCAN_ABORT: Command timed out', the adapter was > completely locked until I forced a reload of the driver). The behaviour seems correct: [...] Mar 29 16:46:42 tm430 kernel: ipw2200: Failed to send SCAN_ABORT: Command timed out. Mar 29 16:46:44 tm430 kernel: ipw2200: Failed to send ASSOCIATE: Command timed out. Mar 29 16:46:45 tm430 kernel: ipw2200: Failed to send CARD_DISABLE: Command timed out. Mar 29 16:46:47 tm430 wpa_supplicant[2555]: Associated with xx:xx:xx:xx:xx:xx [...] The connection is then reestablished and the device works again. This behaviour is not optimal but definitely better than dropping the connection everytime the scan stalls.
commit e65054b64ff6b89380a7f546c9eebf862e679646 Author: Zhu Yi <yi.zhu@intel.com> Date: Tue Mar 30 09:36:16 2010 +0800 ipw2200: restart adapter only when abort_scan doesn't work When a scan watchdog is fired, try to send abort scan command first before restarting the adapter. This avoids reconnection for some users when scan hang is detected. This fixed bug https://bugzilla.kernel.org/show_bug.cgi?id=15419 Reported-by: Maurizio Avogadro <mavoga@gmail.com> Tested-by: Maurizio Avogadro <mavoga@gmail.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>