Bug 11055

Summary: iwl4965 - connection doesn't work more than 10 seconds
Product: Networking Reporter: François Valenduc (francoisvalenduc)
Component: WirelessAssignee: Zhu Yi (yi.zhu)
Status: CLOSED CODE_FIX    
Severity: normal CC: francoisvalenduc, marcus, rjw, yi.zhu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26 Subsystem:
Regression: --- Bisected commit-id:
Attachments: dmesg with commit 66b5004d85164a6439d3ba1e7757734472ee2cac reverted
dmesg with commit 66b5004d85164a6439d3ba1e7757734472ee2cac reverted and debug enabled
dmesg with commit 66b5004d85164a6439d3ba1e7757734472ee2cac applied and debug enabled
dmesg with commit 66b5004d85164a6439d3ba1e7757734472ee2cac reverted and debug enabled
Syslog output with commit 66b5004d85164a6439d3ba1e7757734472ee2cac applied

Description François Valenduc 2008-07-08 13:29:38 UTC
Latest working kernel version: 2.6.26-rc8
Earliest failing kernel version: probably 2.6.26-rc9
Distribution: Gentoo
Hardware Environment: Packard Bell easynote MB86, intel core 2 duo
Software Environment:
Problem Description: after a bit more than 10 seconds, there is no connection anymore between the computer and the router. When I ping my router, I don't get answer after 9 or 10 pings. However, when I run iwconfig, it reports that it's still associated. I am using WPA-PSK with CCMP encryption. I doesn't see message in the kernel logs when this problem occurs.
Does anybody have any idea about this very annoying problem ?

Thanks in advance.

Steps to reproduce: boot the computer and ping the router. Notice that after 9 or 10 pings, there is no answer and that the wireless connection is stopped.
Comment 1 Rafael J. Wysocki 2008-07-08 15:34:08 UTC
This entry is being used for tracking a regression from 2.6.25.  Please don't
close it until the problem is fixed in the mainline.
Comment 2 Zhu Yi 2008-07-08 18:36:27 UTC
Are you sure this problem is between rc8 and rc9? I only find 3 patches are wireless related. But none of them seems have anything to do with this problem.

commit 23976efedd5ecb420b87455787c537eb4aed1981
mac80211: don't accept WEP keys other than WEP40 and WEP104

commit 00eb7fe77eb455f807c396f9917f0f623d4c84bb
mac80211: fix an oops in several failure paths in key allocation

commit 66b5004d85164a6439d3ba1e7757734472ee2cac
iwlwifi: improve scanning band selection management
Comment 3 François Valenduc 2008-07-09 12:53:22 UTC
In fact, it seems that the problem occurs somewhere between rc6 and rc7. After having tried git-bisect, it appears that the first bad commit is the following:

commit 66b5004d85164a6439d3ba1e7757734472ee2cac
Author: Ron Rindjunsky <ron.rindjunsky@intel.com>
Date:   Wed Jun 25 16:46:31 2008 +0800

    iwlwifi: improve scanning band selection management

    This patch modifies the band selection management when scanning, so
    bands are now scanned according to HW band support.

If I revert it, the problem doesn't occur.
Comment 4 Zhu Yi 2008-07-09 18:32:08 UTC
This commit is the last one I listed. So it is between rc8 and rc9, right? I couldn't see why this change introduced this problem. Can you attach the dmesg with debug=0x8ff for both with this commit and with the commit reverted?
Comment 5 François Valenduc 2008-07-10 00:07:52 UTC
Created attachment 16781 [details]
dmesg with commit 66b5004d85164a6439d3ba1e7757734472ee2cac reverted

Sorry for the confusion, the problem occurs well between rc8 and rc9. However, when I used the git-bisect, I end up with a 2.6.26-rc6 release, that's what made me doubt a little.
I suppose you wanted that I used the debug parameter for the iwl4965 module. The dmesg I attach is obtained with this parameter but it seems there is not more lines related to iwlwifi.
Comment 6 Zhu Yi 2008-07-10 00:16:14 UTC
Did you have CONFIG_IWLWIFI_DEBUG? Please also attach the dmesg after the commit.
Comment 7 François Valenduc 2008-07-10 00:30:15 UTC
Created attachment 16782 [details]
dmesg with commit 66b5004d85164a6439d3ba1e7757734472ee2cac reverted and debug enabled 

So, here is the dmesg output you asked. It's obtained with CONFIG_IWLWIFI_DEBUG enabled.
Comment 8 François Valenduc 2008-07-10 00:40:25 UTC
Created attachment 16783 [details]
dmesg with commit 66b5004d85164a6439d3ba1e7757734472ee2cac applied and debug enabled 

This dmesg output is well obtained when commit 66b5004d85164a6439d3ba1e7757734472ee2cac is applied and when the problem as occured.
Comment 9 François Valenduc 2008-07-10 00:43:18 UTC
Created attachment 16784 [details]
 dmesg with commit 66b5004d85164a6439d3ba1e7757734472ee2cac reverted and debug enabled

This dmesg is obtained with debug enabled and when this commit is reverted.
Comment 10 François Valenduc 2008-07-14 07:06:15 UTC
So, the 2.6.26 kernel as been released with this regression unresolved. 
Comment 11 François Valenduc 2008-07-14 07:10:42 UTC
Created attachment 16812 [details]
Syslog output with commit 66b5004d85164a6439d3ba1e7757734472ee2cac applied

Maybe this syslog output is more clear about the problem. The problem starts at the line 1274 of this file.
Comment 12 Zhu Yi 2008-07-16 18:26:13 UTC
> Jul 14 15:49:01 pc-francois dhcpcd[8245]: wlan0: adding default route via
> 192.168.1.1 metric 2000
> Jul 14 15:49:01 pc-francois dhcpcd[8245]: wlan0: adding route to
> 169.254.0.0/16 metric 2000

Is the ping stopped due to the route table change? Can you post the ping output?

BTW, do you have CONFIG_IWL4965_HT enabled?
Comment 13 François Valenduc 2008-07-17 00:17:15 UTC
Here is the ping output:

PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=1.11 ms
64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=1.09 ms
64 bytes from 192.168.1.1: icmp_seq=3 ttl=64 time=1.16 ms
64 bytes from 192.168.1.1: icmp_seq=4 ttl=64 time=1.20 ms
64 bytes from 192.168.1.1: icmp_seq=5 ttl=64 time=1.13 ms
64 bytes from 192.168.1.1: icmp_seq=6 ttl=64 time=1.09 ms
64 bytes from 192.168.1.1: icmp_seq=7 ttl=64 time=1.16 ms
64 bytes from 192.168.1.1: icmp_seq=8 ttl=64 time=1.15 ms
64 bytes from 192.168.1.1: icmp_seq=9 ttl=64 time=1.14 ms
64 bytes from 192.168.1.1: icmp_seq=10 ttl=64 time=1.09 ms
64 bytes from 192.168.1.1: icmp_seq=11 ttl=64 time=1.16 ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
^C
--- 192.168.1.1 ping statistics ---
24 packets transmitted, 11 received, 54% packet loss, time 42020ms
rtt min/avg/max/mdev = 1.091/1.136/1.200/0.051 ms

I had not yet seen the message "no buffer space avaiblable". In fact, after the last received ping, there is a pause (when the route table changes) and after around 30 seconds, these other messages appears.

I don't have CONFIG_IWL4965_HT enabled. Should I enable it ?
Comment 14 François Valenduc 2008-07-17 00:24:59 UTC
I also notice this kind of message in dmesg (without debugging enabled):

iwl4965: Intel(R) Wireless WiFi Link 4965AGN driver for Linux, 1.2.26k
iwl4965: Copyright(c) 2003-2008 Intel Corporation
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
PCI: Setting latency timer of device 0000:02:00.0 to 64
iwl4965: Detected Intel Wireless WiFi Link 4965AGN
iwl4965: Tunable channels: 13 802.11bg, 19 802.11a channels
phy1: Selected rate control algorithm 'iwl-4965-rs'
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
firmware: requesting iwlwifi-4965-1.ucode
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1b:11:5d:2c:d9
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: RX authentication from 00:1a:70:9a:3c:be (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:1a:70:9a:3c:be
wlan0: RX AssocResp from 00:1a:70:9a:3c:be (capab=0x411 status=0 aid=2)
wlan0: associated
wlan0: switched to short barker preamble (BSSID=00:1a:70:9a:3c:be)
fbcondecor: console 11 using theme 'livecd'
fbcondecor: switched decor state to 'on' on console 11
fbcondecor: console 11 using theme 'livecd'
fbcondecor: switched decor state to 'on' on console 11
wlan0: deauthenticate(reason=3)
ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1b:11:5d:2c:d9
wlan0: authenticate with AP 00:1b:11:5d:2c:d9
wlan0: authenticate with AP 00:1b:11:5d:2c:d9
wlan0: authentication with AP 00:1b:11:5d:2c:d9 timed out
wlan0: Failed to config new BSSID to the low-level driver
wlan0: Failed to config new BSSID to the low-level driver
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1b:11:5d:2c:d9
wlan0: Failed to config new BSSID to the low-level driver
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: RX authentication from 00:1a:70:9a:3c:be (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:1a:70:9a:3c:be
wlan0: authentication frame received from 00:1a:70:9a:3c:be, but not in authenticate state - ignored
wlan0: RX AssocResp from 00:1a:70:9a:3c:be (capab=0x411 status=0 aid=2)
wlan0: associated
wlan0: switched to short barker preamble (BSSID=00:1a:70:9a:3c:be)
wlan0: disassociate(reason=3)
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: RX authentication from 00:1a:70:9a:3c:be (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:1a:70:9a:3c:be
wlan0: RX ReassocResp from 00:1a:70:9a:3c:be (capab=0x411 status=0 aid=2)
wlan0: associated
wlan0: switched to short barker preamble (BSSID=00:1a:70:9a:3c:be)
wlan0: disassociate(reason=3)
fbcondecor: console 11 using theme 'livecd'
fbcondecor: switched decor state to 'on' on console 11
wlan0: No ProbeResp from current AP 00:1a:70:9a:3c:be - assume out of range
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: RX authentication from 00:1a:70:9a:3c:be (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:1a:70:9a:3c:be
wlan0: RX ReassocResp from 00:1a:70:9a:3c:be (capab=0x411 status=0 aid=2)
wlan0: associated
wlan0: switched to short barker preamble (BSSID=00:1a:70:9a:3c:be)
wlan0: disassociate(reason=3)
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: RX authentication from 00:1a:70:9a:3c:be (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:1a:70:9a:3c:be
wlan0: RX ReassocResp from 00:1a:70:9a:3c:be (capab=0x411 status=0 aid=2)
wlan0: associated
wlan0: switched to short barker preamble (BSSID=00:1a:70:9a:3c:be)
wlan0: disassociate(reason=3)
wlan0: No ProbeResp from current AP 00:1a:70:9a:3c:be - assume out of range
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: RX authentication from 00:1a:70:9a:3c:be (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:1a:70:9a:3c:be
wlan0: RX ReassocResp from 00:1a:70:9a:3c:be (capab=0x411 status=0 aid=2)
wlan0: associated
wlan0: switched to short barker preamble (BSSID=00:1a:70:9a:3c:be)
wlan0: disassociate(reason=3)
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:1a:70:9a:3c:be
wlan0: RX authentication from 00:1a:70:9a:3c:be (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:1a:70:9a:3c:be
wlan0: RX ReassocResp from 00:1a:70:9a:3c:be (capab=0x411 status=0 aid=2)
wlan0: associated
wlan0: switched to short barker preamble (BSSID=00:1a:70:9a:3c:be)

It seems that it associates and disassociates all the time.
Comment 15 François Valenduc 2008-07-18 05:50:58 UTC
If I enable CONFIG_IWL4965_HT, the connection works correctly. So this option seems to be necessary in my case. Is this option necessary with some routers ?
This is supposed "enables IEEE 802.11n High Throughput features for the iwl4965 driver" but my router is a Linksys WRT54G  which doesn't support the 802.11n norm.
Comment 16 Zhu Yi 2008-07-24 22:07:26 UTC
(In reply to comment #15)
> This is supposed "enables IEEE 802.11n High Throughput features for the
> iwl4965
> driver" but my router is a Linksys WRT54G  which doesn't support the 802.11n
> norm.

Yes, it is a bug. Can you confirm this problem on 2.6.26?
Comment 17 François Valenduc 2008-07-25 01:11:57 UTC
I can indeed confirm the problem. In order to make the connection work correctly under kernel 2.6.26, I need to enable this option (CONFIG_IWL4965_HT) or to revert the commit mentionned earlier (commit 66b5004d85164a6439d3ba1e7757734472ee2cac,iwlwifi: improve scanning band selection management).
Comment 18 Zhu Yi 2008-08-14 19:15:02 UTC
The problem should be fixed in 2.6.27-rc2 after merging the IWL4965_HT removing patch.
Comment 19 François Valenduc 2008-08-16 14:39:26 UTC
The problem is indeed solved in 2.6.27-rc2. For kernel 2.6.26 or 2.6.25, I simply need to enable CONFIG_IWL4965_HT to make the connection work correctly.
So can we consider this bug is solved ? Or maybe it's not a bug.