Bug 35932 - iwl4965 connection problems after moving to iwlegacy
Summary: iwl4965 connection problems after moving to iwlegacy
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stanislaw Gruszka
URL:
Keywords:
Depends on:
Blocks: 32012
  Show dependency tree
 
Reported: 2011-05-26 17:01 UTC by Luca Mattiello
Modified: 2011-06-07 08:05 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.38-rc1-git
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Kernel logs (169.31 KB, application/x-bzip)
2011-05-29 16:27 UTC, Luca Mattiello
Details
iwlegacy_test.patch (2.37 KB, text/plain)
2011-05-31 07:28 UTC, Stanislaw Gruszka
Details
iwl4965 logs (4.72 KB, text/x-config)
2011-06-01 03:04 UTC, Marko
Details

Description Luca Mattiello 2011-05-26 17:01:39 UTC
I recently installed the Linux kernel version 2.6.39 just to discover that wireless networking is not functioning properly: it takes minutes to connect to the network (both in 802.11g and 802.11n mode), and often it doesn't even connect in the end; if it connects, the bit rate stays fixed ad 1Mb/s, connection speed doesn't exceed 30KiB/s, and after some minutes it disconnects again.

iwconfig reports a link quality around 30/70, however at the same position and with the same link quality I am able to connect and surf full speed with both an old “good” kernel and Windows.

I bisected the problem down to commit be663ab67077fac8e23eb8e231a8c1c94cb32e54 (iwlwifi: split the drivers for agn and legacy devices 3945/4965)

I suspect the problem might be in one of drivers/net/wireless/iwlegacy/*-tx.c, since I noticed that iwconfig reports high “Tx excessive retries” in the affected kernels, however I'm not able to dig further into it since I'm not experienced in kernel code.


Some more info (from the last “good” kernel):

localhost ~ # uname -a
Linux localhost 2.6.38-rc1-bsct-00448-g4bc85c1 #17 SMP Thu May 26 18:03:26 CEST 2011 x86_64 Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz GenuineIntel GNU/Linux

localhost ~ # lspci | grep -i wireless
02:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61)

localhost linux-2.6-bsct # git bisect log
git bisect start
# good: [e40152ee1e1c7a63f4777791863215e3faa37a86] Linus 2.6.34
git bisect good e40152ee1e1c7a63f4777791863215e3faa37a86
# bad: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39
git bisect bad 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf
# good: [520045db940a381d2bee1c1b2179f7921b40fb10] Merge branches 'upstream/xenfs' and 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
git bisect good 520045db940a381d2bee1c1b2179f7921b40fb10
# good: [fc8fe1e992ae0326a88edbe4d6793e840bbdd4ff] PCI / ACPI: Fix build of the AER driver for CONFIG_ACPI unset
git bisect good fc8fe1e992ae0326a88edbe4d6793e840bbdd4ff
# skip: [399a40c92d744feebf89b1f07208407c26e0aaf5] KVM: emulator: Fix permission checking in io permission bitmap
git bisect skip 399a40c92d744feebf89b1f07208407c26e0aaf5
# good: [c370e594efe2993620d24d41a78f325102e99d1c] ARM: PL08x: fix locking between prepare function and submit function
git bisect good c370e594efe2993620d24d41a78f325102e99d1c
# bad: [776e58ea3d3735f85678155398241d2513afa67a] KVM: unbreak userspace that does not sets tss address
git bisect bad 776e58ea3d3735f85678155398241d2513afa67a
# good: [79d8a8f736151b12129984b1250fd708440e742c] Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
git bisect good 79d8a8f736151b12129984b1250fd708440e742c
# good: [6445ced8670f37cfc2c5e24a9de9b413dbfc788d] Merge branch 'staging-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
git bisect good 6445ced8670f37cfc2c5e24a9de9b413dbfc788d
# good: [1b0db64fb7693c1a131b7bcc38f7bc63fb396850] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
git bisect good 1b0db64fb7693c1a131b7bcc38f7bc63fb396850
# bad: [08704bcbf022786532b5f188935ab6619906049f] net: Create union flowi_uli
git bisect bad 08704bcbf022786532b5f188935ab6619906049f
# bad: [3c0afdca44af795dd315c20cc525927a459abe30] ipv4: Get peer more cheaply in rt_init_metrics().
git bisect bad 3c0afdca44af795dd315c20cc525927a459abe30
# good: [29546a6404e3a4b5d13f0a9586eb5cf1c3b25167] ipv6: Use ERR_CAST in addrconf_dst_alloc.
git bisect good 29546a6404e3a4b5d13f0a9586eb5cf1c3b25167
# bad: [892c05c093858086d808aeb366b2e11106dd96c6] rtlwifi: Let rtlwifi build when PCI is not enabled
git bisect bad 892c05c093858086d808aeb366b2e11106dd96c6
# bad: [36bcce430657e6fece0e8dd91557f35dbb69ec67] ath9k_htc: Handle storage devices
git bisect bad 36bcce430657e6fece0e8dd91557f35dbb69ec67
# good: [409622ecc2a3b618b31b1894ed6360fbdca95d62] wl12xx: AP mode - fix bug in cleanup of wl1271_op_sta_add()
git bisect good 409622ecc2a3b618b31b1894ed6360fbdca95d62
# bad: [9077f2189e54215a690048ab14ad6f7d8b5a5af7] Merge branch 'wireless-next-2.6' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-2.6
git bisect bad 9077f2189e54215a690048ab14ad6f7d8b5a5af7
# good: [73b78a22720087d2d384bdd49e9c25500ba73edd] iwlwifi: enable 2-wire bt coex support for non-combo device
git bisect good 73b78a22720087d2d384bdd49e9c25500ba73edd
# good: [4bc85c1324aaa4a8bb0171e332ff762b6230bdfe] Revert "iwlwifi: split the drivers for agn and legacy devices 3945/4965"
git bisect good 4bc85c1324aaa4a8bb0171e332ff762b6230bdfe
# bad: [be663ab67077fac8e23eb8e231a8c1c94cb32e54] iwlwifi: split the drivers for agn and legacy devices 3945/4965
git bisect bad be663ab67077fac8e23eb8e231a8c1c94cb32e54

Let me know if I can do something else, or provide more info.

Luca Mattiello
Comment 1 Rafael J. Wysocki 2011-05-27 22:54:43 UTC
First-Bad-Commit : be663ab67077fac8e23eb8e231a8c1c94cb32e54
Comment 2 Stanislaw Gruszka 2011-05-29 12:43:43 UTC
Unfortunately first bad commit info is not much helpful here.

Please provide information about your network, i.e: type, encryption, channel, anything that you think could be useful for reproduce the problem.

Also please provide verbose debug messages on bad kernel and also for working kernel for compare. To do this you have to configure syslog deamon to log debug messages into some file, let say /var/log/kernel. For rsyslogd this mean adding:

kern.*    /var/log/kernel

line in /etc/rsyslog.conf and restart service "/etc/init.d/rsyslog restart"

Then to get debug messages:

modprobe -r iwl4965 # or modprobe -r iwlagn on old kernel
echo > /var/log/kernel
modprobe iwl4965 debug=0x47ffffff # or modprobe iwlagn debug=...
#connect reproduce the problem
cp /var/log/kernel ~/kernel.bad # or ~/kernel.good

and attach kernel.bad and kernel.good here (compressed if big).

Note you have to compiled kernel with CONFIG_IWLWIFI_LEGACY_DEBUG on new and CONFIG_IWLWIFI_DEBUG=y on old kernel, to make driver generate debug messages.
Comment 3 Luca Mattiello 2011-05-29 16:21:45 UTC
Network type is 802.11G, encryption WPA2-PSK [AES], channel 07 (2.442GHz), however I experienced the same problem with any other channel I could try (01-13), and with 802.11N too.

I have to clarify that if I am nearer to the router (link quality ~50/70) then the network becomes usable even in the bad kernel, the difference is with little lower signal levels (~45-25/70), where the good kernel performs excellently, while the bad refuses to connect.

I'll attach the compressed logs.
Comment 4 Luca Mattiello 2011-05-29 16:27:12 UTC
Created attachment 59982 [details]
Kernel logs
Comment 5 Stanislaw Gruszka 2011-05-30 14:52:08 UTC
Indeed we can not transmit data - don't get response from firmware for tx command.

Seems that is caused by bad tx gain calibration, because of wrong temperature value (-15 Celsius degrees :-) 

Bad kernel:

> May 29 17:55:03 localhost kernel: [ 3191.889918] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl curr volt 0 eeprom volt -3 volt comp 0
> May 29 17:55:03 localhost kernel: [ 3191.889925] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl chain = 0
> May 29 17:55:03 localhost kernel: [ 3191.889933] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl fctry tmp 32, curr tmp -14, comp -15 steps
> May 29 17:55:03 localhost kernel: [ 3191.889940] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl fctry idx 32, fctry pwr 27
> May 29 17:55:03 localhost kernel: [ 3191.889947] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl chain = 1
> May 29 17:55:03 localhost kernel: [ 3191.889955] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl fctry tmp 32, curr tmp -14, comp -15 steps
> May 29 17:55:03 localhost kernel: [ 3191.889962] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl fctry idx 32, fctry pwr 26
> May 29 17:55:03 localhost kernel: [ 3191.889971] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl rate 0 sat 34 reg 30 usr 30 tgt 30

Good kernel:

> May 29 18:02:35 localhost kernel: [  162.112045] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl curr volt 0 eeprom volt -3 volt comp 0
> May 29 18:02:35 localhost kernel: [  162.112052] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl chain = 0
> May 29 18:02:35 localhost kernel: [  162.112058] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl fctry tmp 32, curr tmp 60, comp 9 steps
> May 29 18:02:35 localhost kernel: [  162.112066] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl fctry idx 32, fctry pwr 27
> May 29 18:02:35 localhost kernel: [  162.112072] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl chain = 1
> May 29 18:02:35 localhost kernel: [  162.112078] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl fctry tmp 32, curr tmp 60, comp 9 steps
> May 29 18:02:35 localhost kernel: [  162.112085] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl fctry idx 32, fctry pwr 26
> May 29 18:02:35 localhost kernel: [  162.112093] ieee80211 phy2: U
> iwl4965_fill_txpower_tbl rate 0 sat 34 reg 30 usr 30 tgt 30

I'm going to find the reason of bad temperature value, and prepare a patch.
Comment 6 Stanislaw Gruszka 2011-05-31 07:28:32 UTC
Created attachment 60202 [details]
iwlegacy_test.patch

Please test this patch, it include possible fix (+other one that is not yet applied), and some debug messages change.

If it does not fix the problem, please load module with debug=0x4303 option, reproduce and attach logs.
Comment 7 Luca Mattiello 2011-05-31 14:04:42 UTC
Yes, the patch fixed the problem.

Thank you very much.
Comment 8 Marko 2011-06-01 03:02:58 UTC
I have the same issue with 2.6.39. I applied the patch but it did not fix the issue: connection is slow (lots of lost packages, some times 90% loss).

ping www.google.com
...
--- www.google.com ping statistics ---
9 packets transmitted, 8 received, 11% packet loss, time 8008ms
rtt min/avg/max/mdev = 48.856/52.062/57.012/2.885 ms

iwconfig:

wlan0     IEEE 802.11abgn  ESSID:"sid123"  
          Mode:Managed  Frequency:2.462 GHz  Access Point: xxxx   
          Bit Rate=28.9 Mb/s   Tx-Power=14 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:off
          Power Management:off
          Link Quality=56/70  Signal level=-54 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:555  Invalid misc:405   Missed beacon:0


If I use 2.6.37 everything goes back to normal (good speed).
I captured the driver logs for debug=0x4303 option.
Comment 9 Marko 2011-06-01 03:04:47 UTC
Created attachment 60312 [details]
iwl4965 logs
Comment 10 Stanislaw Gruszka 2011-06-01 08:29:48 UTC
Mark, your issue is a different one, please open a new bug report for it, assign it to me, and provide logs as described in Comment #2.
Comment 11 Florian Mickler 2011-06-06 10:55:53 UTC
A patch referencing this bug report has been merged in v3.0-rc2:

commit dfe21582ac5ebc460dda98c67e8589dd506d02cd
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Wed Jun 1 17:17:57 2011 +0200

    iwl4965: correctly validate temperature value

Note You need to log in before you can comment on or make changes to this bug.