Bug 33852

Summary: Regression of AR2413 802.11bg in 2.6.38.4
Product: Process Management Reporter: Boris Popov (popov.b)
Component: OtherAssignee: process_other
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: florian, linville, lucio.pinese, maciej.rutecki, me, mickflemm, rjw, tj, zajec5
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 27352    
Attachments: /var/log/messages
lspci and iwlist in kernel 2.6.38.4
lspci and iwlist in kernel 2.6.37.6
first bad commit
log of bisecting
first bad commit (new)
log of bisecting (new)

Description Boris Popov 2011-04-23 12:12:24 UTC
Created attachment 55122 [details]
/var/log/messages

AR2413 works best in my notebook with kernel before 2.6.37.6 inc,
but doesn't work with kernel 2.6.38.4.
Comment 1 John W. Linville 2011-04-25 13:42:20 UTC
I don't see anything unusual in your /var/log/messages output.  Can you describe the problem more precisely?  What "doesn't work"?
Comment 2 Boris Popov 2011-04-26 09:47:17 UTC
(In reply to comment #1)
> I don't see anything unusual in your /var/log/messages output.  Can you
> describe the problem more precisely?  What "doesn't work"?

I think problem in Apr 23 15:39:27 laptop kernel: [   15.668598] ADDRCONF(NETDEV_UP): wlan0: link is not ready.

I can't connect to my router.

What info do you need?
Comment 3 Boris Popov 2011-04-26 13:19:25 UTC
iwconfig wlan0 essid router.popov.net key XXXXXXXXXXXXXXXXXXXXXXXXXX

iwconfig wlan0

wlan0     IEEE 802.11bg  ESSID:"router.popov.net"  
          Mode:Managed  Frequency:2.452 GHz  Access Point: Not-Associated   
          Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XX
          Power Management:off
Comment 4 Boris Popov 2011-04-26 13:26:45 UTC
iwconfig wlan0 essid router.popov.net key XXXXXXXXXXXXXXXXXXXXXXXXXX

iwconfig wlan0

wlan0     IEEE 802.11bg  ESSID:"router.popov.net"  
          Mode:Managed  Frequency:2.452 GHz  Access Point: Not-Associated   
          Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Encryption key:XXXX-XXXX-XXXX-XXXX-XXXX-XXXX-XX
          Power Management:off
Comment 5 Boris Popov 2011-04-26 13:27:54 UTC
kernel 2.6.37.6:

boris@laptop:~$ /sbin/iwconfig wlan0
wlan0     IEEE 802.11bg  ESSID:"router.popov.net"  
          Mode:Managed  Frequency:2.452 GHz  Access Point: 00:18:E7:F7:B3:9D   
          Bit Rate=54 Mb/s   Tx-Power=20 dBm   
          Retry  long limit:7   RTS thr:off   Fragment thr:off
          Power Management:off
          Link Quality=68/70  Signal level=-42 dBm  
          Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
          Tx excessive retries:0  Invalid misc:53   Missed beacon:0
Comment 6 John W. Linville 2011-04-26 15:07:10 UTC
Still not much information...perhaps we could see the output of "lspci -n" and "iwlist wlan0 scan"?
Comment 7 Boris Popov 2011-04-27 03:28:46 UTC
Created attachment 55592 [details]
lspci and iwlist in kernel 2.6.38.4
Comment 8 Boris Popov 2011-04-27 03:30:07 UTC
Created attachment 55602 [details]
lspci and iwlist in kernel 2.6.37.6
Comment 9 Boris Popov 2011-04-27 03:40:00 UTC
(In reply to comment #6)
> Still not much information...perhaps we could see the output of "lspci -n"
> and
> "iwlist wlan0 scan"?

Ok John, please see attachment.
Can I help you any more (debuginfo etc...)?
Comment 10 John W. Linville 2011-04-27 19:08:52 UTC
So, it looks like you aren't receiving any scan information (and possibly nothing at all).

Could you do a git bisect between 2.6.37 and 2.6.38.4?
Comment 11 Boris Popov 2011-04-28 12:15:35 UTC
I will try.
Comment 12 Boris Popov 2011-04-30 04:19:12 UTC
> Could you do a git bisect between 2.6.37 and 2.6.38.4?

I am doing bisect.

In 42c025f3de9042d9c9abd9a6f6205d1a0f4bcadf:

iwlist wlan0 scan works well, but
in kern.log:

Apr 30 07:58:24 laptop kernel: [   67.073537] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 30 07:58:25 laptop kernel: [   67.258316] wlan0: authenticate with 00:18:e7:f7:b3:9d (try 1)
Apr 30 07:58:25 laptop kernel: [   67.261474] wlan0: authenticated
Apr 30 07:58:25 laptop kernel: [   67.263641] wlan0: associate with 00:18:e7:f7:b3:9d (try 1)
Apr 30 07:58:25 laptop kernel: [   67.274361] wlan0: RX AssocResp from 00:18:e7:f7:b3:9d (capab=0x431 status=0 aid=1)
Apr 30 07:58:25 laptop kernel: [   67.274369] wlan0: associated
Apr 30 07:58:25 laptop kernel: [   67.275894] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Apr 30 07:58:25 laptop kernel: [   67.619988] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 07:58:27 laptop kernel: [   69.328244] cfg80211: Calling CRDA to update world regulatory domain
Apr 30 07:58:31 laptop kernel: [   73.637083] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 07:58:35 laptop kernel: [   77.768144] wlan0: no IPv6 routers present
Apr 30 07:59:10 laptop kernel: [  112.276479] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 07:59:10 laptop kernel: [  112.624093] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 07:59:10 laptop kernel: [  112.626185] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 30 07:59:10 laptop kernel: [  113.102671] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 07:59:11 laptop kernel: [  113.608067] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 08:00:31 laptop kernel: [  193.276325] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 08:00:31 laptop kernel: [  193.623698] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 08:00:31 laptop kernel: [  193.625725] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 30 08:00:31 laptop kernel: [  194.102689] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 08:00:32 laptop kernel: [  194.617677] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 08:03:50 laptop kernel: [  392.757340] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 08:03:50 laptop kernel: [  393.104117] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 08:03:50 laptop kernel: [  393.106053] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 30 08:03:51 laptop kernel: [  393.577207] ath5k phy0: gain calibration timeout (2452MHz)
Apr 30 08:03:51 laptop kernel: [  394.077608] ath5k phy0: gain calibration timeout (2452MHz)
Comment 13 Boris Popov 2011-04-30 07:16:47 UTC
Created attachment 55932 [details]
first bad commit
Comment 14 Boris Popov 2011-04-30 07:17:39 UTC
Created attachment 55942 [details]
log of bisecting
Comment 15 Rafael J. Wysocki 2011-04-30 20:08:25 UTC
First-Bad-Commit : 42c025f3de9042d9c9abd9a6f6205d1a0f4bcadf
Comment 16 Tejun Heo 2011-05-01 13:48:21 UTC
Umm... I'm sorry but that bisection gotta be incorrect.  The only thing the commit does is adding a comment.

 kernel/workqueue.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 930c239..11869fa 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -768,7 +768,11 @@ static inline void worker_clr_flags(struct worker *worker, unsigned int flags)
 
        worker->flags &= ~flags;
 
-       /* if transitioning out of NOT_RUNNING, increment nr_running */
+       /*
+        * If transitioning out of NOT_RUNNING, increment nr_running.  Note
+        * that the nested NOT_RUNNING is not a noop.  NOT_RUNNING is mask
+        * of multiple flags, not a single flag.
+        */
        if ((flags & WORKER_NOT_RUNNING) && (oflags & WORKER_NOT_RUNNING))
                if (!(worker->flags & WORKER_NOT_RUNNING))
                        atomic_inc(get_gcwq_nr_running(gcwq->cpu));
Comment 17 Florian Mickler 2011-05-01 14:42:37 UTC
Boris, can you concentrate your good/bad decision only on the scanning not working? It's expected that other issues crop up and get fixed while bisecting through a range of commits... if they don't interfere with the scanning, ignore them, if you can't test the wifi scanning (maybe a commit does not compile) just skip that commit. 

[You can always check where you're at with the command 'git bisect visualize', which will you show all commits you have not yet eleminated]
Comment 18 Boris Popov 2011-05-07 18:12:01 UTC
I repeated bisect and have new first bad commit:

573cfde7aaeaadb0fd356ff2a14bdf9238967661 is the first bad commit
commit 573cfde7aaeaadb0fd356ff2a14bdf9238967661
Author: Nick Kossifidis <mickflemm@gmail.com>
Date:   Fri Feb 4 01:41:02 2011 +0200

    ath5k: Fix fast channel switching
    
    Fast channel change fixes:
    
    a) Always set OFDM timings
    b) Don't re-activate PHY
    c) Enable only NF calibration, not AGC
    
    https://bugzilla.kernel.org/show_bug.cgi?id=27382
    
    Signed-off-by: Nick Kossifidis <mickflemm@gmail.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

:040000 040000 47e60fb64921ea8eb2a91f5baa60b8fcd699a39e c9676b82bdca218cce2988c4952f2fcaac935d36 M      drivers
Comment 19 Boris Popov 2011-05-07 18:14:21 UTC
Created attachment 56922 [details]
first bad commit (new)
Comment 20 Boris Popov 2011-05-07 18:15:10 UTC
Created attachment 56932 [details]
log of bisecting (new)
Comment 21 John W. Linville 2011-05-10 17:13:45 UTC
Did you try reverting that commit?  Does that resolve the issue?
Comment 22 Bob Copeland 2011-05-11 04:14:28 UTC
Perhaps just turning off fast channel switching would help, it's good but nonessential and a few people have reported problems with it.
Comment 23 Boris Popov 2011-05-11 06:29:37 UTC
(In reply to comment #21)
> Did you try reverting that commit?  Does that resolve the issue?

Reverting that commit resolve _only_ iwlist scan.
Access point is "Not-Associated" like comment #3 (https://bugzilla.kernel.org/show_bug.cgi?id=33852#c3).

Can I help you any more?
Comment 24 Nick Kossifidis 2011-05-13 00:19:24 UTC
O.K. let's turn fast channel switching to a module parameter, the question is should we use a blacklist or a whitelist approach (enable by default or disable by default) ? Also we still don't have any reports on failed AR5413 hw, could we at least limit that to AR2413 ?
Comment 25 Nick Kossifidis 2011-05-14 14:25:55 UTC
Anyway I'll send a patch later today that disables fast channel switching by default and adds a module parameter to enable it...
Comment 27 Boris Popov 2011-05-15 07:09:01 UTC
(In reply to comment #26)
> Try this out...
>
> http://www.kernel.org/pub/linux/kernel/people/mickflemm/01-fast-chan-switch-modparm

It works great! Thanks so much!
Comment 28 Rafał Miłecki 2011-05-15 07:52:43 UTC
(In reply to comment #27)
> (In reply to comment #26)
> > Try this out...
> >
> http://www.kernel.org/pub/linux/kernel/people/mickflemm/01-fast-chan-switch-modparm
> 
> It works great! Thanks so much!

Boris: what about your association problem? Does it still occur? Is this also regression between 2.6.37 and 2.6.38?

Could you do bisecting between:
a) GOOD: 2.6.37
b) BAD: One commit before "ath5k: Fix fast channel switching"
and perhaps create new bug report.
Comment 29 Boris Popov 2011-05-15 10:40:00 UTC
(In reply to comment #28)

> Boris: what about your association problem? Does it still occur? Is this also
> regression between 2.6.37 and 2.6.38?

I applied patch to last commit Linus' kernel tree and haven't problem.


> Could you do bisecting between:
> a) GOOD: 2.6.37
> b) BAD: One commit before "ath5k: Fix fast channel switching"
> and perhaps create new bug report.

I saw commit before "fix fast..." b5f737... and it is working well at this moment.
Comment 30 Florian Mickler 2011-05-23 17:23:53 UTC
So do I understand correctly that this issue is now resolved and the 
Patch: http://www.kernel.org/pub/linux/kernel/people/mickflemm/01-fast-chan-switch-modparm 

resolves this issue?
Comment 31 Boris Popov 2011-05-24 03:21:29 UTC
(In reply to comment #30)
> So do I understand correctly that this issue is now resolved and the 
> Patch:
>
> http://www.kernel.org/pub/linux/kernel/people/mickflemm/01-fast-chan-switch-modparm 
> 
> resolves this issue?

Yes of course.
Comment 32 lucio.pinese 2011-06-07 13:36:48 UTC
sorry for the newbie question...on what release of the kernel this patch will be implented? Thanks..i am new to linux and i don't know how to patch manually! 


Thanks a lot!