Bug 10320

Summary: rt2x00 does not associate or give scan results
Product: Networking Reporter: Marcus Better (marcus)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, frederic.coiffier, IvDoorn
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc6, 2.6.25 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: Debugfs register dump script
Updated regdump script
Register dump for 2.6.24
Register dump for 2.6.25-rc6 (working)
Register dump for 2.6.25-rc6 (not working)
Add MAC/BSSID configuration debugging
Add MAC/BSSID configuration debugging
Register dump for wireless-testing 2.6.25-rc7 (not working)
Traffic capture when taking down interface (2.6.25-rc7 wireless-testing)

Description Marcus Better 2008-03-25 06:04:44 UTC
Latest working kernel version: 2.6.24
Earliest failing kernel version: 2.6.25-rc6
Distribution: Debian testing/unstable
Hardware Environment: LG LE50 Express laptop (i386)
Software Environment: wpa_supplicant 0.6.3, wireless-tools 29
Problem Description:

With 2.6.25-rc6 I can no longer associate with my office access point. I am using WEP with wpa_supplicant. "iwlist scan" is not showing any scan results, or sometimes shows one cell IIRC, but with 2.6.24 I see plenty of networks.

Nothing special in the logs:

Mar 25 10:48:49 better kernel: phy0 -> rt2x00_set_chip: Info - Chipset detected - rt: 0201, rf: 0003, rev: 00000004.
Mar 25 10:48:49 better kernel: phy0 -> rt2x00mac_conf_tx: Info - Configured TX ring 0 - CWmin: 4, CWmax: 10, Aifs: 2.
Mar 25 10:48:49 better kernel: phy0 -> rt2x00mac_conf_tx: Info - Configured TX ring 1 - CWmin: 4, CWmax: 10, Aifs: 2.
Mar 25 10:48:49 better kernel: phy0 -> rt2x00mac_conf_tx: Info - Configured TX ring 7 - CWmin: 5, CWmax: 10, Aifs: 2.
Comment 1 Ivo van Doorn 2008-03-25 07:20:26 UTC
Created attachment 15425 [details]
Debugfs register dump script

Could you enable debugfs in mac80211 and rt2x00 and use attached script to create a full dump of the registers?
Please create a dump of the last working and the first non-working kernel.

Thanks.
Comment 2 Marcus Better 2008-03-25 08:31:05 UTC
> ------- Comment #1 from IvDoorn@gmail.com  2008-03-25 07:20 -------
> Please create a dump of the last working and the first non-working kernel.

The script didn't work. There is no "register" directory. Am I missing 
some kernel option?

~$ sudo ./rt2x00-debugfsdump
2.6.24-lg
cat: /sys/kernel/debug/ieee80211/phy*/rt*/register/../driver: Filen 
eller katalogen finns inte
dev_flags:cat: 
/sys/kernel/debug/ieee80211/phy*/rt*/register/../dev_flags: Filen eller 
katalogen finns inte
cat: /sys/kernel/debug/ieee80211/phy*/rt*/register/../chipset: Filen 
eller katalogen finns inte
grep: /sys/kernel/debug/ieee80211/phy*/rt*/register/../chipset: Filen 
eller katalogen finns inte

~$ ls /sys/kernel/debug/ieee80211/phy0/rt2500pci/
bbp_offset  chipset     csr_value  driver         eeprom_value  rf_value
bbp_value   csr_offset  dev_flags  eeprom_offset  rf_offset

~$ zcat /proc/config.gz |egrep "(RT2|MAC802)"
CONFIG_MAC80211=m
CONFIG_MAC80211_RCSIMPLE=y
CONFIG_MAC80211_LEDS=y
CONFIG_MAC80211_DEBUGFS=y
CONFIG_MAC80211_DEBUG=y
# CONFIG_MAC80211_VERBOSE_DEBUG is not set
# CONFIG_MAC80211_LOWTX_FRAME_DUMP is not set
# CONFIG_MAC80211_DEBUG_COUNTERS is not set
# CONFIG_MAC80211_IBSS_DEBUG is not set
# CONFIG_MAC80211_VERBOSE_PS_DEBUG is not set
CONFIG_RT2X00=m
CONFIG_RT2X00_LIB=m
CONFIG_RT2X00_LIB_PCI=m
CONFIG_RT2X00_LIB_RFKILL=y
# CONFIG_RT2400PCI is not set
CONFIG_RT2500PCI=m
CONFIG_RT2500PCI_RFKILL=y
# CONFIG_RT2500USB is not set
CONFIG_RT2X00_LIB_DEBUGFS=y
CONFIG_RT2X00_DEBUG=y
Comment 3 Ivo van Doorn 2008-03-25 08:57:18 UTC
Created attachment 15427 [details]
Updated regdump script

Ah the location of some files were moved post-2.6.24
I have attached an updated script which should run cleanly on both kernels. :)
Comment 4 Marcus Better 2008-03-25 09:20:11 UTC
Created attachment 15428 [details]
Register dump for 2.6.24
Comment 5 Marcus Better 2008-03-25 12:15:15 UTC
Hrmph. It started working when I was about to test. :( Will report back if it happens again.
Comment 6 Ivo van Doorn 2008-03-25 12:20:15 UTC
hehe :)
Now that it is working, you could create a register dump of the working setup and attach that in advance. Next time it breaks, you only need to add the broken dump and reopen this bug.

That way it makes comparison easier since there are 2 valid dumps and 1 broken, that will definately help in determining which register changes made the difference.
Comment 7 Marcus Better 2008-03-25 12:23:07 UTC
Created attachment 15429 [details]
Register dump for 2.6.25-rc6 (working)

This is from a working state, for comparison if it breaks later.
Comment 8 Marcus Better 2008-03-25 13:07:47 UTC
Created attachment 15430 [details]
Register dump for 2.6.25-rc6 (not working)

Hah! It broke again. After taking the interface down, I couldn't bring it up anymore. And no scan results whatsoever.
Comment 9 Marcus Better 2008-03-25 13:41:31 UTC
It works again after reloading the rt2500pci module.
Comment 10 Ivo van Doorn 2008-03-25 15:11:10 UTC
I noticed something odd in the register dumps which is a good clue about what is going on. :)

Does the log indicate anything when the link stops?
Because the register indicates the association registers were cleared for unknown reason...
Comment 11 Marcus Better 2008-03-26 01:50:25 UTC
> ------- Comment #10 from IvDoorn@gmail.com  2008-03-25 15:11 -------
> Does the log indicate anything when the link stops?

No, nothing.

Marcus
Comment 12 Ivo van Doorn 2008-03-26 09:13:32 UTC
Created attachment 15449 [details]
Add MAC/BSSID configuration debugging

Could you try attached patch?
It adds additional debugging for MAC and BSSID configuration, this can help in determining if mac80211/rt2x00 is resetting the BSSID or if the hardware issued a reset of some sort.
Comment 13 Marcus Better 2008-03-26 14:10:56 UTC
> ------- Comment #12 from IvDoorn@gmail.com  2008-03-26 09:13 -------
> Add MAC/BSSID configuration debugging
> 
> Could you try attached patch?

It doesn't seem to apply against nearly-current mainline (commit
a4083c9271e0a697278e089f2c0b9a95363ada0a). There is no function named
rt2x00lib_config_intf in that file.

Marcus
Comment 14 Ivo van Doorn 2008-03-26 14:24:32 UTC
Ah right, that was a 2.6.26 patch.. :S
I'll respin and create a new patch for 2.6.25-rcX tomorrow.
Comment 15 Ivo van Doorn 2008-03-27 08:47:41 UTC
Created attachment 15458 [details]
Add MAC/BSSID configuration debugging

Here is the updated patch, this uses a BUG_ON() statement, so you will see a complete stacktrace when somebody attempts to reset the BSSID.
Note that this also means that the trace occurs during ifconfig wlan0 down... But that is expected behavior. ;)
Comment 16 Marcus Better 2008-03-27 12:22:42 UTC
> ------- Comment #15 from IvDoorn@gmail.com  2008-03-27 08:47 -------
> Add MAC/BSSID configuration debugging
> 
> Here is the updated patch, this uses a BUG_ON() statement, so you will see a
> complete stacktrace when somebody attempts to reset the BSSID.

The BUG happened predictably in wpa_supplicant at boot when the 
interface was brought up. But immediately after that the system slowed 
down to a crawl so I couldn't do more testing. There was not much disk 
activity, the boot process continued, but it took a minute or so to run 
each init script.

I'll try to test some more when bringing the interface up manually.

Marcus
Comment 17 Ivo van Doorn 2008-03-27 12:38:13 UTC
Well that could mean that wpa_supplicant isn't detecting any traffic anymore and assumes the AP is out of reach (after which it will reset the BSSID and rescans).

Could you try to see what frames are coming in with wireshark around the time
the device stops. (You can revert the previous patch, because it will only cause noise in your log).

Perhaps you could try out the latest wireless-testing git tree (http://git.kernel.org/?p=linux/kernel/git/linville/wireless-testing.git;a=summary)
there is a major code overhaul between 2.6.25 and 2.6.26 including interface handling, queue handling which fixes dozens of bugs.
Perhaps this bug is among those. ;)
Comment 18 Marcus Better 2008-03-27 12:41:50 UTC
> ------- Comment #17 from IvDoorn@gmail.com  2008-03-27 12:38 -------
> Well that could mean that wpa_supplicant isn't detecting any traffic anymore
> and assumes the AP is out of reach (after which it will reset the BSSID and
> rescans).

You mean it doesn't normally do this at startup? This is the first time 
the interface is brought up, that worked fine before the patch. It seems 
to work now too, just that it slows down the system.

> Perhaps you could try out the latest wireless-testing git tree

Will try.

Marcus
Comment 19 Ivo van Doorn 2008-03-27 12:47:12 UTC
Well directly after startup either wpa_supplicant or mac80211 could send a 00:00:00:00:00:00 address to rt2x00.

When association starts it will send the correct address and when it deassociates it will be cleared again.
So under normal behavior the BUG() should be triggered directly at ifup() and at ifdown().
But if the BUG() is triggered while the interface is running and it was associated to the AP, then it might be because the beacons from the AP aren't getting through and wpa_supplicant thinks the AP is gone.
Comment 20 Marcus Better 2008-03-28 02:57:57 UTC
> ------- Comment #17 from IvDoorn@gmail.com  2008-03-27 12:38 -------
> Perhaps you could try out the latest wireless-testing git tree

Just tried it, the bug is still present. Besides throughput seems to be 
much worse, when it works.
Comment 21 Marcus Better 2008-03-28 03:02:23 UTC
Created attachment 15475 [details]
Register dump for wireless-testing 2.6.25-rc7 (not working)
Comment 22 Ivo van Doorn 2008-03-28 15:44:02 UTC
Could you try to see what frames (if any) are coming in with wireshark around the time the device stops?
Comment 23 Marcus Better 2008-03-31 01:20:26 UTC
Created attachment 15526 [details]
Traffic capture when taking down interface (2.6.25-rc7 wireless-testing)
Comment 24 Ivo van Doorn 2008-04-04 11:50:28 UTC
Ehm, you state this is the dump while taking down the interface,
but I meant the dump around the time rt2x00 breaks and stops transmitting any frames.
It is to see if the AP is sending a deauth message, or if it is still sending any form of beacons.
Comment 25 Marcus Better 2008-04-06 11:24:11 UTC
> ------- Comment #24 from IvDoorn@gmail.com  2008-04-04 11:50 -------
> Ehm, you state this is the dump while taking down the interface,
> but I meant the dump around the time rt2x00 breaks and stops transmitting any
> frames.

I don't follow. It doesn't break unless I take down the interface. Then
it doesn't associate when I try to bring it up again.
Comment 26 Ivo van Doorn 2008-04-09 11:35:52 UTC
Ok sorry am confusing different bugs.. :S

At the start of this bug you mentioned the link died after X minutes, so at least there is progress that it now only occurs after a ifdown && ifup. However I have to look it up, but I think I saw a mac80211 bug about that...

Just to be sure this is still a regression does this bug, (thus ifup && ifdown && ifup leaves the device in a unusable state) also occur in 2.6.24?
Comment 27 Marcus Better 2008-04-09 11:40:53 UTC
> ------- Comment #26 from IvDoorn@gmail.com  2008-04-09 11:35 -------
> At the start of this bug you mentioned the link died after X minutes,

I may not have realised what triggered it the first time. It hasn't died
on me except after ifdown/ifup, AFAICT.

> Just to be sure this is still a regression does this bug, (thus ifup &&
> ifdown
> && ifup leaves the device in a unusable state) also occur in 2.6.24?

The bug is not present in 2.6.24.

Please tell me what steps to take if you need a packet trace etc.
Comment 28 Ivo van Doorn 2008-04-10 08:09:57 UTC
After the ifdown & ifup command, did you use the 'iwconfig wlan0 ap <bssid>' command?
There was a bug in mac80211 that made that mandatory after a ifdown-ifup cycle, that should be fixed now, but your version most likely doesn't have that fix yet.
Comment 29 Marcus Better 2008-04-10 23:54:26 UTC
> ------- Comment #28 from IvDoorn@gmail.com  2008-04-10 08:09 -------
> After the ifdown & ifup command, did you use the 'iwconfig wlan0 ap <bssid>'
> command?

No, but it seems to help somewhat. I was able to get some packets
through by doing that. Then it broke down again after a short while.
Comment 30 Ivo van Doorn 2008-04-12 03:43:32 UTC
Ok then we are probably facing a mac80211 bug rather then rt2x00.
You did that test with latest wireless-testing right?
Comment 31 Marcus Better 2008-04-13 01:01:46 UTC
(In reply to comment #30)
> Ok then we are probably facing a mac80211 bug rather then rt2x00.
> You did that test with latest wireless-testing right?

That last test was with -rc8 mainline. Comments 20-23 apply to wireless-testing.

I can try with an updated wireless-testing in a couple of days.
Comment 32 Marcus Better 2008-04-22 04:52:54 UTC
Bug confirmed on 2.6.25.
Comment 33 Marcus Better 2008-04-25 05:12:40 UTC
(In reply to comment #30)
> You did that test with latest wireless-testing right?

Finally had a chance to test. The bug is not present in wireless-testing (2.6.25-rc9, commit 37bfd4f9703be5de4f632b08431127c2c1263353).
Comment 34 Ivo van Doorn 2008-09-04 00:45:51 UTC
Is the bug present in 2.6.26?

If not, I'll close this bug. :)
Comment 35 Marcus Better 2008-09-04 01:51:20 UTC
> ------- Comment #34 from IvDoorn@gmail.com  2008-09-04 00:45 -------
> Is the bug present in 2.6.26?

Sorry, I've changed laptops and cannot test it.
Comment 36 Ivo van Doorn 2008-09-04 02:46:38 UTC
Ok. this bug can be closed then, since I haven't heard problems from anybody else for the driver on 2.6.26 (Other then from the Fedora kernel users, who use a different version of rt2x00).