Latest working kernel version: 2.6.24 Earliest failing kernel version: 2.6.25-rc6 Distribution: Debian testing/unstable Hardware Environment: LG LE50 Express laptop (i386) Software Environment: wpa_supplicant 0.6.3, wireless-tools 29 Problem Description: With 2.6.25-rc6 I can no longer associate with my office access point. I am using WEP with wpa_supplicant. "iwlist scan" is not showing any scan results, or sometimes shows one cell IIRC, but with 2.6.24 I see plenty of networks. Nothing special in the logs: Mar 25 10:48:49 better kernel: phy0 -> rt2x00_set_chip: Info - Chipset detected - rt: 0201, rf: 0003, rev: 00000004. Mar 25 10:48:49 better kernel: phy0 -> rt2x00mac_conf_tx: Info - Configured TX ring 0 - CWmin: 4, CWmax: 10, Aifs: 2. Mar 25 10:48:49 better kernel: phy0 -> rt2x00mac_conf_tx: Info - Configured TX ring 1 - CWmin: 4, CWmax: 10, Aifs: 2. Mar 25 10:48:49 better kernel: phy0 -> rt2x00mac_conf_tx: Info - Configured TX ring 7 - CWmin: 5, CWmax: 10, Aifs: 2.
Created attachment 15425 [details] Debugfs register dump script Could you enable debugfs in mac80211 and rt2x00 and use attached script to create a full dump of the registers? Please create a dump of the last working and the first non-working kernel. Thanks.
> ------- Comment #1 from IvDoorn@gmail.com 2008-03-25 07:20 ------- > Please create a dump of the last working and the first non-working kernel. The script didn't work. There is no "register" directory. Am I missing some kernel option? ~$ sudo ./rt2x00-debugfsdump 2.6.24-lg cat: /sys/kernel/debug/ieee80211/phy*/rt*/register/../driver: Filen eller katalogen finns inte dev_flags:cat: /sys/kernel/debug/ieee80211/phy*/rt*/register/../dev_flags: Filen eller katalogen finns inte cat: /sys/kernel/debug/ieee80211/phy*/rt*/register/../chipset: Filen eller katalogen finns inte grep: /sys/kernel/debug/ieee80211/phy*/rt*/register/../chipset: Filen eller katalogen finns inte ~$ ls /sys/kernel/debug/ieee80211/phy0/rt2500pci/ bbp_offset chipset csr_value driver eeprom_value rf_value bbp_value csr_offset dev_flags eeprom_offset rf_offset ~$ zcat /proc/config.gz |egrep "(RT2|MAC802)" CONFIG_MAC80211=m CONFIG_MAC80211_RCSIMPLE=y CONFIG_MAC80211_LEDS=y CONFIG_MAC80211_DEBUGFS=y CONFIG_MAC80211_DEBUG=y # CONFIG_MAC80211_VERBOSE_DEBUG is not set # CONFIG_MAC80211_LOWTX_FRAME_DUMP is not set # CONFIG_MAC80211_DEBUG_COUNTERS is not set # CONFIG_MAC80211_IBSS_DEBUG is not set # CONFIG_MAC80211_VERBOSE_PS_DEBUG is not set CONFIG_RT2X00=m CONFIG_RT2X00_LIB=m CONFIG_RT2X00_LIB_PCI=m CONFIG_RT2X00_LIB_RFKILL=y # CONFIG_RT2400PCI is not set CONFIG_RT2500PCI=m CONFIG_RT2500PCI_RFKILL=y # CONFIG_RT2500USB is not set CONFIG_RT2X00_LIB_DEBUGFS=y CONFIG_RT2X00_DEBUG=y
Created attachment 15427 [details] Updated regdump script Ah the location of some files were moved post-2.6.24 I have attached an updated script which should run cleanly on both kernels. :)
Created attachment 15428 [details] Register dump for 2.6.24
Hrmph. It started working when I was about to test. :( Will report back if it happens again.
hehe :) Now that it is working, you could create a register dump of the working setup and attach that in advance. Next time it breaks, you only need to add the broken dump and reopen this bug. That way it makes comparison easier since there are 2 valid dumps and 1 broken, that will definately help in determining which register changes made the difference.
Created attachment 15429 [details] Register dump for 2.6.25-rc6 (working) This is from a working state, for comparison if it breaks later.
Created attachment 15430 [details] Register dump for 2.6.25-rc6 (not working) Hah! It broke again. After taking the interface down, I couldn't bring it up anymore. And no scan results whatsoever.
It works again after reloading the rt2500pci module.
I noticed something odd in the register dumps which is a good clue about what is going on. :) Does the log indicate anything when the link stops? Because the register indicates the association registers were cleared for unknown reason...
> ------- Comment #10 from IvDoorn@gmail.com 2008-03-25 15:11 ------- > Does the log indicate anything when the link stops? No, nothing. Marcus
Created attachment 15449 [details] Add MAC/BSSID configuration debugging Could you try attached patch? It adds additional debugging for MAC and BSSID configuration, this can help in determining if mac80211/rt2x00 is resetting the BSSID or if the hardware issued a reset of some sort.
> ------- Comment #12 from IvDoorn@gmail.com 2008-03-26 09:13 ------- > Add MAC/BSSID configuration debugging > > Could you try attached patch? It doesn't seem to apply against nearly-current mainline (commit a4083c9271e0a697278e089f2c0b9a95363ada0a). There is no function named rt2x00lib_config_intf in that file. Marcus
Ah right, that was a 2.6.26 patch.. :S I'll respin and create a new patch for 2.6.25-rcX tomorrow.
Created attachment 15458 [details] Add MAC/BSSID configuration debugging Here is the updated patch, this uses a BUG_ON() statement, so you will see a complete stacktrace when somebody attempts to reset the BSSID. Note that this also means that the trace occurs during ifconfig wlan0 down... But that is expected behavior. ;)
> ------- Comment #15 from IvDoorn@gmail.com 2008-03-27 08:47 ------- > Add MAC/BSSID configuration debugging > > Here is the updated patch, this uses a BUG_ON() statement, so you will see a > complete stacktrace when somebody attempts to reset the BSSID. The BUG happened predictably in wpa_supplicant at boot when the interface was brought up. But immediately after that the system slowed down to a crawl so I couldn't do more testing. There was not much disk activity, the boot process continued, but it took a minute or so to run each init script. I'll try to test some more when bringing the interface up manually. Marcus
Well that could mean that wpa_supplicant isn't detecting any traffic anymore and assumes the AP is out of reach (after which it will reset the BSSID and rescans). Could you try to see what frames are coming in with wireshark around the time the device stops. (You can revert the previous patch, because it will only cause noise in your log). Perhaps you could try out the latest wireless-testing git tree (http://git.kernel.org/?p=linux/kernel/git/linville/wireless-testing.git;a=summary) there is a major code overhaul between 2.6.25 and 2.6.26 including interface handling, queue handling which fixes dozens of bugs. Perhaps this bug is among those. ;)
> ------- Comment #17 from IvDoorn@gmail.com 2008-03-27 12:38 ------- > Well that could mean that wpa_supplicant isn't detecting any traffic anymore > and assumes the AP is out of reach (after which it will reset the BSSID and > rescans). You mean it doesn't normally do this at startup? This is the first time the interface is brought up, that worked fine before the patch. It seems to work now too, just that it slows down the system. > Perhaps you could try out the latest wireless-testing git tree Will try. Marcus
Well directly after startup either wpa_supplicant or mac80211 could send a 00:00:00:00:00:00 address to rt2x00. When association starts it will send the correct address and when it deassociates it will be cleared again. So under normal behavior the BUG() should be triggered directly at ifup() and at ifdown(). But if the BUG() is triggered while the interface is running and it was associated to the AP, then it might be because the beacons from the AP aren't getting through and wpa_supplicant thinks the AP is gone.
> ------- Comment #17 from IvDoorn@gmail.com 2008-03-27 12:38 ------- > Perhaps you could try out the latest wireless-testing git tree Just tried it, the bug is still present. Besides throughput seems to be much worse, when it works.
Created attachment 15475 [details] Register dump for wireless-testing 2.6.25-rc7 (not working)
Could you try to see what frames (if any) are coming in with wireshark around the time the device stops?
Created attachment 15526 [details] Traffic capture when taking down interface (2.6.25-rc7 wireless-testing)
Ehm, you state this is the dump while taking down the interface, but I meant the dump around the time rt2x00 breaks and stops transmitting any frames. It is to see if the AP is sending a deauth message, or if it is still sending any form of beacons.
> ------- Comment #24 from IvDoorn@gmail.com 2008-04-04 11:50 ------- > Ehm, you state this is the dump while taking down the interface, > but I meant the dump around the time rt2x00 breaks and stops transmitting any > frames. I don't follow. It doesn't break unless I take down the interface. Then it doesn't associate when I try to bring it up again.
Ok sorry am confusing different bugs.. :S At the start of this bug you mentioned the link died after X minutes, so at least there is progress that it now only occurs after a ifdown && ifup. However I have to look it up, but I think I saw a mac80211 bug about that... Just to be sure this is still a regression does this bug, (thus ifup && ifdown && ifup leaves the device in a unusable state) also occur in 2.6.24?
> ------- Comment #26 from IvDoorn@gmail.com 2008-04-09 11:35 ------- > At the start of this bug you mentioned the link died after X minutes, I may not have realised what triggered it the first time. It hasn't died on me except after ifdown/ifup, AFAICT. > Just to be sure this is still a regression does this bug, (thus ifup && > ifdown > && ifup leaves the device in a unusable state) also occur in 2.6.24? The bug is not present in 2.6.24. Please tell me what steps to take if you need a packet trace etc.
After the ifdown & ifup command, did you use the 'iwconfig wlan0 ap <bssid>' command? There was a bug in mac80211 that made that mandatory after a ifdown-ifup cycle, that should be fixed now, but your version most likely doesn't have that fix yet.
> ------- Comment #28 from IvDoorn@gmail.com 2008-04-10 08:09 ------- > After the ifdown & ifup command, did you use the 'iwconfig wlan0 ap <bssid>' > command? No, but it seems to help somewhat. I was able to get some packets through by doing that. Then it broke down again after a short while.
Ok then we are probably facing a mac80211 bug rather then rt2x00. You did that test with latest wireless-testing right?
(In reply to comment #30) > Ok then we are probably facing a mac80211 bug rather then rt2x00. > You did that test with latest wireless-testing right? That last test was with -rc8 mainline. Comments 20-23 apply to wireless-testing. I can try with an updated wireless-testing in a couple of days.
Bug confirmed on 2.6.25.
(In reply to comment #30) > You did that test with latest wireless-testing right? Finally had a chance to test. The bug is not present in wireless-testing (2.6.25-rc9, commit 37bfd4f9703be5de4f632b08431127c2c1263353).
Is the bug present in 2.6.26? If not, I'll close this bug. :)
> ------- Comment #34 from IvDoorn@gmail.com 2008-09-04 00:45 ------- > Is the bug present in 2.6.26? Sorry, I've changed laptops and cannot test it.
Ok. this bug can be closed then, since I haven't heard problems from anybody else for the driver on 2.6.26 (Other then from the Fedora kernel users, who use a different version of rt2x00).