Bug 17471 - Ath5k dropouts and lockups
Summary: Ath5k dropouts and lockups
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-30 15:02 UTC by Chris Adams
Modified: 2011-01-21 20:23 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.35.4-12.fc14
Subsystem:
Regression: No
Bisected commit-id:


Attachments
ath5k disconnect log (121.87 KB, text/plain)
2010-09-10 19:36 UTC, Oliver
Details
phy0 info (2.76 KB, text/plain)
2010-09-11 10:49 UTC, Oliver
Details
phy2 info (1.11 KB, text/plain)
2010-09-11 10:50 UTC, Oliver
Details

Description Chris Adams 2010-08-30 15:02:48 UTC
I have an old Thinkpad Z60m that came with an Atheros (AR5212, 168c:1014) card.  I switched it out with the Intel 2200 at the time because I didn't want to mess with madwifi (and got a good deal on the Intel NIC).  I recently had a need for an AP, so I put the Atheros back in to run hostapd.

Since then, running normal client mode (no hostapd running, just NM), I have had intermittent wireless network dropouts, and a couple of system lockups (screensaver was active and wouldn't wake so I don't know if anything was printed, but the speaker was a solid beep and the caps-lock LED was blinking).  When I saw dropouts, the wireless LED came on solid for several seconds, and then went back to normal when the network came back.  It did not appear to lose the association and dmesg didn't show anything logged.

The system is running Fedora 12, with the latest F12 kernel (kernel-PAE-2.6.32.19-163.fc12.i686).  After the first lockup, I tried upgrading to the F14 kernel-PAE-2.6.35.4-12.fc14.i686, but it doesn't appear to have changed anything (I got another lockup yesterday).
Comment 1 John W. Linville 2010-08-30 17:41:40 UTC
There isn't much real information here.  Perhaps you could produce the hang from a pseudo terminal or use netconsole or a serial console to capture a backtrace?  Maybe you can narrow-down a way to reproduce the hang on demand?
Comment 2 Chris Adams 2010-08-30 17:49:34 UTC
I'm sorry about that.  The system hangs have been when the system was idle (and idle for hours; I had to get up in the middle of the night to figure out where the beep was coming from).  I'm going to have to switch back to the Intel NIC for the next week or so (I need it working for some stuff), but then I'll put the Atheros back in and set up netconsole and maybe kdump to try to gather some info.

Is there any debugging I can do when the network just drops out?  It only does it for a few seconds (enough to be aggravating to an SSH session).

I haven't needed to debug wireless issues before, so I'm not sure how to proceed.
Comment 3 John W. Linville 2010-08-30 17:52:10 UTC
An occasional disconnect isn't necessarily unexpected, and it could be caused by any number of things.  It might be useful to attach /var/log/messages from after one of these drops.
Comment 4 Oliver 2010-09-10 13:33:59 UTC
I have 2 ath5k devices in an IBM T42 that have similar behavior.

1 device is a PCMCIA 3com device (details follow if needed) which only does b/g and I use as it is far far more stable. The drops are noticable, but very infrequent and only short.

On the mini-pci internal NIC however, the drops are so bad, that you can't really use it. They happen about every 2-5 minutes and last a good 10 seconds. With the classic madwifi driver, it all worked fine, also in win it works as supposed to, so it's not interference or anything the like.

I'll upload a debug file I obtained a few months ago when I get home.

As for the crashes, I also notice them when using the onboard NIC. At first I blamed firefox, as it happened whilst surfing, so switched tho chrome for a while, but even there it has happened. Since my entire X session hangs, I can do absolutely nothing and since the hangs happen so unpredictably, it's very hard to reproduce.
Comment 5 Bob Copeland 2010-09-10 15:11:08 UTC
10 seconds every 2 minutes sounds like background scanning to me.  Is the 3Com device 2 GHz only and the mini-pci dual band?

2.6.36 should fix some hangs caused by concurrent resets, but netconsole or a serial console to capture any kernel OOPSes would be very helpful.
Comment 6 Oliver 2010-09-10 19:35:13 UTC
02:02.0 Ethernet controller: Atheros Communications Inc. AR5212 802.11abg NIC (rev 01)
	Subsystem: Phillips Components Device 8331
	Flags: bus master, medium devsel, latency 168, IRQ 11
	Memory at c0210000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: [44] Power Management version 2
	Kernel driver in use: ath5k
	Kernel modules: ath5k

03:00.0 Ethernet controller: Atheros Communications Inc. Atheros AR5001X+ Wireless Network Adapter (rev 01)
	Subsystem: 3Com Corporation Device 6801
	Flags: bus master, medium devsel, latency 168, IRQ 10
	Memory at c4000000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: [44] Power Management version 2
	Kernel driver in use: ath5k
	Kernel modules: ath5k

are the two devices; the thing is, the stuttering from the devices is quite different. The 3com is quite usable, wheras the onboard mini-pci is quite unusable due the much more often and far longer stutters. Network Manager btw doesn't disconnect, it claims to be connected constantly.

Btw, I have had this wireless issue for about 2 years now, s it's not something from a recent update. Also I reinstall every 6 months whenever there's a new ubuntu release. A few months ago, 2 or 3 maybe, I did try 2.6.35 selfcompiled but same result.
Comment 7 Oliver 2010-09-10 19:36:24 UTC
Created attachment 29521 [details]
ath5k disconnect log
Comment 8 Bob Copeland 2010-09-10 21:56:59 UTC
Ok the minipci is definitely dual band, but is the 3com a single band card or not?  (i.e. what is output of 'iw phy phy0 info' and 'iw phy phy1 info'?)  If it's scanning, that could explain all of your stuttering right there.
Comment 9 Oliver 2010-09-11 10:49:56 UTC
Created attachment 29572 [details]
phy0 info

output of iw phy phy0 info
Comment 10 Oliver 2010-09-11 10:50:33 UTC
Created attachment 29582 [details]
phy2 info

output of iw phy phy2 info
Comment 11 Oliver 2010-09-11 10:53:32 UTC
Well as the minipci abg so makes sense it's dualband. As for the 3com, it's only bg so singleband?

Anyway, how can I prevent it from scanning? I'm sure not a lot of people have abg cards, most have bg, but with N now being popular, people are starting to see abgn cards which are dualband no?

I'm using a stock ubuntu install and had this since forever, i assumed it was the driver not playing nice. I can see why a wireless devices would occasionally spend time in the background scanning, to keep the list of wifi networks current, but is this behavior can be quite annoying, so how do I disable it? :)
Comment 12 Bob Copeland 2010-09-11 13:23:54 UTC
Well there's a couple of things we can do from the driver POV:

- try to reduce channel change time; iirc the time to configure channel is not accounted in the off-channel time, and that by itself takes a bit of time in ath5k.  We are working on that but it will be a while before a patch is ready.
- make sure we implement flush etc, I'll add a patch for that today

As for what you can do right now: network manager queues the background scans and it uses wpa_supplicant behind the scenes.  You can use wpa_supplicant directly with the nl80211 driver and ask it to scan only certain frequencies.  I last looked at this some time ago when NM didn't support the nl80211 driver, and now it does, so you may be able to configure NM to send the correct option to wpa_supplicant.  [FYI, nl80211 is the preferred way to configure wireless devices today (used by iw) while wext is the legacy way (used by iwconfig).]
Comment 13 Bob Copeland 2010-09-11 13:25:52 UTC
(In reply to comment #11)
> Well as the minipci abg so makes sense it's dualband. As for the 3com, it's
> only bg so singleband?

oh and to answer this, yes, the 3com is single band; no 5 ghz channels in the phy info output.  Scan time is some factor * the number of channels available.
Comment 14 Oliver 2010-09-11 16:13:03 UTC
Looks like i've got some heavy digging into NM to do :S and then only hope I can get that sorted out. I'm surprised however this isn't the default, but I guess that's a different bug report.

Thanks for the info in so far.
Comment 15 Oliver 2010-09-11 18:41:02 UTC
After reading this post I went searching, now that I knew what to search for and found some interesting things.

https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/373680
http://nilvec.com/disable-scanning-in-networkmanager-when-connected/

Bug report and workarounds. Hopefully this will finally work with 10.04 and I can use my wifi normally.

One last p.s. I wanted to mention, and now makes perfect sense, is that when the background scan is happening, my wifi led is almost always entirely on. Not flashing, solid on. Back to testing now then.

Thanks,

oliver
Comment 16 John W. Linville 2011-01-21 18:25:05 UTC
Closing on basis of OP seeming to have disappeared and Oliver's issue seeming to relate to NetworkManager...
Comment 17 Oliver 2011-01-21 20:23:04 UTC
I switched from NetworkManager to wicd and now everything is perfect (wicd doesn't do backgroundscanning) so it's deffinatly the background scanning thing on dual band that bugs something out.

Note You need to log in before you can comment on or make changes to this bug.