Latest working kernel version: 2.6.27.10 (vanilla) and 2.6.27-gentoo-r7 Earliest failing kernel version: 2.6.28 (vanilla) and 2.6.28-gentoo Distribution: Gentoo Hardware Environment: amd64, Atheros Communications Inc. AR242x 802.11abg Wireless PCI Express Adapter (rev 01) Software Environment: ath5k, wpa_supplicant, wireless-tools Problem Description: With the ath5k driver in the kernel 2.6.28 it is not possible to connect to any access point. From time to time there are some messages in the kernel log, but not always: ath5k: unsopported jumbo ath5k: can't reset hardeware (-11) ath5k phy0: noise floor calibration timeout (2412) MHz phy0: failed to restore operational channel after scan Scanning for access points with "wpa_cli scan && wpa_cli scan_results" works perfectly. Scanning for access points with "iwlist wlan0 scan" doesn't work: error message: print_scanning_info: Allocation failed. Important: With the 2.6.27.10 there is none of these two problems. Steps to reproduce: Update to 2.6.28 kernel and use the ath5k driver... Hint: Don't know if this is important: I test this only with WEP and WPA networks. I don't try to connect to an open network.
Is SSID hidden?
(In reply to comment #1) > Is SSID hidden? > no, and I tested it with an open network now, no difference
Can you post output of 'strace iwlist wlan0 scan' ?
Created attachment 19786 [details] strace iwlist wlan0 scan strace iwlist wlan0 scan &> strace_iwlist_wlan0_scan. New info: It seems to me that iwlist wlan0 scan print_scanning_info: Allocation failed is a bug which happens only from time to time, too: Sometimes it scans, sometimes it doesn't. But the "association bug" remains... greetings Jan
Very weird. Which version of wireless-tools? Here is where things look broken: > ioctl(3, SIOCGIWSCAN, 0x7fff75ae7cb0) = -1 E2BIG (Argument list too long) > mremap(0x7fc65d2e2000, 134221824, 268439552, MREMAP_MAYMOVE) = 0x7fc64d2e1000 > ioctl(3, SIOCGIWSCAN, 0x7fff75ae7cb0) = -1 E2BIG (Argument list too long) [...] So we're asking for scan results with a 134 *meg* buffer, it fails so we reallocate with 268 megs. > mremap(0x7fc5ed2df000, 1073745920, 18446744071562072064, MREMAP_MAYMOVE) = -1 > EFAULT (Bad address) > mmap(NULL, 18446744071562072064, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) We keep doubling until we wrap 32-bit int, then it goes negative so you get ENOMEM. Looking at mac80211 (net/mac80211/scan.c) ieee80211_scan_results, I don't see right away how you would get -E2BIG with any of those sizes, unless ieee80211_scan_result is hosed. But there weren't major changes to it in 2.6.28. As for association, can you turn on CONFIG_MAC80211_DEBUG_MENU and CONFIG_MAC80211_VERBOSE_DEBUG then post whatever shows up in 'dmesg' (if anything) when you try to associate?
Created attachment 19938 [details] with debugging enabled and some tests Sorry, I was busy with an exam... (In reply to comment #5) > Very weird. Which version of wireless-tools? I use wireless-tools 29 > As for association, can you turn on CONFIG_MAC80211_DEBUG_MENU and > CONFIG_MAC80211_VERBOSE_DEBUG then post whatever shows up in 'dmesg' > (if anything) when you try to associate? Steps I had done: 0) debugging enabled (CONFIG_MAC80211_VERBOSE_DEBUG and ath5k) in the 2.6.28 vanilla (not 2.6.28.1). 1) fresh reboot with this kernel effect: get no connection to any access point NEW: 2) disabled the wireless lan with the keyboard button (seems to be hardware based, things like rfkill are disabled) 3) wait some seconds and enabled the wlan again. effect: get a connection to a access point, but loose it shortly, especally if I scan with "iwlist wlan0 scan" (sometimes it scans correctly) 4) Repeat step 2) and 3): It is reproducible. You can see this in the dmesg output.
New info: First failing kernel version is the 2.6.28-rc1. It is save to do a (git) bisect between 2.6.27 and 2.6.28-rc1? I mean, can it damage my hardware if I start my system with such a kernel? (and this are 3800 patches, is there an easy way?) New Info: I told you in my previous comment that the card connects to an access point if I disable and enable it. If I start a ping to a website, then the connection doesn't breake.
Bisecting won't hurt your hardware, and really it's the only thing I can think of at the moment. You can try excluding it to changes in net/ via: $ git bieect start -- net
(^typo, should be bisect)
Any news on this one?
(In reply to comment #10) > Any news on this one? > I'm sorry, I ran into trouble with the bisect, but I hang in there: This is what I find out up to now: First the kernel in the bisect doesn't compile, something with drivers/built-in.o: In function `rtl8169_gset_xmii': r8169.c:(.text+0x7e4b8): undefined reference to `mii_ethtool_gset' make: *** [.tmp_vmlinux1] Fehler 1 and the modules does not build. After skipping some of such kernels I decided to do the bisect without the realtek 8169 and no modules (all things build in) But after testing 2.6.27 and 2.6.28-rc1 again, I find out, that both problems are not reproducible any time: At the university the problems appear more often then at home: At the university are up to 80 access points, at home up to ten. Maybe this is one reason. Next problem: After making sure that the 2.6.28-rc1 has the bug and the 2.6.27 has not, the first kernel between them in the bisect gets a kernel panic at boot. More exactly: I test it at the university and the kernel panic appears if and only if I activate the chip. Until now I skipped some kernels but all get a kernel panic (test in the university). At home some of them boot up and I can test. Maybe the same problem: Too many access points near to the tuning range. Hence its amazing to reboot the laptop every time, and so I have to spend more time on it.
(In reply to comment #8) > Bisecting won't hurt your hardware, Why are you so sure? http://www.phoronix.com/scan.php?page=news_item&px=Njc0Nw This can happen every time... But I'll do the bisect, if there are no more unexpected problems.
> Why are you so sure? > http://www.phoronix.com/scan.php?page=news_item&px=Njc0Nw Well, because there are no known ath5k bugs that brick the device. If there are any unknown ones, then you might as well hit it using a stable kernel :) Of course if you have e1000, that's another story. > This can happen every time... > But I'll do the bisect, if there are no more unexpected problems. Actually I believe the issue has to do with large information elements in the scan results, combined with the fact that ath5k exports lots of channels so scans take a considerable time. This can interrupt normal function of the card. There are some changes in the pipeline to address some of this. Though I don't think either of those issues are regressions, so there may be something else.
1) The connecting problem to access points: I believe bisecting is not useful, because the bug is not reproducible (at friends I get a connection every time) and the behavior of the bug chances between the bisect. I get this bug: a40c24a13366e324bc0ff8c3bb107db89312c984 is first bad commit commit a40c24a13366e324bc0ff8c3bb107db89312c984 Author: David S. Miller <davem@davemloft.net> Date: Thu Sep 11 04:51:14 2008 -0700 net: Add SKB DMA mapping helper functions. Signed-off-by: David S. Miller <davem@davemloft.net> :040000 040000 2ab13c7cac689f67d97cb8f7ca42343713c53ca0 15a1e0f81f6e8f7eb7e6659a 0f7b6b983eeda420 M include :040000 040000 ff3568bfc0848c00927e97f7c6005a7857f9c0af c877f9af828cab1c62785ead 7cf3571202ab27a7 M net
2) For the "iwlist bug" I have to do a second bisect. The bug split (I get a connection to an access point, but iwlist wlan0 scan fails) 3) I'll test what happen if I revert the commit above in 2.6.28-rc1
Created attachment 20304 [details] log of the bisect for the "ap connection bug" only the log
> > 3) I'll test what happen if I revert the commit above in 2.6.28-rc1 > It is not possible to revert this bug in 2.6.28-rc1 (too many dependencies) After testing this commit again (boot with this kernel), I could connect to an ap. I said it is not reproducible all the time. But: I use the wpa_supllicant and I have all networks disabled as standart. It seems to me that I can connect to an ap with more probability the faster I enable the network with the wpa_cli after reboot.
(In reply to comment #15) > 2) For the "iwlist bug" I have to do a second bisect. The bug split (I get a > connection to an access point, but iwlist wlan0 scan fails) I will wait for the 2.6.29 now and test both problems then, maybe they are gone. I will do a new bisect if and only if the problems are still present then.
both still present with 2.6.29 (gentoo-sources)
Please post the dmesg of the attempt to associate with AP. You can also try this patch in the meantime: http://marc.info/?l=linux-wireless&m=123841474910111&w=2
(In reply to comment #20) > Please post the dmesg of the attempt to associate with AP. It shows nothing... (tested with gentoo-sources-2.6.29-r1, CONFIG_MAC80211_DEBUG_MENU and CONFIG_MAC80211_VERBOSE_DEBUG turned on) > > You can also try this patch in the meantime: > > http://marc.info/?l=linux-wireless&m=123841474910111&w=2 I will do this next
oh I forgot: only wpa_cli repeats "CTRL-EVENT-SCAN-RESULTS" regulary Happy Easter!
Today I installed gentoo on an old desktop system. I have an external "Siemens Gigaset 54 Usb" adapter. I installed the 2.6.27-r8 and 2.6.29-r5 kernel (gentoo-sources). And now an important new info: I thought this bug is a problem with the ath5k, but I have the same bug with the p54usb: It is all ok with the 2.6.27 but with 2.6.29 wpa-cli does not connect to the ap! (ap with WPA2). My friend has a bcm4318 and uses the b43 - module. He is not affected by this bug. Does ath5k and p54usb have any same dependecies / code they use, which does not has / is not used by the b43-module? Maybe we can narrow the regression / patch down, now. I don't test this bug with 2.6.30 yet.
(In reply to comment #23) > I don't test this bug with 2.6.30 yet. Test it. On both systems (with gentoo-sources-2.6.30-r1). It seems to be FIXED.