Bug 16633

Summary: ath5k + 2.6.35 x86 + hostapd - Failed to set channel
Product: Networking Reporter: NiTr0 (nitr0)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: CLOSED DOCUMENTED    
Severity: high CC: florian, linville, maciej.rutecki, mcgrof, rjw, trenn
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: Config of faulty 2.6.35 kernel

Description NiTr0 2010-08-19 19:37:31 UTC
I decided to upgrade my router distro kernel from 2.6.32 to 2.6.35 - and noticed that I loose wi-fi in it. hostapd reports:

nl80211: Failed to set channel (freq=2442): -518219808 (Unknown error 518219808)

On Gentoo box (x86_64) all works OK with Gentoo native binaries, and same hostapd binary in 32bit mode with same libraries (checked by ldd - it pulls libraries from building sandbox) with same card (DLink DWA-520) and same hostapd config works perfect.

I tried to recompile kernel using config from Gentoo box (I just disabled here KVM & DRI support, and of course changed CPU to PentiumII) - no effect. So it looks that trouble is somewhere in kernel (it looks like stack corruption/variable overflow). Channel is actually changed - iwconfig displays that frequency changed.

Used compiler - GCC 3.3.3; binutils - 2.16.1; C library - uClibc 0.9.30.3
Kernel config is attached. After update kernel to 2.6.35.2 - nothing changes.
Comment 1 NiTr0 2010-08-19 19:38:51 UTC
Created attachment 27514 [details]
Config of faulty 2.6.35 kernel
Comment 2 John W. Linville 2010-08-20 14:12:37 UTC
Please attach the output of dmesg after getting the failure.
Comment 3 NiTr0 2010-08-20 21:36:12 UTC
In dmesg there is nothing interesting.
After modprobing eth5k:

[85534.274422] cfg80211: Calling CRDA to update world regulatory domain
[85534.301306] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 19
[85534.301367]   alloc irq_desc for 19 on node -1
[85534.301370]   alloc kstat_irqs on node -1
[85534.301382] ath5k 0000:02:08.0: PCI INT A -> Link[LNKC] -> GSI 19 (level, low) -> IRQ 19
[85534.301536] ath5k 0000:02:08.0: registered as 'phy0'
[85534.884169] ath: EEPROM regdomain: 0x30
[85534.884172] ath: EEPROM indicates we should expect a direct regpair map
[85534.884176] ath: Country alpha2 being used: AM
[85534.884178] ath: Regpair used: 0x30
[85534.897816] phy0: Selected rate control algorithm 'minstrel'
[85534.898649] ath5k phy0: Atheros AR2414 chip found (MAC: 0x79, PHY: 0x45)
[85534.898841] cfg80211: Calling CRDA for country: AM

After running hostapd:
[85536.905073] netconsole: network logging stopped, interface mon.wlan0 unregistered

That's all... 
I need to enable debugging somewhere?
Comment 4 Rafael J. Wysocki 2010-08-20 23:03:09 UTC
If possible, please verify if 2.6.34 works on this hardware.
Comment 5 NiTr0 2010-08-21 14:47:15 UTC
I've reassembled 2.6.34.5 kernel - it's working perfectly. I just copy 2.6.35 config to 2.6.34.5 and do make oldconfig; then - reassemble distro with this kernel. On router I replaced only initrd (with base libraries like uClibc), kernel and modules. libnl & hostapd are from build with 2.6.35 kernel - so it's much like that trouble is really in kernel.
Comment 6 John W. Linville 2010-08-23 13:54:51 UTC
Your dmesg output doesn't show any response from CRDA.  Do you have it installed?
Comment 7 NiTr0 2010-08-23 14:03:18 UTC
No, I haven't it. 2.6.35 kernel strictly require usage of CRDA + udev for wireless communication?
Also on my Gentoo x86_64 box there is no CRDA - but wifi works fine on it even with 2.6.35.
Comment 8 John W. Linville 2010-08-23 14:37:32 UTC
There is no new requirement.  As it has been for some time, without something to provide regulatory rules to the kernel you will experience limited functionality regarding channel selection, transmit power, beaconing, etc.  Whether or not this effects your situation depends on what channels you are trying to use, etc.  Still, for the most part the lack of CRDA or the alternative use of the CONFIG_CFG80211_INTERNAL_REGDB config option is a telltale sign of a misconfigured system and is usually worth investigating.

Are you selecting channel 7 (i.e. 2.442 GHz)?  Please attach your hostapd configuration.

Could you do a git bisect from 2.6.34 to 2.6.35 in an attempt to narrow-down the potential cause of this failure?
Comment 9 NiTr0 2010-08-24 09:48:50 UTC
With disabled CONFIG_CFG80211_INTERNAL_REGDB all works OK on x86_64 Gentoo box and older kernels without CRDA; and with enabled CONFIG_CFG80211_INTERNAL_REGDB on x86 kernel it still doesn't work.

I used 7th channel - but even with 1st channel I have same error.

I can try to make bisection - but it'll require enough time to do this if I'll do this on full tree; and I'm not sure where trouble is actually - driver or wireless subsystem (cfg80211 & etc) - to do bisection on part of kernel, so it'll be good to check if trouble is present on other wireless cards. Unfortunately, I have only ath9k (in my Asus 1001p) that for unknown reason doesn't want to run in AP mode (it fails to switch in AP mode with error 132).

Here is my config:

interface=wlan0
driver=nl80211
logger_syslog=-1
logger_syslog_level=2
logger_stdout=-1
logger_stdout_level=2
dump_file=/tmp/hostapd.dump
ctrl_interface=/var/run/hostapd
ctrl_interface_group=0
ssid=NiTr0
hw_mode=g
channel=7
beacon_int=100
dtim_period=2
max_num_sta=255
rts_threshold=2347
fragm_threshold=2346
supported_rates=10 20 55 110 60 90 120 180 240  360 480 #540 1080
preamble=1
macaddr_acl=0
auth_algs=3
ignore_broadcast_ssid=0
wme_enabled=1
wme_ac_bk_cwmin=4
wme_ac_bk_cwmax=10
wme_ac_bk_aifs=7
wme_ac_bk_txop_limit=0
wme_ac_bk_acm=0
wme_ac_be_aifs=3
wme_ac_be_cwmin=4
wme_ac_be_cwmax=10
wme_ac_be_txop_limit=0
wme_ac_be_acm=0
wme_ac_vi_aifs=2
wme_ac_vi_cwmin=3
wme_ac_vi_cwmax=4
wme_ac_vi_txop_limit=94
wme_ac_vi_acm=0
wme_ac_vo_aifs=2
wme_ac_vo_cwmin=2
wme_ac_vo_cwmax=3
wme_ac_vo_txop_limit=47
wme_ac_vo_acm=0
eapol_key_index_workaround=0
eap_server=0
wpa=3
wpa_passphrase=longpassphraze
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP CCMP

I can put my distro somewhere (it's full size is 24M) if somebody can check it's work on different wireless cards. Or I can try to ack help from somebody of this distro community to make bisection of full tree and check all on my hardware because now I haven't too much free time.
Comment 10 John W. Linville 2010-08-24 13:09:10 UTC
FWIW, error 132 is ERFKILL.

If you want to limit your bisection, net/wireless net/mac80211 and drivers/net/wireless/ath seem like good candidates.
Comment 11 NiTr0 2010-09-07 19:32:38 UTC
Same kernel with same config, recompiled by GCC 4.4.3 - works OK. It seems that trouble is with gcc compatibility - possible it's related with lot of warnings during compilation with older gcc (gcc-3.3.3 in my case) (bug 16506).

Unfortunately, I still haven't enough free time to do bisection & find broken commit - it looks like I need to do bisection on full tree, not on wireless-related commits. I asked community to help with besecting, possible somebody will do this, or I will do bisection later, when I'll have free time.
Comment 12 Florian Mickler 2010-09-13 06:50:56 UTC
Well, if you have a buildwarning that may be related (the cpufeature.h one comes to mind) you can write yourself a script and use 'git bisect run' to have git do the bisection without user intervention.

It's a bit rougher in this case because you probably have to apply the build-fix from #16506 at most steps. 

To run the automated bisection use:

$git bisect run ../runscript.sh

when ../runscript.sh could look like so (fingers crossed, i actually didn't test it.. you need to copy your .config above the linux-source tree into backup_of_your_config):

#!bin/bash
NEEDED_COMMIT=30246557a06bb20618bed906a06d1e1e0faa8bb4
CURRENT_HEAD=`cat .git/HEAD`

git log HEAD --pretty=oneline | grep -q $NEEDED_COMMIT 
if [ $? -ne 0 ]; then 
     echo "$CURRENT_HEAD needs build fix applied..."
     git cherry-pick $NEEDED_COMMIT #let's hope this does not fail
fi

cp ../backup_of_your_config .config
make silentoldconfig 
make -j4 > build_log.txt || exit 125 #skips this commit (build failure)

grep -q 'warning: asm operand 1 probably doesn't match constraints' \ 
      build_log.txt
TEST_RESULT=$?

git reset --hard $CURRENT_HEAD

if [ $TEST_RESULT -ne 0 ]; then
    exit 0; #good
else
    exit 1; #bad
fi
Comment 13 Florian Mickler 2010-09-30 05:56:37 UTC
The minimum requirements for the build chain have been bumped up, see bug 16506.
I close this as 'Documented' then.