This bug was initially reported at the Archlinux Bugtracker (http://bugs.archlinux.org/task/16413) The Atheros ath9k module is not working with [testing]-kernel 2.6.31.1 on an Acer Extensa 7630EZ. Downgrading to 2.6.30.x solves this problem ################################################# #### Log with 2.6.30 Kernel (Wifi is working) ### ################################################# [bernhard@wallaby ~]$ lsmod |grep ath ath9k 356404 0 mac80211 211488 1 ath9k cfg80211 78152 2 ath9k,mac80211 rfkill 13108 4 ath9k,acer_wmi led_class 5112 2 ath9k,acer_wmi [bernhard@wallaby ~]$ dmesg |grep ath ath9k 0000:05:00.0: enabling device (0000 -> 0002) ath9k 0000:05:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 ath9k 0000:05:00.0: setting latency timer to 64 phy0: Selected rate control algorithm 'ath9k_rate_control' Registered led device: ath9k-phy0::radio Registered led device: ath9k-phy0::assoc Registered led device: ath9k-phy0::tx Registered led device: ath9k-phy0::rx ################################################# ##################################################### #### Log with 2.6.31 Kernel (Wifi is not working) ### ##################################################### [bernhard@wallaby ~]$ dmesg |grep ath ath9k 0000:05:00.0: enabling device (0000 -> 0002) ath9k 0000:05:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 ath9k 0000:05:00.0: setting latency timer to 64 ath9k 0000:05:00.0: PCI INT A disabled ##################################################### Steps to reproduce: When I upgrade to a 2.6.31.x kernel, the wlan0 interface is detected and disabled instantly so I can't do something like iwlist wlan0 scan. If any further information would be useful, I will happily provide it.
I forgot to mention that the loaded modules are the same when running 2.6.30.x and 2.6.31.x kernels.
I suspect this relates to rfkill...there were a couple of acer-wmi rfkill fixes in the 2.6.31, possibly not enough of them...
Hi John. Thx for commenting on this one. Is there anything I can try so that we can confirm this depends on acer-wmi?
git rev-list v2.6.30..v2.6.31 drivers/platform/x86/acer-wmi.c ed5c8ef3bb2de277b7885072e0e981c41a022be5 a878417cc576720d3c9ff5399522d06f226bad7d b3fa1329eaf2a7b97124dacf5b663fd51346ac19 19d337dff95cbf76edd3ad95c0cee2732c3e1ec5 621cac85297de5ba655e3430b007dd2e0da91da6 You could try reverting those patches? Looks like it won't be a clean set of reverts. Not sure what else to suggest as a quick test...?
I don't think acer-wmi is essential. If it is doing something wrong, you should be able to prevent it from loading echo "blacklist acer-wmi" >> /etc/modprobe.d/acer-wmi-blacklist.conf (or move the module file elsewhere, or disable it in your kernel config...)
I'm having a similar problem with asus_laptop. At first, I could connect for ~20min, and then it would be killed (still probably rfkill related). When unable to connect, I get "SIOCSIFFLAGS: Unknown error 132"
I don't think it is related to acer-wmi though. The two fixes in 2.6.31 should cover everything. I find it very suspicious that the led devices disappear. They should really not be affected by rfkill, and as far as I can see this is confirmed by the code. I don't know anything about ath9k or what has been changed recently. But perhaps you should try CONFIG_ATH9K_DEBUG. Check the help text, you will need to set a module parameter to get any output.
(In reply to comment #6) > I'm having a similar problem with asus_laptop. At first, I could connect for > ~20min, and then it would be killed (still probably rfkill related). > > When unable to connect, I get "SIOCSIFFLAGS: Unknown error 132" Yes, that is the new RFKILL error code. However asus-laptop does not expose any rfkill device, so your problem is more likely to lie with the wireless driver. (asus-laptop does appear to export a custom wireless toggle interface, but I don't see any recent changes there). Please confirm which kernel versions you are talking about (last known good version, first known bad version).
Adam: I think your bug is something different. In my case, the device is ALWAYS immediately disabled again. so no chance to even connect.
Might be worth probing the rfkill subsystem using the rfkill utility: http://git.sipsolutions.net/rfkill.git Re: Unknown error 132 (include/asm-generic/errno.h) #define ERFKILL 132 /* Operation not possible due to RF-kill */ Eeek...ath9k, not ath5k -- sorry Bob!
(In reply to comment #5) > I don't think acer-wmi is essential. If it is doing something wrong, you > should be able to prevent it from loading > > echo "blacklist acer-wmi" >> /etc/modprobe.d/acer-wmi-blacklist.conf > > (or move the module file elsewhere, or disable it in your kernel config...) I just blacklisted acer-wmi and booted 2.6.31.4 The result is exactly the same as in #1. The device is detected and immediately disabled.
2.6.31.4 from Arch Linux here. I mentioned asus-laptop since I thought it was similar to acer-wmi. Some people in the arch forums were saying that the problem seemed to change based on if bluetooth was turned on or off. If I have bluetooth off, I receive the error every time with wireless (same as Bernhard). asus-laptop does expose custom interface toggles which are the ones I use when enabling and disabling radios.
for what i know, my laptop (extensa 7630ez) does not even have a bluetooth device which i could disable....
(In reply to comment #12) > 2.6.31.4 from Arch Linux here. > > I mentioned asus-laptop since I thought it was similar to acer-wmi. > > Some people in the arch forums were saying that the problem seemed to change > based on if bluetooth was turned on or off. If I have bluetooth off, I > receive > the error every time with wireless (same as Bernhard). > > asus-laptop does expose custom interface toggles which are the ones I use > when > enabling and disabling radios. Right. That last sounds very much like a bug in asus-laptop, please do report it separately. Since it's not a core ACPI driver, you're more likely to get a response if you email the maintainer (To: Corentin Chary <corentincj@iksaif.net>, CC: acpi4asus-user@lists.sourceforge.net). The "20 minute" problem is a bit of a mystery. I would tend to blame it on ath9k, but I suppose it could be due to some core ACPI change. In any case, it would be much appreciated if you could open a new bug describing the 20-minute problem instead of commenting here. It can always be merged back if it is found to be the same problem.
Yep, will do once I get home and have access to my laptop.
Comment 12 suggests a link to bluetooth coexistence?
This might be rfkill related, and is similar to bug report: http://bugzilla.kernel.org/show_bug.cgi?id=13581 A copy and paste from my comment there, perhaps we should merge the two bug reports? Note that rfkill was completely rewritten for the 2.6.31 kernel. Please try out the new rfkill userspace application to see if you can query the rfkill status: http://wireless.kernel.org/en/users/Documentation/rfkill I think there is support for a command: rfkill unblock all
(In reply to comment #14) > (In reply to comment #12) > Right. That last sounds very much like a bug in asus-laptop, please do > report > it separately. Since it's not a core ACPI driver, you're more likely to get > a > response if you email the maintainer (To: Corentin Chary > <corentincj@iksaif.net>, CC: acpi4asus-user@lists.sourceforge.net). Please disregard. Since the disabling bluetooth also causes this problem on an EeePC (and therefore with eeepc-laptop), it's pretty unlikely that asus-laptop is the problem.
Ahhh, sorry I always get that mixed up when I'm not at my machine. Yes, it's an EeePC.
i just installed the rfkill usertool (git version) and realized, that when i boot 2.6.31.x wlan was softblocked. using rfkill unblock all i double checked, that the device was neither soft- nor hardblocked. rmmod ath9k and modprobe ath9k led to the exactly same error message as in msg #1 (according to dmesg). the device is detected and disabled again. no chance to bring it up using ifconfig wlan0 up or something like this :(
Odd -- I know nothing of rfkill, but can you try building with rfkill disabled? Posting your kernel messages and userspace output of rfkill might help here. From what I understand from what you describe you did have your device rfkill'd and then you tried to unblock it using the rfkill userspace app but still were unable to bring the interface up, even after an rmmod/modprobe. Does your laptop have rfkill buttons? Can you try to lift the blocking state without using the rfkill userspace app? You can use the userspace app to query the state. How about the BIOS?
(In reply to comment #21) > Odd -- I know nothing of rfkill, but can you try building with rfkill > disabled? > > Posting your kernel messages and userspace output of rfkill might help here. > From what I understand from what you describe you did have your device > rfkill'd > and then you tried to unblock it using the rfkill userspace app but still > were > unable to bring the interface up, even after an rmmod/modprobe. > > Does your laptop have rfkill buttons? Can you try to lift the blocking state > without using the rfkill userspace app? You can use the userspace app to > query > the state. > > How about the BIOS? 1) i do not know anything about rfkill too, unfortunately. 2) yes, exactly. I booted 2.6.31, checked the rfkill state and realized that the wlan device was softblocked. I unblocked it using the rfkill tool and checked that afterwards it was not softblocked and not hardblocked. After that step I tried to get the device up by rmmod and modprobe and got exactly the same kernel-message as I have posted in #1 3) An rfkill button is to disable/enable wlan, right? yes, i do have such a button, but clicking it doesn't change anything when booting 2.6.31. I checked that with rfkill tool. (rfkill list).
Can you try 'rfkill unblock all' after bootup and then try to bring the interface up -- that is, do not rmmod ath9k and modprobe it afterwards.
Hi Luis. Adding to #22: Suddenly the rfkill-button does work in a sense that it changes the softblock-state of wlan according to the output of rfkill list. But: Installing 2.6.31, rfkill unblock all and ifconfig wlan0 up tells me that wlan0 does not exist (since it was disabled when booting). strange... especially since no one else is experiencing this as it seems...
Can you provide output from dmesg when you try all these things?
i did a tail -f /var/log/dmesg but when i ran all these commands, nothing (absolutely nothing) was added to dmesg :(
OK then I'm afraid I cannot be of any more help at this point, as Alan pointed out you may want to communicate this to the platform driver maintainer for your laptop.
(In reply to comment #27) > OK then I'm afraid I cannot be of any more help at this point, as Alan > pointed > out you may want to communicate this to the platform driver maintainer for > your > laptop. Ok, thx anyway a lot for trying to help. One question, though. What do you mean by "platform driver maintainer"? Some guy at Acer? Packager at Arch?
Bernhard: The "platform driver" means acer-wmi in your case (and the contact addresses could be found in the MAINTAINERS file). But the evidence in _your_ case should rule it out, because the same problem happened when you blacklisted acer-wmi. I think it's just been too confusing in here with both you and Adam + John, me and Luis :-/. The key is this: > ##################################################### > #### Log with 2.6.31 Kernel (Wifi is not working) ### > ##################################################### > [bernhard@wallaby ~]$ dmesg |grep ath > ath9k 0000:05:00.0: enabling device (0000 -> 0002) > ath9k 0000:05:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 > ath9k 0000:05:00.0: setting latency timer to 64 [working kernels print led messages here, but the non-working kernel does this:] > ath9k 0000:05:00.0: PCI INT A disabled > ##################################################### It strongly suggests that the ath9k driver failed to initialize the device. It didn't get as far as creating the LEDs, before it gave up and disabled the PCI device again. [Platform rfkill could potentially do evil things to cause that, but we've ruled it out. And this is _not_ the expected behaviour for ath9k rfkill. If ath9k sees that the device is rfkilled, it should not fail to create the wlan0 device. It should create wlan0, but generate the new RFKILL error code (132) on "ifconfig up wlan0".] As I say, I don't know anything about ath9k or it's recent development. All _I_ can suggest is that you check out CONFIG_ATH9K_DEBUG and the associated module option. Hopefully that will narrow down where the failure occurs in the initialisation sequence.
Thanks Alan -- I missed the PCI INT A disabled Odd, given that ath_pci_probe() does print out any error messages using KERN_ERR so you should see them. The CONFIG_ATH9K_DEBUG will help if you at least pass the device initialization, in this case this is not even reached so therefore it won't help. Although we do use KERN_ERR on probe for errors to be safe set your default kernel print log level to print all out: dmesg -n 8 If you don't get much out you could add printk's to the pci probe to see at least where it reached. Another idea is to try the latest ath9k using directly wireless-testing [1] which requires a whole kernel compile or compat-wireless [2] which just requires the wireless modules to be compiled. compat-wireless is based on wireless-testing so its bleeding edge. If you want to try what is on 2.6.32 you could try the compat-wireless stable release as well [3]. [1] http://wireless.kernel.org/en/developers/Documentation#wireless-testing.git [2] http://wireless.kernel.org/en/users/Download [3] http://wireless.kernel.org/en/users/Download/stable
Unfortunately there's one failure that doesn't have a printk in v2.6.31.1. I think it's the only one, so at least it narrows it down :-). if (ath_attach(id->device, sc) != 0) { ret = -ENODEV; goto bad3; }
And there are a fair number of DPRINTF's under ath_attach. I assume they will show up in dmesg if one enables CONFIG_ATH9K_DEBUG & do "modprobe -r ath9k; modprobe ath9k debug=0xffffffff".
Heh good catch -- I was looking at wireless-testing code an on that ath_pci_probe() always prints something out upon failure. Only in recent kernels will we propagate the exact error during hardware initialization so what would be good is for Bernhard to try a later version to see where the hardware initialization fails (assuming that is where this hits). Using wireless-testing or compat-wireless would be good. In bleeding edge you actually now should not need to enable CONFIG_ATH9K_DEBUG to enable debug prints but its still a good idea. Either way as Alan suggests please do ensure to use: modprobe ath9k debug=0xffffffff On compat-wireless you can enable CONFIG_ATH9K_DEBUG by setting it on config.mk, its already there just remove the # which makes it a comment now.
Hi everybody. I'll try to compile compat-testing and will report back if I succeed.
I did not manage to compile and run compat-testing nor compiling a custom kernel with CONFIG_ATH9K_DEBUG enabled. However, perhaps this is helpful. I realized that if I boot any 2.6.31 kernel with acpi=off, everything works fine. wlan0 device is created and I can connect just fine.
Please report back -- I'd like to see the output of loading ath9k with debugging completely enabled on the latest ath9k driver. What issues are you having?
Hi Luis. First of, the situation is the still the same with any 2.6.31.x kernel as described in #1. Unfortunately, I don't have any clue how to compile a kernel with debugging enabled. I am sorry but I have not yet compiled any kernel by myself and I am pretty helpless in this case. What I still experience (also with 2.6.31.5) is that the device is enable and working if I boot with acpi=off (if this if of any help). The debug-messages in this case are the same as for the (working) 2.6.30.x kernels.
You can easily enable ath9k debugging if you just use compat-wireless and edit config.mk to enable CONFIG_ATH9k_DEBUG. http://wireless.kernel.org/en/users/Download This is look for ATH9K on config.mk and make sure this is set: CONFIG_ATH9K=m CONFIG_ATH9K_DEBUG=y Then run: make sudo make install reboot
Oh you also need CONFIG_ATH_DEBUG enabled
Hi Luis. I compiled compat-wireless (the latest) and managed to get the following output: 1) relevant parts of dmesg after make unload && modprobe ath9k http://pastebin.com/f2b5d59f7 2) after rebooting the laptop with the latest drivers http://pastebin.com/f26cfc51b I hope this is of any help. Bernhard
What changes did you make to config.mk ? Can you paste the diff here.
I ran ./scripts/driver-select ath9k and then just uncommented CONFIG_ATH9K_DEBUG=y in config.mk. That's the only change I made. However, I just now realized that I should have uncommented CONFIG_ATH_DEBUG. I will try that now and report back.
Yeah you also need CONFIG_ATH_DEBUG.
ok. I seem to have another problem. I did the following 1) Installed kernel 2.6.31.5 from arch-repositories and rebooted (wlan device not detected as in #1) 2) downloaded and tarred bleeding edge compat-wireless 3) ran ./scripts/driver-select ath9k 4) make 5) changed config.mk (diff to original: http://pastebin.com/f52bb030) 6) make install -> here something strange happened. I realized two warning messages about ath_print not defined in ath9k_hw.ko and ath9k.ko) but install went through 7) make unload 8) modprobe ath9k -> dmesg-output: http://pastebin.com/f61a00fab 9) rebooted: -> dmesg-output after reboot: http://pastebin.com/f218d1390 so it seems that due to enabling CONFIG_ATH_DEBUG all wlan-specifig output is gone from dmesg....
4) make 5) changed config.mk (diff to original: http://pastebin.com/f52bb030) 6) make install This order is incorrect. You should do it in this order: 4) changed config.mk (diff to original: http://pastebin.com/f52bb030) 5) make 6) make install
sorry, I made an error copy and pasting. Of course I edited config.mk before make && make install. sorry for the confusion.
So ath_print comes from ath.ko -- can you run modprobe -l ath What do you get? We need to ensure you are building ath.ko on compat-wireless, you can ensure this by having: CONFIG_ATH_COMMON=m CONFIG_ATH_DEBUG=y
Hi, I have not enabled everything suggested here: CONFIG_ATH9K=m CONFIG_ATH9K_DEBUG=y CONFIG_ATH_COMMON=m CONFIG_ATH_DEBUG=y While doing make: (building modules, stage 2:) the following warnings occured: WARNING: "ath_print" [/home/bernhard/compat/compat-wireless-2009-10-27/drivers/net/wireless/ath/ath9k/ath9k_hw.ko] undefined! WARNING: "ath_print" [/home/bernhard/compat/compat-wireless-2009-10-27/drivers/net/wireless/ath/ath9k/ath9k.ko] undefined! After make install I unloaded the modules [bernhard@wallaby compat-wireless-2009-10-27]$ sudo make unload Unloading ath... Unloading mac80211... which seemed to work fine, however modprobing ath9k resulted in: [bernhard@wallaby compat-wireless-2009-10-27]$ sudo modprobe ath9k FATAL: Error inserting ath9k (/lib/modules/2.6.31-ARCH/updates/drivers/net/wireless/ath/ath9k/ath9k.ko): Unknown symbol in module, or unknown parameter (see dmesg) The corresponding dmesg-output is: dmesg: cfg80211: Calling CRDA to update world regulatory domain ath9k_hw: Unknown symbol ath_print To make sure ath is build on compat-wireless: [bernhard@wallaby compat-wireless-2009-10-27]$ sudo modprobe -l ath updates/drivers/net/wireless/ath/ath.ko after reboot I find (only) the following line in dmesg relating to ath: [bernhard@wallaby ~]$ dmesg |grep ath ath9k_hw: Unknown symbol ath_print The device is still not created. Thx for your interest in this bug and guiding me through all this stuff btw! Bernhard
This is just odd. This is from ath/debug.h #ifdef CONFIG_ATH_DEBUG void ath_print(struct ath_common *common, int dbg_mask, const char *fmt, ...); #else static inline void ath_print(struct ath_common *common, int dbg_mask, const char *fmt, ...) { } #endif /* CONFIG_ATH_DEBUG */ Then the ath/Makefile has: ath-$(CONFIG_ATH_DEBUG) += debug.o and ath/debug.c has the ath_print implemented and exported: void ath_print(struct ath_common *common, int dbg_mask, const char *fmt, ...) { va_list args; if (likely(!(common->debug_mask & dbg_mask))) return; va_start(args, fmt); printk(KERN_DEBUG "ath: "); vprintk(fmt, args); va_end(args); } EXPORT_SYMBOL(ath_print); I suspect you might be doing something wrong but I cannot tell what. Do you also have CONFIG_ATH9K_HW=m ?
Hi Luis. This is the config.mk file I have used http://pastebin.com/f441d160e I downloaded compat-wireless from here http://wireless.kernel.org/download/compat-wireless-2.6/compat-wireless-2.6.tar.bz2
Well ok but you ran ./scripts/driver-select ath9k I am thinking I never tested this with debug so its possible that the ath/Makefile gets fuckered up for debug. Do me a favor and download that tarball again, and do not run driver-select, instead just make sure you uncomment the config options for debugging as discussed. The compile everything (make) and install.
Ok, I did as you suggested getting the following dmesg-output after unloading and re-modprobing ath9k: ------------------- ath9k: Driver unloaded cfg80211: Calling CRDA to update world regulatory domain ath9k 0000:05:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 ath9k 0000:05:00.0: setting latency timer to 64 ath: timeout (100000 us) on reg 0x7044: 0xdccebfdb & 0x0000000f != 0x00000002 ath: Couldn't reset chip ath: Unable to initialize hardware; initialization status: -5 ath9k 0000:05:00.0: failed to initialize device ath9k 0000:05:00.0: PCI INT A disabled ath9k: probe of 0000:05:00.0 failed with error -5
And this used to work on 2.6.30, so a regression must've obviously have been introduced. Can you do a git bisect between the two kernel releases? It'll take a while but it would help. At this point I don't see anything obvious and since you did mention that disabling ACPI with acpi=off this could also not even be related to the driver but something change ACPI. I think a bisect between 2.6.30 and 2.6.31 would therefore be very useful and valuable.
Adding Len Brown -- perhaps he might be aware of some ACPI regression on 2.6.31 pending resolution.
What's your lspci output for the Atheros device?
Adding some notes on the ath9k front -- this seems to fail at the very first register write to the hardware. The first write to the hardware happens when we try to initialize the hardware through ath9k_hw_init(). The first call to access the hardware is done in that routine as follows: if (!ath9k_hw_set_reset_reg(ah, ATH9K_RESET_POWER_ON)) { ath_print(common, ATH_DBG_FATAL, "Couldn't reset chip\n"); return -EIO; } This is small enough so I'll just paste the code here: static bool ath9k_hw_set_reset_reg(struct ath_hw *ah, u32 type) { REG_WRITE(ah, AR_RTC_FORCE_WAKE, AR_RTC_FORCE_WAKE_EN | AR_RTC_FORCE_WAKE_ON_INT); switch (type) { case ATH9K_RESET_POWER_ON: return ath9k_hw_set_reset_power_on(ah); case ATH9K_RESET_WARM: case ATH9K_RESET_COLD: return ath9k_hw_set_reset(ah, type); default: return false; } } static bool ath9k_hw_set_reset_power_on(struct ath_hw *ah) { REG_WRITE(ah, AR_RTC_FORCE_WAKE, AR_RTC_FORCE_WAKE_EN | AR_RTC_FORCE_WAKE_ON_INT); if (!AR_SREV_9100(ah)) REG_WRITE(ah, AR_RC, AR_RC_AHB); REG_WRITE(ah, AR_RTC_RESET, 0); udelay(2); if (!AR_SREV_9100(ah)) REG_WRITE(ah, AR_RC, 0); REG_WRITE(ah, AR_RTC_RESET, 1); if (!ath9k_hw_wait(ah, AR_RTC_STATUS, AR_RTC_STATUS_M, AR_RTC_STATUS_ON, AH_WAIT_TIMEOUT)) { ath_print(ath9k_hw_common(ah), ATH_DBG_RESET, "RTC not waking up\n"); return false; } ath9k_hw_read_revisions(ah); return ath9k_hw_set_reset(ah, ATH9K_RESET_WARM); } So the issue comes from the RTC_STATUS not coming back happy. I do see one thing worth trying... I'll post a patch for that.
Created attachment 23551 [details] remove duplicate rtc reset This removes the duplicate rtc reset setting. Please give this a try on either compat-wireless or on wireless-testing.
(In reply to comment #55) > What's your lspci output for the Atheros device? [bernhard@wallaby ~]$ lspci |grep Network 05:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
(In reply to comment #57) > Created an attachment (id=23551) [details] > remove duplicate rtc reset > > This removes the duplicate rtc reset setting. Please give this a try on > either > compat-wireless or on wireless-testing. I just tested with your patch applied and getting exactly the same output as before in #52. The device is still not initialized unfortunately.
Then yeah I cannot see what could be bad with the driver code at this point as that is a very basic initial driver write settings to hardware. Something is preventing the write to the card to be issued correctly. I'd recommend a bisect.
Hm, from what I read now about git-bisect. Do I understand it correctly that after each step the kernel (with the source depending on the bitsect-step) needs to be compiled?
(In reply to comment #61) > Hm, from what I read now about git-bisect. Do I understand it correctly that > after each step the kernel (with the source depending on the bitsect-step) > needs to be compiled? Compiled & tested during each step, and tell git-bisect whether you're seeing the bad behaviour or not so it can decide what the next step should be.
Just for info: On my way to bisect the kernel to find the problematic revision, I tried to find a minimal kernel-config and somehow managed to compile my first kernel, 2.6.32-rc5. Surprisingly, the wiresless device works out of the box now again. Is it safe to assume that - since I have already tested the device with compat-wireless under 2.6.31 which didn't make the device work - another part of the kernel introduced the regression? If so, how should I proceed? Still trying to bisect or just report this bug elsewhere?
Ok, finally I managed to do the bisect, sorry for the delay bisect-log output: http://pastebin.com/f53977eda first bad commit: http://pastebin.com/f376cdb50 Bernhard
Whoa, interesting, can you now just try to use wireless-testing and revert that single patch and see if indeed reverting it fixes your issue?
Well you don't need to use wireless-testing, whatever git tree you use, just rever that one commit sha1 and see if that fixes the issue indeed. And please confirm that with it enabled you do get the issue you describe in this bug report (ath9k probe fails due to the RTC reset failing, which is the first thing we do for ath9k).
I will for sure do this if anyone can tell me how to revert a single patch....
Sure, in your case its: git revert 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 But please be very sure you are done with the bisect. git bisect reset
it seems that I am too stupid to revert the patch. I did checkout the lastest 2.6.31.x kernel and then did a git revert 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 and ended with [bernhard@wallaby linux-2.6.31.y]$ git revert 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 warning: too many files (created: 1571 deleted: 336), skipping inexact rename detection Automatic revert failed. After resolving the conflicts, mark the corrected paths with 'git add <paths>' or 'git rm <paths>' and commit the result. now, I absolutely have no clue how to resolve the conflict and provide what you asked for....
git reset --hard origin Then compile and confirm your issue is present. Then git revert 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 Compile and verify your issue is gone.
Compiling 2.6.31.5 (after git resect) results in the device not detected. However, I am unable to build the kernel after the revert with make failing with the following error (sorry for the german error msgs)... CC arch/x86/kernel/e820.o arch/x86/kernel/e820.c:1368: Fehler: expected identifier or »(« before »<<« token arch/x86/kernel/e820.c:1388: Fehler: expected identifier or »(« before »==« token arch/x86/kernel/e820.c:1389:9: Fehler: ungültiger Suffix »fbe3e...« an Ganzzahlkonstante arch/x86/kernel/e820.c:1423:9: Fehler: ungültiger Suffix »fbe3e...« an Ganzzahlkonstante make[2]: *** [arch/x86/kernel/e820.o] Fehler 1 make[1]: *** [arch/x86/kernel] Fehler 2 make: *** [arch/x86] Fehler 2 I think the problem is that I cannot properly revert the patch with the error msg given in #69.
That patch is very small, it should not cause so many issues. Most likely you never git bisect reset properly so a revert during a rebase (bisect) could have confused git. Let me try to see if I run into the same compile issue as you do with a revert of the same patch.
Created attachment 23638 [details] 0001-Revert-x86-e820-pci-reserve-extra-free-space-nea.patch Try applying this instead of the doing the 'git revert...' yourself...
Don't forget to git reset --hard origin before applying
OK I see the revert issue you were describing. The problem is that there was a patch after that one that touched the same code area, so reverting just that commit sha1 is a bit more trickier. Trying to see if I can resolve the conflicts manually.
Oh wait, John you resolved that didn't you.
Indeed, sorry for the confusion, please just: git reset --hard origin and apply John's patch. If that does fix your issue then please try to apply it then to ensure you do get the issue *with* the patch applied.
Bernhard, sorry for the trouble but it seems I gave you instructions on something that will probably have also not gotten you a pristine 2.6.31.5. When you run: git reset --hard origin it will checkout a fresh tree from origin/master which if you're using Linus' tree today is based on 2.6.32 and not 2.6.31. So to get your 2.6.31 please trash your older branch and create a fresh one based on 2.6.31.5. What tree are you using? To be sure you can check the top level Makefile to ensure you are on 2.6.31.5 Sorry for any confusion.
first of, i checked out the latest 2.6.31.y tree, made the kernel and faced the problem described in #1. Then I applied the patch (thx john), made the kernel again and the problem was not solved (still no wlan interface). BUT: What came to my mind. Whenever I face the problem that the wlan device is disabled, I get a warning about a failed phy ?? from my ethernet device (tg3) which is according to lspci a 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5764M Gigabit Ethernet PCIe (rev 10) With any working kernel (2.6.30.x, 2.6.32-rc5 for example) I don't get this warning. Perhaps this is the reason....
Well you can compile your kernel without that driver to verify your theory, or maybe disable it on the BIOS. Anyway -- you did a git bisect and managed to find a culprit commit but it did not seem to lead to a proper culprit patch. I noticed you did a git bisect skip in your history, do your recall doing this or was this automatically done somehow? Lets try to take advantage of your git bisect history though. Your latest good sha1 is 134cbf35c739bf89c51fd975a33a6b87507482c4 which is right after 2.6.30-rc5. Can you ensure double check this is a good commit with: git checkout -b x86-mm-patch-01 134cbf35c739bf89c51fd975a33a6b87507482c4 and compiling and loading that kernel. If that is good then its probably the end limit of what we need to test and if your git bisect log is correct the next bad is 5d423ccd7ba4285f1084e91b26805e1d0ae978ed so please test to confirm that is bad with: git checkout -b x86-mm-patch-03 5d423ccd7ba4285f1084e91b26805e1d0ae978ed Test that to confirm it is bad. If so then we have only 1 patches to test: git checkout -b x86-mm-patch-02 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 Test that.. So if you do have some time to test please test those sha1sums. Hopefully one of them will be good and one of them bad.
(In reply to comment #80) > Well you can compile your kernel without that driver to verify your theory, > or > maybe disable it on the BIOS. I tried this, it seems that tg3 is not the reason for the observed behaviour since I compiled the kernel completely without ethernet support and experienced the same behaviour. > Lets try to take advantage of your git bisect history though. > Your latest good sha1 is 134cbf35c739bf89c51fd975a33a6b87507482c4 which is > right after 2.6.30-rc5. Can you ensure double check this is a good commit > with: > > git checkout -b x86-mm-patch-01 134cbf35c739bf89c51fd975a33a6b87507482c4 > > and compiling and loading that kernel. I just booted this kernel and can confirm that this is indeed a good one. Will now compile the remaining sha's and report back...
(In reply to comment #80) > ... if your git bisect log is correct the > next bad is 5d423ccd7ba4285f1084e91b26805e1d0ae978ed so please test to > confirm > that is bad with: > git checkout -b x86-mm-patch-03 5d423ccd7ba4285f1084e91b26805e1d0ae978ed > > Test that to confirm it is bad. Yes, this one is bad. > If so then we have only 1 patches to test: > > git checkout -b x86-mm-patch-02 45fbe3ee01b8e463b28c2751b5dcc0cbdc142d90 > > Test that.. This one is good. > So if you do have some time to test please test those sha1sums. Hopefully one > of them will be good and one of them bad. hope is the poor man's bread :-)
OK great, now please checkout a clean 2.6.31.5 branch: git checkout -b linux-2.6.31.y v2.6.31.5 compile and load and confirm that kernel has an issue. Then run: git revert 5d423ccd7ba4285f1084e91b26805e1d0ae978ed compile and load and confirm the issue is resolved.
(In reply to comment #83) > OK great, now please checkout a clean 2.6.31.5 branch: > > git checkout -b linux-2.6.31.y v2.6.31.5 > > compile and load and confirm that kernel has an issue. confirmed. wlan-device is disabled on boot > Then run: > > git revert 5d423ccd7ba4285f1084e91b26805e1d0ae978ed > > compile and load and confirm the issue is resolved. confirm, issue is resolved by reverting this commit. bernhard
Created attachment 23657 [details] Ram align hack Bernhard, get a clean 2.6.31.5 branch and then apply the attached patch with: git am ram-align-hack.patch Compile and see if that also resolves your woes with ath9k.
Also, please post the full log (dmesg > log.txt) of the bootup with the new patch applied.
Bernhard also please attach "debug" the the kernel parameters line. On grub this would be editing /boot/grub/menu.lst and for your specific kernel line add debug.
(In reply to comment #87) > Bernhard also please attach "debug" the the kernel parameters line. On grub > this would be editing /boot/grub/menu.lst and for your specific kernel line > add > debug. you may need CONFIG_PCI_DEBUG=y
from 2.6.30 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009f400 (usable) BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000d2000 - 00000000000d4000 (reserved) BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000b5aa1000 (usable) BIOS-e820: 00000000b5aa1000 - 00000000b5aa7000 (reserved) BIOS-e820: 00000000b5aa7000 - 00000000b5bba000 (usable) BIOS-e820: 00000000b5bba000 - 00000000b5c0f000 (reserved) BIOS-e820: 00000000b5c0f000 - 00000000b5d08000 (usable) BIOS-e820: 00000000b5d08000 - 00000000b5f0f000 (reserved) BIOS-e820: 00000000b5f0f000 - 00000000b5f19000 (usable) BIOS-e820: 00000000b5f19000 - 00000000b5f1f000 (reserved) BIOS-e820: 00000000b5f1f000 - 00000000b5f64000 (usable) BIOS-e820: 00000000b5f64000 - 00000000b5f9f000 (ACPI NVS) BIOS-e820: 00000000b5f9f000 - 00000000b5fe2000 (usable) BIOS-e820: 00000000b5fe2000 - 00000000b5fff000 (ACPI data) BIOS-e820: 00000000b5fff000 - 00000000b6000000 (usable) BIOS-e820: 0000000100000000 - 0000000140000000 (usable) Allocating PCI resources starting at b8000000 (gap: b6000000:4a000000) pci 0000:00:1f.3: reg 10 64bit mmio: [0x000000-0x0000ff] pci 0000:00:1f.3: reg 20 io port: [0x1c00-0x1c1f] pci 0000:00:1c.0: bridge io port: [0x00-0xfff] pci 0000:00:1c.0: bridge 32bit mmio: [0x000000-0x0fffff] pci 0000:00:1c.0: bridge 64bit mmio pref: [0x000000-0x0fffff] pci 0000:05:00.0: reg 10 64bit mmio: [0x000000-0x00ffff] pci 0000:05:00.0: supports D1 pci 0000:05:00.0: PME# supported from D0 D1 D3hot pci 0000:05:00.0: PME# disabled pci 0000:00:1c.1: bridge io port: [0x00-0xfff] pci 0000:00:1c.1: bridge 32bit mmio: [0x000000-0x0fffff] pci 0000:00:1c.1: bridge 64bit mmio pref: [0x000000-0x0fffff] pci 0000:00:1c.2: bridge io port: [0x00-0xfff] pci 0000:00:1c.2: bridge 32bit mmio: [0x000000-0x0fffff] pci 0000:00:1c.2: bridge 64bit mmio pref: [0x000000-0x0fffff] pci 0000:09:00.0: reg 10 64bit mmio: [0x000000-0x00ffff] pci 0000:09:00.0: PME# supported from D3hot D3cold pci 0000:09:00.0: PME# disabled pci 0000:00:1c.5: bridge io port: [0x00-0xfff] pci 0000:00:1c.5: bridge 32bit mmio: [0x000000-0x0fffff] pci 0000:00:1c.5: bridge 64bit mmio pref: [0x000000-0x0fffff] with patch the increase alignment to 64M should workaround the problem. the notebooks looks from same vendorfor sky2 driver bug? looks like they are using [0xb6000000, 0xb8000000) for video ram? may need to find one way to read those range from HW.
Hi. Sorry for not providing feedback sooner, but I am just moving into another flat. Anyway, the patch provided makes the device working. Above the dmesg after boot with CONFIG_PCI_DEBUG=y. dmesg from 2.6.31.5 stable (unfortunately I forgot to set the "debug" kernel option): http://pastebin.com/f68658b89 dmesg from patched 2.6.31.5 kernel with "debug" set in grub: http://pastebin.com/f739fdf6b Bernhard
Someone please close this bug and mark as fixed (and closed) -- this fix is now on 2.6.31.6.
Closing.