Bug 15862
Summary: | 2.6.34-rc4/5: iwlagn unusable until reload | ||
---|---|---|---|
Product: | Drivers | Reporter: | Maciej Rutecki (maciej.rutecki) |
Component: | network-wireless | Assignee: | Shanyu Zhao (shanyu.zhao) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | florian, johannes, linville, maciej.rutecki, nico-bugzilla.kernel.org, reinette.chatre, rjw, shanyu.zhao |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34-rc4 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 15310 | ||
Attachments: |
dmesg before reload
2.6.34-rc6-00106-gd93ac51-initial works complete dmesg series from load, reload, etc. patch to make work items match properly After suspend, no packets can be received dmesg output after reload After restarting wpa_supplicant |
Description
Maciej Rutecki
2010-04-27 18:43:37 UTC
Hi Nico, Does this problem only happen after reboot? Can you please compile the driver with debug options: CONFIG_IWLWIFI_DEBUG=y CONFIG_IWLWIFI_DEBUGFS=y CONFIG_IWLWIFI_DEVICE_TRACING=y And load the module with parameter "debug=0x63fff", then send us the whole debug log? If this problem only happens after reboot, you may need to set the module parameter in /etc/modprobe.d/ Thanks, Shanyu Hey I also needed CONFIG_MAC80211_DEBUGFS=y for CONFIG_IWLWIFI_DEBUGFS=y. Regarding CONFIG_IWLWIFI_DEVICE_TRACING=y, I found the dependency to EVENT_TRACING, which I couldn't trace down to its menu item, because the search for it returned only Symbol: EVENT_TRACING [=n] │ Selects: CONTEXT_SWITCH_TRACER [=n] │ Selected by: TRACING [=n] Which again has other dependencies...heck, is there no way just to force CONFIG_IWLWIFI_DEVICE_TRACING=y in make menuconfig (probably this should go to lkml...) bugzilla-daemon@bugzilla.kernel.org [Wed, Apr 28, 2010 at 03:46:29AM +0000]: > Hi Nico, > > Does this problem only happen after reboot? I'm a bit confused: If it's the first time the module is loaded, it must always be after a reboot? If you mean if it also happens after a cold boot, the answer is yes. > Can you please compile the driver with debug options: > CONFIG_IWLWIFI_DEBUG=y > CONFIG_IWLWIFI_DEBUGFS=y Enabled thosee two for now and will recompile and test. Nico -- New PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0 Please resign, if you signed 9885188C or 8D0E27A4. Currently moving *.schottelius.org to http://www.nico.schottelius.org/ ... Hi Nico, Sorry I should mention that CONFIG_IWLWIFI_DEVICE_TRACING depends on TRACING, which basically is based on ftrace. I was originally interested in seeing some ftrace output. Yes, enablying: CONFIG_IWLWIFI_DEBUG=y CONFIG_IWLWIFI_DEBUGFS=y should be good for now. Basically I'm asking if the problem (as seen in your trace "Error sending REPLY_CT_KILL_CONFIG_CMD: time out after 500ms.") can be reproduced randomly, or you have to reboot the machine so that the first time loading module it failed with the above log, and after reloading module the problem goes away? How frequently you see this problem? In your log, I saw the problem happened about 2 minutes after the module gets loaded, do you have network connection in that 2 minutes? Or it never connects to an AP? Thanks, Shanyu Hey Shanyu, bugzilla-daemon@bugzilla.kernel.org [Wed, Apr 28, 2010 at 04:31:45PM +0000]: > --- Comment #3 from Shanyu Zhao <shanyu.zhao@intel.com> 2010-04-28 16:31:26 > --- > Hi Nico, > > Sorry I should mention that CONFIG_IWLWIFI_DEVICE_TRACING depends on TRACING, > which basically is based on ftrace. I was originally interested in seeing > some > ftrace output. Yes, enablying: > CONFIG_IWLWIFI_DEBUG=y > CONFIG_IWLWIFI_DEBUGFS=y > should be good for now. Attached that output. > Basically I'm asking if the problem (as seen in your trace "Error sending > REPLY_CT_KILL_CONFIG_CMD: time out after 500ms.") can be reproduced randomly, > or you have to reboot the machine so that the first time loading module it > failed with the above log, and after reloading module the problem goes away? > How frequently you see this problem? I always saw it exactly after reboot. Currently I am using 2.6.34-rc5-00340-g79dba2e on which it works straight after boot. > In your log, I saw the problem happened about 2 minutes after the module gets > loaded, do you have network connection in that 2 minutes? Or it never > connects > to an AP? It never connects, but an AP is in range and it also connects when iwlagn is reloaded. Cheers, Nico -- New PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0 Please resign, if you signed 9885188C or 8D0E27A4. Currently moving *.schottelius.org to http://www.nico.schottelius.org/ ... Created attachment 26213 [details]
dmesg before reload
Hey Shanyu, an update regarding that issue, now running on 2.6.34-rc5-00340-g79dba2e: Iwlagn stops working after some hours of activity, displaying the following messages (sorry, debug not loaded, have to add it to modprobe.conf, otherwise I'll miss it always: [root@kr ~]# dmesg -c [49672.525816] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [49672.725554] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [49672.925195] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [49673.124839] wlan0: direct probe to 00:05:4e:4d:82:6d timed out [49685.755245] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [49685.954616] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [49686.154321] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [49686.223975] wlan0: direct probe responded [49686.228184] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [49686.426841] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [49686.626491] wlan0: authenticate with 00:05:4e:4d:82:6d (try 3) [49686.629633] wlan0: direct probe responded [49686.634364] wlan0: associate with 00:05:4e:4d:82:6d (try 1) [49686.639832] wlan0: direct probe responded [49686.639837] wlan0: invalid aid value 0; bits 15:14 not set [49845.070636] No probe response from AP 00:05:4e:4d:82:6d after 500ms, disconnecting. [49845.075606] cfg80211: Calling CRDA to update world regulatory domain [49889.387359] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [49889.586325] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [49889.785969] wlan0: authenticate with 00:05:4e:4d:82:6d (try 3) [49889.985610] wlan0: authentication with 00:05:4e:4d:82:6d timed out [49910.765129] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [49910.850148] wlan0: direct probe responded [49910.854953] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [49911.054596] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [49911.254237] wlan0: authenticate with 00:05:4e:4d:82:6d (try 3) [49911.453877] wlan0: authentication with 00:05:4e:4d:82:6d timed out [49948.655205] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [49948.854403] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [49949.054047] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [49949.253687] wlan0: direct probe to 00:05:4e:4d:82:6d timed out -------------------------------------------------------------------------------- [50005.268634] wlan0: direct probe to 00:05:4e:4d:82:6d timed out [50042.498197] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [50042.697095] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [50042.896731] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [50043.096370] wlan0: direct probe to 00:05:4e:4d:82:6d timed out [50063.867627] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [50064.066563] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [50064.266198] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [50064.465832] wlan0: direct probe to 00:05:4e:4d:82:6d timed out [50085.265894] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [50085.464955] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [50085.545574] wlan0: direct probe responded [50085.549801] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [50085.749444] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [50085.852226] wlan0: direct probe responded [50085.859710] wlan0: associate with 00:05:4e:4d:82:6d (try 1) [50086.058886] wlan0: associate with 00:05:4e:4d:82:6d (try 2) [50086.258527] wlan0: associate with 00:05:4e:4d:82:6d (try 3) [50086.458170] wlan0: association with 00:05:4e:4d:82:6d timed out [50098.496705] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [50098.696089] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [50098.895728] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [50099.095370] wlan0: direct probe to 00:05:4e:4d:82:6d timed out [50128.062109] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [50128.261741] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [50128.461380] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [50128.661035] wlan0: direct probe to 00:05:4e:4d:82:6d timed out [50141.242950] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [50141.245944] wlan0: direct probe responded [50141.249284] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [50141.250445] wlan0: direct probe responded [50141.252207] wlan0: associate with 00:05:4e:4d:82:6d (try 1) [50141.451964] wlan0: associate with 00:05:4e:4d:82:6d (try 2) [50141.460088] wlan0: direct probe responded [50141.460093] wlan0: invalid aid value 0; bits 15:14 not set [50273.298104] No probe response from AP 00:05:4e:4d:82:6d after 500ms, disconnecting. [50273.305819] cfg80211: Calling CRDA to update world regulatory domain [50276.593576] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [50276.792794] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [50276.992444] wlan0: authenticate with 00:05:4e:4d:82:6d (try 3) [50277.192079] wlan0: authentication with 00:05:4e:4d:82:6d timed out [50297.995540] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [50298.195197] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [50298.394833] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [50298.594471] wlan0: direct probe to 00:05:4e:4d:82:6d timed out [50319.329553] wlan0: direct probe to 00:05:4e:4d:82:6d (try 1) [50319.528703] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [50319.728341] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [50319.927989] wlan0: direct probe to 00:05:4e:4d:82:6d timed out [50324.897028] PM: Removing info for No Bus:wlan0 [50324.905037] PM: Removing info for No Bus:rfkill0 [50324.912961] PM: Removing info for No Bus:phy0 [50324.916364] iwlagn 0000:02:00.0: PCI INT A disabled [50327.438822] iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:d [50327.438824] iwlagn: Copyright(c) 2003-2010 Intel Corporation [50327.438914] iwlagn 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [50327.438952] iwlagn 0000:02:00.0: setting latency timer to 64 [50327.440241] iwlagn 0000:02:00.0: Detected Intel Wireless WiFi Link 6000 Series 3x3 AGN REV=0x74 [50327.449528] iwlagn 0000:02:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels [50327.449642] iwlagn 0000:02:00.0: irq 29 for MSI/MSI-X [50327.449975] iwlagn 0000:02:00.0: firmware: requesting iwlwifi-6000-4.ucode [50327.449993] PM: Adding info for No Bus:0000:02:00.0 [50327.480437] PM: Removing info for No Bus:0000:02:00.0 [50327.480682] iwlagn 0000:02:00.0: loaded firmware version 9.176.4.1 [50327.481092] PM: Adding info for No Bus:phy1 [50327.481184] PM: Adding info for No Bus:rfkill4 [50327.481818] phy1: Selected rate control algorithm 'iwl-agn-rs' [50327.481926] PM: Adding info for No Bus:wlan0 [50327.795617] ADDRCONF(NETDEV_UP): wlan0: link is not ready [root@kr ~]# After reload: -------------------------------------------------------------------------------- [50332.233788] wlan0: direct probe to 00:05:4e:4d:82:6d (try 2) [50332.433414] wlan0: direct probe to 00:05:4e:4d:82:6d (try 3) [50332.509068] wlan0: direct probe responded [50332.512514] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [50332.711866] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [50332.815723] wlan0: direct probe responded [50332.826366] wlan0: associate with 00:05:4e:4d:82:6d (try 1) [50333.025350] wlan0: associate with 00:05:4e:4d:82:6d (try 2) [50333.225031] wlan0: associate with 00:05:4e:4d:82:6d (try 3) [50333.424573] wlan0: association with 00:05:4e:4d:82:6d timed out [50353.397696] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [50353.597266] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [50353.796913] wlan0: authenticate with 00:05:4e:4d:82:6d (try 3) [50353.996493] wlan0: authentication with 00:05:4e:4d:82:6d timed out [50382.986593] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [50383.185940] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [50383.385505] wlan0: authenticate with 00:05:4e:4d:82:6d (try 3) [50383.585187] wlan0: authentication with 00:05:4e:4d:82:6d timed out [50396.210912] wlan0: authenticate with 00:05:4e:4d:82:6d (try 1) [50396.409966] wlan0: authenticate with 00:05:4e:4d:82:6d (try 2) [50396.499577] wlan0: authenticated [50396.507249] wlan0: associate with 00:05:4e:4d:82:6d (try 1) [50396.706518] wlan0: associate with 00:05:4e:4d:82:6d (try 2) [50396.906164] wlan0: associate with 00:05:4e:4d:82:6d (try 3) [50397.010888] wlan0: RX AssocResp from 00:05:4e:4d:82:6d (capab=0x1 status=0 aid=1) [50397.010893] wlan0: associated [50397.016546] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready -------------------------------------------------------------------------------- Reloading now with debug flags, but we now know at leas that it does not only happen after reboot! Cheers, Nico -- New PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0 Please resign, if you signed 9885188C or 8D0E27A4. Currently moving *.schottelius.org to http://www.nico.schottelius.org/ ... Created attachment 26219 [details]
2.6.34-rc6-00106-gd93ac51-initial works
With this kernel (2.6.34-rc6-00106-gd93ac51) iwlagn in debug mode works straight away. I'm keeping it running and hope it will stop to work soon, so I can send you the debug information.
Happy for you, i encounter the problem again, this time with debug on and many less nerves (took me three reloads until I could connect again and thus open bugzilla....). I'll add a tar.gz, here's the timeline for it: [root@kr iwlagn]# ls -lt 2.6.34-rc6-NS-00106-gd93ac51* -rw-r--r-- 1 root root 103137 4. Mai 23:28 2.6.34-rc6-NS-00106-gd93ac51.modprobe3_connected -rw-r--r-- 1 root root 1052 4. Mai 23:27 2.6.34-rc6-NS-00106-gd93ac51.rmmod3 -rw-r--r-- 1 root root 254680 4. Mai 23:27 2.6.34-rc6-NS-00106-gd93ac51.wpa_cli_reassociate_again -rw-r--r-- 1 root root 27880 4. Mai 23:09 2.6.34-rc6-NS-00106-gd93ac51.iwconfig.wlan0.essid.lyni -rw-r--r-- 1 root root 56815 4. Mai 23:09 2.6.34-rc6-NS-00106-gd93ac51.wpa_cli_reassociate2 -rw-r--r-- 1 root root 22588 4. Mai 23:08 2.6.34-rc6-NS-00106-gd93ac51.modprobe2 -rw-r--r-- 1 root root 5553 4. Mai 23:08 2.6.34-rc6-NS-00106-gd93ac51.rmmod2 -rw-r--r-- 1 root root 254599 4. Mai 23:08 2.6.34-rc6-NS-00106-gd93ac51.iwlagn-no-responses-iwconfig-connected -rw-r--r-- 1 nico root 254338 4. Mai 23:03 2.6.34-rc6-NS-00106-gd93ac51.iwlagn-no-connect -rw-r--r-- 1 nico root 254335 4. Mai 22:46 2.6.34-rc6-NS-00106-gd93ac51.wpa_cli_reassociate-worked -rw-r--r-- 1 nico root 230723 4. Mai 22:42 2.6.34-rc6-NS-00106-gd93ac51.reload.iwlagn -rw-r--r-- 1 nico root 254626 4. Mai 22:40 2.6.34-rc6-NS-00106-gd93ac51.rmmod.iwlagn -rw-r--r-- 1 nico root 81359 4. Mai 22:40 2.6.34-rc6-NS-00106-gd93ac51.no-connect-after-boot -rw------- 1 nico nico 252427 4. Mai 17:40 2.6.34-rc6-NS-00106-gd93ac51.dmesg-works-fine -rw------- 1 nico nico 252064 4. Mai 17:39 2.6.34-rc6-NS-00106-gd93ac51 -NS is just "built by me" so i can distinguish it from distro kernels Created attachment 26221 [details]
complete dmesg series from load, reload, etc.
Nico, Thank you for providing the log files! The message you provided in comment #6 is quite a different problem than the original one. I didn't see "Device reached critical temperature - ucode going to sleep!" (which showed up in the original logs) in your new set of debug logs. What you encountered in comment #6 is probably a general connectivity problem, where you got a noisy environment and for some reason the AP doesn't reply to the direct probe request from your PC therefore the PC can't connect or lost connection. There's probably a race condition in mac80211 layer from log file "iwlagn-no-responses-iwconfig-connected", and afterwards in "wpa_cli_reassociate2" and "wpa_cli_reassociate_again" mac80211 layer never actually did an association. We're looking into this now. Thanks, Shanyu Created attachment 26331 [details]
patch to make work items match properly
Nico, please try this patch.
Created attachment 26336 [details]
After suspend, no packets can be received
After resuming, iwconfig shows a connection, the wirelees led blinks quite regular & often, but no packets are being received. New kernel version this time.
Created attachment 26337 [details]
dmesg output after reload
Connection still does not work
Created attachment 26338 [details]
After restarting wpa_supplicant
The connection works again.
Thanks. I'm kinda lost though. I can see that the work issue is fixed, we no longer have the invalid AID thing in the log. Unless that depended on some other circumstances you didn't hit now? I can also see why the connection doesn't work after resume, but after resume it should work if you only restart wpa_supplicant, not reload the module? Do we have three or four bugs here? (In reply to comment #15) Reinette, Shanyu: > I can also see why the connection doesn't work after resume, The reason for this is that we absolutely must send an unassociated rxon with the right BSSID, then add the station, and finally send an associated RXON with the right BSSID. We get the right mostly by accident normally, but apparently not during reconfig. There's an internal firmware reason for this, and some newer firmware might even sysassert when we don't do this. Hey Jo, bugzilla-daemon@bugzilla.kernel.org [Tue, May 11, 2010 at 10:05:02AM +0000]: > https://bugzilla.kernel.org/show_bug.cgi?id=15862 > > --- Comment #15 from Johannes Berg <johannes@sipsolutions.net> 2010-05-11 > 10:05:01 --- > Thanks. I'm kinda lost though. I can see that the work issue is fixed, we no > longer have the invalid AID thing in the log. Unless that depended on some > other circumstances you didn't hit now? Well, there's at least the other issue in the kernel that resume does not work reliable and I'm not sure whether this was the first or the second time I resumed. In the latter case it was the first time after a long kernel series that came up at all (so there may be more bugs, too). > I can also see why the connection doesn't work after resume, but after resume > it should work if you only restart wpa_supplicant, not reload the module? Oh, I did not try to restart wpa_supplicant only, but tried wpa_cli reassociate only. Sometimes after resuming a new connection is setup almost instantly without restarting wpa_supplicant - was that only luck or should iwlagn/wpa_supplicant work like this? Cheers, Nico -- New PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0 Please resign, if you signed 9885188C or 8D0E27A4. Currently moving *.schottelius.org to http://www.nico.schottelius.org/ ... (In reply to comment #17) > Well, there's at least the other issue in the kernel that > resume does not work reliable and I'm not sure whether this was > the first or the second time I resumed. > > In the latter case it was the first time after a long kernel series > that came up at all (so there may be more bugs, too). Fun. > > I can also see why the connection doesn't work after resume, but after > resume > > it should work if you only restart wpa_supplicant, not reload the module? > > Oh, I did not try to restart wpa_supplicant only, but tried wpa_cli > reassociate > only. Oh, I didn't see that -- that should have worked as well. > Sometimes after resuming a new connection is setup almost instantly without > restarting wpa_supplicant - was that only luck or should > iwlagn/wpa_supplicant > work like this? It should work like that. Well, actually, it should just pretend it is still associated, and then when the AP kicks it off it will re-associate. Or if you just suspended for a few seconds, the connection just lost a few frames. Handled-By : Johannes Berg <johannes@sipsolutions.net> Works on 2.6.35. Sorry for the late reply, have been on holidays. Cheers, Nico Thanks, but I'm getting lost now. Which part works/doesn't work on .35? Is there something left that we should maybe try to fix on .34 that was fixed in .35? .35 after resume the connection is still tied to the previous access point, but issuing "reassociate" to wpa_cli makes it work without the need to reload iwlagn. Hi! What is the status of this issue? Is this problem still present in 2.6.37-rc* ? Regards, Flo Hey, bugzilla-daemon@bugzilla.kernel.org [Wed, Dec 22, 2010 at 08:19:21AM +0000]: > --- Comment #23 from Florian Mickler <florian@mickler.org> 2010-12-22 > 08:19:18 --- > Hi! > > What is the status of this issue? > Is this problem still present in 2.6.37-rc* ? Running 2.6.37-rc7, IT WORKS! Happy to say, please close the bugreport! Cheers, Nico -- PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0 Well, that's a christmas gift, isn't it? :) I guess we don't know which commit fixed it, so I'm closing this as unreproducible. Thanks for the update. |