Bug 15692
Summary: | Oopses in conjunction with wireless, ath9k | ||
---|---|---|---|
Product: | Drivers | Reporter: | Sebastian Kemper (sebastian_ml) |
Component: | network-wireless | Assignee: | Luis Chamberlain (mcgrof) |
Status: | CLOSED CODE_FIX | ||
Severity: | low | CC: | ath9k-devel, greg, linville, mcgrof, sebastian_ml, tom.leiming |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32.11, 2.6.33.2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Kernel configuration
dmesg from April 3rd dmesg from April 4th |
Description
Sebastian Kemper
2010-04-04 18:43:56 UTC
Created attachment 25850 [details]
Kernel configuration
I didn't change the configuration when upgrading from .10 to .11
Created attachment 25851 [details]
dmesg from April 3rd
Created attachment 25852 [details]
dmesg from April 4th
Could be a problem with one of the backported ath9k patches...perhaps you could try reverting them in order? Or perhaps bisecting? /home/linville/git/linux-2.6.32.y [linville-t400.local]:> git log v2.6.32.10..v2.6.32.11 drivers/net/wireless/ath/ath9k/ commit 948dc19cef84be418aadc6b3ade94207c0888937 Author: Vivek Natarajan <vnatarajan@atheros.com> Date: Thu Mar 11 13:03:01 2010 -0800 ath9k: Enable IEEE80211_HW_REPORTS_TX_ACK_STATUS flag for ath9k commit 05df49865be08b30e7ba91b9d3d94d7d52dd3033 upstream. Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> commit 525ad995580f1d760cbf4a3b42df1cba3df6821b Author: Senthil Balasubramanian <senthilkumar@atheros.com> Date: Thu Mar 11 11:50:54 2010 -0800 ath9k: Enable TIM timer interrupt only when needed. commit 3f7c5c10e9dc6bf90179eb9f7c06151d508fb324 upstream. The TIM timer interrupt is enabled even before the ACK of nullqos is received which is unnecessary. Also clean up the CONF_PS part of config callback properly for better readability. Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> commit fa8e8558a6874707d8e30c85bb386f43cb61a6e1 Author: Felix Fietkau <nbd@openwrt.org> Date: Fri Mar 12 04:02:43 2010 +0100 ath9k: fix BUG_ON triggered by PAE frames commit 4fdec031b9169b3c17938b9c4168f099f457169c upstream. When I initially stumbled upon sequence number problems with PAE frames in ath9k, I submitted a patch to remove all special cases for PAE frames and let them go through the normal transmit path. Out of concern about crypto incompatibility issues, this change was merged instead: commit 6c8afef551fef87a3bf24f8a74c69a7f2f72fc82 Author: Sujith <Sujith.Manoharan@atheros.com> Date: Tue Feb 9 10:07:00 2010 +0530 ath9k: Fix sequence numbers for PAE frames After a lot of testing, I'm able to reliably trigger a driver crash on rekeying with current versions with this change in place. It seems that the driver does not support sending out regular MPDUs with the same TID while an A-MPDU session is active. This leads to duplicate entries in the TID Tx buffer, which hits the following BUG_ON in ath_tx_addto_baw(): index = ATH_BA_INDEX(tid->seq_start, bf->bf_seqno); cindex = (tid->baw_head + index) & (ATH_TID_MAX_BUFS - 1); BUG_ON(tid->tx_buf[cindex] != NULL); I believe until we actually have a reproducible case of an incompatibility with another AP using no PAE special cases, we should simply get rid of this mess. This patch completely fixes my crash issues in STA mode and makes it stay connected without throughput drops or connectivity issues even when the AP is configured to a very short group rekey interval. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> commit 0dcc9985f34aef3c60bffab3dfc7f7ba3748f35a Author: Ming Lei <tom.leiming@gmail.com> Date: Sun Feb 28 00:56:24 2010 +0800 ath9k: fix lockdep warning when unloading module commit a9f042cbe5284f34ccff15f3084477e11b39b17b upstream. Since txq->axq_lock may be hold in softirq context, it must be acquired with spin_lock_bh() instead of spin_lock() if softieq is enabled. The patch fixes the lockdep warning below when unloading ath9k modules. ================================= [ INFO: inconsistent lock state ] 2.6.33-wl #12 --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. rmmod/3642 [HC0[0]:SC0[0]:HE1:SE1] takes: (&(&txq->axq_lock)->rlock){+.?...}, at: [<ffffffffa03568c3>] ath_tx_node_cl {IN-SOFTIRQ-W} state was registered at: [<ffffffff8107577d>] __lock_acquire+0x2f6/0xd35 [<ffffffff81076289>] lock_acquire+0xcd/0xf1 [<ffffffff813a7486>] _raw_spin_lock_bh+0x3b/0x6e [<ffffffffa0356b49>] spin_lock_bh+0xe/0x10 [ath9k] [<ffffffffa0358ec7>] ath_tx_tasklet+0xcd/0x391 [ath9k] [<ffffffffa0354f5f>] ath9k_tasklet+0x70/0xc8 [ath9k] [<ffffffff8104e601>] tasklet_action+0x8c/0xf4 [<ffffffff8104f459>] __do_softirq+0xf8/0x1cd [<ffffffff8100ab1c>] call_softirq+0x1c/0x30 [<ffffffff8100c2cf>] do_softirq+0x4b/0xa3 [<ffffffff8104f045>] irq_exit+0x4a/0x8c [<ffffffff813acccc>] do_IRQ+0xac/0xc3 [<ffffffff813a7d53>] ret_from_intr+0x0/0x16 [<ffffffff81302d52>] cpuidle_idle_call+0x9e/0xf8 [<ffffffff81008be7>] cpu_idle+0x62/0x9d [<ffffffff81391c1a>] rest_init+0x7e/0x80 [<ffffffff818bbd38>] start_kernel+0x3e8/0x3f3 [<ffffffff818bb2bc>] x86_64_start_reservations+0xa7/0xab [<ffffffff818bb3b8>] x86_64_start_kernel+0xf8/0x107 irq event stamp: 42037 hardirqs last enabled at (42037): [<ffffffff813a7b21>] _raw_spin_unlock_irq hardirqs last disabled at (42036): [<ffffffff813a72f4>] _raw_spin_lock_irqsa softirqs last enabled at (42000): [<ffffffffa0353ea6>] spin_unlock_bh+0xe/0 softirqs last disabled at (41998): [<ffffffff813a7463>] _raw_spin_lock_bh+0x other info that might help us debug this: 4 locks held by rmmod/3642: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8132c10d>] rtnl_lock+0x17/0x19 #1: (&wdev->mtx){+.+.+.}, at: [<ffffffffa01e53f2>] cfg80211_netdev_notifie #2: (&ifmgd->mtx){+.+.+.}, at: [<ffffffffa0260834>] ieee80211_mgd_deauth+0 #3: (&local->sta_mtx){+.+.+.}, at: [<ffffffffa025a381>] sta_info_destroy_a stack backtrace: Pid: 3642, comm: rmmod Not tainted 2.6.33-wl #12 Call Trace: [<ffffffff81074469>] valid_state+0x178/0x18b [<ffffffff81014f94>] ? save_stack_trace+0x2f/0x4c [<ffffffff81074e08>] ? check_usage_backwards+0x0/0x88 [<ffffffff8107458f>] mark_lock+0x113/0x230 [<ffffffff810757f1>] __lock_acquire+0x36a/0xd35 [<ffffffff8101018d>] ? native_sched_clock+0x2d/0x5f [<ffffffffa03568c3>] ? ath_tx_node_cleanup+0x62/0x180 [ath9k] [<ffffffff81076289>] lock_acquire+0xcd/0xf1 [<ffffffffa03568c3>] ? ath_tx_node_cleanup+0x62/0x180 [ath9k] [<ffffffff810732eb>] ? trace_hardirqs_off+0xd/0xf [<ffffffff813a7193>] _raw_spin_lock+0x36/0x69 [<ffffffffa03568c3>] ? ath_tx_node_cleanup+0x62/0x180 [ath9k] [<ffffffffa03568c3>] ath_tx_node_cleanup+0x62/0x180 [ath9k] [<ffffffff810749ed>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa0353950>] ath9k_sta_remove+0x22/0x26 [ath9k] [<ffffffffa025a08f>] __sta_info_destroy+0x1ad/0x38c [mac80211] [<ffffffffa025a394>] sta_info_destroy_addr+0x3e/0x5e [mac80211] [<ffffffffa02605d6>] ieee80211_set_disassoc+0x175/0x180 [mac80211] [<ffffffffa026084d>] ieee80211_mgd_deauth+0x58/0x17e [mac80211] [<ffffffff813a60c1>] ? __mutex_lock_common+0x37f/0x3a4 [<ffffffffa01e53f2>] ? cfg80211_netdev_notifier_call+0x28d/0x46d [cfg80211] [<ffffffffa026786e>] ieee80211_deauth+0x1e/0x20 [mac80211] [<ffffffffa01f47f9>] __cfg80211_mlme_deauth+0x130/0x13f [cfg80211] [<ffffffffa01e53f2>] ? cfg80211_netdev_notifier_call+0x28d/0x46d [cfg80211] [<ffffffff810732eb>] ? trace_hardirqs_off+0xd/0xf [<ffffffffa01f7eee>] __cfg80211_disconnect+0x111/0x189 [cfg80211] [<ffffffffa01e5433>] cfg80211_netdev_notifier_call+0x2ce/0x46d [cfg80211] [<ffffffff813aa9ea>] notifier_call_chain+0x37/0x63 [<ffffffff81068c98>] raw_notifier_call_chain+0x14/0x16 [<ffffffff81322e97>] call_netdevice_notifiers+0x1b/0x1d [<ffffffff8132386d>] dev_close+0x6a/0xa6 [<ffffffff8132395f>] rollback_registered_many+0xb6/0x2f4 [<ffffffff81323bb8>] unregister_netdevice_many+0x1b/0x66 [<ffffffffa026494f>] ieee80211_remove_interfaces+0xc5/0xd0 [mac80211] [<ffffffffa02580a2>] ieee80211_unregister_hw+0x47/0xe8 [mac80211] [<ffffffffa035290e>] ath9k_deinit_device+0x7a/0x9b [ath9k] [<ffffffffa035bc26>] ath_pci_remove+0x38/0x76 [ath9k] [<ffffffff8120940a>] pci_device_remove+0x2d/0x51 [<ffffffff8129d797>] __device_release_driver+0x7b/0xd1 [<ffffffff8129d885>] driver_detach+0x98/0xbe [<ffffffff8129ca7a>] bus_remove_driver+0x94/0xb7 [<ffffffff8129ddd6>] driver_unregister+0x6c/0x74 [<ffffffff812096d2>] pci_unregister_driver+0x46/0xad [<ffffffffa035bae1>] ath_pci_exit+0x15/0x17 [ath9k] [<ffffffffa035e1a2>] ath9k_exit+0xe/0x2f [ath9k] [<ffffffff8108050a>] sys_delete_module+0x1c7/0x236 [<ffffffff813a7df5>] ? retint_swapgs+0x13/0x1b [<ffffffff810749b5>] ? trace_hardirqs_on_caller+0x119/0x144 [<ffffffff8109b9f6>] ? audit_syscall_entry+0x11e/0x14a [<ffffffff81009bb2>] system_call_fastpath+0x16/0x1b wlan1: deauthenticating from 00:23:cd:e1:f9:b2 by local choice (reason=3) PM: Removing info for No Bus:wlan1 cfg80211: Calling CRDA to update world regulatory domain PM: Removing info for No Bus:rfkill2 PM: Removing info for No Bus:phy1 ath9k 0000:16:00.0: PCI INT A disabled Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Hi all, I think it's the last one you mentioned: 0dcc9985f34aef3c60bffab3dfc7f7ba3748f35a I've added Ming to the CC list of this report. I've found that the trace can be reproduced by simply shutting down the access point (I cold-started it). So I reversed patch #1, found that the trace still occurred, applied patch #1 again, then reversed patch #2 and so on. Whenever I reverse this patch I don't get the trace: diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c index 9009bac..5184af2 100644 --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -2210,7 +2210,7 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an) if (ATH_TXQ_SETUP(sc, i)) { txq = &sc->tx.txq[i]; - spin_lock(&txq->axq_lock); + spin_lock_bh(&txq->axq_lock); list_for_each_entry_safe(ac, ac_tmp, &txq->axq_acq, list) { @@ -2231,7 +2231,7 @@ void ath_tx_node_cleanup(struct ath_softc *sc, struct ath_node *an) } } - spin_unlock(&txq->axq_lock); + spin_unlock_bh(&txq->axq_lock); } } } Whenever I apply it again the trace shows up again (when switching off the Access Point). Kind regards Sebastian Kernel 2.6.33.2 also shows the issue, see this mailing list thread: http://marc.info/?t=127083441400007&r=1&w=2 Hello all, issue was fixed with commits 4ee20bc28f74a77307091b15d498e44c7f54645e and 0afe7e4c1136dda6486587be78c5d277f7aac999 for 2.6.32 and .33. Thanks! Sebastian Thanks for the report. |