Bug 43217 - Null pointer dereference in ath9k causing kernel hang
Null pointer dereference in ath9k causing kernel hang
Product: Drivers
Classification: Unclassified
Component: network-wireless
All Linux
: P1 high
Assigned To: drivers_network-wireless@kernel-bugs.osdl.org
Depends on:
  Show dependency treegraph
Reported: 2012-05-08 18:35 UTC by Andrej Podzimek
Modified: 2012-09-06 11:39 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.3.4
Tree: Mainline
Regression: No

lspci output from the machine where the bug was observed (2.36 KB, text/plain)
2012-05-08 18:35 UTC, Andrej Podzimek
BUG messages extracted from kernel log files. (373.21 KB, text/plain)
2012-05-08 18:37 UTC, Andrej Podzimek
older fix (2.61 KB, patch)
2012-05-14 15:05 UTC, shafi
Details | Diff

Description Andrej Podzimek 2012-05-08 18:35:26 UTC
Created attachment 73222 [details]
lspci output from the machine where the bug was observed

This bug has existed for some time, but it was benign and thus unnoticed. Since kernel 3.3.4, the problem became much more serious and the kernel hangs forever once the BUG message appears in dmesg.

The machine:
[    0.000000] DMI: BenQ Joybook S32/Joybook S32, BIOS 01.17  12/10/2007

The original WiFi card has been replaced with 802.11n Atheros:
[   12.286471] ath: EEPROM regdomain: 0x6a
[   12.286475] ath: EEPROM indicates we should expect a direct regpair map
[   12.286479] ath: Country alpha2 being used: 00
[   12.286480] ath: Regpair used: 0x6a
[   12.339126] ieee80211 phy0: Selected rate control algorithm 'ath9k_rate_control'
[   12.340324] Registered led device: ath9k-phy0
[   12.340344] ieee80211 phy0: Atheros AR9300 Rev:3 mem=0xffffc90023500000, irq=16

Full lspci output is attached:

BUG messages including backtraces extracted from the kernel logs are attached:
Caution, these messages come from *multiple* different kernel versions.

There is a *workaround*:
1) Disable WiFi using the RFkill switch, so that the machine boots without hanging.
2) Set options ath9k nohwcrypt=1 in /etc/modprobe.d/modprobe.conf.
3) Unload and reload ath9k. (modprobe -r ath9k; modprobe ath9k;)
4) Re-enable WiFi using the RFkill switch.
5) Now restart the WiFi connectionl. The machine should work and (re)boot normally from now on.

Surprisingly, disabling hardware encryption hides the problem.
Comment 1 Andrej Podzimek 2012-05-08 18:37:38 UTC
Created attachment 73223 [details]
BUG messages extracted from kernel log files.

These are the BUG messages related to the ath9k driver. This problem is only exposed when hardware encryption is on. Setting nohwcrypt=1 hides the problem and the machine works normally.
Comment 2 shafi 2012-05-14 15:04:42 UTC
should be fixed by

commit bff2ec2b916cc85628f3025e08660c0350f03650
Author: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Date:   Mon Mar 12 07:51:07 2012 +0530

    ath9k: Fix BTCOEX shutdown
    Flush MCI profiles only if MCI is being actually used.
    This fixes a panic on driver unload when non-MCI devices
    are being used and btcoex_enable is set.
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<ffffffffa06296d2>] ath_mci_flush_profile+0x12/0x100 [ath9k]
    Call Trace:
    [<ffffffffa061befe>] ath9k_stop_btcoex+0x5e/0x80 [ath9k]
    [<ffffffffa061ed57>] ath9k_stop+0xb7/0x230 [ath9k]
    [<ffffffffa0533f30>] ieee80211_stop_device+0x50/0x180 [mac80211]
    [<ffffffffa051f0cf>] ieee80211_do_stop+0x2af/0x6a0 [mac80211]
    [<ffffffffa051f4da>] ieee80211_stop+0x1a/0x20 [mac80211]
    [<ffffffff81365d96>] __dev_close_many+0x86/0xe0
    [<ffffffff81365ee0>] dev_close_many+0xa0/0x110
    [<ffffffff81366038>] rollback_registered_many+0xe8/0x260
    [<ffffffff813661cb>] unregister_netdevice_many+0x1b/0x80
    [<ffffffffa051e950>] ieee80211_remove_interfaces+0xd0/0x110 [mac80211]
    [<ffffffffa050c133>] ieee80211_unregister_hw+0x53/0x120 [mac80211]
    [<ffffffffa061d5a4>] ath9k_deinit_device+0x44/0x70 [ath9k]
    [<ffffffffa062c1d4>] ath_pci_remove+0x54/0xa0 [ath9k]
    [<ffffffff81267c46>] pci_device_remove+0x46/0x110
    [<ffffffff8131021c>] __device_release_driver+0x7c/0xe0
    [<ffffffff81310960>] driver_detach+0xd0/0xe0
    [<ffffffff81310078>] bus_remove_driver+0x88/0xe0
    [<ffffffff81311122>] driver_unregister+0x62/0xa0
    [<ffffffff81268004>] pci_unregister_driver+0x44/0xc0
    [<ffffffffa062c8b5>] ath_pci_exit+0x15/0x20 [ath9k]
    [<ffffffffa063205d>] ath9k_exit+0x15/0x31 [ath9k]
    [<ffffffff810b92cc>] sys_delete_module+0x18c/0x270
    [<ffffffff814373dd>] ? retint_swapgs+0x13/0x1b
    [<ffffffff8124828e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [<ffffffff81437de9>] system_call_fastpath+0x16/0x1b
    Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

attaching the patch for your testing in arch linux
Comment 3 shafi 2012-05-14 15:05:13 UTC
Created attachment 73291 [details]
older fix
Comment 4 shafi 2012-05-14 15:06:10 UTC
please let us know if this does not helps your issues. good idea to test  in
latest compat wireless/ wireless testing tree

Note You need to log in before you can comment on or make changes to this bug.