Created attachment 97511 [details] kernel log leading up to and including the failed state Starting with Debian Wheezy, install 3.9.0-rc5 (technically commit b6a9b7f6b1f21735a7456d534dc0e68e61359d2c since that's what was the most current upstream when I grabbed the git repo). Use the Wheezy kernel config as a starting point. Boot the kernel and let it run for a few hours (sometimes days for other kernel I've tried). In this particular instance I was away from home and there was only basic link maintenance going on (ie. no file transfers or anything like that). Eventually, the wireless connection drops and there are thousands of messages in the dmesg output. First a hundred or so of: Apr 5 11:00:44 debian kernel: [57208.372531] ath: phy0: Failed to stop TX DMA, queues=0x004! Followed by the repeating pattern of: Apr 5 17:52:52 debian kernel: [81964.567757] ath: phy0: Failed to wakeup in 500us Apr 5 17:52:52 debian kernel: [81964.639866] ath: phy0: Failed to stop TX DMA, queues=0x10f! Apr 5 17:52:52 debian kernel: [81964.651217] ath: phy0: DMA failed to stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff DMADBG_7=0xffffffff Apr 5 17:52:52 debian kernel: [81964.651255] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up Apr 5 17:52:52 debian kernel: [81964.765575] ath: phy0: Chip reset failed Apr 5 17:52:52 debian kernel: [81964.765579] ath: phy0: Unable to reset channel, reset status -22 Nothing terribly interesting about the wireless setup. It is managed by NetworkManager and connects to a WPA2-Personal network to an Apple Airport Extreme base station. The wireless card details from lspci look like garbage after the failure: 02:00.0 Network controller: Atheros Communications Inc. AR9485 Wireless Network Adapter (rev ff) 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Reran with ath9k.debug=0xffffffff and it spews out a massive amount of info. I'll attach an edited version of that run, but here I'll just highlight the few lines leading up to the failure: Apr 8 09:47:09 debian kernel: [ 7644.814981] ath: phy0: NETWORK SLEEP -> AWAKE Apr 8 09:47:09 debian kernel: [ 7644.815507] ath: phy0: AWAKE -> NETWORK SLEEP Apr 8 09:47:09 debian kernel: [ 7644.909517] ath: phy0: received PCI FATAL interrupt Apr 8 09:47:09 debian kernel: [ 7644.909522] ath: phy0: received PCI PERR interrupt Apr 8 09:47:09 debian kernel: [ 7644.909524] ath: phy0: AR_INTR_SYNC_LOCAL_TIMEOUT Apr 8 09:47:09 debian kernel: [ 7644.909528] ath: phy0: disable IER Apr 8 09:47:09 debian kernel: [ 7644.909625] ath: phy0: NETWORK SLEEP -> AWAKE Apr 8 09:47:09 debian kernel: [ 7644.920406] ath: phy0: Failed to wakeup in 500us Apr 8 09:47:09 debian kernel: [ 7644.920410] ------------[ cut here ]------------ Apr 8 09:47:09 debian kernel: [ 7644.920421] WARNING: at drivers/net/wireless/ath/ath9k/hw.c:2208 ath9k_hw_setpower+0x446/0x499 [ath9k_hw]() Apr 8 09:47:09 debian kernel: [ 7644.920422] Hardware name: Inspiron One 2020 Apr 8 09:47:09 debian kernel: [ 7644.920424] Modules linked in: bnep rfcomm binfmt_misc loop hid_generic usbhid hid usb_storage ath3k btusb bluetooth coretemp kvm_intel kvm ehci_pci ehci_hcd snd_hda_codec_realtek snd_hda_intel usbcore snd_hda_codec arc4 ath9k ath9k_common ath9k_hw i915 ath mac80211 drm_kms_helper drm i2c_i801 iTCO_wdt cfg80211 usb_common i2c_algo_bit iTCO_vendor_support lpc_ich mfd_core i2c_core acpi_cpufreq mperf snd_hwdep evdev psmouse crc32c_intel snd_pcm snd_page_alloc snd_timer snd ghash_clmulni_intel sparse_keymap rfkill wmi dcdbas serio_raw pcspkr cryptd processor video button soundcore microcode ext4 crc16 jbd2 mbcache sg sr_mod sd_mod cdrom crc_t10dif ata_generic thermal fan thermal_sys ata_piix libata r8169 mii scsi_mod Apr 8 09:47:09 debian kernel: [ 7644.920476] Pid: 0, comm: swapper/0 Not tainted 3.9.0-rc5ath+ #2 Apr 8 09:47:09 debian kernel: [ 7644.920477] Call Trace: Apr 8 09:47:09 debian kernel: [ 7644.920479] <IRQ> [<ffffffff8103d20c>] ? warn_slowpath_common+0x76/0x8c Apr 8 09:47:09 debian kernel: [ 7644.920492] [<ffffffffa03de1c8>] ? ath9k_hw_setpower+0x446/0x499 [ath9k_hw] Apr 8 09:47:09 debian kernel: [ 7644.920498] [<ffffffffa043dc63>] ? ath9k_ps_wakeup+0x4c/0xa9 [ath9k] Apr 8 09:47:09 debian kernel: [ 7644.920503] [<ffffffffa043f32e>] ? ath9k_tasklet+0x24/0x131 [ath9k] Apr 8 09:47:09 debian kernel: [ 7644.920507] [<ffffffff81043cff>] ? tasklet_action+0x73/0xc2 Apr 8 09:47:09 debian kernel: [ 7644.920510] [<ffffffff81043942>] ? __do_softirq+0xc3/0x1f8 Apr 8 09:47:09 debian kernel: [ 7644.920513] [<ffffffff81043b42>] ? irq_exit+0x3f/0x82 Apr 8 09:47:09 debian kernel: [ 7644.920517] [<ffffffff8102c301>] ? x2apic_send_IPI_allbutself+0x6/0x13 Apr 8 09:47:09 debian kernel: [ 7644.920520] [<ffffffff8100f701>] ? do_IRQ+0x81/0x97 Apr 8 09:47:09 debian kernel: [ 7644.920524] [<ffffffff8138512d>] ? common_interrupt+0x6d/0x6d Apr 8 09:47:09 debian kernel: [ 7644.920526] <EOI> [<ffffffff8129f1a0>] ? arch_local_irq_enable+0x4/0x8 Apr 8 09:47:09 debian kernel: [ 7644.920534] [<ffffffff8129f824>] ? cpuidle_wrap_enter+0x3c/0x71 Apr 8 09:47:09 debian kernel: [ 7644.920537] [<ffffffff8129f556>] ? cpuidle_enter_state+0xa/0x2f Apr 8 09:47:09 debian kernel: [ 7644.920540] [<ffffffff8129f624>] ? cpuidle_idle_call+0xa9/0xfb Apr 8 09:47:09 debian kernel: [ 7644.920544] [<ffffffff81014c79>] ? cpu_idle+0x9c/0xe6 Apr 8 09:47:09 debian kernel: [ 7644.920548] [<ffffffff816aed23>] ? start_kernel+0x3b8/0x3c3 Apr 8 09:47:09 debian kernel: [ 7644.920551] [<ffffffff816ae781>] ? repair_env_string+0x57/0x57 Apr 8 09:47:09 debian kernel: [ 7644.920554] [<ffffffff816ae59c>] ? x86_64_start_kernel+0xf2/0xfd Apr 8 09:47:09 debian kernel: [ 7644.920556] ---[ end trace 2ee6ffcecf2cb6d5 ]--- Apr 8 09:47:09 debian kernel: [ 7644.920610] ath: phy0: enable IER Apr 8 09:47:09 debian kernel: [ 7644.920616] ath: phy0: AR_IMR 0xffffffff IER 0xffffffff Apr 8 09:47:09 debian kernel: [ 7644.920696] ath: phy0: AWAKE -> NETWORK SLEEP Apr 8 09:47:09 debian kernel: [ 7644.920706] ath: phy0: NETWORK SLEEP -> AWAKE
Created attachment 97761 [details] failure with ath9k.debug=0xffffffff ... including approximately 50 lines before and after the failure
Created attachment 100061 [details] Synced up to 3.9-rc8 and saw a slightly different call trace around the failure Don't know if this matters, but this time with 3.9-rc8, the stack trace after the failed messages was slightly different: [14287.560144] ath: phy0: Failed to stop TX DMA, queues=0x005! [14287.574102] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02000020 DMADBG_7=0x0000a400 [14287.574137] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up [14287.574139] ------------[ cut here ]------------ [14287.574152] WARNING: at drivers/net/wireless/ath/ath9k/recv.c:487 ath_stoprecv+0xed/0x110 [ath9k]() [14287.574154] Hardware name: Inspiron One 2020 [14287.574155] Modules linked in: bnep rfcomm binfmt_misc loop hid_generic snd_hda_codec_realtek usbhid hid i915 usb_storage ath3k btusb snd_hda_intel snd_hda_codec bluetooth coretemp arc4 ath9k snd_hwdep kvm_intel snd_pcm kvm snd_page_alloc snd_timer snd ath9k_common ath9k_hw drm_kms_helper drm ath mac80211 cfg80211 ehci_pci soundcore ehci_hcd usbcore iTCO_wdt iTCO_vendor_support lpc_ich psmouse acpi_cpufreq sparse_keymap rfkill mfd_core serio_raw pcspkr evdev i2c_algo_bit dcdbas crc32c_intel mperf ghash_clmulni_intel cryptd i2c_i801 i2c_core processor usb_common video button microcode wmi ext4 crc16 jbd2 mbcache sg sr_mod sd_mod cdrom crc_t10dif ata_generic thermal fan thermal_sys ata_piix libata scsi_mod r8169 mii [14287.574216] Pid: 5903, comm: kworker/u:1 Not tainted 3.9.0-rc8ath+ #3 [14287.574217] Call Trace: [14287.574226] [<ffffffff8103d500>] ? warn_slowpath_common+0x76/0x8c [14287.574233] [<ffffffffa04557d1>] ? ath_stoprecv+0xed/0x110 [ath9k] [14287.574239] [<ffffffffa04512fc>] ? ath_prepare_reset+0x56/0x64 [ath9k] [14287.574245] [<ffffffffa045301d>] ? ath_reset_internal+0x8e/0x197 [ath9k] [14287.574250] [<ffffffffa045314c>] ? ath_reset+0x26/0x8a [ath9k] [14287.574254] [<ffffffff81053546>] ? process_one_work+0x19f/0x2b8 [14287.574258] [<ffffffff81053fed>] ? worker_thread+0x123/0x1ca [14287.574261] [<ffffffff81053eca>] ? manage_workers+0x219/0x219 [14287.574265] [<ffffffff81057d54>] ? kthread+0x81/0x89 [14287.574270] [<ffffffff81057cd3>] ? __kthread_parkme+0x5d/0x5d [14287.574273] [<ffffffff8138ad3c>] ? ret_from_fork+0x7c/0xb0 [14287.574277] [<ffffffff81057cd3>] ? __kthread_parkme+0x5d/0x5d [14287.574280] ---[ end trace cf68276854f9f65c ]--- [17324.273314] ath: phy0: Failed to stop TX DMA, queues=0x004!
After getting some help on ath9k-devel, it turns out that disabling power saving mode (iw dev wlan0 set power_save off) seems to have resolved the problem. System has been running for almost 5 days straight without the failure.
Relevant thread from ath9k-devel: https://lists.ath9k.org/pipermail/ath9k-devel/2013-May/011338.html
I've posted a patch disabling PowerSave: https://patchwork.kernel.org/patch/2646451/ This bug can be closed.