Bug 56301

Summary: AR9485 ath9k: wireless connection stops after hours of uptime when power saving mode enabled
Product: Drivers Reporter: Joel Diaz (hundred17)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: ath9k-devel, linville, mcgrof, sujith
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.9.0-rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel log leading up to and including the failed state
failure with ath9k.debug=0xffffffff ... including approximately 50 lines before and after the failure
Synced up to 3.9-rc8 and saw a slightly different call trace around the failure

Description Joel Diaz 2013-04-06 00:15:51 UTC
Created attachment 97511 [details]
kernel log leading up to and including the failed state

Starting with Debian Wheezy, install 3.9.0-rc5 (technically commit b6a9b7f6b1f21735a7456d534dc0e68e61359d2c since that's what was the most current upstream when I grabbed the git repo).  Use the Wheezy kernel config as a starting point.

Boot the kernel and let it run for a few hours (sometimes days for other kernel I've tried).  In this particular instance I was away from home and there was only basic link maintenance going on (ie. no file transfers or anything like that).  Eventually, the wireless connection drops and there are thousands of messages in the dmesg output.

First a hundred or so of: 

Apr  5 11:00:44 debian kernel: [57208.372531] ath: phy0: Failed to stop TX DMA, queues=0x004!

Followed by the repeating pattern of:

Apr  5 17:52:52 debian kernel: [81964.567757] ath: phy0: Failed to wakeup in 500us
Apr  5 17:52:52 debian kernel: [81964.639866] ath: phy0: Failed to stop TX DMA, queues=0x10f!
Apr  5 17:52:52 debian kernel: [81964.651217] ath: phy0: DMA failed to stop in 10 ms AR_CR=0xffffffff AR_DIAG_SW=0xffffffff DMADBG_7=0xffffffff
Apr  5 17:52:52 debian kernel: [81964.651255] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up
Apr  5 17:52:52 debian kernel: [81964.765575] ath: phy0: Chip reset failed
Apr  5 17:52:52 debian kernel: [81964.765579] ath: phy0: Unable to reset channel, reset status -22

Nothing terribly interesting about the wireless setup.  It is managed by NetworkManager and connects to a WPA2-Personal network to an Apple Airport Extreme base station.

The wireless card details from lspci look like garbage after the failure:
02:00.0 Network controller: Atheros Communications Inc. AR9485 Wireless Network Adapter (rev ff)
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Comment 1 Joel Diaz 2013-04-08 23:09:15 UTC
Reran with ath9k.debug=0xffffffff and it spews out a massive amount of info.  I'll attach an edited version of that run, but here I'll just highlight the few lines leading up to the failure:

Apr  8 09:47:09 debian kernel: [ 7644.814981] ath: phy0: NETWORK SLEEP -> AWAKE
Apr  8 09:47:09 debian kernel: [ 7644.815507] ath: phy0: AWAKE -> NETWORK SLEEP
Apr  8 09:47:09 debian kernel: [ 7644.909517] ath: phy0: received PCI FATAL interrupt
Apr  8 09:47:09 debian kernel: [ 7644.909522] ath: phy0: received PCI PERR interrupt
Apr  8 09:47:09 debian kernel: [ 7644.909524] ath: phy0: AR_INTR_SYNC_LOCAL_TIMEOUT
Apr  8 09:47:09 debian kernel: [ 7644.909528] ath: phy0: disable IER
Apr  8 09:47:09 debian kernel: [ 7644.909625] ath: phy0: NETWORK SLEEP -> AWAKE
Apr  8 09:47:09 debian kernel: [ 7644.920406] ath: phy0: Failed to wakeup in 500us
Apr  8 09:47:09 debian kernel: [ 7644.920410] ------------[ cut here ]------------
Apr  8 09:47:09 debian kernel: [ 7644.920421] WARNING: at drivers/net/wireless/ath/ath9k/hw.c:2208 ath9k_hw_setpower+0x446/0x499 [ath9k_hw]()
Apr  8 09:47:09 debian kernel: [ 7644.920422] Hardware name: Inspiron One 2020
Apr  8 09:47:09 debian kernel: [ 7644.920424] Modules linked in: bnep rfcomm binfmt_misc loop hid_generic usbhid hid usb_storage ath3k btusb bluetooth coretemp kvm_intel kvm ehci_pci ehci_hcd snd_hda_codec_realtek snd_hda_intel usbcore snd_hda_codec arc4 ath9k ath9k_common ath9k_hw i915 ath mac80211 drm_kms_helper drm i2c_i801 iTCO_wdt cfg80211 usb_common i2c_algo_bit iTCO_vendor_support lpc_ich mfd_core i2c_core acpi_cpufreq mperf snd_hwdep evdev psmouse crc32c_intel snd_pcm snd_page_alloc snd_timer snd ghash_clmulni_intel sparse_keymap rfkill wmi dcdbas serio_raw pcspkr cryptd processor video button soundcore microcode ext4 crc16 jbd2 mbcache sg sr_mod sd_mod cdrom crc_t10dif ata_generic thermal fan thermal_sys ata_piix libata r8169 mii scsi_mod
Apr  8 09:47:09 debian kernel: [ 7644.920476] Pid: 0, comm: swapper/0 Not tainted 3.9.0-rc5ath+ #2
Apr  8 09:47:09 debian kernel: [ 7644.920477] Call Trace:
Apr  8 09:47:09 debian kernel: [ 7644.920479]  <IRQ>  [<ffffffff8103d20c>] ? warn_slowpath_common+0x76/0x8c
Apr  8 09:47:09 debian kernel: [ 7644.920492]  [<ffffffffa03de1c8>] ? ath9k_hw_setpower+0x446/0x499 [ath9k_hw]
Apr  8 09:47:09 debian kernel: [ 7644.920498]  [<ffffffffa043dc63>] ? ath9k_ps_wakeup+0x4c/0xa9 [ath9k]
Apr  8 09:47:09 debian kernel: [ 7644.920503]  [<ffffffffa043f32e>] ? ath9k_tasklet+0x24/0x131 [ath9k]
Apr  8 09:47:09 debian kernel: [ 7644.920507]  [<ffffffff81043cff>] ? tasklet_action+0x73/0xc2
Apr  8 09:47:09 debian kernel: [ 7644.920510]  [<ffffffff81043942>] ? __do_softirq+0xc3/0x1f8
Apr  8 09:47:09 debian kernel: [ 7644.920513]  [<ffffffff81043b42>] ? irq_exit+0x3f/0x82
Apr  8 09:47:09 debian kernel: [ 7644.920517]  [<ffffffff8102c301>] ? x2apic_send_IPI_allbutself+0x6/0x13
Apr  8 09:47:09 debian kernel: [ 7644.920520]  [<ffffffff8100f701>] ? do_IRQ+0x81/0x97
Apr  8 09:47:09 debian kernel: [ 7644.920524]  [<ffffffff8138512d>] ? common_interrupt+0x6d/0x6d
Apr  8 09:47:09 debian kernel: [ 7644.920526]  <EOI>  [<ffffffff8129f1a0>] ? arch_local_irq_enable+0x4/0x8
Apr  8 09:47:09 debian kernel: [ 7644.920534]  [<ffffffff8129f824>] ? cpuidle_wrap_enter+0x3c/0x71
Apr  8 09:47:09 debian kernel: [ 7644.920537]  [<ffffffff8129f556>] ? cpuidle_enter_state+0xa/0x2f
Apr  8 09:47:09 debian kernel: [ 7644.920540]  [<ffffffff8129f624>] ? cpuidle_idle_call+0xa9/0xfb
Apr  8 09:47:09 debian kernel: [ 7644.920544]  [<ffffffff81014c79>] ? cpu_idle+0x9c/0xe6
Apr  8 09:47:09 debian kernel: [ 7644.920548]  [<ffffffff816aed23>] ? start_kernel+0x3b8/0x3c3
Apr  8 09:47:09 debian kernel: [ 7644.920551]  [<ffffffff816ae781>] ? repair_env_string+0x57/0x57
Apr  8 09:47:09 debian kernel: [ 7644.920554]  [<ffffffff816ae59c>] ? x86_64_start_kernel+0xf2/0xfd
Apr  8 09:47:09 debian kernel: [ 7644.920556] ---[ end trace 2ee6ffcecf2cb6d5 ]---
Apr  8 09:47:09 debian kernel: [ 7644.920610] ath: phy0: enable IER
Apr  8 09:47:09 debian kernel: [ 7644.920616] ath: phy0: AR_IMR 0xffffffff IER 0xffffffff
Apr  8 09:47:09 debian kernel: [ 7644.920696] ath: phy0: AWAKE -> NETWORK SLEEP
Apr  8 09:47:09 debian kernel: [ 7644.920706] ath: phy0: NETWORK SLEEP -> AWAKE
Comment 2 Joel Diaz 2013-04-08 23:11:04 UTC
Created attachment 97761 [details]
failure with ath9k.debug=0xffffffff ... including approximately 50 lines before and after the failure
Comment 3 Joel Diaz 2013-04-27 12:29:19 UTC
Created attachment 100061 [details]
Synced up to 3.9-rc8 and saw a slightly different call trace around the failure

Don't know if this matters, but this time with 3.9-rc8, the stack trace after the failed messages was slightly different:

[14287.560144] ath: phy0: Failed to stop TX DMA, queues=0x005!
[14287.574102] ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02000020 DMADBG_7=0x0000a400
[14287.574137] ath: phy0: Could not stop RX, we could be confusing the DMA engine when we start RX up
[14287.574139] ------------[ cut here ]------------
[14287.574152] WARNING: at drivers/net/wireless/ath/ath9k/recv.c:487 ath_stoprecv+0xed/0x110 [ath9k]()
[14287.574154] Hardware name: Inspiron One 2020
[14287.574155] Modules linked in: bnep rfcomm binfmt_misc loop hid_generic snd_hda_codec_realtek usbhid hid i915 usb_storage ath3k btusb snd_hda_intel snd_hda_codec bluetooth coretemp arc4 ath9k snd_hwdep kvm_intel snd_pcm kvm snd_page_alloc snd_timer snd ath9k_common ath9k_hw drm_kms_helper drm ath mac80211 cfg80211 ehci_pci soundcore ehci_hcd usbcore iTCO_wdt iTCO_vendor_support lpc_ich psmouse acpi_cpufreq sparse_keymap rfkill mfd_core serio_raw pcspkr evdev i2c_algo_bit dcdbas crc32c_intel mperf ghash_clmulni_intel cryptd i2c_i801 i2c_core processor usb_common video button microcode wmi ext4 crc16 jbd2 mbcache sg sr_mod sd_mod cdrom crc_t10dif ata_generic thermal fan thermal_sys ata_piix libata scsi_mod r8169 mii
[14287.574216] Pid: 5903, comm: kworker/u:1 Not tainted 3.9.0-rc8ath+ #3
[14287.574217] Call Trace:
[14287.574226]  [<ffffffff8103d500>] ? warn_slowpath_common+0x76/0x8c
[14287.574233]  [<ffffffffa04557d1>] ? ath_stoprecv+0xed/0x110 [ath9k]
[14287.574239]  [<ffffffffa04512fc>] ? ath_prepare_reset+0x56/0x64 [ath9k]
[14287.574245]  [<ffffffffa045301d>] ? ath_reset_internal+0x8e/0x197 [ath9k]
[14287.574250]  [<ffffffffa045314c>] ? ath_reset+0x26/0x8a [ath9k]
[14287.574254]  [<ffffffff81053546>] ? process_one_work+0x19f/0x2b8
[14287.574258]  [<ffffffff81053fed>] ? worker_thread+0x123/0x1ca
[14287.574261]  [<ffffffff81053eca>] ? manage_workers+0x219/0x219
[14287.574265]  [<ffffffff81057d54>] ? kthread+0x81/0x89
[14287.574270]  [<ffffffff81057cd3>] ? __kthread_parkme+0x5d/0x5d
[14287.574273]  [<ffffffff8138ad3c>] ? ret_from_fork+0x7c/0xb0
[14287.574277]  [<ffffffff81057cd3>] ? __kthread_parkme+0x5d/0x5d
[14287.574280] ---[ end trace cf68276854f9f65c ]---
[17324.273314] ath: phy0: Failed to stop TX DMA, queues=0x004!
Comment 4 Joel Diaz 2013-05-13 21:09:18 UTC
After getting some help on ath9k-devel, it turns out that disabling power saving mode (iw dev wlan0 set power_save off) seems to have resolved the problem.  System has been running for almost 5 days straight without the failure.
Comment 5 Joel Diaz 2013-05-13 21:16:34 UTC
Relevant thread from ath9k-devel: https://lists.ath9k.org/pipermail/ath9k-devel/2013-May/011338.html
Comment 6 Sujith 2013-06-03 01:52:17 UTC
I've posted a patch disabling PowerSave: https://patchwork.kernel.org/patch/2646451/

This bug can be closed.