Bug 42673

Summary: ath9k: Fatal PCI error and 'failed to stop TX DMA'
Product: Networking Reporter: Ben Greear (greearb)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: NEW ---    
Severity: normal CC: boris.ilpossente, greearb, kerneldotorg, linville, psyberbits, shafi.wireless, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0.18+ Subsystem:
Regression: No Bisected commit-id:
Attachments: system configuration and crash log

Description Ben Greear 2012-01-27 21:33:43 UTC
This was reported by Sune Molgaard sune@molgaard.org on the ath9k-devel list.
Other folks are hitting similar problems, but maybe not for the same reason.

The PCI error below is caused by:
AR_INTR_SYNC_HOST1_FATAL, which evidently means:

"pissed off bus glue."
You'll get that error whenever the PCI/PCIe interface glue errors out.

"host1_fatal signal from PCI core during a DMA transfer"



Kernel is Ben Greear's 3.0.18+ kernel..no significant changes to
ath9k, and problem also reported with other official kernels.


Jan 27 19:06:38 jadis kernel: ath: received PCI FATAL interrupt
Jan 27 19:06:38 jadis kernel: ath: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x00006040
Jan 27 19:06:38 jadis kernel: ath: Could not stop RX, we could be confusing the DMA engine when we start RX up
Jan 27 19:06:38 jadis kernel: ------------[ cut here ]------------
Jan 27 19:06:38 jadis kernel: WARNING: at drivers/net/wireless/ath/ath9k/recv.c:528 ath_stoprecv+0xbc/0xe0 [ath9k]()
Jan 27 19:06:38 jadis kernel: Hardware name: 8366-8233
Jan 27 19:06:38 jadis kernel: Modules linked in: cryptd aes_i586 aes_generic ip6table_filter tun xt_CHECKSUM xt_TCPMSS act_police cls_flow cls_fw cls_u32 
sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_time xt_connlimit xt_realm xt_addrtype iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REDIRECT 
ipt_NETMAP ipt_MASQUERADE ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp 
nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip 
nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc 
nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_tproxy_core ip6_tables nf_defrag_ipv6 xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log 
xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt
Jan 27 19:06:38 jadis kernel: _connmark xt_CLASSIFY xt_AUDIT ipt_LOG iptable_nat nf_nat iptable_mangle nfnetlink sit tunnel4 reiserfs bridge ipv6 stp llc arc4 
snd_cs4236 snd_wss_lib snd_via82xx snd_ac97_codec snd_opl3_lib ppdev snd_hwdep ac97_bus ath9k snd_pcm mac80211 snd_timer ath9k_common ath9k_hw snd_page_alloc 
snd_mpu401 i2c_viapro snd_mpu401_uart snd_rawmidi ath via_ircc snd_seq_device irda snd cfg80211 soundcore crc_ccitt ns558 gameport parport_pc hwmon_vid hwmon lp 
parport 8139too ata_generic pata_acpi 8139cp sundance pata_via sata_sil mii floppy raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy 
async_tx raid1 raid0 multipath linear
Jan 27 19:06:38 jadis kernel: Pid: 4298, comm: java Not tainted 3.0.18-sune-1 #3
Jan 27 19:06:38 jadis kernel: Call Trace:
Jan 27 19:06:38 jadis kernel: [<c0431086>] warn_slowpath_common+0x63/0x78
Jan 27 19:06:38 jadis kernel: [<f851eaab>] ? ath_stoprecv+0xbc/0xe0 [ath9k]
Jan 27 19:06:38 jadis kernel: [<c04310aa>] warn_slowpath_null+0xf/0x13
Jan 27 19:06:38 jadis kernel: [<f851eaab>] ath_stoprecv+0xbc/0xe0 [ath9k]
Jan 27 19:06:38 jadis kernel: [<f851d782>] ath_reset+0x64/0x17c [ath9k]
Jan 27 19:06:38 jadis kernel: [<c07771e1>] ? _raw_spin_unlock_irqrestore+0x12/0x15
Jan 27 19:06:38 jadis kernel: [<f851d9ef>] ath9k_tasklet+0x27/0x133 [ath9k]
Jan 27 19:06:38 jadis kernel: [<c0435308>] tasklet_action+0x65/0xa7
Jan 27 19:06:38 jadis kernel: [<c04355e0>] __do_softirq+0x6c/0xe6
Jan 27 19:06:38 jadis kernel: [<c0435574>] ? local_bh_enable+0xa/0xa
Jan 27 19:06:38 jadis kernel: <IRQ>  [<c04357b8>] ? irq_exit+0x35/0x84
Jan 27 19:06:38 jadis kernel: [<c0403a78>] ? do_IRQ+0x6c/0x80
Jan 27 19:06:38 jadis kernel: [<c077be69>] ? common_interrupt+0x29/0x30
Jan 27 19:06:38 jadis kernel: ---[ end trace 46bdcea34024800a ]---
Jan 27 19:06:38 jadis kernel: ath: Failed to stop TX DMA!
Jan 27 19:06:38 jadis kernel: ath: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x00006040
Jan 27 19:06:38 jadis kernel: ath: Could not stop RX, we could be confusing the DMA engine when we start RX up
Jan 27 19:06:38 jadis kernel: ath: Failed to stop TX DMA!
....


System runs slowish AMD processor and 32-bit Ubuntu 11.


Atheros NIC:

00:06.0 Network controller: Atheros Communications Inc. AR922X Wireless Network Adapter (rev 01)
	Subsystem: D-Link System Inc Device 3a7c
	Flags: bus master, 66MHz, medium devsel, latency 168, IRQ 17
	Memory at e3000000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: <access denied>
	Kernel driver in use: ath9k
	Kernel modules: ath9k


Original report can be found here (includes full dmesg and some other logs)
https://lists.ath9k.org/pipermail/ath9k-devel/2012-January/007891.html

Other lspci info:

00:00.0 Host bridge: VIA Technologies, Inc. VT8375 [KM266/KL266] Host Bridge
	Subsystem: VIA Technologies, Inc. VT8375 [KM266/KL266] Host Bridge
	Flags: bus master, 66MHz, medium devsel, latency 8
	Memory at d0000000 (32-bit, prefetchable) [size=128M]
	Capabilities: <access denied>
	Kernel driver in use: agpgart-via

00:01.0 PCI bridge: VIA Technologies, Inc. VT8633 [Apollo Pro266 AGP] (prog-if 00 [Normal decode])
	Flags: bus master, 66MHz, medium devsel, latency 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	Memory behind bridge: e0000000-e1ffffff
	Prefetchable memory behind bridge: d8000000-dfffffff
	Capabilities: <access denied>

00:05.0 RAID bus controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
	Subsystem: Silicon Image, Inc. Device 7114
	Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 16
	I/O ports at 9000 [size=8]
	I/O ports at 9400 [size=4]
	I/O ports at 9800 [size=8]
	I/O ports at 9c00 [size=4]
	I/O ports at a000 [size=16]
	Memory at e3010000 (32-bit, non-prefetchable) [size=1K]
	[virtual] Expansion ROM at 80000000 [disabled] [size=512K]
	Capabilities: <access denied>
	Kernel driver in use: sata_sil
	Kernel modules: sata_sil


00:07.0 Ethernet controller: Sundance Technology Inc / IC Plus Corp IC Plus IP100A Integrated 10/100 Ethernet MAC + PHY (rev 31)
	Subsystem: Sundance Technology Inc / IC Plus Corp Device 0201
	Flags: bus master, medium devsel, latency 32, IRQ 18
	I/O ports at a400 [size=128]
	Memory at e3011000 (32-bit, non-prefetchable) [size=512]
	[virtual] Expansion ROM at 80080000 [disabled] [size=64K]
	Capabilities: <access denied>
	Kernel driver in use: sundance
	Kernel modules: sundance

00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
	Subsystem: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
	Flags: bus master, medium devsel, latency 32, IRQ 17
	I/O ports at a800 [size=256]
	Memory at e3012000 (32-bit, non-prefetchable) [size=256]
	[virtual] Expansion ROM at 80090000 [disabled] [size=64K]
	Capabilities: <access denied>
	Kernel driver in use: 8139too
	Kernel modules: 8139too, 8139cp

00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge
	Subsystem: VIA Technologies, Inc. Device 3074
	Flags: bus master, stepping, medium devsel, latency 0
	Capabilities: <access denied>
	Kernel modules: i2c-viapro, via-ircc

00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
	Subsystem: VIA Technologies, Inc. VT82C586/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE
	Flags: bus master, medium devsel, latency 32, IRQ 23
	[virtual] Memory at 000001f0 (32-bit, non-prefetchable) [size=8]
	[virtual] Memory at 000003f0 (type 3, non-prefetchable) [size=1]
	[virtual] Memory at 00000170 (32-bit, non-prefetchable) [size=8]
	[virtual] Memory at 00000370 (type 3, non-prefetchable) [size=1]
	I/O ports at ac00 [size=16]
	Capabilities: <access denied>
	Kernel driver in use: pata_via
	Kernel modules: pata_via

00:11.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) (prog-if 00 [UHCI])
	Subsystem: First International Computer, Inc. VA-502 Mainboard
	Flags: bus master, medium devsel, latency 32, IRQ 21
	I/O ports at b000 [size=32]
	Capabilities: <access denied>
	Kernel driver in use: uhci_hcd

00:11.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) (prog-if 00 [UHCI])
	Subsystem: First International Computer, Inc. VA-502 Mainboard
	Flags: bus master, medium devsel, latency 32, IRQ 21
	I/O ports at b400 [size=32]
	Capabilities: <access denied>
	Kernel driver in use: uhci_hcd

00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 40)
	Subsystem: Micro-Star International Co., Ltd. Device 3900
	Flags: medium devsel, IRQ 22
	I/O ports at b800 [size=256]
	Capabilities: <access denied>
	Kernel driver in use: VIA 82xx Audio
	Kernel modules: snd-via82xx

01:00.0 VGA compatible controller: S3 Inc. VT8375 [ProSavage8 KM266/KL266] (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. Device 3908
	Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 11
	Memory at e1000000 (32-bit, non-prefetchable) [size=512K]
	Memory at d8000000 (32-bit, prefetchable) [size=128M]
	[virtual] Expansion ROM at e0000000 [disabled] [size=64K]
	Capabilities: <access denied>
	Kernel modules: savagefb
Comment 1 Ben Greear 2012-01-30 17:44:52 UTC
Paul Farrow reported similar problem, but this time there are no obvious PCI errors.  His NIC is:

Apple Macbook Airport A1181, AR9280 AR5BXB92 300m N card

The PC is a Jetaway NF93R with a Intel DUO P8700 @ 2.53Ghz, the PCI slot is empty and the AR9280 card is in the pci-e slot.

OS is 64-bit Fedora 16.

Using the NIC in AP mode.


ath: Failed to stop TX DMA!
ath: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x42000020 DMADBG_7=0x000062c0
ath: Could not stop RX, we could be confusing the DMA engine when we start RX up
------------[ cut here ]------------
WARNING: at drivers/net/wireless/ath/ath9k/recv.c:528 ath_stoprecv+0xd5/0xfb [ath9k]()
Hardware name: OEM
Modules linked in: cryptd aes_x86_64 aes_generic ipt_LOG iptable_nat nf_nat lockd nf_conntrack_ftp ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 f71882fg coretemp hwmon ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm arc4 ath9k mac80211 snd_timer snd ath9k_common ath9k_hw r8169 soundcore snd_page_alloc ath cfg80211 iTCO_wdt i2c_i801 mii pcspkr iTCO_vendor_support microcode joydev serio_raw sunrpc ipv6 autofs4 ata_generic pata_acpi pata_jmicron i915 drm_kms_helper drm i2c_algo_bit video [last unloaded: scsi_wait_scan]
Pid: 0, comm: kworker/0:0 Not tainted 3.0.17+ #2
Call Trace:
 <IRQ>  [<ffffffff810343b2>] warn_slowpath_common+0x7e/0x96
 [<ffffffff810343df>] warn_slowpath_null+0x15/0x17
 [<ffffffffa024e90a>] ath_stoprecv+0xd5/0xfb [ath9k]
 [<ffffffffa024d24f>] ath_reset+0x7e/0x1cb [ath9k]
 [<ffffffffa02485de>] ath_beacon_tasklet+0xe9/0x73c [ath9k]
 [<ffffffff812d45be>] ? uhci_free_td+0x8b/0x90
 [<ffffffffa02499cb>] ? ath9k_ioread32+0x64/0x71 [ath9k]
 [<ffffffffa01ace84>] ? ar9002_hw_get_isr+0x181/0x3cd [ath9k_hw]
 [<ffffffff810397fb>] ? __tasklet_schedule+0x46/0x4b
 [<ffffffffa024b03f>] ? tasklet_schedule+0x15/0x17 [ath9k]
 [<ffffffffa024c8e8>] ? ath_isr+0x1bb/0x1e5 [ath9k]
 [<ffffffff81039018>] tasklet_action+0x7a/0xcb
 [<ffffffff81039370>] __do_softirq+0x89/0x115
 [<ffffffff81077f92>] ? handle_irq_event+0x47/0x5d
 [<ffffffff813f34dc>] call_softirq+0x1c/0x26
 [<ffffffff810038d7>] do_softirq+0x41/0x7e
 [<ffffffff810395ad>] irq_exit+0x44/0x9e
 [<ffffffff810035f4>] do_IRQ+0x89/0xa0
 [<ffffffff813ed4d3>] common_interrupt+0x13/0x13
 <EOI>  [<ffffffff8100816c>] ? mwait_idle+0x5a/0x63
 [<ffffffff8100815f>] ? mwait_idle+0x4d/0x63
 [<ffffffff810011dc>] cpu_idle+0x58/0x93
 [<ffffffff813def4e>] start_secondary+0x190/0x195
---[ end trace 036819f004727a68 ]---

lspci info:

00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.2 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 3 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M-E LPC Interface Controller (rev 03)
00:1f.2 IDE interface: Intel Corporation ICH9M/M-E 2 port SATA IDE Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
00:1f.5 IDE interface: Intel Corporation ICH9M/M-E 2 port SATA IDE Controller (rev 03)
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
03:00.0 Network controller: Atheros Communications Inc. AR928X Wireless Network Adapter (PCI-Express) (rev 01)
04:00.0 IDE interface: JMicron Technology Corp. JMB368 IDE controller



hostapd-conf:

logger_syslog=-1
logger_syslog_level=2
ctrl_interface=/var/run/hostapd
ctrl_interface_group=0

# Some usable default settings...
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0 
# Uncomment these for base WPA & WPA2 support with a pre-shared key
wpa=2
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP

# DO NOT FORGET TO SET A WPA PASSPHRASE!!
wpa_passphrase=somepassword

# Most modern wireless drivers in the kernel need driver=nl80211
driver=nl80211


wme_enabled=1
ieee80211n=1
ht_capab=[HT40+][SHORT-GI-40][DSSS_CCK-40]

# Customize these for your local configuration...
interface=wlan0
hw_mode=g
channel=1
ssid=The Planetarium
own_ip_addr=192.168.1.15
Comment 2 boris_il_forte 2013-10-19 22:14:32 UTC
This bug is still an issue on 3.11 kernel.

I'm using debian experimental kernel amd64.


lspci -vv:

04:00.0 Network controller: Qualcomm Atheros AR928X Wireless Network Adapter (PCI-Express) (rev 01)
        Subsystem: Foxconn International, Inc. T77H047.31 802.11bgn Wireless Half-size Mini PCIe Card [AR9283]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 17
        Region 0: Memory at f0500000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: ath9k

this bug causes several issues in my case:
- Kernel panic at startup
- disable of interupt 17 and often system freeze
- ath9k crash at startup or randomly after a while
- fails to reload module with modprobe after removal


i've also tried to disable acpi to be sure that's not a power management issue, 	
neither the nohwcrypt=1 seems to work.

this bug has a nondeterministic behaviour: sometimes the drivers works just fine for a lot of time, even under network stress condition, sometimes the driver causes kernel panic at startup, and sometimes the driver couldn't be loaded at all.

I an give more infos if you tell me what you want.

However, I will add some basic system configuration information and the dmesg output of the error
Comment 3 boris_il_forte 2013-10-19 22:17:26 UTC
Created attachment 111641 [details]
system configuration and crash log

basic information about the configuration of the system where the bug has occurred:

lspci
lsmod
dmesg
Comment 4 oblique 2015-05-09 18:17:18 UTC
Hi,

I use OpenWRT with kernel 3.18.9 and this is still an issue [1].
A user of DD-WRT manage to find a way to reproduce the problem all the time [2].

It looks like if a client is connected to a 2.4ghz channel with HT40 and then another client which is HT40 intolerant (e.g. Mac OSX, iOS) tries to connect, the transition from HT40 to HT20 is not done correctly and then AP is not usable anymore.

To recover from this I have to reboot the router. As a workaround I configured my router to create AP with HT20 only.

[1] https://dev.openwrt.org/ticket/11862
[2] http://svn.dd-wrt.com/ticket/2952#comment:110