Bug 119151

Summary: [regression] ath10k no longer authenitcates and freezes system
Product: Drivers Reporter: Mike Lothian (mike)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Severity: normal CC: kvalo, linville, regressions
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.7-rc0 Subsystem:
Regression: No Bisected commit-id:
Attachments: cal-pci-0000:3c:00.0.bin
journalctl output

Description Mike Lothian 2016-05-27 12:36:11 UTC
Since the updates for 4.7 landed in Linus's tree my system freezes and I get a continuous stream of errors in my logs, I always have to hard reset my machine.

I'll attach the firmware I'm using too

3c:00.0 Network controller [0280]: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter [168c:003e] (rev 32)
Comment 1 Mike Lothian 2016-05-27 12:40:58 UTC
Created attachment 217881 [details]

The calibration binary
Comment 2 Mike Lothian 2016-05-27 12:42:05 UTC
Created attachment 217891 [details]

The board.bin
Comment 3 Mike Lothian 2016-05-27 12:42:56 UTC
Created attachment 217901 [details]

The firmware-5.bin file
Comment 4 Mike Lothian 2016-05-27 12:43:50 UTC
I've not been able to capture the output so far as I can't ssh in and the system freezes up before I can save the output. I'll keep trying
Comment 5 Mike Lothian 2016-05-27 13:29:56 UTC
Hmm, if I compile ath10k as a module rather than compiling it into my kernel it now works:

fireburn@axion ~ $ dmesg | grep ath
[    0.000000] Kernel command line: root=/dev/nvme0n1p2 usbcore.autosuspend=1 rootfstype=ext4 libahci.ignore_sss=1 init=/usr/lib/systemd/systemd ath10k_core.skip_otp=y amdgpu.runpm=1 amdgpu.powerplay=1 ignore_loglevel 
[    1.189693] ath10k_pci 0000:3c:00.0: enabling device (0000 -> 0002)
[    1.190562] ath10k_pci 0000:3c:00.0: pci irq msi-x interrupts 8 irq_mode 0 reset_mode 0
[    1.391761] ath10k_pci 0000:3c:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:1535
[    1.391766] ath10k_pci 0000:3c:00.0: kconfig debug 1 debugfs 0 tracing 0 dfs 0 testmode 0
[    1.392187] ath10k_pci 0000:3c:00.0: firmware ver WLAN.RM.2.0-00180-QCARMSWPZ-1 api 5 features wowlan,ignore-otp,no-4addr-pad crc32 75dee6c5
[    1.454222] ath10k_pci 0000:3c:00.0: Direct firmware load for ath10k/QCA6174/hw3.0/board-2.bin failed with error -2
[    1.454229] ath10k_pci 0000:3c:00.0: board_file api 1 bmi_id N/A crc32 ed5f849a
[    3.508537] ath10k_pci 0000:3c:00.0: htt-ver 3.26 wmi-op 4 htt-op 3 cal file max-sta 32 raw 0 hwcrypto 1
[    3.566046] ath: EEPROM regdomain: 0x0
[    3.566047] ath: EEPROM indicates default country code should be used
[    3.566047] ath: doing EEPROM country->regdmn map search
[    3.566048] ath: country maps to regdmn code: 0x3a
[    3.566048] ath: Country alpha2 being used: US
[    3.566048] ath: Regpair used: 0x3a

Any idea why?
Comment 6 Mike Lothian 2016-05-27 13:39:18 UTC
Sorry that was my old 4.6 kernel that booted up, with the firmware still compiled in

The system still freezes up with 4.7-rc0 even when it's compiled as a module, it doesn't freeze if ath10k isn't built at all
Comment 7 Mike Lothian 2016-05-30 20:15:34 UTC
Created attachment 218281 [details]
journalctl output

I managed to capture this when insmoding the modules manually
Comment 8 Mike Lothian 2016-05-31 22:31:43 UTC
I managed to bisect using git bisect start net/mac80211/ drivers/net/wireless/ath/ath10k/ ; git bisect bad ; git bisect good v4.6

Most of the builds seem to come from 4.5-rc5

And this was the one pinpointed:

5c86d97bcc1d42ce7f75685a61be4dad34ee8183 is the first bad commit
commit 5c86d97bcc1d42ce7f75685a61be4dad34ee8183
Author: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Date:   Tue Mar 22 17:22:19 2016 +0530

ath10k: combine txrx and replenish task

Since tx completion and rx indication processing are moved out
of txrx tasklet and rx ring lock contention also removed from txrx
for rx_ind messages, it would be efficient to combine both replenish
and txrx tasks. Refill threshold is adjusted for both AP135 and AP148
(low and high end systems). With this adjustment in AP135, TCP DL is
improved from 603 Mbps to 620 Mbps and UDP DL is improved from 758 Mbps
to 803 Mbps. Also no watchdog are observed on UDP BiDi.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

:040000 040000 4b4efb5e5fbd3690e87aced0c4460cd70d446c0e 048e41895952e18d09703fdd24b45d58699c1ac4 M      drivers

Reverting this commit gets everything working again :D
Comment 9 Kalle Valo 2016-06-02 13:58:07 UTC
Thanks for the bisect, I reported the issue on the list:


Mike, can I CC you on the thread? I was not sure if it's ok to use your email or not.
Comment 10 Mike Lothian 2016-06-02 14:11:55 UTC
That's fine, saves me subscribing.
Comment 11 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-06-17 10:33:57 UTC
@Kalle, @Mike: What's the latest status of this bug? I see no recent ath10k-commits in mainline, so I assume this is still unfixed.
 Sincerely, your regression tracker for Linux 4.7 (http://bit.ly/28JRmJo )
Comment 12 Kalle Valo 2016-06-20 14:56:19 UTC
The fix is currently in wireless-drivers.git, from there it will go to net.git and then to Linus' tree.

ath10k: fix deadlock while processing rx_in_ord_ind

Comment 13 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-07-02 06:07:38 UTC
The fix is in mainline now. Can somebody close this please? I'm lacking permission to do so :-/