Bug 217549

Summary: Dell XPS 13 ath10k_pci firmware crashed!
Product: Drivers Reporter: Garry Williams (gtwilliams)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED INVALID    
Severity: normal CC: bagasdotme, gtwilliams, pmenzel+bugzilla.kernel.org
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: Fedora patch that I applied to 6,3,3 to reproduce my error

Description Garry Williams 2023-06-14 00:39:33 UTC
Beginning with kernel 6.2.15-300.fc38.x86_64 and continuing through 6.3.7-200.fc38.x86_64, the wifi connection fails periodically with these log messages:

ath10k_pci 0000:02:00.0: firmware crashed! (guid 6c545da0-593c-4a0e-b5ad-3ef2b91cdebf)
ath10k_pci 0000:02:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:143a
ath10k_pci 0000:02:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
ath10k_pci 0000:02:00.0: firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp crc32 bf907c7c
ath10k_pci 0000:02:00.0: board_file api 2 bmi_id N/A crc32 d2863f91
ath10k_pci 0000:02:00.0: htt-ver 3.87 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
ath10k_pci 0000:02:00.0: failed to get memcpy hi address for firmware address 4: -16
ath10k_pci 0000:02:00.0: failed to read firmware dump area: -16
ath10k_pci 0000:02:00.0: Copy Engine register dump:
ath10k_pci 0000:02:00.0: [00]: 0x00034400  12  12   3   3
ath10k_pci 0000:02:00.0: [01]: 0x00034800  14  14 347 348
ath10k_pci 0000:02:00.0: [02]: 0x00034c00   8   2   0   1
ath10k_pci 0000:02:00.0: [03]: 0x00035000  16  15  16  14
ath10k_pci 0000:02:00.0: [04]: 0x00035400 2995 2987  22 214
ath10k_pci 0000:02:00.0: [05]: 0x00035800   0   0  64   0
ath10k_pci 0000:02:00.0: [06]: 0x00035c00   0   0  18  18
ath10k_pci 0000:02:00.0: [07]: 0x00036000   1   1   1   0
ath10k_pci 0000:02:00.0: could not request stats (-108)
ath10k_pci 0000:02:00.0: could not request peer stats info: -108
ath10k_pci 0000:02:00.0: failed to read hi_board_data address: -28
ieee80211 phy0: Hardware restart was requested
ath10k_pci 0000:02:00.0: could not request stats (-108)
ath10k_pci 0000:02:00.0: device successfully recovered


If I disconnect and reconnect using network manager, the connection is restored.  But this same failure recurs over and over after some few minutes to a few hours.

This is a regression.  The error was not reported with any previous kernel since 6.2.14-300.fc38.x86_64
Comment 1 Bagas Sanjaya 2023-06-14 00:53:56 UTC
(In reply to Garry Williams from comment #0)
> Beginning with kernel 6.2.15-300.fc38.x86_64 and continuing through
> 6.3.7-200.fc38.x86_64, the wifi connection fails periodically with these log
> messages:
> 
> ath10k_pci 0000:02:00.0: firmware crashed! (guid
> 6c545da0-593c-4a0e-b5ad-3ef2b91cdebf)
> ath10k_pci 0000:02:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff
> sub 1a56:143a
> ath10k_pci 0000:02:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
> ath10k_pci 0000:02:00.0: firmware ver WLAN.RM.4.4.1-00288- api 6 features
> wowlan,ignore-otp,mfp crc32 bf907c7c
> ath10k_pci 0000:02:00.0: board_file api 2 bmi_id N/A crc32 d2863f91
> ath10k_pci 0000:02:00.0: htt-ver 3.87 wmi-op 4 htt-op 3 cal otp max-sta 32
> raw 0 hwcrypto 1
> ath10k_pci 0000:02:00.0: failed to get memcpy hi address for firmware
> address 4: -16
> ath10k_pci 0000:02:00.0: failed to read firmware dump area: -16
> ath10k_pci 0000:02:00.0: Copy Engine register dump:
> ath10k_pci 0000:02:00.0: [00]: 0x00034400  12  12   3   3
> ath10k_pci 0000:02:00.0: [01]: 0x00034800  14  14 347 348
> ath10k_pci 0000:02:00.0: [02]: 0x00034c00   8   2   0   1
> ath10k_pci 0000:02:00.0: [03]: 0x00035000  16  15  16  14
> ath10k_pci 0000:02:00.0: [04]: 0x00035400 2995 2987  22 214
> ath10k_pci 0000:02:00.0: [05]: 0x00035800   0   0  64   0
> ath10k_pci 0000:02:00.0: [06]: 0x00035c00   0   0  18  18
> ath10k_pci 0000:02:00.0: [07]: 0x00036000   1   1   1   0
> ath10k_pci 0000:02:00.0: could not request stats (-108)
> ath10k_pci 0000:02:00.0: could not request peer stats info: -108
> ath10k_pci 0000:02:00.0: failed to read hi_board_data address: -28
> ieee80211 phy0: Hardware restart was requested
> ath10k_pci 0000:02:00.0: could not request stats (-108)
> ath10k_pci 0000:02:00.0: device successfully recovered
> 
> 
> If I disconnect and reconnect using network manager, the connection is
> restored.  But this same failure recurs over and over after some few minutes
> to a few hours.
> 
> This is a regression.  The error was not reported with any previous kernel
> since 6.2.14-300.fc38.x86_64

Can you perform bisection please?
Comment 2 Garry Williams 2023-06-14 01:02:33 UTC
Sorry but I am not capable to do that.  I only test and run with Fedora kernels.  The last good one is 6.2.14-300.
Comment 3 Bagas Sanjaya 2023-06-14 01:07:57 UTC
On 6/14/23 08:02, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217549
> 
> --- Comment #2 from Garry Williams (gtwilliams@gmail.com) ---
> Sorry but I am not capable to do that.  I only test and run with Fedora
> kernels.  The last good one is 6.2.14-300.
> 

Please see Documentation/admin-guide/bug-bisect.rst in the kernel
sources for how to do bisection. And because you're about to compile
your own custom kernel when bisecting, see
Documentation/admin-guide/quickly-build-trimmed-linux.rst for building
instructions.

See you in your bisection report!
Comment 4 Garry Williams 2023-06-20 03:15:14 UTC
OK, I built 6.3.3 thinking I'd start there since it fails with the
Fedora kernel 6.3.3. I was surprised to note that the failure on my
local build would not recur.  So I checked out the Fedora kernel
package for 6.3.3 and noted that the spec file applies a patch before
building the kernel to ship as a Fedora rpm.  So I applied that patch
to the 6.3.3 kernel sources and rebuilt.  Now my bug appears.

It's bisected, but it is not a kernel problem.  I will take this to my
distribution and report the patch that causes the error.  I am sorry
for the noise.  (But at least I can perform a git bisect now.  :-))
Comment 5 Paul Menzel 2023-06-20 14:31:51 UTC
For the record, it’d be great if you added the URL for Fedora’s patch. If you report it to Fedora, please also document it here.
Comment 6 Garry Williams 2023-06-20 14:44:48 UTC
Created attachment 304467 [details]
Fedora patch that I applied to 6,3,3 to reproduce my error
Comment 7 Garry Williams 2023-06-20 14:45:45 UTC
Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=2210435
Comment 8 Garry Williams 2023-06-20 14:48:15 UTC
Fedora git repo for kernel RPM (contains attached patch file): 
https://src.fedoraproject.org/rpms/kernel.git
Comment 9 Garry Williams 2023-07-08 22:41:38 UTC
For the record, Fedora 6.3.11-200 fixes the problem.  I closed the downstream bug.
Comment 10 Kalle Valo 2023-07-11 09:22:23 UTC
> For the record, Fedora 6.3.11-200 fixes the problem.  I closed the downstream
> bug.
Thanks for the update, this always helps us a lot. That way it's easier
for us to help other Fedora users.