Bug 208251

Summary: ath9k: broken after changes in 5.4.47
Product: Drivers Reporter: Roman Mamedov (rm+bko)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexander.konotop, aliakc, anenbupt, carnil, fmscott1, iyanmv, jkoderu, jpisaniello+kernel, jwrdegoede, lacov, muhammetk, radek, rm+bko, rtentser, santi-hernandez, ucelsanicin, viktor_jaegerskuepper, vmlinuz386, ZeroBeat
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4.47 Subsystem:
Regression: Yes Bisected commit-id:

Description Roman Mamedov 2020-06-19 14:32:12 UTC
Hello,

I have:

Bus 001 Device 004: ID 13d3:3327 IMC Networks AW-NU137 802.11bgn Wireless Module [Atheros AR9271]

On 5.4.47 it no longer works at all:

[   37.104168] usb 1-2.2: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[   37.104991] usbcore: registered new interface driver ath9k_htc
[   37.412791] usb 1-2.2: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[   37.693289] ath9k_htc 1-2.2:1.0: ath9k_htc: HTC initialized with 33 credits
[  148.871980] ath9k_htc: Failed to initialize the device
[  148.872155] usb 1-2.2: ath9k_htc: USB layer deinitialized

Rolling back to 5.4.44 makes it work again:

[   37.294753] usb 1-2.2: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[   37.295356] usbcore: registered new interface driver ath9k_htc
[   37.589345] usb 1-2.2: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[   37.841053] ath9k_htc 1-2.2:1.0: ath9k_htc: HTC initialized with 33 credits
[   38.121058] ath9k_htc 1-2.2:1.0: ath9k_htc: FW Version: 1.4
[   38.121160] ath9k_htc 1-2.2:1.0: FW RMW support: On
[   38.121237] ath: EEPROM regdomain: 0x60
[   38.121239] ath: EEPROM indicates we should expect a direct regpair map
[   38.121243] ath: Country alpha2 being used: 00
[   38.121245] ath: Regpair used: 0x60

It is not practical to bisect for me right now, as it is a remote machine. But releases 45 and 46 did not have ath9k changes, while the 47th had quite a lot.
Comment 1 Roman Mamedov 2020-06-19 14:34:27 UTC
Dear Qiujun Huang could you please take a look? Thanks
Comment 2 Viktor Jägersküpper 2020-06-19 15:45:17 UTC
This also affects the latest stable kernel 5.7.4. I am bisecting this to find the bad commit, but I am using an old Core 2 Duo, so this will take some time.
Comment 3 Adrian Bassett 2020-06-19 16:59:25 UTC
(In reply to Viktor Jägersküpper from comment #2)
> This also affects the latest stable kernel 5.7.4. I am bisecting this to
> find the bad commit, but I am using an old Core 2 Duo, so this will take
> some time.

Confirmed for 5.7.3 (where ath9k changes were made) and 5.6.19 (didn't try 5.6.18 where equivalent changes to 5.7.3 occurred).
Comment 4 Alexander Konotop 2020-06-20 08:56:51 UTC
For me ath9k also does not work in 5.7.3.
Works fine after downgrading to 5.7.2.

I didn't post a separate bug though probably it's another problem because I couldn't find any useful info actually. There are no error messages in systemd log. After I plugin the usb adapter in the latest message is that it has successfully sent firmware and no errors appear. But no device is created after plugging the device in, as if udev would ignore it somehow.
Comment 5 Roman Mamedov 2020-06-20 09:16:32 UTC
@Alexander: It is probably the same problem. If you look closely at dmesg in the bug report, you see that the failure message appears only almost 2 minutes after the firmware message. Maybe it's the same for you too (and you didn't wait for it long enough to recheck the logs)?
Comment 6 Viktor Jägersküpper 2020-06-20 16:11:00 UTC
The first bad commit according to my bisection (using the linux-5.7.y branch) is:

6602f080cb28745259e2fab1a4cf55eeb5894f93 (ath9k: Fix general protection fault in ath9k_hif_usb_rx_cb)

This relates to commit 2bbcaaee1fcbd83272e29f31e2bb7e70d8c49e05 upstream.

I will test if reverting the commit helps (again on the linux-5.7.y branch, currently at the 5.7.4 release).
Comment 7 Ali Akcaagac 2020-06-20 20:53:28 UTC
Hi,

I can confirm this issue and did bisecting myself and ended up with the same commit.

Reported it on bugzilla.redhat.com

https://bugzilla.redhat.com/show_bug.cgi?id=1848631

Hope I could help.
Comment 8 Ali Akcaagac 2020-06-20 20:59:26 UTC
Also affected: 5.6.19 and 5.8.0-rc1/2
Comment 9 Roman Mamedov 2020-06-21 13:12:56 UTC
> I will test if reverting the commit helps

Yes I confirm that reverting the mentioned commit restores the device operation on 5.4.47. Thanks
Comment 10 Viktor Jägersküpper 2020-06-21 14:10:53 UTC
Reverting the commit also solves the problem for 5.7.4.
Comment 11 Adrian Bassett 2020-06-21 14:11:10 UTC
Unsurprisingly, reverting commit 6602f080cb28745259e2fab1a4cf55eeb5894f93 unbreaks this driver for 5.7.4
Comment 13 Michael 2020-06-30 10:17:08 UTC
I can confirm it for TP-Link TL-WN722N:
Bus 001 Device 007: ID 0cf3:9271 Qualcomm Atheros Communications AR9271 802.11n

Reverting the commit solves the problem for 5.7.6-arch1-1.
Comment 14 lacov 2020-06-30 15:45:06 UTC
I really tried to find the answer by myself but...
What does reverting the commit means?
Reverting changes I believe, but is that mean that I can update kernel and revert changes only for atheros?

Currently I have my kernel downgraded to Las working for me version so I have 5.4.46 currently.
Comment 15 Michael 2020-07-01 08:16:05 UTC
@lacov@o2.pl remove the commit that caused the trouble. Than compile it against your kernel and replace the broken code one with the compiled new one.
Comment 16 jkoderu 2020-07-01 14:17:41 UTC
Is this going to be fixed and released for everyone or we manually revert and compile the kernel if we have this problem? No change in yesterday kernel update.
Comment 17 Viktor Jägersküpper 2020-07-01 14:26:13 UTC
I am preparing a patch which reverts the mentioned commit because I don't see any fix on the mailing lists. The patch will land in the mainline kernel first and then be picked up for the stable and longterm kernel releases.
Comment 18 jkoderu 2020-07-02 21:20:28 UTC
(In reply to Viktor Jägersküpper from comment #17)
> I am preparing a patch which reverts the mentioned commit because I don't
> see any fix on the mailing lists. The patch will land in the mainline kernel
> first and then be picked up for the stable and longterm kernel releases.

Thanks.

I asked because the mentioned commit fixes something but brakes something else. Reverting it isn't an universal fix.
Comment 19 Viktor Jägersküpper 2020-07-03 07:57:11 UTC
Roman reported this bug in the email thread which proposed the broken patch and a kernel developer asked for a patch to revert the mentioned commit if there is no proper fix, see here:

https://lore.kernel.org/linux-wireless/87lfkff9qe.fsf@codeaurora.org/
Comment 20 Iyan Mendez 2020-07-13 07:32:47 UTC
Any news about this? It's been broken for so long and if the solution it's just a patch to revert one commit, I don't see why it should take that long.
Comment 21 Viktor Jägersküpper 2020-07-13 10:38:44 UTC
If Kalle Valo (ath9k maintainer) doesn't reply today, I will ask David Miller (networking maintainer who merged Kalle's tree) to revert the commit or to forward this issue to Linus Torvalds. Any of them should deal with this in the end.
Comment 22 rtentser 2020-07-13 19:47:32 UTC
There was a report that it was fixed in 5.7.8. Please check:

https://bugs.archlinux.org/task/67041#comment190998
Comment 23 Roman Mamedov 2020-07-13 19:50:30 UTC
Arch Linux just picked up Viktor's revert by themselves, not waiting for mainline: https://git.archlinux.org/linux.git/commit/?h=v5.7.8-arch1&id=1a32e7b57b0b37cab6845093920b4d1ff94d3bf4

There are no ath9k changes in the mainline 5.7.8.
Comment 24 Hans de Goede 2020-07-14 10:03:15 UTC
First of all thank you to everyone involved for the bisecting and for submitting the revert upstream.

Tip for next time, for stable regressions please send an email to stable@vger.kernel.org with a subject of "Please revert "patch subject" from the stable kernels" that is the quickest way to get troublesome patches dropped from the stable series.

I've send an email to Greg Kroah-Hartman / stable@vger.kernel.org asking for the offending commit to be reverted. I've pointed him to:
https://git.archlinux.org/linux.git/commit/?h=v5.7.8-arch1&id=1a32e7b57b0b37cab6845093920b4d1ff94d3bf4

So that he can cherry-pick that and credit those involved in figuring out the problem.

I expect the revert to show up in 5.7.9 and other upcoming stable releases.
Comment 25 Viktor Jägersküpper 2020-07-16 08:27:18 UTC
Someone posted a patch to fix this properly (i.e. via a revert), which fixes the issue for me. Please test the fix if you can (ideally with the mainline kernel from Linus Torvalds's branch):

https://patchwork.kernel.org/patch/11657669/

You can add the "Tested-By" line then in your reply to the corresponding e-mail.
Comment 26 Viktor Jägersküpper 2020-07-16 08:28:23 UTC
> Someone posted a patch to fix this properly (i.e. via a revert)

This should read "not via a revert" of course.
Comment 27 Viktor Jägersküpper 2020-07-16 08:37:19 UTC
I forgot that the revert indeed landed in the 5.7.9, 5.4.52 and 4.19.133 releases (it is also queued for the older longterm kernels), so you would have to test with an older release of the stable (i.e. <=5.7.8) and longterm kernels, but that is not the recommended way to test this fix. So you should really test with the 5.8 pre-release mainline kernel.
Comment 28 Jordan Pisaniello 2020-07-21 12:21:08 UTC
5.7.9 landed in Fedora 32 recently and I can confirm that the patch has restored functionality:

[   44.574618] usb 3-1: ath9k_htc: Firmware ath9k_htc/htc_9271-1.4.0.fw requested
[   44.577810] usbcore: registered new interface driver ath9k_htc
[   44.859691] usb 3-1: ath9k_htc: Transferred FW: ath9k_htc/htc_9271-1.4.0.fw, size: 51008
[   45.111509] ath9k_htc 3-1:1.0: ath9k_htc: HTC initialized with 33 credits
[   45.375128] ath9k_htc 3-1:1.0: ath9k_htc: FW Version: 1.4
[   45.375129] ath9k_htc 3-1:1.0: FW RMW support: On

Thanks to all that made this possible.
Comment 29 Ali Akcaagac 2020-07-22 18:43:54 UTC
Confirming that 5.7.9 solved the issue for me.
Comment 30 Viktor Jägersküpper 2020-07-31 17:26:53 UTC
The original patch (which caused the problems) and the fix by Mark O'Donovan are now included in the stable and longterm kernels (5.7.11, 5.4.54, 4.19.135, 4.14.190, 4.9.232, 4.4.232 and higher).
Comment 31 Hans de Goede 2020-08-01 13:43:13 UTC
As mentioned in comment 30, this is fixed now, so lets close it.