Bug 206329

Summary: [iwlwifi, v5.5 regression, bisected] Wireless-AC 3168NGW no longer initializes
Product: Drivers Reporter: Dmitry (nrndda)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: high CC: belegdol, bugs-a21, bugzilla, bugzilla, DeathTBO, dmoulding, dobmec, jmandawg, joanbrugueram, labadens.pierre, libcg, memantere, Michaelnussbaum08, noodles, noonetinone, paulocoghi, pbrobinson, randisitohang02, stevepoppers, thomasesr, unico
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://lore.kernel.org/linux-wireless/20200128093107.9740-1-dmoulding@me.com/
Kernel Version: 5.5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Kernel log with initialization error.
Kernel bisect log.
Motile m142 wifi error

Description Dmitry 2020-01-27 22:54:06 UTC
Created attachment 286999 [details]
Kernel log with initialization error.

Bisected to commit b3f20e098293892388d6a0491d6bbb2efb46fbff. Reverting makes wifi work again.
Comment 1 Dmitry 2020-01-27 22:54:39 UTC
Created attachment 287001 [details]
Kernel bisect log.
Comment 2 joanbrugueram 2020-01-28 09:37:52 UTC
Same problem here with a laptop with a 3168NGW. Working fine in 5.4.15 (Arch) and 5.5-rc7 (Mainline) but broken on 5.5 (Arch). Will build with the mentioned commit reverted and report back.
Comment 3 joanbrugueram 2020-01-28 10:54:07 UTC
Potential fix already committed here:

https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/linux-wireless/0.git/commit/?id=86ef7e6b1507a4bcf17a425f71bd1d7027136223

+The logic for checking required NVM sections was recently fixed in
+commit b3f20e098293 ("iwlwifi: mvm: Do not require PHY_SKU NVM section
+for 3168 devices"). However, with that fixed the else is now taken for
+3168 devices and within the else clause there is a mandatory check for
+the PHY_SKU section. This causes the parsing to fail for 3168 devices.
Comment 4 joanbrugueram 2020-01-28 12:16:38 UTC
Both reverting b3f20e098293 and the patch from the mailing list fix the problem for me.

You can find a the properly formatted mail + patch here:
https://lore.kernel.org/linux-wireless/20200128093107.9740-1-dmoulding@me.com/T
Comment 5 Dmitry 2020-01-28 22:48:45 UTC
Can confirm that patch helps. Mark resolved?
Comment 6 Duane Robertson 2020-01-29 01:27:08 UTC
I think I've got the same issue on my Motile m142. However, the patch did not help. Not only didn't the wifi load, I think it killed the boot record on the laptop. The system booted once from the patched kernel, and the wifi failed. When I rebooted, I got the bios screen repeatedly. I had to reinstall grub from a live usb. I'm a bit nervous about trying the patched kernel again.
Comment 7 Duane Robertson 2020-01-29 01:29:07 UTC
Created attachment 287013 [details]
Motile m142 wifi error

Attached the initial error on the Motile.
Comment 8 Duane Robertson 2020-01-29 02:15:29 UTC
I can't attach this, for some reason. This is the error after the above patch is applied on the Motile.

http://dpaste.com/0VC14XX
Comment 9 joanbrugueram 2020-01-29 11:33:57 UTC
Weird... if you're still in the mood for building kernels, can you try reverting b3f20e098293 instead?
Comment 10 Duane Robertson 2020-01-29 20:51:29 UTC
Ok, disregard everything I said about the uefi problem -- I've managed to reproduce that with the 5.4.15 kernel. Whatever it is, it's not recent.

However, reverting b3f20e098293 did not make the wifi work. Neither did using rc7. It looks like the same message to my untrained eye. 5.4.15 continues to work perfectly.
Comment 11 Duane Robertson 2020-01-29 20:56:46 UTC
The error after reverting b3f20e098293892388d6a0491d6bbb2efb46fbff:

http://dpaste.com/1GSNZRA

The error using rc-7:

http://dpaste.com/2ZCRNBV
Comment 12 Duane Robertson 2020-01-30 03:23:41 UTC
If I've done the bisection correctly, the first bad commit for my Motile is 39c1a9728f93: refactor the SAR tables from mvm to acpi. That's the point at which the driver crashes every time. I can't authenticate with my router well before that, but iwlwifi loads.
Comment 13 joanbrugueram 2020-02-01 12:41:20 UTC
After the patch from the mailing list I've been running and often rebooting my PC pretty intensely those few days and so far zero problems.

I can't comment on Duane's problem but it may very well be a different problem (or rather two chained problems). I never got a BAD_COMMAND or stack trace and also it worked with 5.5-rc7 for me. Unfortunately there are quite a few open iwlwifi reports with similar messages floating around so it's hard to tell for an untrained eye.

Also unfortunately the patch from the mailing list is not yet in the recently released 5.5.1 so affected users still need to recompile the kernel.
Comment 14 Clément Guérin 2020-02-03 05:47:34 UTC
I can confirm that the patch fixes the problem for me with the Intel 3168NGW on the Asrock X370 Killer SLI/AC motherboard.
Comment 15 Clément Guérin 2020-02-03 05:48:27 UTC
*** Bug 206333 has been marked as a duplicate of this bug. ***
Comment 16 joanbrugueram 2020-02-04 13:12:57 UTC
The commit causing the problem (b3f20e...) has been introduced to 5.4.18-rc3 ( https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/log/?h=linux-5.4.y&id=a59b851019bc15226d5c7c31ac4e0452e9a57d13 ), but the fix has not. Unless I'm missing something, this doesn't look good for 5.4.x users if it goes through like this...
Comment 17 Dmitry 2020-02-05 03:04:09 UTC
Rise importance to get attention. Backporting bug to stable is not good.
Comment 18 joanbrugueram 2020-02-05 21:48:30 UTC
Looks like the culprit commit (b3f20e...) finally didn't get into 5.4.18, so finally there should be no problem there.
Comment 19 Anthony Jagers 2020-02-10 00:53:25 UTC
The patch in comment 4 works. I'm looking for this fix to be pushed to
Kroah-Hartman's stable branch. If Luca Coelho isn't aware of this bug,
then someone should let this individual know.
Comment 20 Duane Robertson 2020-02-10 02:03:55 UTC
The BAD_COMMAND error on the Motile seems to be a separate issue, so I've started bug #206479 for that.
Comment 21 Clément Guérin 2020-02-12 18:03:12 UTC
linux 5.5.3 works fine for me, 3168NGW on the Asrock X370 Killer SLI/AC.
Comment 22 joanbrugueram 2020-02-12 18:25:30 UTC
@Clément Guérin Are you sure you're not receiving a patch through distro channels (e.g. Arch/Gentoo patchsets)? AFAIK this issue affects all 3168 devices and is not fixed in the mainline tree.
Comment 23 Duane Robertson 2020-02-12 18:50:50 UTC
Gentoo has already patched this issue with 5.5.2-gentoo-r1.
Comment 24 Clément Guérin 2020-02-12 18:57:42 UTC
@Joan you're right, it was patched on Arch Linux as of v5.5.2-arch2 https://git.archlinux.org/linux.git/log/?h=v5.5.2-arch2
Comment 25 Dan Moulding 2020-02-13 15:20:16 UTC
Patch has been accepted to the wireless-drivers tree. Should eventually land in the mainline. Once that happens, I will ping the stable mailing list to get a back-port to the 5.5 series.
Comment 26 randisitohang02 2020-02-15 07:22:47 UTC
@DanMoulding This bug haven't been patch on Upstream kernel v5.5.4
Comment 27 John 2020-02-24 10:28:49 UTC
Just tested on 5.6-rc3 (Ubuntu) and the wifi still doesn't work.
Will this regression be fixed before the final 5.6 ?
Comment 28 Julian Sikorski 2020-02-24 12:34:55 UTC
Fedora's kernel-5.5.5-200.fc31.x86_64 is affected too on ASRock Fatal1ty B450 Gaming-ITX/ac.
Comment 29 thomasesr 2020-02-25 21:58:32 UTC
I am also experiencing the same issue with 3168NGW wireless adaptor on the AsRock X370 Taichi. The same error in the log after update the kernel core on Fedora 31 Silverblue

kernel: iwlwifi 0000:08:00.0: Failed to run INIT ucode: -61
Comment 30 DeathTBO 2020-03-06 21:10:14 UTC
Looks like the x470 Taichi uses the same adapter. Still broken on Fedora 32 and Kernel 5.6 rc4.
Comment 31 Sami Mannila 2020-03-08 04:08:13 UTC
I'm experiencing the same issue with ASRock X399 Taichi (it also has the 3168NGW adapter). The latest Kernel I have tried is Kernel 5.6.0-rc4.
Comment 32 John 2020-03-10 13:54:39 UTC
Asrock X470 Taichi Ultimate is also affected by this.
Comment 33 Kris Karas 2020-03-13 06:43:58 UTC
Just chiming in with yet another "Me Too."
I have an ASRock X470 Taichi that, as others have reported, is affected.
I bisected the kernel, searched for the commit hash in bugzilla, and landed here.
Comment 34 Dan Moulding 2020-03-13 15:20:22 UTC
(In reply to John from comment #27)
> Just tested on 5.6-rc3 (Ubuntu) and the wifi still doesn't work.
> Will this regression be fixed before the final 5.6 ?

The patch has been merged to Linus's tree, so yes, this will be fixed in the final 5.6.

Now that the fix is in the mainline, it is also eligible for backport to the stable trees that need it. I have pinged the stable mailing list requesting inclusion in the 5.5.x stable series. Fingers crossed that it will be cherry-picked in time for the next 5.5.x release (which will be v5.5.10).
Comment 35 John 2020-03-13 16:40:24 UTC
Thank you very much Dan, I'm very happy to hear that it will be fixe in the final 5.6
Seeing the RCs number increase without this problem fixed i was afraid that it might be released without this fix.
I glad that it's all sorted out!
Comment 36 Paulo Coghi 2020-03-17 19:15:30 UTC
Exactly.

I tested both 5.5.9 and 5.6.rc5 on a MSI B450I Gaming Plus AC with a Dual Band Wireless-AC 3168NGW and it didn't work.

I returned to 5.4.25 and I will test today the 5.6-rc6 to see if it's already there.
Comment 37 Dmitry 2020-03-18 10:15:18 UTC
Commit "iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices" landed in 5.5.10, 5.4.26 and 4.19.111. As in 5.6-rc6.
Comment 38 TomRZ 2020-03-25 14:32:05 UTC
Same problem on fedora 32 beta, fresh install on a Dell Precision Mobile 5540
uname -a
Linux DESKTOP-75L32M2-1.home 5.6.0-0.rc7.git0.2.fc32.x86_64 #1 SMP Mon Mar 23 18:38:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

lspci -nnk | grep -iA3 "Network"
3b:00.0 Network controller [0280]: Intel Corporation Wireless-AC 9260 [8086:2526] (rev 29)
	Subsystem: Intel Corporation Device [8086:4010]
	Kernel modules: iwlwifi

dmesg|grep -i firmware
[    0.415253] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
[    2.811435] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[   14.032463] iwlwifi 0000:3b:00.0: Direct firmware load for (null)39.ucode failed with error -2
[   14.032497] iwlwifi 0000:3b:00.0: Direct firmware load for (null)38.ucode failed with error -2
[   14.032506] iwlwifi 0000:3b:00.0: Direct firmware load for (null)37.ucode failed with error -2
[   14.032514] iwlwifi 0000:3b:00.0: Direct firmware load for (null)36.ucode failed with error -2
[   14.032523] iwlwifi 0000:3b:00.0: Direct firmware load for (null)35.ucode failed with error -2
[   14.032531] iwlwifi 0000:3b:00.0: Direct firmware load for (null)34.ucode failed with error -2
[   14.032543] iwlwifi 0000:3b:00.0: Direct firmware load for (null)33.ucode failed with error -2
[   14.032552] iwlwifi 0000:3b:00.0: Direct firmware load for (null)32.ucode failed with error -2
[   14.032589] iwlwifi 0000:3b:00.0: Direct firmware load for (null)31.ucode failed with error -2
[   14.032602] iwlwifi 0000:3b:00.0: Direct firmware load for (null)30.ucode failed with error -2
[   14.032613] iwlwifi 0000:3b:00.0: Direct firmware load for (null)29.ucode failed with error -2
[   14.032622] iwlwifi 0000:3b:00.0: Direct firmware load for (null)28.ucode failed with error -2
[   14.032630] iwlwifi 0000:3b:00.0: Direct firmware load for (null)27.ucode failed with error -2
[   14.032661] iwlwifi 0000:3b:00.0: Direct firmware load for (null)26.ucode failed with error -2
[   14.032687] iwlwifi 0000:3b:00.0: Direct firmware load for (null)25.ucode failed with error -2
[   14.032695] iwlwifi 0000:3b:00.0: Direct firmware load for (null)24.ucode failed with error -2
[   14.032704] iwlwifi 0000:3b:00.0: Direct firmware load for (null)23.ucode failed with error -2
[   14.033022] iwlwifi 0000:3b:00.0: Direct firmware load for (null)22.ucode failed with error -2
[   14.033035] iwlwifi 0000:3b:00.0: Direct firmware load for (null)21.ucode failed with error -2
[   14.033047] iwlwifi 0000:3b:00.0: Direct firmware load for (null)20.ucode failed with error -2
[   14.033056] iwlwifi 0000:3b:00.0: Direct firmware load for (null)19.ucode failed with error -2
[   14.033066] iwlwifi 0000:3b:00.0: Direct firmware load for (null)18.ucode failed with error -2
[   14.033076] iwlwifi 0000:3b:00.0: Direct firmware load for (null)17.ucode failed with error -2
[   14.033085] iwlwifi 0000:3b:00.0: Direct firmware load for (null)16.ucode failed with error -2
[   14.033093] iwlwifi 0000:3b:00.0: Direct firmware load for (null)15.ucode failed with error -2
[   14.033102] iwlwifi 0000:3b:00.0: Direct firmware load for (null)14.ucode failed with error -2
[   14.033110] iwlwifi 0000:3b:00.0: Direct firmware load for (null)13.ucode failed with error -2
[   14.033119] iwlwifi 0000:3b:00.0: Direct firmware load for (null)12.ucode failed with error -2
[   14.033448] iwlwifi 0000:3b:00.0: Direct firmware load for (null)11.ucode failed with error -2
[   14.033474] iwlwifi 0000:3b:00.0: Direct firmware load for (null)10.ucode failed with error -2
[   14.033483] iwlwifi 0000:3b:00.0: Direct firmware load for (null)9.ucode failed with error -2
[   14.033492] iwlwifi 0000:3b:00.0: Direct firmware load for (null)8.ucode failed with error -2
[   14.033501] iwlwifi 0000:3b:00.0: Direct firmware load for (null)7.ucode failed with error -2
[   14.033510] iwlwifi 0000:3b:00.0: Direct firmware load for (null)6.ucode failed with error -2
[   14.033518] iwlwifi 0000:3b:00.0: Direct firmware load for (null)5.ucode failed with error -2
[   14.033526] iwlwifi 0000:3b:00.0: Direct firmware load for (null)4.ucode failed with error -2
[   14.033535] iwlwifi 0000:3b:00.0: Direct firmware load for (null)3.ucode failed with error -2
[   14.033544] iwlwifi 0000:3b:00.0: Direct firmware load for (null)2.ucode failed with error -2
[   14.033925] iwlwifi 0000:3b:00.0: Direct firmware load for (null)1.ucode failed with error -2
[   14.033934] iwlwifi 0000:3b:00.0: Direct firmware load for (null)0.ucode failed with error -2
[   14.033935] iwlwifi 0000:3b:00.0: no suitable firmware found!
[   14.033937] iwlwifi 0000:3b:00.0: check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
Comment 39 joanbrugueram 2020-03-25 15:03:16 UTC
@TomRZ This is a completely different issue, and the issue in this bug report has already been fixed in your kernel version.
Comment 40 TomRZ 2020-03-25 19:30:05 UTC
Oh sorry then :(
Comment 41 Paulo Coghi 2020-03-26 14:17:21 UTC
@TomRZ, anyway we appreciate your information and I would like to ask you to reopen it as a separate issue with all the same details, since it can be the case for other users as well.
Comment 42 TomRZ 2020-03-26 15:35:39 UTC
I've opened another bug, don't worry, and thanks a lot for your great work :)

https://bugzilla.redhat.com/show_bug.cgi?id=1817373