Bug 202399

Summary: iwlwifi: 8260: wireless module found on cold boot but not on reboot
Product: Drivers Reporter: Sylvain Leroux (sylvain)
Component: network-wireless-intelAssignee: DO NOT USE - assign "network-wireless-intel" component instead (linuxwifi)
Status: ASSIGNED ---    
Severity: normal CC: bjorn, hkallweit1, luca, sylvain
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.9 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg after a cold-boot
dmesg after reboot
hwinfo after a cold-boot
hwinfo after a reboot
lspci -xxxvvv after coldboot
lspci -xxxvvv after reboot
acpidump (after coldboot FWIW)

Description Sylvain Leroux 2019-01-24 14:38:41 UTC
Created attachment 280719 [details]
dmesg after a cold-boot

Hi linuxwifi@intel.com and iwlwifi team,

On an ASRock J4105-ITX motherboard with Intel 8260 Wireless-AC,
The 8260 module is detected and working properly on cold-boot. But when rebooting the computer, the module is no longer detected. Manually (re)loading the iwlwifi does not change anything. Only a power off/power on cycle seems to bring the module back.

Attached
Comment 1 Sylvain Leroux 2019-01-24 14:40:03 UTC
Created attachment 280721 [details]
dmesg after reboot
Comment 2 Sylvain Leroux 2019-01-24 14:40:48 UTC
Created attachment 280723 [details]
hwinfo after a cold-boot
Comment 3 Sylvain Leroux 2019-01-24 14:41:17 UTC
Created attachment 280725 [details]
hwinfo after a reboot
Comment 4 Sylvain Leroux 2019-01-24 14:49:06 UTC
Attached above are the dmesg/hwinfo gathered after a cold- and warm-boot.

The Intel 8260 is on /devices/pci0000:00/0000:00:13.1.
I am running the 4.9 kernel on a Debian Stretch distribution:

> $uname -a
> Linux nosferapti 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64
> GNU/Linux



Let me know if you need more info.
Regards,
- Sylvain
Comment 5 Emmanuel Grumbach 2019-01-24 15:11:07 UTC
Can you please add the output of sudo lspci -xxxvvv after a cold boot and after reboot?

Thanks.
Comment 6 Sylvain Leroux 2019-01-24 15:19:17 UTC
Created attachment 280727 [details]
lspci -xxxvvv after coldboot
Comment 7 Sylvain Leroux 2019-01-24 15:19:44 UTC
Created attachment 280729 [details]
lspci -xxxvvv after reboot
Comment 8 Sylvain Leroux 2019-01-24 15:20:53 UTC
Thanks for your interest in this issue Emmanuel.


I attached the `lspci -xxxvvv` output above.
Comment 9 Sylvain Leroux 2019-01-24 16:37:53 UTC
FWIW, as a workaround, adding the `reboot=pci` boot kernel option ensures the 8260 to be consistently detected after a reboot.
Comment 10 Emmanuel Grumbach 2019-01-24 17:16:03 UTC
Adding the PCI maintainer.
Comment 11 Emmanuel Grumbach 2019-02-14 06:55:48 UTC
There have been fixes in PCI lately. Those fixes aren't included in mainline yet.
Could you please test mainline + patches from:

https://bugzilla.kernel.org/show_bug.cgi?id=201469#c48

Thanks.
Comment 12 Bjorn Helgaas 2019-02-14 15:21:32 UTC
I doubt the ASPM fixes from bug 201469 are related, but this does seem to have some PCI wrinkle to it.

This system has:

  00:13.1 bridge to [bus 02]
  02:00.0 iwlwifi NIC

After the reboot, the 00:13.1 bridge has its secondary bus number programmed (02), but we didn't enumerate 02:00.0 and the bridge memory window is left disabled:

-       Memory behind bridge: a1300000-a13fffff
+       Memory behind bridge: fff00000-000fffff

After the reboot, BIOS would enumerate PCI devices and configure things, then Linux would enumerate everything again.  My guess is that the NIC isn't responding to config reads after the reboot.

That would mean the BIOS wouldn't find the NIC, so it would leave the bridge window disabled, and Linux also wouldn't find the NIC.

If the NIC doesn't respond to config accesses, Linux doesn't know it even exists, so the possibilities for a workaround are somewhat limited.

You might be able to fiddle with this theory by attempting another reset of the NIC by asserting the bridge's Secondary Bus Reset bit, e.g.,

  # setpci -s00:13.1 BRIDGE_CONTROL.w=0x40
  # setpci -s00:13.1 BRIDGE_CONTROL.w=0x00
  # echo 1 > /sys/bus/pci/rescan
Comment 13 Emmanuel Grumbach 2019-02-15 06:39:25 UTC
If that's the case, then it may be an "integration problem". I don't know much about all this, but I heard some stuff about the fact that the device needs the platform / BIOS to write to a specific place on the device upon certain flows.
I don't have more details about this, but when we have such bugs on self made systems, it may be related.
Comment 14 Bjorn Helgaas 2019-02-15 16:33:21 UTC
The platform/BIOS has no way to touch the NIC itself if it's not responding to config reads.  There could certainly be something else in the chipset, e.g., in the ICH, that is relevant.

We do have several DMI quirks that use set_pci_reboot() to essentially do "reboot=pci" automatically.  We could add a similar quirk for this system.  I don't think that's *ideal* because presumably Windows reboots cleanly without such a quirk, and there may be many systems with this issue and we may be adding such quirks frequently.  But maybe that's the only option, since we don't know any other way to fix this.

ACPI provides information about how to reboot, so it'd be nice to have an acpidump attached here just in case we can figure out a more generic fix in the future.
Comment 15 Emmanuel Grumbach 2019-02-25 21:03:57 UTC
Sylvain do you want to provide the required information here?
I am not sure we'll be able to do much without this.
If not, I'll close.
Comment 16 Bjorn Helgaas 2019-02-25 21:24:23 UTC
I think the dmesg log has enough information to write the quirk.  The acpidump is a "nice to have" for possible future improvements.  If we do go the quirk route, who wants to write it?  It's not hard and I think there are existing ones we can copy, but it would take me a couple days before I have a chance.
Comment 17 Sylvain Leroux 2019-02-25 21:28:07 UTC
Emmanuel, I do not have access to the device for now. Probably next week. Sorry for the delay.
Comment 18 Emmanuel Grumbach 2019-02-26 06:37:37 UTC
(In reply to Bjorn Helgaas from comment #16)
> I think the dmesg log has enough information to write the quirk.  The
> acpidump is a "nice to have" for possible future improvements.  If we do go
> the quirk route, who wants to write it?  It's not hard and I think there are
> existing ones we can copy, but it would take me a couple days before I have
> a chance.

I don't really know how to do that. I can learn, sure, but it'll have to wait since I am busy as well.
Another thing is that I am not even sure we need to go there. After all, this system is a self made system and we probably don't want to add a quirk for every system a user may build?
Comment 19 Sylvain Leroux 2019-02-26 08:53:19 UTC
From  Emmanuel Grumbach  in comment 18:
> Another thing is that I am not even sure we need to go there. After all, this
> system is a self made system [...]

Well, it all depends on your definition of a self-made system. It is not some exotic system made of brandless parts. The motherboard is an ASRock with embedded Intel CPU. The only real change compared to the stock motherboard was the addition of a couple of RAM and WiFi modules.

From Bjorn Helgaas in comment 14:
>  I don't think that's *ideal* because presumably Windows reboots cleanly
>  without such a quirk, and there may be many systems with this issue and we
>  may be adding such quirks frequently.  But maybe that's the only option,
>  since we don't know any other way to fix this.
>
> ACPI provides information about how to reboot, so it'd be nice to have an
> acpidump attached here just in case we can figure out a more generic fix in
> the future.

Surely, a generic fix should be better. I will make the necessary to send you the `acpidump` you've requested ASAP. That being said, if the only solution is to add a system-specific quirk, well, at least that would make the system work "out of the box".
Comment 20 Sylvain Leroux 2019-02-26 09:07:48 UTC
Created attachment 281353 [details]
acpidump (after coldboot FWIW)
Comment 21 Luca Coelho 2019-08-21 08:11:23 UTC
We need to double-check whether there's anything we can do about this.
Comment 22 Heiner Kallweit 2020-11-28 21:42:50 UTC
I have the same issue with an AX210 card on a Zotac ZBOX CI327 nano (N3450 CPU, linux-next kernel from 11/27/2020). Card isn't listed by lspci after a reboot, reboot=pci helps. So far I go with the following private change, however it may be unfair to blame the system if the root cause should be the cards behavior.

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index db115943e..9991c5920 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -477,6 +477,15 @@ static const struct dmi_system_id reboot_dmi_table[] __initconst = {
 		},
 	},
 
+	{	/* PCIe Wifi card isn't detected after reboot otherwise */
+		.callback = set_pci_reboot,
+		.ident = "Zotac ZBOX CI327 nano",
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "NA"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "ZBOX-CI327NANO-GS-01"),
+		},
+	},
+
 	/* Sony */
 	{	/* Handle problems with rebooting on Sony VGN-Z540N */
 		.callback = set_bios_reboot,
-- 
2.29.2