Created attachment 213481 [details] pci info from lspci -vvv system : Dell Constantine. (Intell skylake-H) fail device : Marvell 8897 reproduce procedure: enter S3 by click on suspend button of start manu wait system suspend, click on keyboard to wake up system check wifi disapper from network management. analysis with devmem command device info : 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 255 Region 0: Memory at d0100000 (64-bit, prefetchable) [disabled] [size=1M] Region 2: Memory at d0000000 (64-bit, prefetchable) [disabled] [size=1M] Capabilities: <access denied> Kernel modules: mwifiex_pcie 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Subsystem: Dell I210 Gigabit Network Connection Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 0 Region 0: Memory at df000000 (32-bit, non-prefetchable) [size=512K] Region 2: I/O ports at e000 [size=32] Region 3: Memory at df080000 (32-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: igb Kernel modules: igb memory access before suspend works : root@bdx00-Embedded-Box-PC-5000:/home/bdx00# devmem2 0xd0000c44 w /dev/mem opened. Memory mapped at address 0x7ffaa45f2000. Value at address 0xD0000C44 (0x7ffaa45f2c44): 0xFEDCBA00 root@bdx00-Embedded-Box-PC-5000:/home/bdx00# dmesg of resume: Memory access after resume failed: root@bdx00-Embedded-Box-PC-5000:/home/bdx00# devmem2 0xd0000c44 w /dev/mem opened. Memory mapped at address 0x7fd0317df000. Value at address 0xD0000C44 (0x7fd0317dfc44): 0xFFFFFFFF root@bdx00-Embedded-Box-PC-5000:/home/bdx00# from pci analyzer, no pci transection show in bus. pci bus re-scan can bring memory access back: echo 1 > /sys/bus/pci/rescan root@bdx00-Embedded-Box-PC-5000:/home/bdx00# echo 1 > /sys/bus/pci/rescan root@bdx00-Embedded-Box-PC-5000:/home/bdx00# root@bdx00-Embedded-Box-PC-5000:/home/bdx00# devmem2 0xd0000c44 w /dev/mem opened. Memory mapped at address 0x7fefcd0c1000. Value at address 0xD0000C44 (0x7fefcd0c1c44): 0xFEDCBA00 root@bdx00-Embedded-Box-PC-5000:/home/bdx00#
analysis with Pci analyzer: (1)Captured Intel WLAN card suspend/resume on Constantine. Reviewed and compared the trace. The config cycles on both Intel and Marvell 8897 are similar. For suspend, after host change device power state to D3 it sends PME TurnOff message. KF send PME ACK. Followed by PM EnterL23_request and host ACK and the PCIe link shuts down. For resume, it follows the standard bus numeration process. Assigning BAR address, clearing Device control status and setting Link Control, Resetting correctable and uncorrectable error status registers. . (2)Setup up working paltform (ATOM based with L1 ASPM enabled). Performed bus capture. The suspend process is the same as Constantine except the host turns off the link before we get a chance to send PM EnterL23 request. Resume process is the same as Constantine. (3)We did the PCIe capture to find out what the rescan command does. The bus trace show this rescan didn’t do anything to our PCI config register (no read or write access). From the PCIe config dump of the bridge controller (1C.0) before and after rescan command, also show kernel didn’t touch the bridge controller at all.
dmesg message show in rescan related fail device, is this showing any fix on bar allocation? [ 91.975646] i915_bpo 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
Please perform the following experiments: 1. echo "freeze" > /sys/power/state (instead of echo "mem") wake up the system, and verify that the device still works. If it fails the same way, then the bug is in Linux, likely the device driver. If it works, then we've learned only that the driver is not killing the device -- as it follows the same flow for "mem" and "freeze". 2. unload the driver, suspend to mem, resume, re-load the driver If this fails, then the problem is likely in the HW/BIOS; or something very strange about interaction between PCI and resources on this platform with Linux.
Created attachment 214311 [details] Freeze log
So the adapter works after resume from "freeze", right?
1. With upstream: 4.6.0-rc2, the system successfully goes to freeze and resume. see the log attached. It also is functional. Can scan APs. 2. unload driver --> suspend --> resume--> load driver, doesn't work. 3. unload driver --> suspend--> resume --> "echo 1 > /sys/bus/pci/rescan" --> load driver, works.
OK, thanks! We have PCIe port PM patches under discussion: https://patchwork.kernel.org/patch/8924981/ https://patchwork.kernel.org/patch/8924961/ https://patchwork.kernel.org/patch/8924921/ https://patchwork.kernel.org/patch/8924941/ They may help here in theory. Any chance to try these (on top of 4.6-rc)?
I've tried these patches with v4.6-rc5, didn't fix the issue. I've also tried the new kernel parameter (pcie_port_pm) with both options, seems not making any difference.
Can you also run 'lspci -vv' as root before and after the failure. Also please attach acpidump of the machine.
Created attachment 214341 [details] lspci -vv results This file contains lspci -vv in Ubuntu 4.4.0-18 kernel and patched v4.6-rc5 kernel, both with before and after S3.
We will provide acpidump in a private channel.
Enabled some more debugging info. There was a question earlier whether a resource assignment errors seen with another device during resume might have caused issues resuming the bus where the marvel WLAN card hangs off. And this eventually caused the marvel driver (mwifiex_pcie) not to be able to access the MMIO region on the device to wake up the FW for successful wake up. Pls see the attached and it seems like the Marvel card (02:00.0) failure started during bus resume. After all bus resume completed and towards the end of the wake process is when the OS got an ACPI notification for bus check and started to assign resources for the device 04:00.0, and started printing out those resource assignment error messages. So, they don’t seem to be related. <snip> [ 157.115455] e1000e 0000:00:1f.6: bus resume [ 157.115462] button LNXPWRBN:00: driver resume [ 157.115468] system 00:00: bus resume [ 157.115474] serial 00:01: bus resume [ 157.115525] mwifiex_pcie 0000:02:00.0: bus resume [ 157.115554] igb 0000:03:00.0: bus resume [ 157.115599] mwifiex_pcie 0000:02:00.0: Reading fw_status register failed [ 157.116152] e1000e 0000:00:1f.6: System wakeup disabled by ACPI [ 157.116164] e1000e 0000:00:1f.6: PME# disabled [ 157.116168] e1000e 0000:00:1f.6: enabling bus mastering [ 157.116254] igb 0000:03:00.0: System wakeup disabled by ACPI [ 157.116269] igb 0000:03:00.0: PME# disabled [ 157.116565] pci 0000:04:00.0: bus resume <snip> [ 158.134451] PM: Adding info for mei:mei::bb875e12-cb58-4d14-ae93-8566183c66c7:01 [ 158.281941] [drm] RC6 on [ 160.121710] mwifiex_pcie 0000:02:00.0: Firmware wakeup failed [ 160.128476] rfkill rfkill1: class resume [ 160.133449] rfkill rfkill2: class resume [ 160.174007] PM: resume of devices complete after 3059.632 msecs <snip> [ 160.243624] sunrisepoint-pinctrl INT345D:00: completing power domain resume [ 160.251232] pci 0000:04:00.0: completing bus resume [ 160.256755] igb 0000:03:00.0: completing bus resume [ 160.262266] mwifiex_pcie 0000:02:00.0: completing bus resume [ 160.268581] e1000e 0000:00:1f.6: completing bus resume [ 160.274361] pci 0000:00:1f.4: completing bus resume <snip> [ 160.360199] pcieport 0000:00:01.0: completing bus resume [ 160.366060] pci 0000:00:00.0: completing bus resume [ 160.371567] PM: Finishing wakeup. [ 160.371570] PM: Moving serio:serio1 to end of list [ 160.371571] serio serio1: Retrying from deferred list [ 160.371592] bus: 'serio': driver_probe_device: matched device serio1 with driver atkbd [ 160.371592] bus: 'serio': really_probe: probing driver atkbd with device serio1 [ 160.371594] atkbd serio1: no default pinctrl state [ 160.371599] devices_kset: Moving serio1 to end of list [ 160.371769] pci 0000:04:00.0: scanning [bus 05-05] behind bridge, pass 0 [ 160.371775] pci_bus 0000:05: scanning bus [ 160.371917] pci_bus 0000:05: bus scan returning with max=05 [ 160.371925] pci 0000:04:00.0: scanning [bus 05-05] behind bridge, pass 1 <snip> 160.372026] pci 0000:04:00.0: BAR 14: no space for [mem size 0x00200000] [ 160.372026] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x00200000] [ 160.372027] pci 0000:04:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref] [ 160.372028] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref] [ 160.372028] pci 0000:04:00.0: BAR 13: no space for [io size 0x1000] [ 160.372029] pci 0000:04:00.0: BAR 13: failed to assign [io size 0x1000] [ 160.372030] pci 0000:04:00.0: BAR 14: no space for [mem size <snip>
Created attachment 214381 [details] Marvel resume log
Re: comment 6 > unload driver --> suspend--> resume --> "echo 1 > /sys/bus/pci/rescan" --> > load driver, works. Is it possible to learn from Marvell how much time must pass in order for the PCI bus scan to succeed?
We don't expect any time delay required to make pci rescan to succeed. We had hook up ICE through Jtag to make sure FW is in health condition when issue happen. And Canonical also tried to add pci re-scan into Marvell driver before driver access memory, and it been proved both FW/Driver work normally as expected.
By checking through pci bus analyzer,during pci rescan, pci controler/bridge do not issue any pci bus hand shake or re-init with our device. few testes we have done so far : 1.Captured Intel WLAN card suspend/resume on Constantine. Reviewed and compared the trace with Kevin. The config cycles on both Intel and Marvell 8897 are similar. For suspend, after host change device power state to D3 it sends PME TurnOff message. KF send PME ACK. Followed by PM EnterL23_request and host ACK and the PCIe link shuts down. For resume, it follows the standard bus numeration process. Assigning BAR address, clearing Device control status and setting Link Control, Resetting correctable and uncorrectable error status registers. . 2.Setup up Ford (ATOM based with L1 ASPM enabled). Performed bus capture. The suspend process is the same as Constantine except the host turns off the link before we get a chance to send PM EnterL23 request. Resume process is the same as Constantine. 3.We did the PCIe capture to find out what the rescan command does. The bus trace show this rescan didn’t do anything to our PCI config register (no read or write access). From the PCIe config dump of the bridge controller (1C.0) before and after rescan command, also show kernel didn’t touch the bridge controller at all. (PCI config space dump as attached) 4.Tested Marvell module on the PCIex16 slot (North bridge PCIe port) instead of the South bridge port 1C.0 or 1C.1, the suspend/resume works without any issues.
The last point indicates that this isn't really a kernel problem IMO.
The problem has been isolated down to the bridge controller’s particular behavior and a single PCI config register change to bridge controller solves the suspend/resume issues. (“setpci -s 1c.0 20.l=d010d000”) • Windows 10 setup is ready. Attached is the PCIe transactions, config space dump and Linux . o Confirmed the Linux PCI config settings with Win 10 setup. No difference in PCI config in both bridge and MRVL device b/w Linux and Win10. Config space also list Intel Gigabit Ethernet controller (bus #3) and connected bridge controller (1c.2) as an illustration of non-prefetchable memory base/limit setting. • Further testing from Laser and I revealed Bridge controller’s behavior difference before and after suspend/resume as shown in the following test log o Before suspend, bridge controller uses 0x24 as upper address limit, changes in this register can block the memory access being forwarded to its secondary bus o After resume, bridge seems to incorrectly treat the secondary memory in non-prefetchable mode and use 0x20 as base address. Only setting this register to proper offset can re-enable the memory access to Marvell device. o After issue bus rescan command there is no change in bridge controller’s PCI config and yet the secondary memory base/limit computation switched back to prefetchable mode. It seems the bridge controller hardware uses the address setting in 0x24 to decode memory request for Prefetchable memory type and 0x20 to decode non-Prefetchable memory type). For normal prefetchable memory setting 0x20 = 0x0000fff0 0x24 = 0xd011d001 For non-prefetchable memory setting 0x20 = 0xd010d000 0x24 = 0x000ffff1 After the PCI register work around, the setting can accommodate both Prefetchable and non-Prefetchable mode of bridge controller address decode and allows the memory request be forwarded to Marvell device. 0x20 = 0xd010d000 0x24 = 0xd011d001 We will need to find an explanation of the bridge controller behavior difference before and suspend/resume. It might be certain init steps were missed in Linux bus driver during the resume process in Win10. But from the PCI config dump of bridge controller before and after bus rescan didn’t show any difference (attached). It might help to setup kernel debug or DCI tools to break on access of bridge controller’s config space during bus rescan command to find out what particular access corrects the bridge controller memory decode. bdx00@bdx00-Embedded-Box-PC-5000:~$ sudo bash [sudo] password for bdx00: root@bdx00-Embedded-Box-PC-5000:~# devmem2 0xd0000c44 /dev/mem opened. Memory mapped at address 0x7f193e287000. Value at address 0xD0000C44 (0x7f193e287c44): 0xFEDCBA00 (** Before suspend, Memory read of firmware signature is OK root@bdx00-Embedded-Box-PC-5000:~# lspci -xxv -s 1c.0 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root Port #2 (rev f1) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 17 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Prefetchable memory behind bridge: 00000000d0000000-00000000d01fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Dell Sunrise Point-H PCI Express Root Port Capabilities: [a0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Access Control Services Capabilities: [220] #19 Kernel driver in use: pcieport Kernel modules: shpchp 00: 86 80 11 a1 07 01 10 00 f1 00 04 06 00 00 81 00 10: 00 00 00 00 00 00 00 00 00 02 02 00 f0 00 00 20 20: f0 ff 00 00 01 d0 11 d0 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 02 12 00 root@bdx00-Embedded-Box-PC-5000:~# setpci -s 1c.0 24.l=e900e900 (** BAD Address in register 0x24 blocks memory access ) root@bdx00-Embedded-Box-PC-5000:~# devmem2 0xd0000c44 /dev/mem opened. Memory mapped at address 0x7fc0c165a000. Value at address 0xD0000C44 (0x7fc0c165ac44): 0xFFFFFFFF (** FF means memory access failure root@bdx00-Embedded-Box-PC-5000:~# setpci -s 1c.0 24.l=d011d001 (** Memory access is OK after restoring the correct setting for Prefetchable mode) root@bdx00-Embedded-Box-PC-5000:~# devmem2 0xd0000c44 /dev/mem opened. Memory mapped at address 0x7f53e60b7000. Value at address 0xD0000C44 (0x7f53e60b7c44): 0xFEDCBA00 root@bdx00-Embedded-Box-PC-5000:~# echo mem > /sys/power/state (** SUSPEND the system and then RESUME) root@bdx00-Embedded-Box-PC-5000:~# devmem2 0xd0000c44 /dev/mem opened. Memory mapped at address 0x7fb4da495000. Value at address 0xD0000C44 (0x7fb4da495c44): 0xFFFFFFFF (** Memory access failure as usual ) root@bdx00-Embedded-Box-PC-5000:~# setpci -s 1c.0 20.l=d010d000 (** Change 0x20 to non-prefetchable memory type address setting root@bdx00-Embedded-Box-PC-5000:~# devmem2 0xd0000c44 /dev/mem opened. Memory mapped at address 0x7ff83ef09000. Value at address 0xD0000C44 (0x7ff83ef09c44): 0xFEDCBA00 (** Memory access is now working ) root@bdx00-Embedded-Box-PC-5000:~# setpci -s 1c.0 24.l=e900e900 (** Same BAD setting in 0x24 no longer has effect, memory access still can get through) root@bdx00-Embedded-Box-PC-5000:~# devmem2 0xd0000c44 /dev/mem opened. Memory mapped at address 0x7fc11100a000. Value at address 0xD0000C44 (0x7fc11100ac44): 0xFEDCBA00 root@bdx00-Embedded-Box-PC-5000:~#
Created attachment 215001 [details] Win 10 - PCI config space dump of bridge controller and Marvell WLAN device
Created attachment 215021 [details] Ubuntu 16.04 - PCI config at suspend, resume and after bus rescan The bus rescan command after resume failure (echo 1 > /sys/bus/pci/rescan) did not generate any difference in bridge controller 1c.0, but the bridge switched back prefetchable memory mode and re-enabled the memory access to Marvell device.
> It seems the bridge controller hardware uses the address setting in > 0x24 to decode memory request for Prefetchable memory type and 0x20 > to decode non-Prefetchable memory type). That's standard PCI bridge behavior. Sounds like 00:1c.0 isn't working correctly with the Marvell device after resume. This could be some Linux PCI resume issue, though I'm not aware of a problem like this, and the path should be fairly well-exercised. It could also be some interaction between the Intel bridge and the Marvell device related to bringing the link back up after resume, though I don't know why rescan would affect this. Experiment #1: 1) Instrument raw_pci_read() and raw_pci_write() to log all config accesses for 00:1c.0 and 02:00.0 (device, config offset, data read/written). 2) Confirm that Marvell device works ("devmem2 0xd0000c44") 3) Collect "lspci -vvxxx" output for 00:1c.0 and 02:00.0. 4) Collect entire dmesg log ("before-suspend). 5) Suspend. 6) Resume. 7) Confirm that Marvell device doesn't work ("devmem2 0xd0000c44"). 8) Collect "lspci -vvxxx" output for 00:1c.0 and 02:00.0. 9) Collect entire dmesg log ("after-resume") 10) Rescan ("echo 1 > /sys/bus/pci/rescan"). 11) Confirm that Marvell device works ("devmem2 0xd0000c44") 12) Collect "lspci -vvxxx" output for 00:1c.0 and 02:00.0. 13) Collect entire dmesg log ("after-rescan") Experiment #2: 1) Replace Marvell device with something else, e.g., an Intel NIC, so the same 00:1c.0 bridge leads to a different device. 2) Confirm that replacement device works. 3) Suspend, resume. 4) See whether replacement device works.
Rafael said experiment #2 has already been done, per comment #16, and the result was that with the Intel bridge leading to an Intel WLAN device, suspend/resume worked fine. With the same Intel bridge leading to a Marvell 8897, the Marvell device did not work after resume. Rescan apparently "fixes" the Marvell device after resume. Also per comment #16, a PCIe analyzer showed no activity to the Marvell device during rescan. This is what I would expect from the code, because the rescan calls pci_scan_single_device() to enumerate children of 00:1c.0, and pci_get_slot(02:00.0) finds the device we already know about, so we don't try to read any config space on bus 02. But the rescan *should* touch the 00:1c.0 bridge, and the dmesg diff between steps 9 and 13 of experiment #1 above should show us exactly what we did. Looking at pci_scan_bridge(), it looks like we will write PCI_BRIDGE_CONTROL (to disable Master Abort reporting), probably PCI_EXP_RTCAP (to turn on CRS), and maybe others. You could also use setpci to read & write these registers manually (without doing a rescan), and you might be able to figure out which one makes things start working.
The Intel team has been able to reproduce this issue on their Skylake validation platform and root caused it. For all devices with Prefetchable memory, it is required for platform BIOS to perform restoration of related Prefetchable memory settings (0x24-0x2C) upon S3 resume. The final analysis by Intel's Juswin including my question to them is listed below. In summary, Intel has updated the reference BIOS code and recommended system vendor to work with their preferred BIOS Vendor to implement this fix. No kernel change is needed as the fix will be implemented in BIOS’s S3 boot script. Case closed. "- Confirm it’s the real fix from BIOS change, not WA o This is the fix in BIOS under platform code. Our CRB SKL bios not restore the register for root port. Intel already add this in our CRB test bios, not RC and it’s proved working. Would suggest customer work with IBV for the change based on each specific platform design. - Final fix on platform SBIOS code change o On the SKL platforms with the Prefetch-memory device attached on the root port. It’ll request to save those register in the S3 boot script at the S3 entry from customer BIOS. Those register that mention: Offset length(byte) Name 0x24 2 Prefetchable memory base 0x26 2 Prefetchable memory limite 0x28 4 Prefetchable base upper 32bits 0x2c 4 Prefetchable limite upper 32bits - Q: The sequence described by Jamie involves several register writes. Does BIOS provide a higher level function call for assigning the memory resource that includes all these writes? It seems the bridge controller just need the writes to upper address register (64bit type) for the prefetchable mode address decoding to take effect. I am wondering whether kernel bus driver needs to be updated in addition to the BIOS. A: No. We don’t have the higher level function for this assigning memory resource. This issue happen in S3. We just need to add the restore implementation in S3 resume as S3 resume not doing the PCI enumeration. The bus driver design in Linux doesn’t do that even in system boot which is different from Windows driver.
This issue has struck again in bug #201069 where we are finally going to work around it in Linux. Some notes from reading the history above: 1. Comment #18 suggests that rewriting reg 0x20 (PCI_MEMORY_BASE) solves the problem (which is not true in my case). But then comment #23 focuses on the need to rewrite register range 0x24-0x2c instead and also mentions that just the "upper bits" registers are the ones that really need to be rewritten (which agrees with my observations). 2. It was observed above that the PCI config space contents are the same before suspend and after resume. But it's not clear above if the value-restoring upon resume was done by the hardware, the BIOS, or pci_restore_config_space() being called during early resume and restoring some of the values (it reads the values first and only writes the ones that have changed). 3. Comment #23 suggests that Intel published BIOS changes to save and restore these registers. Asus BIOS engineers told us that their BIOS does do this, and pointed to this bug report as reference. However testing on Asus X542UQ here, I can see that (before pci_restore_config_space() is called) the words at 0x24 and 0x26 do not have their pre-suspend values. So it seems like Intel's BIOS solution is somehow ineffective or has not been correctly propagated to vendors (we also see this on a HP laptop).