Bug 208891 - Thunderbolt hotplug fails on HP x360 13t-aw000/86FA with HP Thunderbolt 3 Dock
Summary: Thunderbolt hotplug fails on HP x360 13t-aw000/86FA with HP Thunderbolt 3 Dock
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-12 22:17 UTC by Matt Turner
Modified: 2020-11-01 16:20 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.8.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel .config (133.43 KB, text/plain)
2020-08-12 22:17 UTC, Matt Turner
Details
coldplugged-lspci (2.00 KB, text/plain)
2020-08-12 22:18 UTC, Matt Turner
Details
unplugged-lspci (1.58 KB, text/plain)
2020-08-12 22:18 UTC, Matt Turner
Details
hotplugged-lspci (2.00 KB, text/plain)
2020-08-12 22:18 UTC, Matt Turner
Details
coldplugged-iomem (4.30 KB, text/plain)
2020-08-12 22:19 UTC, Matt Turner
Details
unplugged-iomem (3.89 KB, text/plain)
2020-08-12 22:19 UTC, Matt Turner
Details
hotplugged-iomem (4.54 KB, text/plain)
2020-08-12 22:19 UTC, Matt Turner
Details
coldplugged-dmesg (83.21 KB, text/plain)
2020-08-12 22:20 UTC, Matt Turner
Details
unplugged-dmesg (85.23 KB, text/plain)
2020-08-12 22:20 UTC, Matt Turner
Details
hotplugged-dmesg (98.99 KB, text/plain)
2020-08-12 22:20 UTC, Matt Turner
Details
hotplugged-not-previously-attached-dmesg (82.74 KB, text/plain)
2020-08-12 22:34 UTC, Matt Turner
Details
hotplugged-not-previously-attached-lspci (72.04 KB, text/plain)
2020-08-12 22:34 UTC, Matt Turner
Details
Kernel .config diff enabling CONFIG_INTEL_IOMMU=y (1.93 KB, text/plain)
2020-08-13 17:32 UTC, Matt Turner
Details
dmesg hotplug with CONFIG_PCI_DEBUG=y (210.26 KB, text/plain)
2020-08-14 07:09 UTC, Matt Turner
Details
sudo lspci -vv after hotplug (72.02 KB, text/plain)
2020-08-14 07:09 UTC, Matt Turner
Details
dmesg coldplug with CONFIG_PCI_DEBUG=y (272.11 KB, text/plain)
2020-08-14 07:15 UTC, Matt Turner
Details
sudo lspci -vv after coldplug (72.09 KB, text/plain)
2020-08-14 07:15 UTC, Matt Turner
Details
Do not skip resource assignment (565 bytes, patch)
2020-08-19 14:00 UTC, Mika Westerberg
Details | Diff
hotplugged-dmesg failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #24 (174.53 KB, text/plain)
2020-08-27 19:57 UTC, Matt Turner
Details
hotplugged-dmesg success with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #24 (186.77 KB, text/plain)
2020-08-27 19:58 UTC, Matt Turner
Details
sudo lspci -vv failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #24 (72.85 KB, text/plain)
2020-08-27 19:59 UTC, Matt Turner
Details
sudo lspci -vv success with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #24 (72.93 KB, text/plain)
2020-08-27 20:00 UTC, Matt Turner
Details
Save PCI bridge state right after setup (526 bytes, patch)
2020-08-31 13:25 UTC, Mika Westerberg
Details | Diff
hotplugged-dmesg failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #33 (183.57 KB, text/plain)
2020-08-31 19:30 UTC, Matt Turner
Details
sudo lspci -vv failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #33 (72.85 KB, text/plain)
2020-08-31 19:30 UTC, Matt Turner
Details
Disable runtime PM from xHCI and PCI ports (2.16 KB, patch)
2020-09-01 07:06 UTC, Mika Westerberg
Details | Diff
hotplugged-dmesg failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #38 (371.23 KB, text/plain)
2020-09-29 19:32 UTC, Matt Turner
Details
sudo lspci -vv failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #38 (72.85 KB, text/plain)
2020-09-29 19:33 UTC, Matt Turner
Details

Description Matt Turner 2020-08-12 22:17:16 UTC
Created attachment 290843 [details]
Kernel .config

My HP Spectre x360 (Icelake) laptop doesn't successfully hotplug with an HP Thunderbolt 3 dock. I'm using 5.8.0-rc7-next-20200729. The dock's firmwares have been updated from a system running Windows. The HP laptop is using the latest BIOS as of last week, and nvm_version is "80.0":

% cat /sys/devices/pci0000:00/0000:00:0d.2/domain0/0-0/nvm_version
80.0

Cold booting the system with the dock attached provides working ethernet, USB hub, etc. Unplugging and replugging the dock does not work, leaving it only providing power.

Attached are dmesg, lspci -vvnnt output, and /proc/iomem captured (1) at coldboot with the dock attached, (2) after unplugging the dock, (3) after hotplugging the dock, and (4) after hotplugging the dock when it had not been previously attached; and my kernel .config.

For search engines, the most apparent failure in dmesg is:

xhci_hcd 0000:2e:00.0: enabling device (0000 -> 0002)
xhci_hcd 0000:2e:00.0: xHCI Host Controller
xhci_hcd 0000:2e:00.0: new USB bus registered, assigned bus number 5
xhci_hcd 0000:2e:00.0: Host halt failed, -19
xhci_hcd 0000:2e:00.0: can't setup: -19
xhci_hcd 0000:2e:00.0: USB bus 5 deregistered
xhci_hcd 0000:2e:00.0: init 0000:2e:00.0 fail, -19
tg3 0000:2f:00.0: enabling device (0000 -> 0002)
tg3 0000:2f:00.0: phy probe failed, err -19
tg3 0000:2f:00.0: Problem fetching invariants of chip, aborting
Comment 1 Matt Turner 2020-08-12 22:18:00 UTC
Created attachment 290845 [details]
coldplugged-lspci
Comment 2 Matt Turner 2020-08-12 22:18:23 UTC
Created attachment 290847 [details]
unplugged-lspci
Comment 3 Matt Turner 2020-08-12 22:18:48 UTC
Created attachment 290849 [details]
hotplugged-lspci
Comment 4 Matt Turner 2020-08-12 22:19:11 UTC
Created attachment 290851 [details]
coldplugged-iomem
Comment 5 Matt Turner 2020-08-12 22:19:31 UTC
Created attachment 290853 [details]
unplugged-iomem
Comment 6 Matt Turner 2020-08-12 22:19:55 UTC
Created attachment 290855 [details]
hotplugged-iomem
Comment 7 Matt Turner 2020-08-12 22:20:15 UTC
Created attachment 290857 [details]
coldplugged-dmesg
Comment 8 Matt Turner 2020-08-12 22:20:35 UTC
Created attachment 290859 [details]
unplugged-dmesg
Comment 9 Matt Turner 2020-08-12 22:20:51 UTC
Created attachment 290861 [details]
hotplugged-dmesg
Comment 10 Matt Turner 2020-08-12 22:34:04 UTC
Created attachment 290863 [details]
hotplugged-not-previously-attached-dmesg
Comment 11 Matt Turner 2020-08-12 22:34:24 UTC
Created attachment 290865 [details]
hotplugged-not-previously-attached-lspci
Comment 12 Matt Turner 2020-08-12 22:39:18 UTC
With some help from Ben Widawsky, we noticed that PCI device 2d:04.0 (Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578]) doesn't get IO space allocated correctly on hotplug:

mattst88@hp-x360 ~ % head working 
2d:04.0 PCI bridge [0604]: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] [8086:1578] (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 144
	Bus: primary=2d, secondary=32, subordinate=32, sec-latency=0
	I/O behind bridge: 00005000-00005fff [size=4K]
	Memory behind bridge: [disabled]
	Prefetchable memory behind bridge: [disabled]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
mattst88@hp-x360 ~ % head broken    
2d:04.0 PCI bridge: Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015] (prog-if 00 [Normal decode])
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 169
	Bus: primary=2d, secondary=32, subordinate=56, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: 68400000-741fffff [size=190M]
	Prefetchable memory behind bridge: 0000006000400000-000000601bffffff [size=444M]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-


Additionally, when I do 'echo 1 > /sys/bus/pci/rescan' I see this in dmesg:

[Aug12 15:37] pcieport 0000:2d:04.0: bridge window [io  0x1000-0x0fff] to [bus 32-56] add_size 1000
[  +0.000006] pcieport 0000:2d:04.0: BAR 7: no space for [io  size 0x1000]
[  +0.000001] pcieport 0000:2d:04.0: BAR 7: failed to assign [io  size 0x1000]
[  +0.000001] pcieport 0000:2d:04.0: BAR 7: no space for [io  size 0x1000]
[  +0.000000] pcieport 0000:2d:04.0: BAR 7: failed to assign [io  size 0x1000]
[  +0.079302] pci_bus 0000:2e: Allocating resources
[  +0.000017] pci_bus 0000:2f: Allocating resources
[  +0.000016] pci_bus 0000:30: Allocating resources
[  +0.000009] pci_bus 0000:31: Allocating resources
[  +0.000085] pci_bus 0000:2e: Allocating resources
[  +0.000015] pci_bus 0000:2f: Allocating resources
[  +0.000015] pci_bus 0000:30: Allocating resources
[  +0.000010] pci_bus 0000:31: Allocating resources

which seems to corroborate that point.

And 2d:04 appears to be a critical device in the tree, according to lspci -t:

           +-07.1-[2c-56]----00.0-[2d-56]--+-00.0-[2e]----00.0  ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller [1b21:1142]
           |                               +-01.0-[2f]----00.0  Broadcom Inc. and subsidiaries NetXtreme BCM57762 Gigabit Ethernet PCIe [14e4:1682]
           |                               +-02.0-[30]--
           |                               +-03.0-[31]--
           |                               \-04.0-[32-56]--
Comment 13 Mika Westerberg 2020-08-13 08:18:53 UTC
IO space is not necessary with PCIe devices so that should be fine and expected.

However, MMIO resources below the dock PCIe switch upstream port look weird and I can't see in the logs any failures.

One thing I can suggest to try is to enable IOMMU since ICL kind of expects it to be enabled so in theory if the BIOS leaves the IOMMU configured or so it could manifest like this. Can you set CONFIG_INTEL_IOMMU=y in your .config and try again?
Comment 14 Matt Turner 2020-08-13 17:32:31 UTC
Created attachment 290885 [details]
Kernel .config diff enabling CONFIG_INTEL_IOMMU=y

I enabled CONFIG_INTEL_IOMMU=y and tried again, but with the same results. :(
Comment 15 Mika Westerberg 2020-08-14 06:54:41 UTC
Can you attach full dmesg and output of 'sudo lspci -vv' here with the IOMMU enabled? Please do the same steps that you only connect the dock after you have booted up and then take the dmesg and lspci.
Comment 16 Mika Westerberg 2020-08-14 06:56:08 UTC
While there can you also enable CONFIG_PCI_DEBUG=y before you take the dmesg so we can hopefully see some additional messages.
Comment 17 Matt Turner 2020-08-14 07:09:01 UTC
Created attachment 290899 [details]
dmesg hotplug with CONFIG_PCI_DEBUG=y
Comment 18 Matt Turner 2020-08-14 07:09:29 UTC
Created attachment 290901 [details]
sudo lspci -vv after hotplug
Comment 19 Matt Turner 2020-08-14 07:15:01 UTC
Created attachment 290903 [details]
dmesg coldplug with CONFIG_PCI_DEBUG=y
Comment 20 Matt Turner 2020-08-14 07:15:36 UTC
Created attachment 290905 [details]
sudo lspci -vv after coldplug
Comment 21 Mika Westerberg 2020-08-14 12:51:17 UTC
Thanks for the logs. For some reason the two downstream PCIe ports (2d:00.0 and 2d:01.0) that lead to the xHCI and the NIC get their bridge windows reset to 0 and this prevents drivers from accessing their MMIO registers. I also see that you are not running the mainline kernel so can you take v5.8 vanilla kernel and try that and add "pcie_port_pm=off" to the kernel command line to disable runtime PM of those ports.
Comment 22 Matt Turner 2020-08-14 21:36:33 UTC
(In reply to Mika Westerberg from comment #21)
> Thanks for the logs. For some reason the two downstream PCIe ports (2d:00.0
> and 2d:01.0) that lead to the xHCI and the NIC get their bridge windows
> reset to 0 and this prevents drivers from accessing their MMIO registers. I
> also see that you are not running the mainline kernel so can you take v5.8
> vanilla kernel and try that and add "pcie_port_pm=off" to the kernel command
> line to disable runtime PM of those ports.

Tried with v5.8.1. Was previously using 5.8.0-rc7-next-20200729 because I expected to be asked to test linux-next.

Anyway, pcie_port_pm=off didn't help. Neither did pcie_ports=native.

I also tried adding

+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,  0x1578, quirk_no_bus_reset);

to drivers/pci/quirks.c on a whim because of something I saw in a google search, but of course that didn't help either.

Is the fact that it works if I attach the dock and then boot the system not indicative of something? Is the BIOS/EFI setup tasked with programming some stuff that the thunderbolt driver might be failing to do so?

I just noticed something odd. Coldplugged with the dock working, I can suspend and resume and it will continue working. But if I unplug and replug the dock while the system is suspended, it fails to work after resume.

Doesn't that indicate that the thunderbolt firmware is doing something wrong?
Comment 23 Mika Westerberg 2020-08-17 10:43:44 UTC
When you boot the system with device connected it is the BIOS that configures the PCIe devices. When you hot-plug the device to the running system it is the kernel PCI stack that does the configuration (no Thunderbolt driver is even involved here, it is just plain PCIe).

The Linux PCI stack should be able to do this but for some reason on your particular system it does not work as expected - it succeeds to configure everything just fine but immediately after the two downstream PCIe ports lose what is configured to their bridge window registers so I kind of suspected that the runtime PM kicks in here but apparently that is not the case.
Comment 24 Mika Westerberg 2020-08-19 14:00:48 UTC
Created attachment 292029 [details]
Do not skip resource assignment

Probably does not help but just in case, can you try the attached patch and see if it makes any difference? There is one device without PCI class in that system and it should not affect resource allocation of devices behind TBT but better to check.
Comment 25 Matt Turner 2020-08-23 19:02:50 UTC
Nope, no suck luck :(
Comment 26 Mika Westerberg 2020-08-24 11:33:43 UTC
OK, can you then add "initcall_debug" to the command line and try again (with CONFIG_PCI_DEBUG=y as well). Then attach full dmesg.
Comment 27 Matt Turner 2020-08-27 19:57:33 UTC
Created attachment 292189 [details]
hotplugged-dmesg failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #24
Comment 28 Matt Turner 2020-08-27 19:58:11 UTC
Created attachment 292191 [details]
hotplugged-dmesg success with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #24
Comment 29 Matt Turner 2020-08-27 19:59:36 UTC
Created attachment 292193 [details]
sudo lspci -vv failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #24
Comment 30 Matt Turner 2020-08-27 20:00:05 UTC
Created attachment 292195 [details]
sudo lspci -vv success with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #24
Comment 31 Matt Turner 2020-08-27 20:06:40 UTC
Unexpectedly, hotplug worked a few times.

I've attached dmesg and sudo lspci -vv output from two hotplug attempts with v5.8.3, CONFIG_PCI_DEBUG=y, initcall_debug, and the patch from comment #24 applied -- one that succeeded and one that failed.

Again we see the same pattern in lspci -vv output:

--- lspci-patched-failure 2020-08-27 12:54:22.300504263 -0700
+++ lspci-patched-success 2020-08-27 12:47:04.525430133 -0700
@@ -654,9 +654,9 @@
 	Latency: 0
 	Interrupt: pin A routed to IRQ 167
 	Bus: primary=2d, secondary=2e, subordinate=2e, sec-latency=0
-	I/O behind bridge: 00000000-00000fff [size=4K]
-	Memory behind bridge: 00000000-000fffff [size=1M]
-	Prefetchable memory behind bridge: 0000000000000000-00000000000fffff [size=1M]
+	I/O behind bridge: 00005000-00005fff [size=4K]
+	Memory behind bridge: 68000000-680fffff [size=1M]
+	Prefetchable memory behind bridge: 0000006000000000-00000060000fffff [size=1M]
 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Comment 32 Mika Westerberg 2020-08-31 13:24:26 UTC
Thanks for the logs. I think I know what is going on. From the failure log:

Registers get saved:
[   80.292442] pci 0000:2d:00.0: saving config space at offset 0x1c (reading 0x101)
[   80.292446] pci 0000:2d:00.0: saving config space at offset 0x20 (reading 0x0)
[   80.292450] pci 0000:2d:00.0: saving config space at offset 0x24 (reading 0x10001)
[   80.292454] pci 0000:2d:00.0: saving config space at offset 0x28 (reading 0x0)
[   80.292458] pci 0000:2d:00.0: saving config space at offset 0x2c (reading 0x0)

Resources are assigned:
[   80.293725] pci 0000:2d:00.0: BAR 8: assigned [mem 0x68000000-0x680fffff]
[   80.293727] pci 0000:2d:00.0: BAR 9: assigned [mem 0x6000000000-0x60000fffff 64bit pref]
[   80.293752] pci 0000:2d:00.0: BAR 7: assigned [io  0x5000-0x5fff]
...
[   80.293803] pci 0000:2d:00.0: PCI bridge to [bus 2e]
[   80.293807] pci 0000:2d:00.0:   bridge window [io  0x5000-0x5fff]
[   80.293816] pci 0000:2d:00.0:   bridge window [mem 0x68000000-0x680fffff]
[   80.293823] pci 0000:2d:00.0:   bridge window [mem 0x6000000000-0x60000fffff 64bit pref]

Note that there is no save happening here. Then shortly after there is register restore:

[   80.294748] pcieport 0000:2d:00.0: runtime IRQ mapping not provided by arch
[   80.294830] pcieport 0000:2d:00.0: restoring config space at offset 0x2c (was 0x60, writing 0x0)
[   80.294835] pcieport 0000:2d:00.0: restoring config space at offset 0x28 (was 0x60, writing 0x0)
[   80.294839] pcieport 0000:2d:00.0: restoring config space at offset 0x24 (was 0x10001, writing 0x10001)
[   80.294844] pcieport 0000:2d:00.0: restoring config space at offset 0x20 (was 0x68006800, writing 0x0)
                                                                                 ^^^^^^^^^^          ^^^
[   80.294848] pcieport 0000:2d:00.0: restoring config space at offset 0x1c (was 0x5151, writing 0x101)

This ends up clearing the bridge window registers of 2d:00.0 downstream port. I guess this does not happen always because it is dependent on timing.
Comment 33 Mika Westerberg 2020-08-31 13:25:57 UTC
Created attachment 292259 [details]
Save PCI bridge state right after setup

Can you try the attached hack patch and see if it makes the issue go away? At least then we know that the theory is correct.
Comment 34 Matt Turner 2020-08-31 19:30:05 UTC
Created attachment 292269 [details]
hotplugged-dmesg failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #33
Comment 35 Matt Turner 2020-08-31 19:30:39 UTC
Created attachment 292271 [details]
sudo lspci -vv failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #33
Comment 36 Matt Turner 2020-08-31 19:37:29 UTC
Dang, doesn't work, but it the lspci output looks like we're getting the right memory addresses (diffing against the attached lspci -vv success output).

--- lspci	2020-08-31 12:16:11.919502424 -0700
+++ lspci-patched-success	2020-08-27 12:47:04.525430133 -0700
@@ -1025,13 +1025,14 @@
 
 2e:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller [1b21:1142] (prog-if 30 [XHCI])
 	Subsystem: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller [1b21:1142]
-	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
+	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
+	Latency: 0, Cache Line Size: 64 bytes
 	Interrupt: pin A routed to IRQ 16
-	Region 0: Memory at 68000000 (64-bit, non-prefetchable) [virtual] [size=32K]
+	Region 0: Memory at 68000000 (64-bit, non-prefetchable) [size=32K]
 	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
 		Address: 0000000000000000  Data: 0000
-	Capabilities: [68] MSI-X: Enable- Count=8 Masked-
+	Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
 		Vector table: BAR=0 offset=00002000
 		PBA: BAR=0 offset=00002080
 	Capabilities: [78] Power Management version 3
@@ -1071,14 +1072,16 @@
 			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
 			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
 			Status:	NegoPending- InProgress-
+	Kernel driver in use: xhci_hcd
 
 2f:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM57762 Gigabit Ethernet PCIe [14e4:1682] (rev 01)
 	Subsystem: Broadcom Inc. and subsidiaries NetXtreme BCM57762 Gigabit Ethernet PCIe [14e4:1682]
-	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
+	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
+	Latency: 0
 	Interrupt: pin A routed to IRQ 17
-	Region 0: Memory at 6000100000 (64-bit, prefetchable) [virtual] [size=64K]
-	Region 2: Memory at 6000110000 (64-bit, prefetchable) [virtual] [size=64K]
+	Region 0: Memory at 6000100000 (64-bit, prefetchable) [size=64K]
+	Region 2: Memory at 6000110000 (64-bit, prefetchable) [size=64K]
 	Capabilities: [48] Power Management version 3
 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-


And the diff against the lspci -vv hotplug failure attachment:

--- lspci	2020-08-31 12:16:11.919502424 -0700
+++ lspci-patched-failure	2020-08-27 12:54:22.300504263 -0700
@@ -654,9 +654,9 @@
 	Latency: 0
 	Interrupt: pin A routed to IRQ 167
 	Bus: primary=2d, secondary=2e, subordinate=2e, sec-latency=0
-	I/O behind bridge: 00005000-00005fff [size=4K]
-	Memory behind bridge: 68000000-680fffff [size=1M]
-	Prefetchable memory behind bridge: 0000006000000000-00000060000fffff [size=1M]
+	I/O behind bridge: 00000000-00000fff [size=4K]
+	Memory behind bridge: 00000000-000fffff [size=1M]
+	Prefetchable memory behind bridge: 0000000000000000-00000000000fffff [size=1M]
 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
 	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-


So the memory addresses look right, but we're missing "[virtual]". Hopefully that indicates only a small remaining problem :)
Comment 37 Matt Turner 2020-08-31 19:40:31 UTC
(In reply to Matt Turner from comment #36)
> So the memory addresses look right, but we're missing "[virtual]". Hopefully
> that indicates only a small remaining problem :)

Sorry, I got this backwards. In the lspci output after hotplug success we *don't* have "[virtual]" and with the patch in #33 applied (and hotplug failure) we *do* have "[virtual]".
Comment 38 Mika Westerberg 2020-09-01 07:06:22 UTC
Created attachment 292273 [details]
Disable runtime PM from xHCI and PCI ports

Can you try this patch instead? It should disable all runtime PM for the affected drivers.

Please remove any previous patch and apply this directly on top of the mainline or stable.
Comment 39 Matt Turner 2020-09-04 20:28:04 UTC
No luck, didn't work. The lspci output again has

> I/O behind bridge: 00000000-00000fff [size=4K]
> Memory behind bridge: 00000000-000fffff [size=1M]
> Prefetchable memory behind bridge: 0000000000000000-00000000000fffff
> [size=1M]

and dmesg didn't look appreciably different. Should I bother posting them?
Comment 40 Mika Westerberg 2020-09-07 08:37:14 UTC
Yes please.
Comment 41 Matt Turner 2020-09-29 19:32:54 UTC
Created attachment 292713 [details]
hotplugged-dmesg failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #38
Comment 42 Matt Turner 2020-09-29 19:33:58 UTC
Created attachment 292715 [details]
sudo lspci -vv failure with CONFIG_PCI_DEBUG=y, initcall_debug, and patch from comment #38

Sorry for the delay. Please find attached the dmesg and lspci output you requested.
Comment 43 Matt Turner 2020-10-12 18:30:58 UTC
Do those logs show anything interesting?

I just updated to v5.9 and it looks like the same behavior to me. :(

This laptop is my development machine at Intel, and I'm leaving Intel in a few weeks. I'd love to see this fixed before I return the laptop. Perhaps if that can't be accomplished we can ship you the laptop when I leave.
Comment 44 Mika Westerberg 2020-10-20 07:33:22 UTC
Sorry for the delay from my side. I was on vacation last week.

From the logs I can see that the ports runtime suspend and resume so with the patch and "pcie_port_pm=off" in the kernel command line should in theory work the problem around.

It would help if you can ship the device to me to our Finland office and if the dock is also Intel I suggest to ship that too so I can replicate the issue.
Comment 45 Matt Turner 2020-11-01 16:20:27 UTC
> It would help if you can ship the device to me to our Finland office and if
> the dock is also Intel I suggest to ship that too so I can replicate the
> issue.

Unfortunately my manager rejected that as an option, and I'm no longer in possession of the laptop.

Note You need to log in before you can comment on or make changes to this bug.