Bug 216000 - TBT storage hotplug fail when connect via thunderbolt dock
Summary: TBT storage hotplug fail when connect via thunderbolt dock
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-19 14:24 UTC by Chris Chiu
Modified: 2024-02-07 00:44 UTC (History)
6 users (show)

See Also:
Kernel Version: 5.17 and later
Subsystem:
Regression: No
Bisected commit-id:


Attachments
output of lspci -vvv (117.29 KB, text/plain)
2022-05-19 14:24 UTC, Chris Chiu
Details
dmesg output when power-on with dock/tbt3 storage connected (110.59 KB, text/plain)
2022-05-19 14:26 UTC, Chris Chiu
Details
acpidump output (3.09 MB, text/plain)
2022-05-23 07:10 UTC, Chris Chiu
Details
Patch to extend PCI resources on initial scan (2.56 KB, patch)
2022-05-23 11:53 UTC, Mika Westerberg
Details | Diff
dmesg output of patched kernel (108.63 KB, text/plain)
2022-05-24 02:53 UTC, Chris Chiu
Details
verbose dmesg output of patched pci (217.21 KB, text/plain)
2022-05-26 01:28 UTC, Chris Chiu
Details
Patch to extend PCI resources on initial scan with added debug (5.02 KB, patch)
2022-05-27 10:51 UTC, Mika Westerberg
Details | Diff
dmesg output of patched kernel - 2 (216.43 KB, text/plain)
2022-05-30 12:32 UTC, Chris Chiu
Details
Patch to extend PCI resources on initial scan v2 (2.08 KB, patch)
2022-06-03 13:49 UTC, Mika Westerberg
Details | Diff
dmesg output of kernel with v2 scan (213.93 KB, text/plain)
2022-06-07 12:47 UTC, Chris Chiu
Details
v2 patched kernel lspci output when power-on with dock connected (27.80 KB, text/plain)
2022-06-08 12:33 UTC, Chris Chiu
Details
v2 patched kernel lspci output (connect dock after boot) (27.81 KB, text/plain)
2022-06-08 12:34 UTC, Chris Chiu
Details
Patch to extend PCI resources on initial scan v3 (both bus numbers and memory/IO resources) (11.26 KB, patch)
2022-07-01 10:34 UTC, Mika Westerberg
Details | Diff
v3 dmesg output when power-on with dock connected (27.83 KB, text/plain)
2022-07-04 08:41 UTC, Chris Chiu
Details
PCI Reallocation fix for thunderbolt usecase (2.62 KB, patch)
2023-06-20 04:56 UTC, Sanath S
Details | Diff
dmesg output with hotpluging an nvme thunderbolt enclosure. (143.34 KB, text/plain)
2023-08-07 22:05 UTC, Shmerl
Details
dmesg output with pcie=realloc and etc. (150.01 KB, text/plain)
2023-08-07 22:07 UTC, Shmerl
Details
lspci -vv (206.82 KB, text/plain)
2023-08-07 22:08 UTC, Shmerl
Details
Probable fix - PCI bridges (2.78 KB, patch)
2023-08-09 06:23 UTC, Sanath S
Details | Diff
dmesg logs and pci topology (36.34 KB, application/zstd)
2024-02-06 01:15 UTC, Shmerl
Details

Description Chris Chiu 2022-05-19 14:24:55 UTC
Created attachment 300996 [details]
output of lspci -vvv

When I power on the adl-hx laptop with the thuderbolt dock(Dell WD22TB4) connected, the TBT storage can never be detected if I connect it via the dock. The kernel message shows "No bus number available for hot-added bridge" as follows.

[  102.073815] pcieport 0000:3a:01.0: pciehp: Slot(1-1): Card present
[  102.073825] pcieport 0000:3a:01.0: pciehp: Slot(1-1): Link Up
[  102.210491] pci 0000:3c:00.0: [8086:15da] type 01 class 0x060400
[  102.210702] pci 0000:3c:00.0: enabling Extended Tags
[  102.211176] pci 0000:3c:00.0: supports D1 D2
[  102.211179] pci 0000:3c:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[  102.211510] pci 0000:3c:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:03:03.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[  102.211732] pci 0000:3c:00.0: Adding to iommu group 30
[  102.212093] pcieport 0000:3a:01.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[  102.212172] pci 0000:3c:00.0: No bus number available for hot-added bridge

The problem will be gone if I boot with the kernel parameter "pci=realloc,assign-busses,hpbussize=0x33" but we expect the `pciehp` should handle it w/o problem. Which part of system should reserve the PCIe bus number the hotplug device?
Comment 1 Chris Chiu 2022-05-19 14:26:56 UTC
Created attachment 300997 [details]
dmesg output when power-on with dock/tbt3 storage connected
Comment 2 Chris Chiu 2022-05-19 14:44:46 UTC
Forgot to mention, the tbt3 storage will be detected if I unplug the dock and replug it.
Comment 3 Mika Westerberg 2022-05-19 16:15:25 UTC
There are these ACPI related errors:

[    0.212467] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.RP13.PXSX._DSD], AE_ALREADY_EXISTS (20211217/dswload2-326)
[    0.212473] fbcon: Taking over console
[    0.212478] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20211217/psobject-220)
[    0.212488] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.RP21.PXSX._DSD], AE_ALREADY_EXISTS (20211217/dswload2-326)
[    0.212491] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20211217/psobject-220)
[    0.212500] ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.RP25.PXSX._DSD], AE_ALREADY_EXISTS (20211217/dswload2-326)
[    0.212503] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20211217/psobject-220)

and then the BIOS has not allocated the resources for all the Maple Ridge hotplug ports:

    0.391195] pci 0000:39:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.391369] pci 0000:3a:00.0: [8086:0b26] type 01 class 0x060400
[    0.391475] pci 0000:3a:00.0: enabling Extended Tags
[    0.391716] pci 0000:3a:00.0: supports D1 D2
[    0.391716] pci 0000:3a:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.392007] pci 0000:3a:01.0: [8086:0b26] type 01 class 0x060400
[    0.392112] pci 0000:3a:01.0: enabling Extended Tags
[    0.392355] pci 0000:3a:01.0: supports D1 D2
[    0.392355] pci 0000:3a:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.392628] pci 0000:3a:02.0: [8086:0b26] type 01 class 0x060400
[    0.392733] pci 0000:3a:02.0: enabling Extended Tags
[    0.392976] pci 0000:3a:02.0: supports D1 D2
[    0.392976] pci 0000:3a:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.393260] pci 0000:3a:03.0: [8086:0b26] type 01 class 0x060400
[    0.393365] pci 0000:3a:03.0: enabling Extended Tags
[    0.393608] pci 0000:3a:03.0: supports D1 D2
[    0.393608] pci 0000:3a:03.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.393880] pci 0000:3a:04.0: [8086:0b26] type 01 class 0x060400
[    0.393985] pci 0000:3a:04.0: enabling Extended Tags
[    0.394225] pci 0000:3a:04.0: supports D1 D2
[    0.394226] pci 0000:3a:04.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.394515] pci 0000:39:00.0: PCI bridge to [bus 3a-6b]
[    0.394527] pci 0000:39:00.0:   bridge window [io  0x0000-0x0fff]
[    0.394532] pci 0000:39:00.0:   bridge window [mem 0x00000000-0x000fffff]
[    0.394544] pci 0000:39:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[    0.394547] pci 0000:3a:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.394561] pci 0000:3a:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.394575] pci 0000:3a:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.394589] pci 0000:3a:03.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.394603] pci 0000:3a:04.0: bridge configuration invalid ([bus 00-00]), reconfiguring

The OS expects these to be properly assigned by the BIOS so I would start by contacting the vendor and ask for an BIOS update.
Comment 4 Lukas Wunner 2022-05-19 19:48:44 UTC
(In reply to Mika Westerberg from comment #3)
> and then the BIOS has not allocated the resources for all the Maple Ridge
> hotplug ports:
> 
>     0.391195] pci 0000:39:00.0: bridge configuration invalid ([bus 00-00]),
> reconfiguring

Hm, there are no errors for the built-in Maple Ridge (device IDs 8086:1136, 1137, 1138), but there *are* errors for the Goshen Ridge and Alpine Ridge below that.

I suppose the Maple Ridge is built-in and the two other ones are external. Perhaps the vendor only tested this without any external Thunderbolt devices attached?
Comment 5 Max Lee 2022-05-20 06:59:59 UTC
(In reply to Lukas Wunner from comment #4)
> (In reply to Mika Westerberg from comment #3)
> > and then the BIOS has not allocated the resources for all the Maple Ridge
> > hotplug ports:
> > 
> >     0.391195] pci 0000:39:00.0: bridge configuration invalid ([bus 00-00]),
> > reconfiguring
> 
> Hm, there are no errors for the built-in Maple Ridge (device IDs 8086:1136,
> 1137, 1138), but there *are* errors for the Goshen Ridge and Alpine Ridge
> below that.
> 
> I suppose the Maple Ridge is built-in and the two other ones are external.
> Perhaps the vendor only tested this without any external Thunderbolt devices
> attached?

Yes, the host connected with two TBT daisy chain device 
Host(ADL+dTBT MR design)
TBT Device: 1st layer : Dell WD22TB4(GR dock)
            2nd layer : Dell Portable SSD (AR storage)
Comment 6 Mika Westerberg 2022-05-20 08:55:04 UTC
Right, the BIOS is supposed to configure the devices when they are connected on boot. It might be that the ACPI errors for the PCIe root ports contribute here (at least the one leading to the Maple Ridge).

I suppose it works if you boot with no devices connected and plug them in when the OS has started?
Comment 7 Mika Westerberg 2022-05-20 10:02:09 UTC
Also can you attach ACPI dump from that system?
Comment 8 Chris Chiu 2022-05-23 07:09:51 UTC
Yes. As you said, it works if I plug them after I power on the laptop.
Please refer to the attached acpidump.
Comment 9 Chris Chiu 2022-05-23 07:10:29 UTC
Created attachment 301014 [details]
acpidump output
Comment 10 Mika Westerberg 2022-05-23 09:16:37 UTC
Thanks! The ACPI error comes from this in (ssdt1.dat):

    Scope (\_SB.PC00.RP21)
    {
        Name (_S0W, Zero)  // _S0W: S0 Device Wake State
        Scope (\_SB.PC00.RP21.PXSX)
        {
            Name (_S0W, 0x03)  // _S0W: S0 Device Wake State
            Name (_DSD, Package (0x02)  // _DSD: Device-Specific Data
            {
                ToUUID ("5025030f-842f-4ab4-a561-99a5189762d0") /* Unknown UUID */,
                Package (0x01)
                {
                    Package (0x02)
                    {
                        "StorageD3Enable",
                        One
                    }
                }
            })
        }
    }

The _DSD for RP21 (and the others) is defined in dsdt.dat. Definitely something that should be fixed in the BIOS but I don't think it has anything to do with the issue you see.
Comment 11 Mika Westerberg 2022-05-23 11:53:11 UTC
Created attachment 301016 [details]
Patch to extend PCI resources on initial scan

Even though the BIOS should really configure the devices, we can try to improve the resource allocation and use the same functions than we do on hotplug in the initial scan. Can you try the attached patch and see if it makes any difference in the case when you boot up with devices connected?
Comment 12 Chris Chiu 2022-05-24 02:53:32 UTC
Created attachment 301022 [details]
dmesg output of patched kernel
Comment 13 Chris Chiu 2022-05-24 02:53:50 UTC
The same "No bus number available for hot-added bridge" is still there with the patched kernel. Even the there's "scanning root, available 225" for scan child bus in advance. Please refer to the attached dmesg. Thanks
Comment 14 Mika Westerberg 2022-05-25 15:02:16 UTC
Thanks for the logs. Can you add CONFIG_PCI_DEBUG=y in .config so we can see more verbose logging wrt resource allocation?
Comment 15 Chris Chiu 2022-05-26 01:28:33 UTC
Created attachment 301049 [details]
verbose dmesg output of patched pci

Please refer to the attached log with PCI DEBUG on. Thanks.
Comment 16 Mika Westerberg 2022-05-27 10:51:08 UTC
Created attachment 301057 [details]
Patch to extend PCI resources on initial scan with added debug

Can you try this patch instead of the previous (and keep CONFIG_PCI_DEBYG=y)? This does not fix the issue but it adds even more debugging so we hopefully see what is going on in the bus allocation.
Comment 17 Chris Chiu 2022-05-30 12:32:13 UTC
Created attachment 301074 [details]
dmesg output of patched kernel - 2

Please refer to the attached log. Thanks
Comment 18 Chris Chiu 2022-06-02 06:38:48 UTC
Gentle ping. @Mika, any update for this?
Comment 19 Mika Westerberg 2022-06-02 09:03:51 UTC
Hi, sorry I've been busy with other things so haven't had chance to look at this. I try to check this tomorrow. In the meantime can you report this to the vendor so they can check and hopefully fix the BIOS side?
Comment 20 Mika Westerberg 2022-06-03 13:49:28 UTC
Created attachment 301097 [details]
Patch to extend PCI resources on initial scan v2

Hi, Can you try the attached patch?
Comment 21 Chris Chiu 2022-06-07 12:47:50 UTC
Created attachment 301118 [details]
dmesg output of kernel with v2 scan

Thanks for the patch. It works on my system w/o "No bus number available for hot-added bridge". Please refer to the attached log file (w/ PCI_DEBUG on) for the v2 patch.
Comment 22 Mika Westerberg 2022-06-08 04:37:25 UTC
Thanks for testing. Can you attach 'lspci -vv' output from this (when you boot) and then another when you plug in the device chain after boot? Just want to check that they don't have any differences.
Comment 23 Chris Chiu 2022-06-08 12:33:29 UTC
Created attachment 301125 [details]
v2 patched kernel lspci output when power-on with dock connected

Please refer to the 2 lspci logs attached.
Comment 24 Chris Chiu 2022-06-08 12:34:34 UTC
Created attachment 301126 [details]
v2 patched kernel lspci output (connect dock after boot)
Comment 25 Mika Westerberg 2022-06-09 05:41:05 UTC
Okay thanks! It looks like the bus numbers are now fine but the memory resources are not, so if you replace the TBT storage with something else, like another TBT dock it will not fit there. I'll see if I can make a v3 of the patch that takes this too into consideration.
Comment 26 Chris Chiu 2022-06-14 03:33:30 UTC
@Mika, when will you have the v3 patch that I can help verify. Thanks
Comment 27 Mika Westerberg 2022-06-14 06:13:00 UTC
Hi, unfortunately I'm again busy with other things. I will try to look at this hopefully still this week.
Comment 28 Chris Chiu 2022-06-21 02:31:52 UTC
Hi Mika,
    ODM has a new BIOS which fixes the ACPI error you mentioned in #3. Unfortunately the "No bus number available for hot-added bridge" error is still there. Could you help when time permits? Thanks
Comment 29 Mika Westerberg 2022-07-01 10:34:22 UTC
Created attachment 301319 [details]
Patch to extend PCI resources on initial scan v3 (both bus numbers and memory/IO resources)

Hi, sorry for the delay. Can you try the attached patch and see if the mem/io resources get extended too? You can compare these with the case where you plug in the device after boot (please attach the lspci -vv outputs too). The idea is that the hotplug downstream port resources need to be extended so that when you add yet another device in the chain it too gets enough resources.

I will be on vacation next 4 weeks so I can continue the investigation only after that.
Comment 30 Chris Chiu 2022-07-04 08:41:05 UTC
Created attachment 301328 [details]
v3 dmesg output when power-on with dock connected

Hi, Mika, Thanks to your time and effort on this. The attached patch is working good. The `lspci -vv` output for booting with thunderbolt dock connected is now identical to the one for plugging dock after boot. Please refer to the attached lspci output file. Is it possible to get it upstream for this version? It seems the same problem also happens on other platforms, although they're all due to BIOS regression. I'll keep pushing for a BIOS fix. This patch would help a lot to prevent from some BIOS problem.
Comment 31 Mika Westerberg 2022-08-01 14:47:49 UTC
Thanks for testing! I will clean this patch up and send it upstream once v5.20-rc1 (or v6.0-rc1) is released.
Comment 32 Mika Westerberg 2022-08-16 10:53:17 UTC
Sent out the series now:

https://lore.kernel.org/linux-pci/20220816100740.68667-1-mika.westerberg@linux.intel.com/
Comment 33 Sanath S 2023-06-20 04:56:25 UTC
Created attachment 304456 [details]
PCI Reallocation fix for thunderbolt usecase

Hi Mika,

A few days back, I hit the same issue on AMD Thunderbolt platforms on the latest mainline.

Is this series( https://lore.kernel.org/linux-pci/20220816100740.68667-1-mika.westerberg@linux.intel.com/) specific to Intel Platforms?

I came up with this patch(attached) that worked.
Please let me know your comments.
Comment 34 Mika Westerberg 2023-06-20 06:49:23 UTC
Hi, It is not Intel specific. Note you should not use ->is_thunderbolt as this is not really Thunderbolt issue and should not be limited to that. It can happen in any system where BIOS does not allocate enough resources and we should have generic solution. My series tries to be generic but I see something is still missing.

I suggest starting a discussion on linux-pci ML to figure out what would be the correct way to deal with this.
Comment 35 Shmerl 2023-08-06 21:39:47 UTC
Was this patch merged or not yet? I hit something similar on X670E Taichi (latest BIOS 1.28), when trying to hot plug a Thunderbolt enclosure for NVMe drive (after system is already running).

I get such error in the log:

```
[21732.747582] pcieport 0000:02:03.0: pciehp: Slot(3): Card present
[21732.747586] pcieport 0000:02:03.0: pciehp: Slot(3): Link Up
[21732.891556] pci 0000:45:00.0: [8086:15ef] type 01 class 0x060400
[21732.891637] pci 0000:45:00.0: enabling Extended Tags
[21732.891841] pci 0000:45:00.0: supports D1 D2
[21732.891842] pci 0000:45:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[21732.892077] pci 0000:45:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:02:03.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[21732.892262] pci 0000:45:00.0: Adding to iommu group 0
[21732.907487] pci 0000:45:00.0: No bus number available for hot-added bridge
[21732.907491] pcieport 0000:02:03.0: PCI bridge to [bus 45]
[21732.907494] pcieport 0000:02:03.0:   bridge window [io  0xd000-0xdfff]
[21732.907498] pcieport 0000:02:03.0:   bridge window [mem 0xc0300000-0xc04fffff]
[21732.907501] pcieport 0000:02:03.0:   bridge window [mem 0xf820300000-0xf8204fffff 64bit pref]
[21733.844394] thunderbolt 0-3: new device found, vendor=0x25 device=0x1
[21733.844397] thunderbolt 0-3: ACASIS TBU401E
```
And in result, the drive is not readable.
Comment 36 Shmerl 2023-08-06 21:45:54 UTC
Relevant part of lspci -tv


[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 14d8
           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 14d9
           +-01.0  Advanced Micro Devices, Inc. [AMD] Device 14da
           +-01.2-[01-45]----00.0-[02-45]--+-00.0-[03-42]----00.0  Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020]
           |                               +-01.0-[43]--
           |                               +-02.0-[44]----00.0  Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020]
           |                               \-03.0-[45]----00.0--
           +-01.3-[46-48]----00.0-[47-48]----00.0-[48]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX]
           |                                            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 14da
           .
           .
           .
Comment 37 Mika Westerberg 2023-08-07 07:46:38 UTC
The patch that was attached to this bugzilla is not merged AFAIK. Can you attach full dmesg and output of "sudo lspci -vv" here?
Comment 38 Mika Westerberg 2023-08-07 07:48:55 UTC
By the looks it seems that the BIOS did not reserve enough PCI resources for the hotplug PCIe ports of Maple Ridge Thunderbolt controller.
Comment 39 Mario Limonciello (AMD) 2023-08-07 11:41:19 UTC
> By the looks it seems that the BIOS did not reserve enough PCI resources for
> the hotplug PCIe ports of Maple Ridge Thunderbolt controller.

To confirm that's the cause, does `pci=realloc,assign-busses,hpbussize=0x33` work to avoid the issue?
Comment 40 Shmerl 2023-08-07 22:04:37 UTC
With realloc and etc. it gets a bit further, but causes its own problems like disabling keyboard unless it's plugged into another USB port. And even after that, lsblk doesn't show the nvme device.

It all works properly if it's plugged in before boot time.
Comment 41 Shmerl 2023-08-07 22:05:51 UTC
Created attachment 304792 [details]
dmesg output with hotpluging an nvme thunderbolt enclosure.

Dmesg with hotplugging.
Comment 42 Shmerl 2023-08-07 22:07:28 UTC
Created attachment 304793 [details]
dmesg output with pcie=realloc and etc.

dmesg with hotplugging when enabling `pci=realloc,assign-busses,hpbussize=0x33`
Comment 43 Shmerl 2023-08-07 22:08:18 UTC
Created attachment 304794 [details]
lspci -vv

Output of

sudo lspci -vv
Comment 44 Mario Limonciello (AMD) 2023-08-08 18:05:07 UTC
In a similar vein to https://bugzilla.kernel.org/show_bug.cgi?id=216000#c33
something I'm wondering is if perhaps we should try to detect which PCIe port is linked to an NHI and treat that one specially when allocating resources.
Comment 45 Mika Westerberg 2023-08-09 06:17:36 UTC
(In reply to Mario Limonciello (AMD) from comment #44)
> In a similar vein to https://bugzilla.kernel.org/show_bug.cgi?id=216000#c33
> something I'm wondering is if perhaps we should try to detect which PCIe
> port is linked to an NHI and treat that one specially when allocating
> resources.

I don't think we should special case any one port. The PCI stack should just be able to deal with it regardless whether it is behind a tunneled link or not.
Comment 46 Sanath S 2023-08-09 06:23:33 UTC
Created attachment 304801 [details]
Probable fix - PCI bridges

Hi,

I've attached the patch.

I've tested here on my setup and this patch fixes the issues.
What are your views? I can send this as an upstream request.


Thanks,
Sanath S
Comment 47 Mika Westerberg 2023-08-09 06:24:28 UTC
Looking at the logs. There are a couple of ACPI related errors in the dmesg too. Even though not seem to directly relate to the issue at hand:

[    0.266077] ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.LPC0.EC0], AE_NOT_FOUND (20230331/dswload2-162)
[    0.266081] ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20230331/psobject-220)
[    0.266083] ACPI: Skipping parse of AML opcode: OpcodeName unavailable (0x0010)
[    0.266793] ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GPP0._PRW], AE_ALREADY_EXISTS (20230331/dswload2-326)
[    0.266795] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20230331/psobject-220)
[    0.266798] ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.GPP2._PRW], AE_ALREADY_EXISTS (20230331/dswload2-326)
[    0.266800] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20230331/psobject-220)
[    0.266803] ACPI BIOS Error (bug): Failure creating named object [\_GPE._L08], AE_ALREADY_EXISTS (20230331/dswload2-326)
[    0.266804] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20230331/psobject-220)
...
[    0.285700] acpi PNP0A08:00: [Firmware Info]: MMCONFIG for domain 0000 [bus 00-7f] only partially covers this bridge

This is slightly concerning.

[    0.287355] pci 0000:01:00.0: [8086:1136] type 01 class 0x060400
[    0.287443] pci 0000:01:00.0: supports D1 D2
[    0.287444] pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.287587] pci 0000:00:01.2: PCI bridge to [bus 01-45]
[    0.287589] pci 0000:00:01.2:   bridge window [io  0xc000-0xdfff]
[    0.287591] pci 0000:00:01.2:   bridge window [mem 0xc0000000-0xee2fffff]
[    0.287593] pci 0000:00:01.2:   bridge window [mem 0xf820000000-0xf869ffffff 64bit pref]

Root port is still fine

[    0.287632] pci 0000:02:00.0: [8086:1136] type 01 class 0x060400
[    0.287722] pci 0000:02:00.0: supports D1 D2
[    0.287723] pci 0000:02:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.287847] pci 0000:02:01.0: [8086:1136] type 01 class 0x060400
[    0.287940] pci 0000:02:01.0: supports D1 D2
[    0.287941] pci 0000:02:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.288065] pci 0000:02:02.0: [8086:1136] type 01 class 0x060400
[    0.288151] pci 0000:02:02.0: supports D1 D2
[    0.288151] pci 0000:02:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.288265] pci 0000:02:03.0: [8086:1136] type 01 class 0x060400
[    0.288358] pci 0000:02:03.0: supports D1 D2
[    0.288359] pci 0000:02:03.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.288487] pci 0000:01:00.0: PCI bridge to [bus 02-45]
[    0.288491] pci 0000:01:00.0:   bridge window [io  0xc000-0xdfff]
[    0.288493] pci 0000:01:00.0:   bridge window [mem 0xc0000000-0xee2fffff]
[    0.288496] pci 0000:01:00.0:   bridge window [mem 0xf820000000-0xf869ffffff 64bit pref]

and the upstream port.

[    0.288539] pci 0000:03:00.0: [8086:1137] type 00 class 0x0c0340
[    0.288555] pci 0000:03:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit pref]
[    0.288565] pci 0000:03:00.0: reg 0x18: [mem 0x00000000-0x00000fff 64bit pref]
[    0.288657] pci 0000:03:00.0: supports D1 D2
[    0.288657] pci 0000:03:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.288795] pci 0000:02:00.0: PCI bridge to [bus 03-42]
[    0.288798] pci 0000:02:00.0:   bridge window [io  0x0000-0x0fff]
[    0.288800] pci 0000:02:00.0:   bridge window [mem 0x00000000-0x000fffff]
[    0.288803] pci 0000:02:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]

But the downstraem port leading to NHI is not, it gets most the buses.

[    0.288828] pci 0000:02:01.0: PCI bridge to [bus 43]
[    0.288831] pci 0000:02:01.0:   bridge window [io  0x0000-0x0fff]
[    0.288833] pci 0000:02:01.0:   bridge window [mem 0x00000000-0x000fffff]
[    0.288836] pci 0000:02:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]

The hotplug downstream port does not get enough resources.

[    0.288877] pci 0000:44:00.0: [8086:1138] type 00 class 0x0c0330
[    0.288891] pci 0000:44:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit]
[    0.288962] pci 0000:44:00.0: PME# supported from D3hot D3cold
[    0.289022] pci 0000:02:02.0: PCI bridge to [bus 44]
[    0.289025] pci 0000:02:02.0:   bridge window [io  0x0000-0x0fff]
[    0.289027] pci 0000:02:02.0:   bridge window [mem 0x00000000-0x000fffff]
[    0.289030] pci 0000:02:02.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[    0.289055] pci 0000:02:03.0: PCI bridge to [bus 45]
[    0.289058] pci 0000:02:03.0:   bridge window [io  0x0000-0x0fff]
[    0.289060] pci 0000:02:03.0:   bridge window [mem 0x00000000-0x000fffff]
[    0.289063] pci 0000:02:03.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]

Neither is this.

The BIOS should really allocate resources for this properly. I suppose
there is no BIOS option that is related USB4 or Thunderbolt that makes
it work better?

The reason the fix we added does not work in this case is because on those systems the BIOS did not configure all the ports so Linux had to re-configure them. Here it has configured them badly and Linux leaves them alone. We should find a way somehow to "reconfigure" the resources if things like these arises.

One way to work it around is to force rescan, although I'm not entirely sure if this ends up allocating the resources similar to native hotplug case but you could try;

# echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
# echo 1 > /sys/bus/pci/devices/0000:00:01.2/rescan

Does that make any difference?
Comment 48 Shmerl 2023-08-09 20:08:00 UTC
Rescan seems to work OK, and it shows me this in the end:

```
[93242.085969] thunderbolt 0-3: new device found, vendor=0x25 device=0x1
[93242.085972] thunderbolt 0-3: ACASIS TBU401E
```

But it still doesn't come up in lsblk and visible storage.

Full snippet of what happens:

```
93241.307999] pci 0000:01:00.0: [8086:1136] type 01 class 0x060400
[93241.308104] pci 0000:01:00.0: supports D1 D2
[93241.308105] pci 0000:01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[93241.308266] pci 0000:01:00.0: Adding to iommu group 0
[93241.308345] pci 0000:02:00.0: [8086:1136] type 01 class 0x060400
[93241.308440] pci 0000:02:00.0: supports D1 D2
[93241.308440] pci 0000:02:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[93241.308568] pci 0000:02:00.0: Adding to iommu group 0
[93241.308599] pci 0000:02:01.0: [8086:1136] type 01 class 0x060400
[93241.308704] pci 0000:02:01.0: supports D1 D2
[93241.308704] pci 0000:02:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[93241.308820] pci 0000:02:01.0: Adding to iommu group 0
[93241.308853] pci 0000:02:02.0: [8086:1136] type 01 class 0x060400
[93241.308946] pci 0000:02:02.0: supports D1 D2
[93241.308947] pci 0000:02:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[93241.309065] pci 0000:02:02.0: Adding to iommu group 0
[93241.309106] pci 0000:02:03.0: [8086:1136] type 01 class 0x060400
[93241.309219] pci 0000:02:03.0: supports D1 D2
[93241.309220] pci 0000:02:03.0: PME# supported from D0 D1 D2 D3hot D3cold
[93241.309344] pci 0000:02:03.0: Adding to iommu group 0
[93241.309382] pci 0000:01:00.0: PCI bridge to [bus 02-45]
[93241.309387] pci 0000:01:00.0:   bridge window [io  0xc000-0xdfff]
[93241.309389] pci 0000:01:00.0:   bridge window [mem 0xc0000000-0xee2fffff]
[93241.309393] pci 0000:01:00.0:   bridge window [mem 0xf820000000-0xf869ffffff 64bit pref]
[93241.309445] pci 0000:03:00.0: [8086:1137] type 00 class 0x0c0340
[93241.309464] pci 0000:03:00.0: reg 0x10: [mem 0xf820000000-0xf82003ffff 64bit pref]
[93241.309476] pci 0000:03:00.0: reg 0x18: [mem 0xf820040000-0xf820040fff 64bit pref]
[93241.309575] pci 0000:03:00.0: supports D1 D2
[93241.309576] pci 0000:03:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[93241.309713] pci 0000:03:00.0: Adding to iommu group 0
[93241.309741] pci 0000:02:00.0: PCI bridge to [bus 03-42]
[93241.309750] pci 0000:02:00.0:   bridge window [mem 0xf820000000-0xf8200fffff 64bit pref]
[93241.309779] pci 0000:02:01.0: PCI bridge to [bus 43]
[93241.309782] pci 0000:02:01.0:   bridge window [io  0xc000-0xcfff]
[93241.309785] pci 0000:02:01.0:   bridge window [mem 0xc0000000-0xc01fffff]
[93241.309788] pci 0000:02:01.0:   bridge window [mem 0xf820100000-0xf8202fffff 64bit pref]
[93241.309834] pci 0000:44:00.0: [8086:1138] type 00 class 0x0c0330
[93241.309850] pci 0000:44:00.0: reg 0x10: [mem 0xc0200000-0xc020ffff 64bit]
[93241.309932] pci 0000:44:00.0: PME# supported from D3hot D3cold
[93241.309998] pci 0000:44:00.0: Adding to iommu group 0
[93241.310025] pci 0000:02:02.0: PCI bridge to [bus 44]
[93241.310031] pci 0000:02:02.0:   bridge window [mem 0xc0200000-0xc02fffff]
[93241.310111] pci 0000:45:00.0: [8086:15ef] type 01 class 0x060400
[93241.310355] pci 0000:45:00.0: supports D1 D2
[93241.310356] pci 0000:45:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[93241.310576] pci 0000:45:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:02:03.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[93241.310721] pci 0000:45:00.0: Adding to iommu group 0
[93241.310749] pci 0000:02:03.0: PCI bridge to [bus 45]
[93241.310752] pci 0000:02:03.0:   bridge window [io  0xd000-0xdfff]
[93241.310754] pci 0000:02:03.0:   bridge window [mem 0xc0300000-0xc04fffff]
[93241.310758] pci 0000:02:03.0:   bridge window [mem 0xf820300000-0xf8204fffff 64bit pref]
[93241.310761] pci 0000:45:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[93241.310830] pci_bus 0000:46: busn_res: [bus 46-48] end is updated to 48
[93241.310837] pci 0000:45:00.0: devices behind bridge are unusable because [bus 46-48] cannot be assigned for them
[93241.310840] pci 0000:02:03.0: bridge has subordinate 45 but max busn 48
[93241.384072] pci 0000:01:00.0: BAR 14: assigned [mem 0xc0000000-0xee2fffff]
[93241.384078] pci 0000:01:00.0: BAR 15: assigned [mem 0xf820000000-0xf869ffffff 64bit pref]
[93241.384080] pci 0000:01:00.0: BAR 13: assigned [io  0xc000-0xdfff]
[93241.384082] pci 0000:02:00.0: BAR 15: assigned [mem 0xf820000000-0xf8200fffff 64bit pref]
[93241.384084] pci 0000:02:01.0: BAR 14: assigned [mem 0xc0000000-0xc01fffff]
[93241.384085] pci 0000:02:01.0: BAR 15: assigned [mem 0xf820100000-0xf8202fffff 64bit pref]
[93241.384086] pci 0000:02:02.0: BAR 14: assigned [mem 0xc0200000-0xc02fffff]
[93241.384087] pci 0000:02:03.0: BAR 14: assigned [mem 0xc0300000-0xc04fffff]
[93241.384088] pci 0000:02:03.0: BAR 15: assigned [mem 0xf820300000-0xf8204fffff 64bit pref]
[93241.384089] pci 0000:02:01.0: BAR 13: assigned [io  0xc000-0xcfff]
[93241.384090] pci 0000:02:03.0: BAR 13: assigned [io  0xd000-0xdfff]
[93241.384092] pci 0000:03:00.0: BAR 0: assigned [mem 0xf820000000-0xf82003ffff 64bit pref]
[93241.384105] pci 0000:03:00.0: BAR 2: assigned [mem 0xf820040000-0xf820040fff 64bit pref]
[93241.384113] pci 0000:02:00.0: PCI bridge to [bus 03-42]
[93241.384119] pci 0000:02:00.0:   bridge window [mem 0xf820000000-0xf8200fffff 64bit pref]
[93241.384123] pci 0000:02:01.0: PCI bridge to [bus 43]
[93241.384125] pci 0000:02:01.0:   bridge window [io  0xc000-0xcfff]
[93241.384128] pci 0000:02:01.0:   bridge window [mem 0xc0000000-0xc01fffff]
[93241.384131] pci 0000:02:01.0:   bridge window [mem 0xf820100000-0xf8202fffff 64bit pref]
[93241.384135] pci 0000:44:00.0: BAR 0: assigned [mem 0xc0200000-0xc020ffff 64bit]
[93241.384145] pci 0000:02:02.0: PCI bridge to [bus 44]
[93241.384148] pci 0000:02:02.0:   bridge window [mem 0xc0200000-0xc02fffff]
[93241.384153] pci 0000:02:03.0: PCI bridge to [bus 45]
[93241.384155] pci 0000:02:03.0:   bridge window [io  0xd000-0xdfff]
[93241.384158] pci 0000:02:03.0:   bridge window [mem 0xc0300000-0xc04fffff]
[93241.384160] pci 0000:02:03.0:   bridge window [mem 0xf820300000-0xf8204fffff 64bit pref]
[93241.384164] pci 0000:01:00.0: PCI bridge to [bus 02-45]
[93241.384166] pci 0000:01:00.0:   bridge window [io  0xc000-0xdfff]
[93241.384169] pci 0000:01:00.0:   bridge window [mem 0xc0000000-0xee2fffff]
[93241.384171] pci 0000:01:00.0:   bridge window [mem 0xf820000000-0xf869ffffff 64bit pref]
[93241.384524] pcieport 0000:02:01.0: pciehp: Slot #1 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[93241.384873] pcieport 0000:02:03.0: pciehp: Slot #3 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[93241.755084] xhci_hcd 0000:44:00.0: xHCI Host Controller
[93241.755089] xhci_hcd 0000:44:00.0: new USB bus registered, assigned bus number 1
[93241.756200] xhci_hcd 0000:44:00.0: hcc params 0x20007fc1 hci version 0x110 quirks 0x0000000200009810
[93241.756402] xhci_hcd 0000:44:00.0: xHCI Host Controller
[93241.756403] xhci_hcd 0000:44:00.0: new USB bus registered, assigned bus number 2
[93241.756405] xhci_hcd 0000:44:00.0: Host supports USB 3.1 Enhanced SuperSpeed
[93241.756455] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.05
[93241.756459] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[93241.756461] usb usb1: Product: xHCI Host Controller
[93241.756462] usb usb1: Manufacturer: Linux 6.5.0-rc5 xhci-hcd
[93241.756463] usb usb1: SerialNumber: 0000:44:00.0
[93241.756586] hub 1-0:1.0: USB hub found
[93241.756594] hub 1-0:1.0: 2 ports detected
[93241.756696] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.05
[93241.756698] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[93241.756700] usb usb2: Product: xHCI Host Controller
[93241.756701] usb usb2: Manufacturer: Linux 6.5.0-rc5 xhci-hcd
[93241.756702] usb usb2: SerialNumber: 0000:44:00.0
[93241.756777] hub 2-0:1.0: USB hub found
[93241.756790] hub 2-0:1.0: 2 ports detected
[93242.085969] thunderbolt 0-3: new device found, vendor=0x25 device=0x1
[93242.085972] thunderbolt 0-3: ACASIS TBU401E
```
Comment 49 Mario Limonciello (AMD) 2023-08-09 20:17:43 UTC
> Rescan seems to work OK, and it shows me this in the end:

You mean in this case that you didn't have kernel command line options added, right?

> But it still doesn't come up in lsblk and visible storage.

Did you authorize it after it enumerated?  Default kernel policy won't authorize it.  You can do that from sysfs directly or by using software like boltctl.
Comment 50 Shmerl 2023-08-09 20:24:50 UTC
(In reply to Mario Limonciello (AMD) from comment #49)
> > Rescan seems to work OK, and it shows me this in the end:
> 
> You mean in this case that you didn't have kernel command line options
> added, right?
> 
> > But it still doesn't come up in lsblk and visible storage.
> 
> Did you authorize it after it enumerated?  Default kernel policy won't
> authorize it.  You can do that from sysfs directly or by using software like
> boltctl.

Yeah, this time I'm trying it without pci=realloc,...

I didn't authorize it, but I don't see any one which isn't authorized:

This gives me nothing:

   cd /sys/devices
   cat $(fd authorize | xargs) | rg 0
Comment 51 Mika Westerberg 2023-08-10 04:55:45 UTC
Try:

# echo 1 > /sys/bus/thunderbolt/devices/0-3/authorized

(could also be 0-1 depending on which port you connect the device).
Comment 52 Shmerl 2023-08-10 05:14:47 UTC
After doing remove / rescan the device looks already authorized so no need to do it manually, but the drive doesn't show up:

```
cat /sys/bus/thunderbolt/devices/0-3/vendor_name 
ACASIS
cat /sys/bus/thunderbolt/devices/0-3/authorized
1
```
Comment 53 Sanath S 2023-09-06 03:46:53 UTC
Comment on attachment 304801 [details]
Probable fix - PCI bridges

Hi,


Can anyone test this on their setup? To confirm that this patch is solving the issue?

Thanks,
Sanath S
Comment 54 Shmerl 2024-02-04 00:35:55 UTC
Just to add to the above, similar issue happens when hotplugging Sonnet Solo10G SFP+ network adapter (also on USB 4 port on Asrock X670E Taichi in my case):


I see these errors:

```
[  222.819703] pcieport 0000:00:01.2: PME: Spurious native interrupt!
[  222.979885] pcieport 0000:00:01.2: PME: Spurious native interrupt!
[  223.140075] pcieport 0000:00:01.2: PME: Spurious native interrupt!
[  224.103994] pcieport 0000:02:03.0: pciehp: Slot(3): Card present
[  224.103997] pcieport 0000:02:03.0: pciehp: Slot(3): Link Up
[  224.241597] pci 0000:45:00.0: [8086:15da] type 01 class 0x060400
[  224.241675] pci 0000:45:00.0: enabling Extended Tags
[  224.241845] pci 0000:45:00.0: supports D1 D2
[  224.241846] pci 0000:45:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[  224.242000] pci 0000:45:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:02:03.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[  224.242171] pci 0000:45:00.0: Adding to iommu group 41
[  224.257554] pci 0000:45:00.0: No bus number available for hot-added bridge
[  224.257560] pcieport 0000:02:03.0: PCI bridge to [bus 45]
[  224.257563] pcieport 0000:02:03.0:   bridge window [io  0xd000-0xdfff]
[  224.257568] pcieport 0000:02:03.0:   bridge window [mem 0xb0300000-0xb04fffff]
[  224.257572] pcieport 0000:02:03.0:   bridge window [mem 0xf820300000-0xf8204fffff 64bit pref]
[  224.954944] thunderbolt 0-3: new device found, vendor=0x8 device=0x36
[  224.954948] thunderbolt 0-3: Sonnet Technologies, Inc Solo 10G SFP+ Thunderbolt 3 Edition
```

It's not showing up after that. When plugged when booting it works OK. Interestingly it works OK even if you unplug and re-plug it, as long as it was present during boot.
Comment 55 Mario Limonciello (AMD) 2024-02-04 01:43:25 UTC
Can you please see if you still reproduce this behavior with the patches in Mika's next tree?

https://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git/log/?h=next

You can cherry pick them into something older if necessary.
Comment 56 Shmerl 2024-02-04 02:27:53 UTC
I built 6.7.3 applying these changes:

* https://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git/commit/?h=next&id=01da6b99d49f60b1edead44e33569b1a2e9f49b7
* https://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git/commit/?h=next&id=b35c1d7b11da8c08b14147bbe87c2c92f7a83f8b
* https://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git/commit/?h=next&id=ec8162b3f0683ae08a21f20517cf49272b07ee0b
* https://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git/commit/?h=next&id=59a54c5f3dbde00b8ad30aef27fe35b1fe07bf5c

It didn't help in my case for either of the devices. I'm still getting errors:

```
[   59.597027] pcieport 0000:00:01.2: PME: Spurious native interrupt!
[   59.756962] pcieport 0000:00:01.2: PME: Spurious native interrupt!
[   59.918343] pcieport 0000:00:01.2: PME: Spurious native interrupt!
[   60.869451] pcieport 0000:02:03.0: pciehp: Slot(3): Card present
[   60.869455] pcieport 0000:02:03.0: pciehp: Slot(3): Link Up
[   61.007595] pci 0000:45:00.0: [8086:15da] type 01 class 0x060400
[   61.007679] pci 0000:45:00.0: enabling Extended Tags
[   61.007855] pci 0000:45:00.0: supports D1 D2
[   61.007857] pci 0000:45:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[   61.008014] pci 0000:45:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:02:03.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[   61.008202] pci 0000:45:00.0: Adding to iommu group 41
[   61.023543] pci 0000:45:00.0: No bus number available for hot-added bridge
[   61.023549] pcieport 0000:02:03.0: PCI bridge to [bus 45]
[   61.023553] pcieport 0000:02:03.0:   bridge window [io  0xd000-0xdfff]
[   61.023557] pcieport 0000:02:03.0:   bridge window [mem 0xb0300000-0xb04fffff]
[   61.023561] pcieport 0000:02:03.0:   bridge window [mem 0xf820300000-0xf8204fffff 64bit pref]
[   61.722705] thunderbolt 0-3: new device found, vendor=0x8 device=0x36
[   61.722708] thunderbolt 0-3: Sonnet Technologies, Inc Solo 10G SFP+ Thunderbolt 3 Edition
[   95.315319] pcieport 0000:02:03.0: pciehp: Slot(3): Card not present
[   95.315328] pcieport 0000:45:00.0: Unable to change power state from D3hot to D0, device inaccessible
[   95.319194] thunderbolt 0-3: device disconnected
[  112.320659] pcieport 0000:00:01.2: PME: Spurious native interrupt!
[  112.320676] pcieport 0000:02:03.0: pciehp: Slot(3): Card present
[  112.320681] pcieport 0000:02:03.0: pciehp: Slot(3): Link Up
[  112.455018] pci 0000:45:00.0: [8086:15ef] type 01 class 0x060400
[  112.455100] pci 0000:45:00.0: enabling Extended Tags
[  112.455310] pci 0000:45:00.0: supports D1 D2
[  112.455312] pci 0000:45:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[  112.455538] pci 0000:45:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:02:03.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[  112.455742] pci 0000:45:00.0: Adding to iommu group 41
[  112.455886] pcieport 0000:02:03.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[  112.470969] pci 0000:45:00.0: No bus number available for hot-added bridge
[  112.470975] pcieport 0000:02:03.0: PCI bridge to [bus 45]
[  112.470979] pcieport 0000:02:03.0:   bridge window [io  0xd000-0xdfff]
[  112.470983] pcieport 0000:02:03.0:   bridge window [mem 0xb0300000-0xb04fffff]
[  112.470987] pcieport 0000:02:03.0:   bridge window [mem 0xf820300000-0xf8204fffff 64bit pref]
[  113.122637] thunderbolt 0-3: new device found, vendor=0x25 device=0x1
[  113.122640] thunderbolt 0-3: ACASIS TBU401E
[  116.150932] pcieport 0000:02:03.0: pciehp: Slot(3): Card not present
[  116.150939] pcieport 0000:45:00.0: Unable to change power state from D3hot to D0, device inaccessible
[  116.155624] thunderbolt 0-3: device disconnected
```
Comment 57 Mario Limonciello (AMD) 2024-02-05 22:43:39 UTC
OK, thanks for checking.  That means it's not the same root cause as the issue that Sanath fixed.

> It's not showing up after that. When plugged when booting it works OK.
> Interestingly it works OK even if you unplug and re-plug it, as long as it
> was present during boot.

I think we should contrast resource allocations for these two cases at bootup. 
With Sanath's patch series share a dmesg for both cases.  It /sounds/ to me like the BIOS is more conservative with resource allocations unless the device is there are bootup and Linux doesn't get along well with this case that the resource allocations are so low.
Comment 58 Shmerl 2024-02-06 01:15:45 UTC
Created attachment 305837 [details]
dmesg logs and pci topology

Attaching logs bundle and lspci -tv (for case when device works) for kernel 6.7.4 with above patches applied.

Used device: Sonnet Technologies, Inc Solo 10G SFP+ Thunderbolt 3 Edition (shows up as Aquantia Corp. Device 80b1 in lspci).

* dmesg_device_after_boot.log - log with device hotplugged after boot and then removed.
* dmesg_device_before_boot.log - log with device attached before boot (it works OK then) and then removed afterwards.
* topology.txt - pci topology when device works.
Comment 59 Sanath S 2024-02-06 17:42:50 UTC
Hi Shmerl,

Could you please check if this patch helps your problem ?

https://lore.kernel.org/linux-pci/20240123185548.1040096-1-alex.williamson@redhat.com/
Comment 60 Shmerl 2024-02-07 00:43:52 UTC
(In reply to Sanath S from comment #59)
> Hi Shmerl,
> 
> Could you please check if this patch helps your problem ?
> 
> https://lore.kernel.org/linux-pci/20240123185548.1040096-1-alex.
> williamson@redhat.com/

Nope, no difference with that patch (applied it on top the same previous 4).

I'm still getting:

[  115.159926] pci 0000:45:00.0: No bus number available for hot-added bridge
Comment 61 Shmerl 2024-02-07 00:44:47 UTC
Note, I'm not using any boot overrides when testing this.

Note You need to log in before you can comment on or make changes to this bug.