Created attachment 301741 [details] lspci -vv
(In reply to Kai-Heng Feng from comment #1) > Created attachment 301741 [details] > dmesg Sorry, this is lspci
Created attachment 301742 [details] dmesg
Created attachment 301743 [details] dmesg, wake bridges after switch is added
Created attachment 301752 [details] dmesg with dyndbg enabled, and authorized the tunnel
Hi Kai-Heng, can you expand on this a little bit? - This is not a regression, right? - This has nothing to do with the ath11k_pci warnings, right? - Does this have something to do with 64:00.5 or 64:00.6? - What is the external PCIe device? Where is it connected? - Is this a hot-add issue? - Does it work if the device is present at boot?
(In reply to Bjorn Helgaas from comment #6) > Hi Kai-Heng, can you expand on this a little bit? > > - This is not a regression, right? No, it's not a regression. > > - This has nothing to do with the ath11k_pci warnings, right? This issue has nothing to do with ath11k_pci. > > - Does this have something to do with 64:00.5 or 64:00.6? Yes, seems like PCIe tunnels have to be up to toggle Card Present bit. > > - What is the external PCIe device? Where is it connected? TBT NVMe/TBT dock. > > - Is this a hot-add issue? Yes. > > - Does it work if the device is present at boot? Yes, it works if TBT devices are plugged at preboot.
Created attachment 301766 [details] AMD's attempt at reproducing it One of my colleagues tried to repro this issue on the same model hardware and same BIOS, but couldn't. Attached his dmesg log.
Created attachment 301767 [details] Debugging patch Patch to add debugging whether the DHP bit is set.
In the kernel thread matching this bugzilla I left a few comments to try to narrow down why this fails for KH but works for AMD on the same HP HW: 1) How did you flash the 01.02.01 firmware? In Anson's check, he used dediprog. Is it possible there was some stateful stuff used by HP's BIOS still on the SPI from the upgrade that didn't get set/cleared properly from an earlier pre-release BIOS? 2) Did you change any BIOS settings? Particularly anything to do with Pre-OS CM? 3) If you explicitly reset to HP's "default BIOS settings" does it resolve? 4) Can you double check ADP_CS_5 bit 31? If it was for some reason set by Pre-OS CM in your BIOS/settings combination, we might need to undo it by the Linux CM. 5) Are you changing any of the default runtime PM policies for any of the USB4 routers or root ports used for tunneling using software like TLP?
BTW, you can read the ADP_CS_5 and any other bit through debugfs too so no need to patch the kernel. The adapter register space is under /sys/kernel/debug/thunderbolt/ROUTER/portX/regs.
Created attachment 301774 [details] dmesg with debug patch
> BTW, you can read the ADP_CS_5 and any other bit through debugfs too so no > need to patch the kernel. The adapter register space is under > /sys/kernel/debug/thunderbolt/ROUTER/portX/regs. Thanks for reminder on that. So there will not really be a point to adding this patch in the future for checking DHP related problems. > dmesg with debug patch At this point I'm suspect that the firmware upgrade path you took didn't match what dediprog would have done. First check # grep SMC /sys/kernel/debug/dri/0/amdgpu_firmware_info If it's not 0x04453200, then the fault is HP's firmware upgrade didn't upgrade all components properly. If it is that then let's check a full SPI dump and we can compare it with the dediprog flashed image.
By comparing an upgraded BIOS versus dediprog flashed BIOS shows that this is caused by a bug in the HP's firmware upgrade process that will need to be addressed by HP.
(In reply to Mario Limonciello (AMD) from comment #14) > By comparing an upgraded BIOS versus dediprog flashed BIOS shows that this > is caused by a bug in the HP's firmware upgrade process that will need to be > addressed by HP. So any possible way to flash back?
(In reply to Mario Limonciello (AMD) from comment #14) > By comparing an upgraded BIOS versus dediprog flashed BIOS shows that this > is caused by a bug in the HP's firmware upgrade process that will need to be > addressed by HP. I mean, if I use dediprog to flash back, will it cause tpm problems?
The specific upgrade problem outlined in this bug can be avoided by reset BIOS default settings after upgrade.