Bug 216448

Summary: AMD USB4 pciehp can't detect external PCIe devices
Product: Drivers Reporter: Kai-Heng Feng (kai.heng.feng)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: REJECTED DOCUMENTED    
Severity: normal CC: 1700011628, bjorn, mario.limonciello, mika.westerberg
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: mainline, linux-next Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci -vv
dmesg
dmesg, wake bridges after switch is added
dmesg with dyndbg enabled, and authorized the tunnel
AMD's attempt at reproducing it
Debugging patch
dmesg with debug patch

Description Kai-Heng Feng 2022-09-05 06:51:52 UTC

    
Comment 1 Kai-Heng Feng 2022-09-05 06:52:18 UTC
Created attachment 301741 [details]
lspci -vv
Comment 2 Kai-Heng Feng 2022-09-05 06:53:30 UTC
(In reply to Kai-Heng Feng from comment #1)
> Created attachment 301741 [details]
> dmesg

Sorry, this is lspci
Comment 3 Kai-Heng Feng 2022-09-05 06:53:52 UTC
Created attachment 301742 [details]
dmesg
Comment 4 Kai-Heng Feng 2022-09-05 06:54:27 UTC
Created attachment 301743 [details]
dmesg, wake bridges after switch is added
Comment 5 Kai-Heng Feng 2022-09-06 14:27:02 UTC
Created attachment 301752 [details]
dmesg with dyndbg enabled, and authorized the tunnel
Comment 6 Bjorn Helgaas 2022-09-06 14:48:49 UTC
Hi Kai-Heng, can you expand on this a little bit?

  - This is not a regression, right?

  - This has nothing to do with the ath11k_pci warnings, right?

  - Does this have something to do with 64:00.5 or 64:00.6?

  - What is the external PCIe device?  Where is it connected?

  - Is this a hot-add issue?

  - Does it work if the device is present at boot?
Comment 7 Kai-Heng Feng 2022-09-07 05:43:41 UTC
(In reply to Bjorn Helgaas from comment #6)
> Hi Kai-Heng, can you expand on this a little bit?
> 
>   - This is not a regression, right?

No, it's not a regression.

> 
>   - This has nothing to do with the ath11k_pci warnings, right?

This issue has nothing to do with ath11k_pci.

> 
>   - Does this have something to do with 64:00.5 or 64:00.6?

Yes, seems like PCIe tunnels have to be up to toggle Card Present bit.

> 
>   - What is the external PCIe device?  Where is it connected?

TBT NVMe/TBT dock.

> 
>   - Is this a hot-add issue?

Yes.

> 
>   - Does it work if the device is present at boot?

Yes, it works if TBT devices are plugged at preboot.
Comment 8 Mario Limonciello (AMD) 2022-09-07 15:54:25 UTC
Created attachment 301766 [details]
AMD's attempt at reproducing it

One of my colleagues tried to repro this issue on the same model hardware and same BIOS, but couldn't.  Attached his dmesg log.
Comment 9 Mario Limonciello (AMD) 2022-09-07 15:59:23 UTC
Created attachment 301767 [details]
Debugging patch

Patch to add debugging whether the DHP bit is set.
Comment 10 Mario Limonciello (AMD) 2022-09-07 16:34:18 UTC
In the kernel thread matching this bugzilla I left a few comments to try to narrow down why this fails for KH but works for AMD on the same HP HW:

1) How did you flash the 01.02.01 firmware?  In Anson's check, he used dediprog.
Is it possible there was some stateful stuff used by HP's BIOS still on the SPI from the
upgrade that didn't get set/cleared properly from an earlier pre-release BIOS?

2) Did you change any BIOS settings?  Particularly anything to do with Pre-OS CM?

3) If you explicitly reset to HP's "default BIOS settings" does it resolve?

4) Can you double check ADP_CS_5 bit 31?  If it was for some reason set by Pre-OS CM in your BIOS/settings
combination, we might need to undo it by the Linux CM.

5) Are you changing any of the default runtime PM policies for any of the USB4 routers or
root ports used for tunneling using software like TLP?
Comment 11 Mika Westerberg 2022-09-07 17:04:25 UTC
BTW, you can read the ADP_CS_5 and any other bit through debugfs too so no need to patch the kernel. The adapter register space is under /sys/kernel/debug/thunderbolt/ROUTER/portX/regs.
Comment 12 Kai-Heng Feng 2022-09-08 14:02:22 UTC
Created attachment 301774 [details]
dmesg with debug patch
Comment 13 Mario Limonciello (AMD) 2022-09-08 17:36:50 UTC
> BTW, you can read the ADP_CS_5 and any other bit through debugfs too so no
> need to patch the kernel. The adapter register space is under
> /sys/kernel/debug/thunderbolt/ROUTER/portX/regs.

Thanks for reminder on that.  So there will not really be a point to adding this patch in the future for checking DHP related problems.

> dmesg with debug patch

At this point I'm suspect that the firmware upgrade path you took didn't match what dediprog would have done.  

First check
# grep SMC /sys/kernel/debug/dri/0/amdgpu_firmware_info

If it's not 0x04453200, then the fault is HP's firmware upgrade didn't upgrade all components properly.

If it is that then let's check a full SPI dump and we can compare it with the dediprog flashed image.
Comment 14 Mario Limonciello (AMD) 2022-09-15 14:00:36 UTC
By comparing an upgraded BIOS versus dediprog flashed BIOS shows that this is caused by a bug in the HP's firmware upgrade process that will need to be addressed by HP.
Comment 15 Jingyuan Deng 2022-10-27 00:34:57 UTC
(In reply to Mario Limonciello (AMD) from comment #14)
> By comparing an upgraded BIOS versus dediprog flashed BIOS shows that this
> is caused by a bug in the HP's firmware upgrade process that will need to be
> addressed by HP.

So any possible way to flash back?
Comment 16 Jingyuan Deng 2022-10-27 00:37:11 UTC
(In reply to Mario Limonciello (AMD) from comment #14)
> By comparing an upgraded BIOS versus dediprog flashed BIOS shows that this
> is caused by a bug in the HP's firmware upgrade process that will need to be
> addressed by HP.

I mean, if I use dediprog to flash back, will it cause tpm problems?
Comment 17 Mario Limonciello (AMD) 2022-10-27 00:38:36 UTC
The specific upgrade problem outlined in this bug can be avoided by reset BIOS default settings after upgrade.