Created attachment 278801 [details]
I recently got new Lenovo Thinkpad T480s with the ThinkPad Thunderbolt 3 Dock. The USB ports (but probably also audio and ethernet) on the dock always doesn't work after resume from suspend on up-to-date Fedora 29 with kernel-4.18.9-300.fc29.x86_64. HDMI port in the dock seems works (but with some delay). It doesn't work even with latest available kernel-4.19.0-0.rc5.git0.1.fc30.x86_64 from rawhide. Replugging the dock usually helps to fix that issue.
Some probably relevant lines from dmesg after resume:
[ 6528.075126] xhci_hcd 0000:0b:00.0: Refused to change power state, currently in D3
[ 6528.075127] xhci_hcd 0000:09:00.0: Refused to change power state, currently in D3
[ 6528.075139] xhci_hcd 0000:0b:00.0: WARN: xHC restore state timeout
[ 6528.075140] xhci_hcd 0000:09:00.0: WARN: xHC restore state timeout
[ 6528.075140] xhci_hcd 0000:0b:00.0: PCI post-resume error -110!
[ 6528.075141] xhci_hcd 0000:09:00.0: PCI post-resume error -110!
[ 6528.075141] xhci_hcd 0000:0b:00.0: HC died; cleaning up
[ 6528.075142] xhci_hcd 0000:09:00.0: HC died; cleaning up
[ 6528.075150] dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[ 6528.075153] dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[ 6528.075155] PM: Device 0000:0b:00.0 failed to resume async: error -110
[ 6528.075157] PM: Device 0000:09:00.0 failed to resume async: error -110
The T480s has the latest available BIOS version, 1.25. Not sure what firmware version is in the dock, because I don't know how to check that on Linux.
I'm attaching dmesg output for the following sequence: boot (dock works ok), suspend&resume (dock doesn't work ok), replug (dock works ok).
I've initially reported this downstream, but they pointed me here:
Is there anything else what I can provide?
Since this is related to XHCI, it might be better to send the message to usb mailing list for usb maintainer's help.
Please send this information to email@example.com
I'm also affected, but in my case replugging does not help. Only a reboot does.
(In reply to Chen Yu from comment #2)
> Please send this information to firstname.lastname@example.org
Can you check firmware version of the dock and the laptop? You can do it like:
# cat /sys/bus/thunderbolt/devices/0-0/nvm_version
# cat /sys/bus/thunderbolt/devices/0-1/nvm_version
(assuming the dock is connected to port A, otherwise it is 0-3).
$ cat /sys/bus/thunderbolt/devices/0-0/nvm_version
$ cat /sys/bus/thunderbolt/devices/0-1/nvm_version
Can you also attach output of 'sudo lspci -vv' before and after the failure?
Created attachment 278895 [details]
lspci -vv before
Created attachment 278897 [details]
lspci -vv after
Hi, looking at your lspci after, I can see the xHCI devices. Is this now from the case where resume failed and the xHCI is not functional anymore?
I've tried again now to be sure, but yes, the output is the same apart from some addresses and pin numbers. USB mouse and keyboard don't react and audio plays from the laptop instead of the docking station.
OK, thanks. It looks like during resume PCIe tunnels are not up (or something like that) which results xHCI and others to fail. Can you attach acpidump from the system as well? Maybe there is something in the ACPI tables that Linux fails to handle.
Created attachment 278913 [details]
Based on the acpidump this is so called "RTD3" (Runtime D3) system where the Thunderbolt controller is present always unless the option is switch to "BIOS assisted" from the BIOS. I wonder if you have done any settings from the BIOS regarding Thunderbolt? These systems typically ship with latest Windows and the RTD3 enabled and it may be that the "BIOS assisted" flow was not tested too much by Lenovo.
I did some changes yes, I enabled "Support in Pre Boot Environment", so I can unlock my disks using the external keyboard and I also disabled "Wake by Thunderbolt(TM) 3" to save battery. However, I am convinced that I did not touch "Thunderbolt BIOS Assist Mode" and also "Setup Defaults" lets this "Enabled". But it seems that it fixes my issue if I change that to "Disabled"! Thanks!
IIRC as least some of the Lenovo systems that were shipped to Redhat were configured to use BIOS assisted mode in the factory. Reason for that is that when the controller is present always it also consumes more power and currently Linux does not fully support powering it down when idle. There is patch series in PCI subsystem tree that is heading to v4.20 that should enable it.
On my HP system the relevant option seems to be called "Thunderbolt Native PCIe hotplug". I had this enabled, and indeed disabling fixes this issue for me as well.
I guess non-native means BIOS assisted.
Additionally, disabling native PCIe hotplug enables S4 suspend. When it's enabled the bios says something like S4 suspend is not available if native PCIe hotplug is active.
So what's the takeaway? Should we just use BIOS assisted mode/non-native PCIe hotplug? Is this a defect in Linux? I read https://lwn.net/Articles/767885/ and there seems to be improvements landing on Linux 4.20+ with regards to TB3 and PCIe hotplug.
I don't know about HP but Lenovo introduced BIOS assisted mode to make Linux and other OSes work in case they do not support RTD3 (native PCIe everything). Now, I'm not sure how much that mode was actually tested which probably explains why we are seeing issues in Linux side. If anyone has Windows installed there it would be interesting to see if it has the same issues or not.
So my recommendation is to stick whatever mode the system was shipped :)
In case of "Native PCIe / RTD3" it should work fine in Linux but it consumes more power because the controller is not powered down (in BIOS assisted mode the controller is hot-removed when nothing is connected). Starting from v4.20 Linux also knows how to power manage the controller and the associated PCIe ports.
I'm also seeing this error on a Thinkpad X1 Extreme.
I'm having to use 4.19.73-1-lts due to this issue: https://bbs.archlinux.org/viewtopic.php?pid=1858969
I've tried both with "Thunderbold BIOS Assist Mode" enabled and disabled, and it doesn't seem to make a difference.
I'll attach dmesg + lscpi (before/after) + acpidump. Let me know if anything else would be helpful.
Created attachment 285153 [details]
lspci before dock has been replugged
Created attachment 285155 [details]
lspci after dock has been replugged
Created attachment 285157 [details]
Created attachment 285159 [details]
dmesg (including initial connection, and replug)
Can you try the patch here: https://bugzilla.kernel.org/attachment.cgi?id=284551
Mika Westerberg: Finally managed to compile and test that patch, thanks. Unfortunately doesn't seem to resolve the issue. I still get this when I plug in post boot and/or unplug/replug it:
[37463.943865] usb 1-4: new high-speed USB device number 8 using xhci_hcd
[37466.654729] pcieport 0000:00:1b.4: PME: Spurious native interrupt!
[37466.894168] pcieport 0000:00:1b.4: PME: Spurious native interrupt!
[37466.894173] pcieport 0000:00:1b.4: PME: Spurious native interrupt!
[37468.091655] thunderbolt 0-1: new device found, vendor=0x108 device=0x1720
[37468.091656] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock
[37469.512595] thinkpad_acpi: undocked from hotplug port replicator
[37469.836926] typec_displayport port0-partner.0: failed to enter mode
[37471.793780] usb usb4-port1: Cannot enable. Maybe the USB cable is bad?
[37475.853804] usb usb4-port1: Cannot enable. Maybe the USB cable is bad?
[37475.853892] usb usb4-port1: attempt power cycle
Does the port work regardless of the above warnings?
No it doesn't. So, the display plugged into the dock works, even after replug, but none of the usb stuff does, so no peripherals, and no sound.
What's the content of /sys/bus/thunderbolt/devices/domain0/security and does boltctl say it has authorized the device after replug?
Ahh hah! That might be the issue. After doing a "boltctl authorize" on the device the usb all starts working :)
It does appear I have to do that every time I replug the device, since I also did a "boltctl enroll" on it. I'll see if I need to mess with the udev rules to get this to happen automatically.
I'm seconding Nick here. I'm on a Lenovo ThinkPad X1C with kernel 5.3.11-arch1-1 and am experiencing the same problems. Doing `boltclt authorize ...` works.