Bug 201255

Summary: USB ports on Thunderbolt 3 Dock always doesn't work after resume from suspend
Product: Drivers Reporter: Ondrej Holy (oholy)
Component: USBAssignee: Greg Kroah-Hartman (greg)
Status: NEEDINFO ---    
Severity: normal CC: a3at.mail, andreas, heikki.krogerus, kugel, mika.westerberg, nick.lanham.nexus+kbugz, yu.c.chen
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.18, 4.19 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
lspci -vv before
lspci -vv after
acpidump
lspci before dock has been replugged
lspci after dock has been replugged
acpidump
dmesg (including initial connection, and replug)

Description Ondrej Holy 2018-09-27 08:01:56 UTC
Created attachment 278801 [details]
dmesg

I recently got new Lenovo Thinkpad T480s with the ThinkPad Thunderbolt 3 Dock. The USB ports (but probably also audio and ethernet) on the dock always doesn't work after resume from suspend on up-to-date Fedora 29 with kernel-4.18.9-300.fc29.x86_64. HDMI port in the dock seems works (but with some delay). It doesn't work even with latest available kernel-4.19.0-0.rc5.git0.1.fc30.x86_64 from rawhide. Replugging the dock usually helps to fix that issue. 

Some probably relevant lines from dmesg after resume:
[ 6528.075126] xhci_hcd 0000:0b:00.0: Refused to change power state, currently in D3
[ 6528.075127] xhci_hcd 0000:09:00.0: Refused to change power state, currently in D3
[ 6528.075139] xhci_hcd 0000:0b:00.0: WARN: xHC restore state timeout
[ 6528.075140] xhci_hcd 0000:09:00.0: WARN: xHC restore state timeout
[ 6528.075140] xhci_hcd 0000:0b:00.0: PCI post-resume error -110!
[ 6528.075141] xhci_hcd 0000:09:00.0: PCI post-resume error -110!
[ 6528.075141] xhci_hcd 0000:0b:00.0: HC died; cleaning up
[ 6528.075142] xhci_hcd 0000:09:00.0: HC died; cleaning up
[ 6528.075150] dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[ 6528.075153] dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[ 6528.075155] PM: Device 0000:0b:00.0 failed to resume async: error -110
[ 6528.075157] PM: Device 0000:09:00.0 failed to resume async: error -110

The T480s has the latest available BIOS version, 1.25. Not sure what firmware version is in the dock, because I don't know how to check that on Linux.

I'm attaching dmesg output for the following sequence: boot (dock works ok), suspend&resume (dock doesn't work ok), replug (dock works ok).

I've initially reported this downstream, but they pointed me here:
https://bugzilla.redhat.com/show_bug.cgi?id=1633109

Is there anything else what I can provide?
Comment 1 Chen Yu 2018-09-27 16:08:23 UTC
Since this is related to XHCI, it might be better to send the message to usb mailing list for usb maintainer's help.
Comment 2 Chen Yu 2018-09-27 16:10:25 UTC
Please send this information to linux-usb@vger.kernel.org
Comment 3 Thomas Martitz 2018-09-29 06:17:47 UTC
I'm also affected, but in my case replugging does not help. Only a reboot does.
Comment 4 Ondrej Holy 2018-09-29 06:27:25 UTC
(In reply to Chen Yu from comment #2)
> Please send this information to linux-usb@vger.kernel.org

Done.
Comment 5 Mika Westerberg 2018-10-01 08:07:47 UTC
Can you check firmware version of the dock and the laptop? You can do it like:

  # cat /sys/bus/thunderbolt/devices/0-0/nvm_version
  # cat /sys/bus/thunderbolt/devices/0-1/nvm_version

(assuming the dock is connected to port A, otherwise it is 0-3).
Comment 6 Ondrej Holy 2018-10-01 08:41:26 UTC
$ cat /sys/bus/thunderbolt/devices/0-0/nvm_version
14.0
$ cat /sys/bus/thunderbolt/devices/0-1/nvm_version
15.0
Comment 7 Mika Westerberg 2018-10-02 07:41:04 UTC
Can you also attach output of 'sudo lspci -vv' before and after the failure?
Comment 8 Ondrej Holy 2018-10-02 07:59:26 UTC
Created attachment 278895 [details]
lspci -vv before
Comment 9 Ondrej Holy 2018-10-02 07:59:54 UTC
Created attachment 278897 [details]
lspci -vv after
Comment 10 Mika Westerberg 2018-10-03 08:56:02 UTC
Hi, looking at your lspci after, I can see the xHCI devices. Is this now from the case where resume failed and the xHCI is not functional anymore?
Comment 11 Ondrej Holy 2018-10-04 06:03:35 UTC
I've tried again now to be sure, but yes, the output is the same apart from some addresses and pin numbers. USB mouse and keyboard don't react and audio plays from the laptop instead of the docking station.
Comment 12 Mika Westerberg 2018-10-04 12:22:47 UTC
OK, thanks. It looks like during resume PCIe tunnels are not up (or something like that) which results xHCI and others to fail. Can you attach acpidump from the system as well? Maybe there is something in the ACPI tables that Linux fails to handle.
Comment 13 Ondrej Holy 2018-10-04 12:49:48 UTC
Created attachment 278913 [details]
acpidump
Comment 14 Mika Westerberg 2018-10-05 08:12:43 UTC
Based on the acpidump this is so called "RTD3" (Runtime D3) system where the Thunderbolt controller is present always unless the option is switch to "BIOS assisted" from the BIOS. I wonder if you have done any settings from the BIOS regarding Thunderbolt? These systems typically ship with latest Windows and the RTD3 enabled and it may be that the "BIOS assisted" flow was not tested too much by Lenovo.
Comment 15 Ondrej Holy 2018-10-05 10:12:15 UTC
I did some changes yes, I enabled "Support in Pre Boot Environment", so I can unlock my disks using the external keyboard and I also disabled "Wake by Thunderbolt(TM) 3" to save battery. However, I am convinced that I did not touch "Thunderbolt BIOS Assist Mode" and also "Setup Defaults" lets this "Enabled". But it seems that it fixes my issue if I change that to "Disabled"! Thanks!
Comment 16 Mika Westerberg 2018-10-05 10:21:18 UTC
IIRC as least some of the Lenovo systems that were shipped to Redhat were configured to use BIOS assisted mode in the factory. Reason for that is that when the controller is present always it also consumes more power and currently Linux does not fully support powering it down when idle. There is patch series in PCI subsystem tree that is heading to v4.20 that should enable it.
Comment 17 Thomas Martitz 2018-10-13 21:09:13 UTC
On my HP system the relevant option seems to be called "Thunderbolt Native PCIe hotplug". I had this enabled, and indeed disabling fixes this issue for me as well.

I guess non-native means BIOS assisted.

Additionally, disabling native PCIe hotplug enables S4 suspend. When it's enabled the bios says something like S4 suspend is not available if native PCIe hotplug is active.

So what's the takeaway? Should we just use BIOS assisted mode/non-native PCIe hotplug? Is this a defect in Linux? I read https://lwn.net/Articles/767885/ and there seems to be improvements landing on Linux 4.20+ with regards to TB3 and PCIe  hotplug.
Comment 18 Mika Westerberg 2018-10-15 07:26:16 UTC
I don't know about HP but Lenovo introduced BIOS assisted mode to make Linux and other OSes work in case they do not support RTD3 (native PCIe everything). Now, I'm not sure how much that mode was actually tested which probably explains why we are seeing issues in Linux side. If anyone has Windows installed there it would be interesting to see if it has the same issues or not.

So my recommendation is to stick whatever mode the system was shipped :)

In case of "Native PCIe / RTD3" it should work fine in Linux but it consumes more power because the controller is not powered down (in BIOS assisted mode the controller is hot-removed when nothing is connected). Starting from v4.20 Linux also knows how to power manage the controller and the associated PCIe ports.
Comment 19 Nick Lanham 2019-09-24 19:49:13 UTC
I'm also seeing this error on a Thinkpad X1 Extreme.


I'm having to use 4.19.73-1-lts due to this issue: https://bbs.archlinux.org/viewtopic.php?pid=1858969


I've tried both with "Thunderbold BIOS Assist Mode" enabled and disabled, and it doesn't seem to make a difference.


I'll attach dmesg + lscpi (before/after) + acpidump.  Let me know if anything else would be helpful.


Thanks!
Comment 20 Nick Lanham 2019-09-24 19:51:00 UTC
Created attachment 285153 [details]
lspci before dock has been replugged
Comment 21 Nick Lanham 2019-09-24 19:51:22 UTC
Created attachment 285155 [details]
lspci after dock has been replugged
Comment 22 Nick Lanham 2019-09-24 19:51:39 UTC
Created attachment 285157 [details]
acpidump
Comment 23 Nick Lanham 2019-09-24 19:52:15 UTC
Created attachment 285159 [details]
dmesg (including initial connection, and replug)
Comment 24 Mika Westerberg 2019-09-25 09:54:17 UTC
Can you try the patch here: https://bugzilla.kernel.org/attachment.cgi?id=284551
Comment 25 Nick Lanham 2019-09-27 17:27:11 UTC
Mika Westerberg: Finally managed to compile and test that patch, thanks.  Unfortunately doesn't seem to resolve the issue.  I still get this when I plug in post boot and/or unplug/replug it:

[37463.943865] usb 1-4: new high-speed USB device number 8 using xhci_hcd
[37466.654729] pcieport 0000:00:1b.4: PME: Spurious native interrupt!
[37466.894168] pcieport 0000:00:1b.4: PME: Spurious native interrupt!
[37466.894173] pcieport 0000:00:1b.4: PME: Spurious native interrupt!
[37468.091655] thunderbolt 0-1: new device found, vendor=0x108 device=0x1720
[37468.091656] thunderbolt 0-1: Lenovo ThinkPad Thunderbolt 3 Dock
[37469.512595] thinkpad_acpi: undocked from hotplug port replicator
[37469.836926] typec_displayport port0-partner.0: failed to enter mode
[37471.793780] usb usb4-port1: Cannot enable. Maybe the USB cable is bad?
[37475.853804] usb usb4-port1: Cannot enable. Maybe the USB cable is bad?
[37475.853892] usb usb4-port1: attempt power cycle
Comment 26 Mika Westerberg 2019-09-30 07:44:45 UTC
Does the port work regardless of the above warnings?
Comment 27 Nick Lanham 2019-09-30 17:12:21 UTC
No it doesn't. So, the display plugged into the dock works, even after replug, but none of the usb stuff does, so no peripherals, and no sound.
Comment 28 Mika Westerberg 2019-09-30 17:50:53 UTC
What's the content of /sys/bus/thunderbolt/devices/domain0/security and does boltctl say it has authorized the device after replug?
Comment 29 Nick Lanham 2019-10-04 17:50:48 UTC
Ahh hah! That might be the issue.  After doing a "boltctl authorize" on the device the usb all starts working :)

It does appear I have to do that every time I replug the device, since I also did a "boltctl enroll" on it.  I'll see if I need to mess with the udev rules to get this to happen automatically.

Thanks!
Comment 30 Andreas Lindhé 2019-12-16 19:41:56 UTC
I'm seconding Nick here. I'm on a Lenovo ThinkPad X1C with kernel 5.3.11-arch1-1 and am experiencing the same problems. Doing `boltclt authorize ...` works.