Created attachment 301465 [details] dell-7390_freeze-thunderbolt.html This performance issue was found running sleepgraph on linux 5.19.0-rc7 on the Dell XPS 13 7390 2-in-1. For reasons unknown, the thunderbolt device takes a full 8 seconds to suspend and a full 8 seconds to resume even though no physical devices are connected. The issue seems to occur in the both the dpm_suspend and dpm_resume phase. There are 4 thunderbolt devices: 2 pcieports and 2 thunderbolt NHIs. The suspend & resume callbacks for these devices spawn async threads that take 8 seconds to complete. The time is shown at the end of dpm_suspend/dpm_resume when async_synchronize_full is called. The time appears to be spent in the ACPI calls: \_SB.PCIO.TBT0._ON and \_SB.PCIO.TBT1._ON. If I disable the 4 thunderbolt devices and run another suspend/resume, the issue goes away. So if thunderbolt is not needed, you can avoid this issue by running this script at boot: #!/bin/sh if [ $USER != "root" ]; then echo "Can only be run as root" exit fi THUNPCI1="/sys/bus/pci/devices/0000:00:07.2" THUNPCI2="/sys/bus/pci/devices/0000:00:07.0" THUNDEV="/sys/bus/pci/drivers/thunderbolt" if [ -e $THUNDEV/0000:00:0d.3 ]; then echo 0000:00:0d.3 > $THUNDEV/unbind fi if [ -e $THUNDEV/0000:00:0d.2 ]; then echo 0000:00:0d.2 > $THUNDEV/unbind fi if [ -e $THUNPCI1 ]; then echo 1 > $THUNPCI1/remove fi if [ -e $THUNPCI2 ]; then echo 1 > $THUNPCI2/remove fi I'm attaching two sleepgraph timelines. The first is the problem timeline run with thunderbolt enabled. The second is run with thunderbolt disabled with the above script. dell-7390_freeze-thunderbolt.html dell-7390_freeze-thunderbolt-removed.html The dmesg logs and system info are in the timelines themselves. Just click the "dmesg" and "log" buttons in the upper right hand corner of the timelines.
Created attachment 301466 [details] dell-7390_freeze-thunderbolt-removed.html
Created attachment 301467 [details] dell-9300_freeze-thunderbolt.html
Created attachment 301468 [details] dell-9300_freeze-thunderbolt-removed.html
I just confirmed the exact same issue on the dell XPS 13 9300. Both this as the 7390 are Ice Lake machines. Attached are the before and after thunderbolt timelines.
Created attachment 301469 [details] otcpl-dell-9300-icl_freeze-thunderbolt.html
Created attachment 301470 [details] otcpl-dell-9300-icl_freeze-thunderbolt-removed.html
This issue is still with us, and is still reproducible.
Created attachment 306727 [details] issue.def
Created attachment 306768 [details] Dell 9300 ICL acpidump
Created attachment 306769 [details] Dell 9310 TGL acpidump
On the 9300, the first (chronological) challenge is the 348ms _ON method. PCI0.TBT0._ON, which results in a bunch of msleep(5)'s that that take 15ms each; with _ON taking 348ms total. If (ITRT) { PowerResource (TBT0, 0x05, 0x0001) { Method (_STA, 0, NotSerialized) // _STA: Status { Return (\_SB.PCI0.TDM0.STAT) } Method (_ON, 0, NotSerialized) // _ON_: Power On { # Dunno what "ITIM" is but we would skip all of this if it were not set... If (ITIM) { TG0N () } } Method (_OFF, 0, NotSerialized) // _OFF: Power Off { If (ITIM) { If ((\_SB.PCI0.TDM0.SD3C == Zero)) { TG0F () } } } } ... Method (TG0N, 0, NotSerialized) { If ((\_SB.PCI0.TDM0.VDID != 0xFFFFFFFF)) { If ((\_SB.PCI0.TDM0.STAT == Zero)) { \_SB.PCI0.TDM0.D3CX () If (CondRefOf (\_SB.PCI1)) { If ((\_SB.PCI1.TRP0.VDID != 0xFFFFFFFF)) { \_SB.PCI1.TRP0.D3CX () } If ((\_SB.PCI1.TRP1.VDID != 0xFFFFFFFF)) { \_SB.PCI1.TRP1.D3CX () } } Else { If ((\_SB.PCI0.TRP0.VDID != 0xFFFFFFFF)) { \_SB.PCI0.TRP0.D3CX () } If ((\_SB.PCI0.TRP1.VDID != 0xFFFFFFFF)) { \_SB.PCI0.TRP1.D3CX () } } If ((\_SB.PCI0.TDM0.ALCT == One)) { If ((CTP0 == One)) { \_SB.PCI0.TDM0.CNTP () \_SB.PCI0.TDM0.WACT = One CTP0 = Zero } \_SB.PCI0.TDM0.ALCT = Zero } } Else { } # presumably ITRE is 5ms # changing it will not help much, because we loop many times Sleep (ITRE) } }
On the 9300, the 2nd (chronological, but much larger) problem is 1,707 ms delay in \_SB.PCI0.TRP0._PS0 Method (_PS0, 0, Serialized) // _PS0: Power State 0 { # this loop requests a sleep for 5ms 100 times, # as EVT0 apparently never magically changes from One # and each sleep actually takes almost 16ms, this loop costs 1,600 ms # EVT0 is initialized to Zero # EVT0 is set to One in _PS3 # EVT0 is set to Zero in HPME(), which is called below, *after* # the delay loop that depends on it. Local0 = Zero While ((EVT0 == One)) { If ((Local0 == 0x64)) { Break } Sleep (0x05) Local0++ } HPEV () If ((HPEX == One)) { HPEX = Zero } HPME () If ((PMEX == One)) { PMEX = Zero } # this hard-coded 100ms sleep takes 107ms, bringing the total # delay for the _PS0 method to 1,707 ms Sleep (0x64) If (((TUID == Zero) || (TUID == One))) { If ((\_SB.PCI0.TDM0.WACT == One)) { \_SB.PCI0.TDM0.WACT = 0x02 \_SB.PCI0.TDM0.WFCC (ITCT) \_SB.PCI0.TDM0.WACT = Zero } ElseIf ((\_SB.PCI0.TDM0.WACT == 0x02)) { While ((\_SB.PCI0.TDM0.WACT != Zero)) { Sleep (0x05) } } } In summary, the _PS0 and the _ON delays are hard-coded and the only way to not suffer from them with this AML appears to be to not invoke _PS0 and _ON.
The Dell 9310 (TGL) has the same 348ms TBT0._ON delay as the 9300. The AML looks the same, though the 9310 adds debug statements, which make the code easier to understand: Scope (\_SB.PC00) ... If (ITRT) { PowerResource (TBT0, 0x05, 0x0001) { Method (_STA, 0, NotSerialized) // _STA: Status { ADBG (Concatenate ("DMA0 D3C STAT -", ToHexString (\_SB.PC00.TDM0.STAT))) Return (\_SB.PC00.TDM0.STAT) } Method (_ON, 0, NotSerialized) // _ON_: Power On { ADBG ("TBT0 Group ON Start") If (ITIM) { TG0N () } Else { ADBG ("TBT IMR is not valid") } ADBG ("TBT0 Group ON End") } ... Method (TG0N, 0, NotSerialized) { ADBG ("TG0N Start") ADBG (Concatenate ("DMA0 VDID -", ToHexString (\_SB.PC00.TDM0.VDID))) ADBG (Concatenate ("DMA0 PMST -", ToHexString (\_SB.PC00.TDM0.PMST))) ADBG (Concatenate ("DMA0 PMEE -", ToHexString (\_SB.PC00.TDM0.PMEE))) ADBG (Concatenate ("DMA0 PMES -", ToHexString (\_SB.PC00.TDM0.PMES))) ADBG (Concatenate ("DMA0 STAT -", ToHexString (\_SB.PC00.TDM0.STAT))) If ((\_SB.PC00.TDM0.VDID != 0xFFFFFFFF)) { If ((\_SB.PC00.TDM0.STAT == Zero)) { \_SB.PC00.TDM0.D3CX () ADBG ("Let\'s bring TBT RPs out of D3Cold") If (CondRefOf (\_SB.PC01)) { If ((\_SB.PC01.TRP0.VDID != 0xFFFFFFFF)) { \_SB.PC01.TRP0.D3CX () } If ((\_SB.PC01.TRP1.VDID != 0xFFFFFFFF)) { \_SB.PC01.TRP1.D3CX () } } Else { If ((\_SB.PC00.TRP0.VDID != 0xFFFFFFFF)) { \_SB.PC00.TRP0.D3CX () } If ((\_SB.PC00.TRP1.VDID != 0xFFFFFFFF)) { \_SB.PC00.TRP1.D3CX () Local0 = One } } If ((\_SB.PC00.TDM0.ALCT == One)) { If ((CTP0 == One)) { If ((ITCT != Zero)) { \_SB.PC00.TDM0.CNTP () \_SB.PC00.TDM0.WACT = One CTP0 = Zero } } \_SB.PC00.TDM0.ALCT = Zero } } Else { ADBG ("Drop TG0N due to it is already exit D3 cold") } # same sleep as on the 9300: Sleep (ITRE) } ADBG ("TG0N End") }
The Dell 9310 AML is very similar to the 9300, except the hard-coded 1600 ms delay at the start of _PS0 is simply missing. As a result _PS0 endures only a 100ms delay, rather than 1,600 + 100ms. ... Scope (\_SB.PC00) { Device (TRP0) { ... Method (_PS0, 0, Serialized) // _PS0: Power State 0 { # the 1,600ms "EVT0 loop" from the 9300's _PS0 is simply missing on the 9310! ADBG (Concatenate ("TBT _PS0 Start RP ", ToHexString (TUID))) ADBG (Concatenate ("TBT RP VDID -", ToHexString (VDID))) HPEV () If ((HPEX == One)) { ADBG ("Disable Hot Plug SCI") HPEX = Zero } HPME () If ((PMEX == One)) { ADBG ("Disable PME SCI") PMEX = Zero } # the 100ms delay from the 9300 _PS0 is still present Sleep (0x64) If (((TUID == Zero) || (TUID == One))) { If ((\_SB.PC00.TDM0.WACT == One)) { \_SB.PC00.TDM0.WACT = 0x02 \_SB.PC00.TDM0.WFCC (ITCT) \_SB.PC00.TDM0.WACT = Zero } ElseIf ((\_SB.PC00.TDM0.WACT == 0x02)) { ADBG ("Wait until other _PS0 get response") While ((\_SB.PC00.TDM0.WACT != Zero)) { Sleep (0x05) } ADBG ("Other _PS0 got response") } } ElseIf ((\_SB.PC00.TDM1.WACT == One)) { \_SB.PC00.TDM1.WACT = 0x02 \_SB.PC00.TDM1.WFCC (ITCT) \_SB.PC00.TDM1.WACT = Zero } ElseIf ((\_SB.PC00.TDM1.WACT == 0x02)) { ADBG ("Wait until other _PS0 get response") While ((\_SB.PC00.TDM1.WACT != Zero)) { Sleep (0x05) } ADBG ("Other _PS0 got response") } ADBG (Concatenate ("TBT _PS0 End RP ", ToHexString (TUID))) } ... # HPEV is annotated in this ASL to mean "HotPlug Event" Method (HPEV, 0, Serialized) { If (((VDID != 0xFFFFFFFF) && HPSX)) { ADBG (Concatenate ("HotPlug Event Start for ITBT Port - ", ToHexString (TUID))) If (((PDCX == One) && (DLSC == One))) { PDCX = One HPSX = One Notify (^, Zero) // Bus Check } Else { HPSX = One } ADBG (Concatenate ("HotPlug Event End for ITBT Port - ", ToHexString (TUID))) } } ... Method (HPME, 0, Serialized) { ADBG ("TBT HPME") If (((VDID != 0xFFFFFFFF) && (PMSX == One))) { ADBG ("TBT RP exists") Notify (PXSX, 0x02) // Device Wake PMSX = One PSPX = One Return (One) } Return (Zero) }
Created attachment 306771 [details] Dell 7390 ICL acpidump
Linux-6.12 reduces msleep() bloat by 1 jiffy: 4381b895f544 ("timers: Remove historical extra jiffie for timeout in msleep()") This reduces Dell XPS-9300 resume from 2460 to 1943 ms (21%) After that change, replacing msleep() with usleep_range() in acpi_os_sleep() reduces resume from 1943 to 1127ms (42%).