Bug 217915 - System fails to resume correctly after Thunderbolt dock disconnected while sleeping
Summary: System fails to resume correctly after Thunderbolt dock disconnected while sl...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-15 23:11 UTC by Mark Blakeney
Modified: 2023-10-10 23:23 UTC (History)
3 users (show)

See Also:
Kernel Version: 6.4, 6.5
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg output with thunderbolt.dyndbg=+p (27.49 KB, application/gzip)
2023-09-15 23:29 UTC, Mark Blakeney
Details
dmesg output with CONFIG_PCI_DEBUG and thunderbolt.dyndbg=+p (19.70 KB, application/gzip)
2023-09-16 01:22 UTC, Mark Blakeney
Details
Mark devices as disconnected if resume fails (807 bytes, patch)
2023-09-17 06:57 UTC, Mika Westerberg
Details | Diff

Description Mark Blakeney 2023-09-15 23:11:53 UTC
After my comments on bug #216728, Mika Westerberg suggested I raise this new bug.

I have a Dell XPS13 laptop and Dell WD19TB Thunderbolt dock and normally work on two dock connected screens + keyboard + mouse with the laptop screen closed. If my laptop suspends in this state, and I then unplug the dock (e.g. to take the laptop away in the morning) then the screen stays blank when I try to resume. This started when Arch Linux updated the kernel from 6.3.9 (good) to 6.4.1 (bad) and I have tested it bad with every point release of 6.4 since up to 6.4.12 and is also bad on current 6.5.3. I have generally been using the LTS kernel (currently 6.1.53) to avoid this bug.

If you wait about 60 to 70 secs then the screen does switch on. On 6.4 kernels I had to wait about 2.5 mins, i.e. this delay has reduced on 6.5 kernels.
Comment 1 Mark Blakeney 2023-09-15 23:23:53 UTC
Mika, in bug #216728, you were concerned that I am using "mem_sleep_default=deep" option but please note the very first thing I removed (and have repeated a few times including today) to confirm that the problem still occurs. I.e. that option is irrelevant to this issue. Note, I have used that option for the 3 years I have had the laptop because if I don't then my laptop battery will die overnight when simply suspending. I will remove it, enable thunderbolt.dyndbg=+p, and then will capture a new dmesg output (compared to what I did for bug #216728) to attach here.

Note that I use fwupd often to ensure my BIOS and dock etc are always running the latest firmware. Dell updates both quite frequently and I am always running the latest. It was mentioned in that email thread at https://www.spinics.net/lists/linux-pci/msg142902.html that changing the BIOS thunderbird setting to "No security" helps alleviate this issue but I don't have any security options like that in my BIOS.
Comment 2 Mark Blakeney 2023-09-15 23:29:12 UTC
Created attachment 305117 [details]
dmesg output with thunderbolt.dyndbg=+p
Comment 3 Mark Blakeney 2023-09-15 23:40:36 UTC
Note the sequence I did to capture that dmesg output was:

1. Added thunderbolt.dyndbg=+p to boot and rebooted.
2. Have laptop lid closed and working with 2 screens + keyboard + mouse on dock.
3. Suspended my laptop.
4. Unplugged dock.
5. Opened lid on laptop and witnessed the 60s blank screen delay.
6. Captured the dmesg output and attached here.

Unlike the dmesg capture I did for bug #216728, this time I did not reconnect the dock because I wish to point out clearly that this must be a bug in Linux given that at the time the bug occurs (i.e. the screen does not recover after resume) the dock is physically disconnected.

I should also reiterate on this bug that user Matt L on the original bug has a Dell XPS laptop and thunderbolt dock and reports this exact same issue as me so have CC him here.
Comment 4 Mark Blakeney 2023-09-16 01:22:12 UTC
Created attachment 305118 [details]
dmesg output with CONFIG_PCI_DEBUG and thunderbolt.dyndbg=+p
Comment 5 Mark Blakeney 2023-09-16 01:23:45 UTC
Recompiled Arch kernel 6.5.3.arch1-1 with CONFIG_PCI_DEBUG=y and then repeated above steps to create the attached dmesg output.
Comment 6 Mario Limonciello (AMD) 2023-09-16 17:14:42 UTC
Since you've already identified two relatively close targets of 6.3.9 to 6.4.1 would you be able to bisect between?
Comment 7 Mark Blakeney 2023-09-16 22:07:42 UTC
Mario, we already know that commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e8b908146d44310473e43b3382eca126e12d279c caused this issue but Mika says that change (which is merely a timeout increase from 1 sec to 60 sec) is valid and has merely exposed this latent problem.

What I don't get is that this bug occurs after the Thunderbolt dock is disconnected and the system is then resumed so surely it has got to be a straight out logical error that the kernel sits around for 60 secs waiting for a device that is no longer connected?
Comment 8 Mika Westerberg 2023-09-17 06:57:25 UTC
Created attachment 305120 [details]
Mark devices as disconnected if resume fails

I can reproduce this myself now. The reason why it takes so long is indeed that we try to resume the devices that are behind the port whose link did not come up after suspend and since they are gone we end up waiting them for the ~60s each.

Can you try the attached patch? On my system this makes the issue go away.
Comment 9 Mark Blakeney 2023-09-17 11:26:47 UTC
Mika, yes I applied that patch to Arch kernel 6.5.3 and it does fix the issue.

So it seems this has nothing to do with Thunderbolt and would occur when any USB dock is disconnected while the laptop is suspended? I am surprised there were not heaps of users complaining about this issue since 6.4 and 6.5? Anyhow, thanks very much for your time here and rapid fix.

At what next kernel version are we likely to see this fix included?
Comment 10 Matt L 2023-09-17 19:01:25 UTC
Hi there,

I also commented on bug #216728 and can confirm that I am experiencing this bug.  My hardware is a Dell 5430 Rugged plugging into a Dell WD22TB4. My Dell XPS 13 Plus exhibits this issue too when plugging into the same dock.

I can also confirm that the proposed patch fixes the issue on both machines.

Thanks,
Matt
Comment 11 Mika Westerberg 2023-09-18 05:36:24 UTC
Thanks for testing!

Yes, this can happen on any dock with PCIe devices (typically that's Thunderbolt/USB4). I'm also surprised that we did not see this because this is pretty common use case with laptops especially. Sorry about that.

I've submitted the patch upstream now:

https://lore.kernel.org/linux-pci/20230918053041.1018876-1-mika.westerberg@linux.intel.com/

It is up to the PCI maintainer to decide when it lands to mainline and the stable trees but in this case I would expect it to be sooner rather than later.
Comment 12 Mark Blakeney 2023-10-10 23:23:51 UTC
This bug can be closed since the patch above is now included in linux 6.5.7 which has been released on Arch.

Note You need to log in before you can comment on or make changes to this bug.