After my comments on bug #216728, Mika Westerberg suggested I raise this new bug. I have a Dell XPS13 laptop and Dell WD19TB Thunderbolt dock and normally work on two dock connected screens + keyboard + mouse with the laptop screen closed. If my laptop suspends in this state, and I then unplug the dock (e.g. to take the laptop away in the morning) then the screen stays blank when I try to resume. This started when Arch Linux updated the kernel from 6.3.9 (good) to 6.4.1 (bad) and I have tested it bad with every point release of 6.4 since up to 6.4.12 and is also bad on current 6.5.3. I have generally been using the LTS kernel (currently 6.1.53) to avoid this bug. If you wait about 60 to 70 secs then the screen does switch on. On 6.4 kernels I had to wait about 2.5 mins, i.e. this delay has reduced on 6.5 kernels.
Mika, in bug #216728, you were concerned that I am using "mem_sleep_default=deep" option but please note the very first thing I removed (and have repeated a few times including today) to confirm that the problem still occurs. I.e. that option is irrelevant to this issue. Note, I have used that option for the 3 years I have had the laptop because if I don't then my laptop battery will die overnight when simply suspending. I will remove it, enable thunderbolt.dyndbg=+p, and then will capture a new dmesg output (compared to what I did for bug #216728) to attach here. Note that I use fwupd often to ensure my BIOS and dock etc are always running the latest firmware. Dell updates both quite frequently and I am always running the latest. It was mentioned in that email thread at https://www.spinics.net/lists/linux-pci/msg142902.html that changing the BIOS thunderbird setting to "No security" helps alleviate this issue but I don't have any security options like that in my BIOS.
Created attachment 305117 [details] dmesg output with thunderbolt.dyndbg=+p
Note the sequence I did to capture that dmesg output was: 1. Added thunderbolt.dyndbg=+p to boot and rebooted. 2. Have laptop lid closed and working with 2 screens + keyboard + mouse on dock. 3. Suspended my laptop. 4. Unplugged dock. 5. Opened lid on laptop and witnessed the 60s blank screen delay. 6. Captured the dmesg output and attached here. Unlike the dmesg capture I did for bug #216728, this time I did not reconnect the dock because I wish to point out clearly that this must be a bug in Linux given that at the time the bug occurs (i.e. the screen does not recover after resume) the dock is physically disconnected. I should also reiterate on this bug that user Matt L on the original bug has a Dell XPS laptop and thunderbolt dock and reports this exact same issue as me so have CC him here.
Created attachment 305118 [details] dmesg output with CONFIG_PCI_DEBUG and thunderbolt.dyndbg=+p
Recompiled Arch kernel 6.5.3.arch1-1 with CONFIG_PCI_DEBUG=y and then repeated above steps to create the attached dmesg output.
Since you've already identified two relatively close targets of 6.3.9 to 6.4.1 would you be able to bisect between?
Mario, we already know that commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e8b908146d44310473e43b3382eca126e12d279c caused this issue but Mika says that change (which is merely a timeout increase from 1 sec to 60 sec) is valid and has merely exposed this latent problem. What I don't get is that this bug occurs after the Thunderbolt dock is disconnected and the system is then resumed so surely it has got to be a straight out logical error that the kernel sits around for 60 secs waiting for a device that is no longer connected?
Created attachment 305120 [details] Mark devices as disconnected if resume fails I can reproduce this myself now. The reason why it takes so long is indeed that we try to resume the devices that are behind the port whose link did not come up after suspend and since they are gone we end up waiting them for the ~60s each. Can you try the attached patch? On my system this makes the issue go away.
Mika, yes I applied that patch to Arch kernel 6.5.3 and it does fix the issue. So it seems this has nothing to do with Thunderbolt and would occur when any USB dock is disconnected while the laptop is suspended? I am surprised there were not heaps of users complaining about this issue since 6.4 and 6.5? Anyhow, thanks very much for your time here and rapid fix. At what next kernel version are we likely to see this fix included?
Hi there, I also commented on bug #216728 and can confirm that I am experiencing this bug. My hardware is a Dell 5430 Rugged plugging into a Dell WD22TB4. My Dell XPS 13 Plus exhibits this issue too when plugging into the same dock. I can also confirm that the proposed patch fixes the issue on both machines. Thanks, Matt
Thanks for testing! Yes, this can happen on any dock with PCIe devices (typically that's Thunderbolt/USB4). I'm also surprised that we did not see this because this is pretty common use case with laptops especially. Sorry about that. I've submitted the patch upstream now: https://lore.kernel.org/linux-pci/20230918053041.1018876-1-mika.westerberg@linux.intel.com/ It is up to the PCI maintainer to decide when it lands to mainline and the stable trees but in this case I would expect it to be sooner rather than later.
This bug can be closed since the patch above is now included in linux 6.5.7 which has been released on Arch.