Bug 218546

Summary: xhci_hcd prevents hibernate/S4 suspend from working on several systems
Product: Drivers Reporter: Todd Brandt (todd.e.brandt)
Component: USBAssignee: Default virtual assignee for Drivers/USB (drivers_usb)
Status: NEW ---    
Severity: normal CC: corngood
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 6.8.0-rc5 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 178231    
Attachments: lenb-Dell-XPS-13-9300_disk.html
otcpl-dell-7390-cmlu_disk.html
otcpl-hp-spectre-tgl_disk.html
issue.def
callgraph-for-otcpl-hp-spectre-disk-usb3-xhci_hcd-fail.html
otcpl-hp-spectre-tgl_disk_dmesg.txt

Description Todd Brandt 2024-03-01 22:19:49 UTC
Created attachment 305940 [details]
lenb-Dell-XPS-13-9300_disk.html

When running disk suspend on certain platforms the xhci_hcd USB3,0 hub controller device prevents hibernate by aborting suspend. The test is initiated with this command:

$> sudo sleepgraph -m disk-platform -rtcwake 60 -dev
This essentially performs these two commands to issue the suspend:
$> echo platform > /sys/power/disk
$> echo disk > /sys/pwoer/state

The dmesg log shows the error where suspends is aborted:

xhci_hcd 0000:00:14.0: PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x20 returns -16
xhci_hcd 0000:00:14.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -16
xhci_hcd 0000:00:14.0: PM: failed to freeze async: error -16
usb usb1: root hub lost power or was reset
usb usb2: root hub lost power or was reset
PM: hibernation: Basic memory bitmaps freed
PM: hibernation: hibernation exit

I'm attaching 3 sleepgraph timelines from three different production machines where this happened, so it's not endemic to just one manufacturer. The dmesg logs are in the timeline itself, just click the dmesg button in the upper left corner. The log button in the upper left corner shows all kinds of system details.
Comment 1 Todd Brandt 2024-03-01 22:20:27 UTC
Created attachment 305941 [details]
otcpl-dell-7390-cmlu_disk.html
Comment 2 Todd Brandt 2024-03-01 22:20:56 UTC
Created attachment 305942 [details]
otcpl-hp-spectre-tgl_disk.html
Comment 3 Todd Brandt 2024-03-01 22:30:13 UTC
Created attachment 305943 [details]
issue.def
Comment 4 Todd Brandt 2024-03-11 16:53:21 UTC
Created attachment 305977 [details]
callgraph-for-otcpl-hp-spectre-disk-usb3-xhci_hcd-fail.html

Includes a callgraph over the failing usb device showing what kernel source functions were called during the hibernate fail. Created with "sleepgraph -m disk-platform -rtcwake 60 -f". The dmesg error timestamp is 323.208. The ftrace time is the same and you can see the timestamps in the usb3 suspend callgraph. The error occurs around the "hrtimer_try_to_cancel" function.
Comment 5 Todd Brandt 2024-03-11 20:13:21 UTC
Created attachment 305978 [details]
otcpl-hp-spectre-tgl_disk_dmesg.txt

dmesg log with dynamic debug enabled for xhci_hcd when the error occured.
Comment 6 David McFarland 2024-03-15 21:02:51 UTC
I'm getting the same symbol on a Dell 7430, but I don't see the xhci_hcd errors you listed.  I used dynamic debug 'module xhci_hcd +pflm'.

It started occurring with:

0c4cae1bc00d PM: hibernate: Avoid missing wakeup events during hibernation

However that's presumably just exposing the underlying problem.
Comment 7 David McFarland 2024-03-15 21:03:34 UTC
Sorry, 'symbol' -> 'symptom'.
Comment 8 David McFarland 2024-03-17 00:20:51 UTC
I think I tracked mine down to this happening during hibernation:

[   37.511141] ACPI: PM: Waking up from system sleep state S4
[   37.529656] ACPI: button: The lid device is not compliant to SW_LID.
[   37.529657] ACPI: \_SB_.LID0: ACPI LID open

So it's probably unrelated.