Bug 218546 - xhci_hcd prevents hibernate/S4 suspend from working on several systems
Summary: xhci_hcd prevents hibernate/S4 suspend from working on several systems
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: USB (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Default virtual assignee for Drivers/USB
URL:
Keywords:
Depends on:
Blocks: 178231
  Show dependency tree
 
Reported: 2024-03-01 22:19 UTC by Todd Brandt
Modified: 2024-03-17 00:20 UTC (History)
1 user (show)

See Also:
Kernel Version: 6.8.0-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lenb-Dell-XPS-13-9300_disk.html (594.79 KB, text/html)
2024-03-01 22:19 UTC, Todd Brandt
Details
otcpl-dell-7390-cmlu_disk.html (403.85 KB, text/html)
2024-03-01 22:20 UTC, Todd Brandt
Details
otcpl-hp-spectre-tgl_disk.html (449.85 KB, text/html)
2024-03-01 22:20 UTC, Todd Brandt
Details
issue.def (402 bytes, text/plain)
2024-03-01 22:30 UTC, Todd Brandt
Details
callgraph-for-otcpl-hp-spectre-disk-usb3-xhci_hcd-fail.html (509.15 KB, text/html)
2024-03-11 16:53 UTC, Todd Brandt
Details
otcpl-hp-spectre-tgl_disk_dmesg.txt (48.21 KB, text/plain)
2024-03-11 20:13 UTC, Todd Brandt
Details

Description Todd Brandt 2024-03-01 22:19:49 UTC
Created attachment 305940 [details]
lenb-Dell-XPS-13-9300_disk.html

When running disk suspend on certain platforms the xhci_hcd USB3,0 hub controller device prevents hibernate by aborting suspend. The test is initiated with this command:

$> sudo sleepgraph -m disk-platform -rtcwake 60 -dev
This essentially performs these two commands to issue the suspend:
$> echo platform > /sys/power/disk
$> echo disk > /sys/pwoer/state

The dmesg log shows the error where suspends is aborted:

xhci_hcd 0000:00:14.0: PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x20 returns -16
xhci_hcd 0000:00:14.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -16
xhci_hcd 0000:00:14.0: PM: failed to freeze async: error -16
usb usb1: root hub lost power or was reset
usb usb2: root hub lost power or was reset
PM: hibernation: Basic memory bitmaps freed
PM: hibernation: hibernation exit

I'm attaching 3 sleepgraph timelines from three different production machines where this happened, so it's not endemic to just one manufacturer. The dmesg logs are in the timeline itself, just click the dmesg button in the upper left corner. The log button in the upper left corner shows all kinds of system details.
Comment 1 Todd Brandt 2024-03-01 22:20:27 UTC
Created attachment 305941 [details]
otcpl-dell-7390-cmlu_disk.html
Comment 2 Todd Brandt 2024-03-01 22:20:56 UTC
Created attachment 305942 [details]
otcpl-hp-spectre-tgl_disk.html
Comment 3 Todd Brandt 2024-03-01 22:30:13 UTC
Created attachment 305943 [details]
issue.def
Comment 4 Todd Brandt 2024-03-11 16:53:21 UTC
Created attachment 305977 [details]
callgraph-for-otcpl-hp-spectre-disk-usb3-xhci_hcd-fail.html

Includes a callgraph over the failing usb device showing what kernel source functions were called during the hibernate fail. Created with "sleepgraph -m disk-platform -rtcwake 60 -f". The dmesg error timestamp is 323.208. The ftrace time is the same and you can see the timestamps in the usb3 suspend callgraph. The error occurs around the "hrtimer_try_to_cancel" function.
Comment 5 Todd Brandt 2024-03-11 20:13:21 UTC
Created attachment 305978 [details]
otcpl-hp-spectre-tgl_disk_dmesg.txt

dmesg log with dynamic debug enabled for xhci_hcd when the error occured.
Comment 6 David McFarland 2024-03-15 21:02:51 UTC
I'm getting the same symbol on a Dell 7430, but I don't see the xhci_hcd errors you listed.  I used dynamic debug 'module xhci_hcd +pflm'.

It started occurring with:

0c4cae1bc00d PM: hibernate: Avoid missing wakeup events during hibernation

However that's presumably just exposing the underlying problem.
Comment 7 David McFarland 2024-03-15 21:03:34 UTC
Sorry, 'symbol' -> 'symptom'.
Comment 8 David McFarland 2024-03-17 00:20:51 UTC
I think I tracked mine down to this happening during hibernation:

[   37.511141] ACPI: PM: Waking up from system sleep state S4
[   37.529656] ACPI: button: The lid device is not compliant to SW_LID.
[   37.529657] ACPI: \_SB_.LID0: ACPI LID open

So it's probably unrelated.

Note You need to log in before you can comment on or make changes to this bug.