Bug 219762 - [regression] power management broke in 6.12 for ECS BAT-I(1.2)
Summary: [regression] power management broke in 6.12 for ECS BAT-I(1.2)
Status: NEEDINFO
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: acpi_power-sleep-wake
URL: https://www.ecs.com.tw/en/Product/Mot...
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-08 16:12 UTC by David Flater
Modified: 2025-02-21 15:03 UTC (History)
0 users

See Also:
Kernel Version: >= 6.12
Subsystem:
Regression: Yes
Bisected commit-id: 948ce83fbb7df85bc930a5c0d6b133481be05c0b


Attachments

Description David Flater 2025-02-08 16:12:56 UTC
Affected hardware and firmware:
ECS BAT-I(1.2)/J2900 (mini-ITX with embedded CPU)
DMI: ECS BAT-I/BAT-I, BIOS 5.6.5 09/18/2018

Misbehavior:
Since kernel 6.12, neither S3 nor S5 work anymore.  For S5, final shutdown message is printed but the PC does not turn off.  For S3, the display (integrated graphics) goes black, but the case fan keeps spinning, the power LED remains solid not blinking, and wakeup events have no effect.  Nothing useful found in log after power cycle.  Problem persists in kernel 6.13.1.  Reverting to kernel 6.11 immediately cures the problem.
Comment 1 Artem S. Tashkinov 2025-02-09 21:19:57 UTC
Please bisect.

https://docs.kernel.org/admin-guide/bug-bisect.html
Comment 2 David Flater 2025-02-15 15:48:35 UTC
I bisected by testing whether shutdown would shut down and got a wrong answer:
    c26cee817f8bd9a22bfade20f739ec2fc6f20221 is the first bad commit
    commit c26cee817f8bd9a22bfade20f739ec2fc6f20221
    Date:   Sat Aug 10 20:00:05 2024 -0400
    usb: gadget: f_fs: add capability for dfu functional descriptor

The patch to revert that commit applied cleanly to the latest (a64dcfb451e254085a7daee5fe51bf22959d52d3) except for comments, but the resulting kernel could not shut down.

Theories:

1. The failure is not 100% and some kernel with the fault accidentally succeeded during bisection.

2. The following error, which appeared after two of the failed shutdowns, indicated a second mode of failure that ruined the bisection:
    systemd-shutdown[1]: Failed to enumerate /proc/: Invalid argument.

I will keep poking at it.
Comment 3 David Flater 2025-02-18 22:17:27 UTC
Second try with different starting points, I landed on
first bad commit: [948ce83fbb7df85bc930a5c0d6b133481be05c0b] xhci: Add USB4 tunnel detection for USB3 devices on Intel hosts

It's in that big block of USB changes again.  Revert patch does not apply cleanly.
Comment 4 David Flater 2025-02-19 22:41:16 UTC
I checked out v6.12 ("bad"), copied in the config used for testing (which was based on an Arch kernel), stripped the USB device drivers config down to the minimum needed for keyboard and mouse on the PC in question, built and tested it.

It totally worked.  Shut down clean, suspend/resume clean.

So, now I'm narrowing down USB device drivers rather than kernel versions.
Comment 5 David Flater 2025-02-20 11:29:38 UTC
v6.12 and v6.14-rc3 are both cured by removing CONFIG_USB_XHCI_HCD.  The result of the second bisection is looking more credible.

Relevant settings in AMI setup that have remained constant through this testing:
All USB Devices = Enabled
Legacy USB Support = Enabled
XHCI Mode = Auto (choices are Enabled, Auto, and Smart Auto)
Comment 6 David Flater 2025-02-21 15:03:01 UTC
Patching xhci-hub.c xhci_port_is_tunneled in 6.14-rc3 to always return USB_LINK_NATIVE did not fix it.

Changing XHCI Mode from Auto to Enabled in AMI setup did fix it.

Note You need to log in before you can comment on or make changes to this bug.