Affected hardware and firmware: ECS BAT-I(1.2)/J2900 (mini-ITX with embedded CPU) DMI: ECS BAT-I/BAT-I, BIOS 5.6.5 09/18/2018 Misbehavior: Since kernel 6.12, neither S3 nor S5 work anymore. For S5, final shutdown message is printed but the PC does not turn off. For S3, the display (integrated graphics) goes black, but the case fan keeps spinning, the power LED remains solid not blinking, and wakeup events have no effect. Nothing useful found in log after power cycle. Problem persists in kernel 6.13.1. Reverting to kernel 6.11 immediately cures the problem.
Please bisect. https://docs.kernel.org/admin-guide/bug-bisect.html
I bisected by testing whether shutdown would shut down and got a wrong answer: c26cee817f8bd9a22bfade20f739ec2fc6f20221 is the first bad commit commit c26cee817f8bd9a22bfade20f739ec2fc6f20221 Date: Sat Aug 10 20:00:05 2024 -0400 usb: gadget: f_fs: add capability for dfu functional descriptor The patch to revert that commit applied cleanly to the latest (a64dcfb451e254085a7daee5fe51bf22959d52d3) except for comments, but the resulting kernel could not shut down. Theories: 1. The failure is not 100% and some kernel with the fault accidentally succeeded during bisection. 2. The following error, which appeared after two of the failed shutdowns, indicated a second mode of failure that ruined the bisection: systemd-shutdown[1]: Failed to enumerate /proc/: Invalid argument. I will keep poking at it.
Second try with different starting points, I landed on first bad commit: [948ce83fbb7df85bc930a5c0d6b133481be05c0b] xhci: Add USB4 tunnel detection for USB3 devices on Intel hosts It's in that big block of USB changes again. Revert patch does not apply cleanly.
I checked out v6.12 ("bad"), copied in the config used for testing (which was based on an Arch kernel), stripped the USB device drivers config down to the minimum needed for keyboard and mouse on the PC in question, built and tested it. It totally worked. Shut down clean, suspend/resume clean. So, now I'm narrowing down USB device drivers rather than kernel versions.
v6.12 and v6.14-rc3 are both cured by removing CONFIG_USB_XHCI_HCD. The result of the second bisection is looking more credible. Relevant settings in AMI setup that have remained constant through this testing: All USB Devices = Enabled Legacy USB Support = Enabled XHCI Mode = Auto (choices are Enabled, Auto, and Smart Auto)
Patching xhci-hub.c xhci_port_is_tunneled in 6.14-rc3 to always return USB_LINK_NATIVE did not fix it. Changing XHCI Mode from Auto to Enabled in AMI setup did fix it.