Bug 217720

Summary: System powers down when running particular programs
Product: ACPI Reporter: Schrodinger Yifan ZHU (yifanzhu)
Component: Power-ProcessorAssignee: acpi_power-processor
Status: RESOLVED INVALID    
Severity: high CC: yifanzhu
Priority: P3    
Hardware: AMD   
OS: Linux   
URL: https://github.com/SchrodingerZhu/paguroidea/tree/acpi-cpupower-bug
Kernel Version: 6.4.3-1-cachyos Subsystem:
Regression: No Bisected commit-id:

Description Schrodinger Yifan ZHU 2023-07-27 17:24:06 UTC
I have a server with AMD EPYC 7773X 64-Core Processor (ucode: 20230625.ee91452d-5) running on supermicro h12ssl-i. 

I mistakenly enabled the ACPI-based driver instead of amd-pstate. It turns out that, when running a particular program (the source code is at https://github.com/SchrodingerZhu/paguroidea/tree/acpi-cpupower-bug and I invoked `cargo bench`), the whole system powers down immediately (the video output is cut; the power LED is down; the BMC seems to be alive as the fan is still running but I cannot get any log from SEL). I reproduced this under both schedutil and performance governor (others are not tested).

I switched to amd_pstate=guided now, and the problem is gone. So I think this should be a problem with the ACPI driver.
Comment 1 Artem S. Tashkinov 2023-07-29 11:45:53 UTC
The ACPI driver is extremely unlikely to cause this.

This looks more like a HW error, either in your CPU or motherboard.