Bug 201965

Summary: Power off / Shutdown hangs after echoing "Reboot: Power down" on DELL / WYSE Thin Client - AMD G-T56N
Product: ACPI Reporter: Sebastian (sebastian486)
Component: Power-OffAssignee: acpi_power-off
Status: NEW ---    
Severity: normal CC: adamoa, biergaizi2009, lenb, marcos, sawbona
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 Subsystem:
Regression: No Bisected commit-id:

Description Sebastian 2018-12-12 03:09:34 UTC
Overview:
I run an up-to-date Debian 9 / Stretch (amd64) on a DELL / WYSE Thin Client Z90D7 with AMD G-T56N CPU (1.65GHz), 4GB RAM and 16GB SSD. Except for keyboard and mouse (both USB) no custom / additional hardware is used. 

The system runs fine except for a power-off issue:

The kernel obviously is unable to power-off the machine. The OS shuts down and everything looks fine. Even USB seems to be shutdown as keyboard and optical mouse turn off and don't react anymore.

The last line printed is:
"Reboot: Power down"

Then the screens stays that way indefinitely. The device stays powered on and I have to "cold-power-off" by pressing the power button for a few seconds. 

Steps to reproduce:
/sbin/poweroff -p

Actual Results:
System hangs after printing "Reboot: power down" (while still showing that console output). Then it must be powered off manually by pressing the power button of the device for a few seconds.

Expected Results:
The system is powered off automatically without having to force a device power-off using the power button.

Unsuccessfull solution attempts: 

I have tried different solution approaches but nothing worked: 

A) Kernel boot parameters:
- acpi=force
- acpi=noirq
- noapic irqpoll
- reboot=pci
- reboot=acpi
- reboot=bios
- reboot=efi
- reboot=hard,acpi,force

B) Switching to APM shutdown (kernel parameter + /etc/modules setting "apm power_off=1")

C) Add a custom init script run at shutdown to rmmod the snd_hda_intel before shutdown

Additional information:

/proc/cpuinfo
vendor_id       : AuthenticAMD
cpu family      : 20
model           : 2
model name      : AMD G-T56N Processor
stepping        : 0
microcode       : 0x5000101
cpu MHz         : 1646.542
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt hw_pstate vmmcall arat npt lbrv svm_lock nrip_save pausefilter
bugs            : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips        : 3293.08
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

ACPI related dmesg entries:
dmesg | grep acpi
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
[    0.263854] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.307440] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.308317] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME AER PCIeCapability]
[    0.308333] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration
[    0.309444] acpi PNP0A08:00: ignoring host bridge window [mem 0x000ce000-0x000cffff window] (conflicts with Video ROM [mem 0x000c0000-0x000ce1ff])
[    0.412327] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    4.341600] acpi device:2d: registered as cooling_device6

Kernel version:
4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64
Comment 1 Sebastian 2018-12-12 03:20:32 UTC
Output of acpitool -e:

  Kernel version : 4.9.0-8-amd64   -    ACPI version : 20160831
  -----------------------------------------------------------
  Battery status : <not available>

  AC adapter     : online
  Fan            : <not available>

  CPU type               : AMD G-T56N Processor
  CPU speed              : 0x5000101 MHz
  Cache size             : 1646.542 KB
  Bogomips               : 3293.08
  Bogomips               : 3293.08
  Function Show_CPU_Info : could not read directory /proc/acpi/processor/
  Make sure your kernel has ACPI processor support enabled.

  Thermal info   : <not available>

   Device       S-state   Status   Sysfs node
  ---------------------------------------
  1. PB4          S4    *disabled
  2. PB5          S4    *disabled
  3. PB6          S4    *disabled
  4. PB7          S4    *disabled
  5. OHC1         S4    *enabled   pci:0000:00:12.0
  6. EHC1         S4    *enabled   pci:0000:00:12.2
  7. OHC2         S4    *enabled   pci:0000:00:13.0
  8. EHC2         S4    *enabled   pci:0000:00:13.2
  9. OHC3         S4    *enabled   pci:0000:00:16.0
  10. EHC3        S4    *enabled   pci:0000:00:16.2
  11. OHC4        S4    *enabled   pci:0000:00:14.5
  12. SBAZ        S4    *disabled  pci:0000:00:14.2
  13. KBC0        S3    *enabled   pnp:00:02
  14. MSE0        S3    *disabled  pnp:00:03
  15. P2P         S5    *disabled  pci:0000:00:14.4
  16. SPB0        S4    *disabled  pci:0000:00:15.0
  17. SPB1        S0    *disabled  pci:0000:00:15.1
  18. SPB2        S4    *disabled  pci:0000:00:15.2
  19. SPB3        S4    *disabled
Comment 2 Tom Li 2019-02-05 05:58:21 UTC
It could be possibly a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=199349. See the latest comment here for a solution.
Comment 3 Julius Henry Marx 2019-03-14 14:45:06 UTC
I have what seems to be a similar issue as the one posted here by the OP.

This is on a Sun Microsystems Ultra 24 Workstation (BIOS v1.56) running under Linux devuan 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux and an Intel(R) Core(TM)2 Quad CPU Q9550.

Basically, on shutdown the machine will do one of two things:

1. shut down properly
2. freeze during the shutdown at this point ...

[code]
e1000e: EEE Tx LPI Timer
Preparing to enter sleep state S5
Reboot: Power Down
[/code]

... with the fans blowing at full speed.

Whenever it occurrs, the last line is before the freeze is the same:

[code]
Reboot: Power Down
[/code]

Unfortunately, I have not been able to reproduce it or link it to anything in particular, it just happens every so many shutdowns and the number can go from3 to 20 with no discernible (at least to me) pattern.

None of the /var/log files (syslog, kern.log, faillog, messages) show anything relevant, which is not a surprise as in that state (ie: Reboot: Power Down) filesystems are in a RO state.     

I have tried the same approaches as the OP ... 

1) various kernel boot parameters
2) a custom init script run at shutdown to rmmod the e1000e driver module

...but the issue eventually ocurrs again. 

I have also tried a shutdown script using /proc/sysrq-trigger to no avail: eventually it will also occurr and in the same manner.   

The problem woud seem to be distribution agnostic as it also happens in an emergency skeleton TCLinux installation (with an older kernel) that I boot from a USB memory stick and in a Mint installation I was using up to about a couple of years ago before I moved to Devuan.

Please advise or ask for addiitonal information if required.

Thanks in advance.

JHM
Comment 4 Adam Alves 2024-02-26 18:16:10 UTC
Hi, I had a similar problem for several months until I decided to focus on analyzing what it could be. I switched ON ACPI_DEBUG and I checked that the system hanged after the ACPI registers were set to sleep (after the "Reboot: Power down" message).

My motherboard has TPM configured but the ownership is of my Dual Boot Windows, when I disabled the TPM the problem vanished. It seems that my mainboard expects the TPM to be enable in order to successfuly power down, please check if this is the case for you.

Best,
Adam
Comment 5 Julius Henry Marx 2024-02-26 20:19:16 UTC
Hello:

Thank you very much for taking the time to write here about this.
Much obliged.

> ... similar problem ...
> ... focus on analyzing what it could be.
A thankless endeavour, no doubt.

> ... switched ON ACPI_DEBUG ...
Never thought of that, basically because it was unknown to me.

> ... system hanged after the ACPI registers were set to sleep ...
The problem with my box (Sun Ultra 24) is that I have never been able to reliably reproduce the fault.

It started when I formatted the original HDD and installed Linux on it.

Save for the ones that would make the box go absolutely berserk (needing to remove the battery and reset everything back) no change in BIOS settings made any difference whatsoever.

Eventually (from one to sixty days) it would rear its ugly head again.

In the end I have learned to live with this absolute POS BIOS that Sun Microsystems wrote into an otherwise very well built box, ahead of its time in many respects.

But I digress ...

The TPM settings in the BIOS have always been set to [Disabled] so I have taken up on your suggestion and changed the TPM settings to [Enabled] according to the instructions here:

https://docs.oracle.com/cd/E19591-01/html/E23171/z40013591297501.html
  
What I have not done is reset the TPM as it is [Unowned] as I fear some sort of 'retribution' if I fiddle with that.

The only difference is that I now have a TPM related entry in /var/log/dmesg:

[code]
 tpm tpm0: [Hardware Error]: Adjusting reported timeouts: A 750->750000us B 2000->2000000us C 750->750000us D 750->750000us
[/code]
  
I now have to wait and see if the fault occurrs again.
If and when it does, I will post here again.

Once again, thank you for taking the time to write here about this.

Best,

JHM
Comment 6 Julius Henry Marx 2024-02-28 12:19:10 UTC
Hello:

An update of sorts, but in maybe relevant.

> ... wait and see if the fault occurrs again.
Setting TPM to enabled has yet to produce the fault.
But this is with my box running Devuan Linux Beowulf with a backported kernel:

[code]
~$ uname -a
Linux devuan 5.10.0-0.deb10.16-amd64 #1 SMP Debian 5.10.127-2~bpo10+1 (2022-07-28) x86_64 GNU/Linux
~$
[/code]
ie: Debian 5.10.127 kernel 

Like I mentioned earlier, I have never found a way to reproduce it so it is a matter of waiting to see 'if and when' it crops up again. It has been that way since I installed Linux on this box.(2015)

But then, while testing a Linux Devuan 5.0.0 (Daedalus) Live from a USB stick the fault happened at shutdown, for the first time after enabling TPM in the BIOS.

[code]
~$ uname -a
Linux devuan 6.1.0-10-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.38-1 (2023-07-14) x86_64 GNU/Linux
~$
[/code]

Rather dissapointed at this being so, I went back to working on other things with my usual Beowulf (Debian 5.10 kernel) installation, booting and shutting down (as usual) a few times during a day's work without incident.

When finished, I went back to booting the USB stick to continue testing the latest Devuan Daedalus release and the fault occurred yet again.

TL;DR:
With the Debian 5.10.xxx (and all previous) kernels, the fault has always occurred albeit in the usual unpredictable/non-reproducible manner.

But with the Debian 6.1.38 kernel in Devuan Daedalus, if the TPM feature is enabled in the BIOS it happens *every* time.
 
The Sun Ultra 24 is ca.2007 hardware and the last available BIOS (1.56) is from late 2011 so I expect that the on-board 1.1/1.2 TPM [i]thing[/i] it carries is probably as much a POS as the board's BIOS itself.

Given my limited experience with all this, I can only speculate that whatever bugs exist within my mainboard's BIOS *may* have been partly worked around in Linux kernel v. 6.1.38.

ie: the shutdown problem solved, albeit not entirely, by disabling TPM in the BIOS.

But I will have to update my rig to Devuan Daedalus to find out and I'm not quite there yet.

I'd appreciate any comments on this.

Best,

JHM