Bug 187061

Summary: Poweroff and reboot hungs the PC with screen off and leds blinking on Latitude E7250
Product: ACPI Reporter: Gianpaolo (gianpaoloc)
Component: Power-OffAssignee: Ocean He (hehy1)
Status: CLOSED CODE_FIX    
Severity: normal CC: dennyvatwork, hector.jerezano, jmaibaum, lenb, rjw, rui.zhang, szg00000, tiwoc
Priority: P1    
Hardware: Intel   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=151631
Kernel Version: 4.9.0-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Screenshot just before hanging
dmesg of the last working kernel
dmesg from kernel 4.9.0-rc4
Screenshot before hanging if sata write caching is disabled
Result of bisection
Patch reverting commit 2c85025c75dfe7ddc2bb33363a998dad59383f94

Description Gianpaolo 2016-11-06 10:55:48 UTC
Created attachment 243731 [details]
Screenshot just before hanging

When doing a poweroff or a reboot the PC stops for a few seconds, apparently at the very end of the process, then it hungs. Screen is blank but leds (disk, power and wifi) blink.

I compile the vanilla kernel myself (and the attached files refers to my kernel) but the same problem affects all the kernels I tested, including the original 4.8.0 deloivered with debian SID, and kernel 4.8 delivered with Ubuntu 16.10.

I am attaching a screenshot of the console during the seconds of waiting just before the screen goes blank and the leds start blinking. I am also attaching dmesg result with the last running kernel (4.7.10) and with the latest 4.9.0-rc4 kernel.

Any help on how to further analyze the source of the bug is welcome.

Gianpaolo
Comment 1 Gianpaolo 2016-11-06 10:56:57 UTC
Created attachment 243741 [details]
dmesg of the last working kernel
Comment 2 Gianpaolo 2016-11-06 10:57:35 UTC
Created attachment 243751 [details]
dmesg from kernel 4.9.0-rc4
Comment 3 Gianpaolo 2016-11-06 20:10:50 UTC
I do not know if this is related, but if I issue the following commands before rebooting:

sync; hdparm -W0 /dev/sda; sleep 1; hdparm -F /dev/sda ; sleep 1 ; hdparm -f /dev/sda

then the reboot process advances a bit. I am attaching the new screenshot, which ends (just before the screen goes off and the leds start blinking) with a sata link error.
Comment 4 Gianpaolo 2016-11-06 20:12:03 UTC
Created attachment 243761 [details]
Screenshot before hanging if sata write caching is disabled
Comment 5 Gianpaolo 2016-11-06 20:28:16 UTC
Is this related
Comment 6 Gianpaolo 2016-11-07 18:40:16 UTC
Created attachment 243861 [details]
Result of bisection

I bisected the kernel to find the source of the problem. 

Actually, this is my first bisecting, so I hope I made everything right. I cloned from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git and started git bisect between v4.7.0 and v4.8-rc1.

This is the result. Apparently, the bad commit is: 2c85025c75dfe7ddc2bb33363a998dad59383f94 - ACPI: Execute _PTS before system reboot
Comment 7 Szőgyényi Gábor 2016-11-07 18:46:46 UTC
Can you manually undo this commit and build a kernel?
Is the problem still persist?
Comment 8 Gianpaolo 2016-11-08 08:09:29 UTC
Created attachment 243881 [details]
Patch reverting commit 2c85025c75dfe7ddc2bb33363a998dad59383f94

The attached patch reverts the commit mentioned above and it works for me. I tested it both with kernel v4.9-rc4 and with the latest stable v4.8.6. This seems to confirm the source of problem.
Comment 9 Daniele Viganò 2016-11-09 19:18:36 UTC
I'm also affected by this bug on a Dell Latitude E5450 with BIOS A13 running Fedora 24 with v4.8.6 (but it can be reproduced on F25 too).

I can confirm that disabling the cache before a reboot or using the patch mentioned above, on top of v4.8.6, fixes the issue.

https://bugzilla.redhat.com/show_bug.cgi?id=1393513
Comment 10 Daniele Viganò 2016-11-10 07:01:50 UTC
The full story of this patch breaking ACPI on reboots/shutdown is here: https://patchwork.kernel.org/patch/9041141/. Damn Lenovo.
Comment 11 Zhang Rui 2016-11-14 05:31:04 UTC
Hi, Ocean,
please take a look at this issue, which has been bisected to this commit
commit 2c85025c75dfe7ddc2bb33363a998dad59383f94
Author: Ocean He <hehy1@lenovo.com>
Date:   Mon Jun 27 14:50:16 2016 +0000

    ACPI: Execute _PTS before system reboot
    
    The _PTS control method is defined in the section 7.4.1 of acpi 6.0
    spec. The _PTS control method is executed by the OS during the sleep
    transition process for S1, S2, S3, S4, and for orderly S5 shutdown.
    
    The _PTS control method provides the BIOS a mechanism for performing
    some housekeeping, such as writing the sleep type value to the embedded
    controller, before entering the system sleeping state. Note that some
    Lenovo Server BIOS use this mechanism to detect reboot event and
    prompt user by popped dialog box.
    
    According to section 7.5 of acpi 6.0 spec, _PTS should run after _TTS.
    Add a _PTS evaulation to the existing _TTS reboot notifier and change
    the notifier name to reflect the fact that it's not for _TTS only any
    more.
    
    Signed-off-by: Ocean He <hehy1@lenovo.com>
    Signed-off-by: Nagananda Chumbalkar <nchumbalkar@lenovo.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Comment 12 Daniele Viganò 2016-11-18 11:22:12 UTC
Bug is affecting v4.8.7 too
Comment 13 Daniele Viganò 2016-11-18 12:23:22 UTC
Some more reports are available here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1594023

It seems also related to the installed SSD: almost every reporter has a Samsung 850 SSD drive.
Comment 14 Gianpaolo 2016-11-18 21:40:01 UTC
Commit 2c85025c75dfe7ddc2bb33363a998dad59383f94 is still included in kernel 4.8.8 and 4.9-rc5 and indeed both kernels are affected by the bug. 

Reverting the buggy commit fixes the bug on both kernels.
Comment 15 Rafael J. Wysocki 2016-11-21 13:24:34 UTC
On Friday, November 18, 2016 09:40:01 PM bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=187061
> 
> --- Comment #14 from Gianpaolo <gianpaoloc@gmail.com> ---
> Commit 2c85025c75dfe7ddc2bb33363a998dad59383f94 is still included in kernel
> 4.8.8 and 4.9-rc5 and indeed both kernels are affected by the bug. 
> 
> Reverting the buggy commit fixes the bug on both kernels.

OK, I'm queuing up a revert of commit 2c85025c75df for 4.9-rc7.
Comment 16 Daniel Seither 2016-11-28 10:08:50 UTC
I can confirm that 4.9-rc7 fixes the issue.

For the record: This bug also affected Dell Latitude E7450 (BIOS A13) with the Samsung 850 EVO mSATA drive (firmware version EMT41B6Q), tested with Fedora 24, Fedora 25 and Ubuntu 16.10. The blinking LEDs mean "A possible processor failure has occurred" (see http://www.dell.com/support/article/de/de/debsdt1/SLN155342/en#E-series_Diagnostic_LEDs).
Comment 17 Rafael J. Wysocki 2016-11-28 21:42:28 UTC
Commit 2c85025c75df was reverted in 4.9-rc7.
Comment 18 Daniele Viganò 2016-12-01 06:35:48 UTC
I confirm bug has been resolved for me in 4.9-rc7. Thanks for the quick support.
Comment 19 Jerezano 2016-12-01 21:33:53 UTC
I (In reply to Daniel Seither from comment #16)
> I can confirm that 4.9-rc7 fixes the issue.
> 
> For the record: This bug also affected Dell Latitude E7450 (BIOS A13) with
> the Samsung 850 EVO mSATA drive (firmware version EMT41B6Q), tested with
> Fedora 24, Fedora 25 and Ubuntu 16.10. The blinking LEDs mean "A possible
> processor failure has occurred" (see
> http://www.dell.com/support/article/de/de/debsdt1/SLN155342/en#E-
> series_Diagnostic_LEDs).

Daniel 
I also have a Dell E7450 same BIOS version with ubuntu 16.10 with the same results Kernel 4.8 has this BUG and with 4.9 rc7 bug is fix!

Apart from this bug do you have any problems with your setup? Do u have thermal issues ? On my if I do stress test temp reach around 100c and some times on high load also!

I have my grub with this options;

GRUB_CMDLINE_LINUX="reboot=b intremap=no_x2apic_optout acpi_osi=Linux initcall_blacklist=pcc_init"
Comment 20 Johannes Maibaum 2016-12-06 14:25:38 UTC
Could/will the revert of commit 2c85025c75dfe7ddc2bb33363a998dad59383f94 be backported to 4.8?

I had a similar problem with a magnetic disk since 4.8 came out in the Arch Linux repos that powered off too early during shutdown/reboot.  I did a test build of 4.8.12 with the revert-patch applied and it resolves the issue for me.