Bug 185521 - Bisected: bogus power button event triggered by commit e6cbd7f2efb4 ("mm, page_alloc: remove fair zone allocation policy"), results in continious suspend/resume cycles - HP Spectre Notebook
Summary: Bisected: bogus power button event triggered by commit e6cbd7f2efb4 ("mm, pag...
Status: CLOSED DOCUMENTED
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-28 18:38 UTC by Chris Murphy
Modified: 2017-01-24 01:35 UTC (History)
7 users (show)

See Also:
Kernel Version: 4.8
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
cpuinfo (1.11 KB, text/plain)
2016-10-28 18:38 UTC, Chris Murphy
Details
dmidecode (19.63 KB, text/plain)
2016-10-28 18:38 UTC, Chris Murphy
Details
lspci (11.21 KB, text/plain)
2016-10-28 18:38 UTC, Chris Murphy
Details
dmesg (172.27 KB, text/plain)
2016-10-28 21:45 UTC, Chris Murphy
Details
journal.log (580.69 KB, application/octet-stream)
2016-10-28 21:54 UTC, Chris Murphy
Details
/proc/acpi/wakeup (1.17 KB, text/plain)
2016-11-07 16:54 UTC, Chris Murphy
Details
dmesg 4.8.6 (180.70 KB, text/plain)
2016-11-07 16:57 UTC, Chris Murphy
Details
dmesg 4.8.7 using XHC (89.73 KB, text/x-log)
2016-11-14 05:41 UTC, Chris Murphy
Details
dmesg for kernel HEAD detached at e6cbd7f (160.64 KB, text/x-log)
2016-11-14 06:29 UTC, Chris Murphy
Details
acpidump 4.9-rc5 (929.48 KB, application/octet-stream)
2016-11-15 06:25 UTC, Chris Murphy
Details
dmesg acpi.debug_level=0x0400000F acpi.debug_layer=0x00000002 (1.94 MB, text/x-log)
2016-11-16 17:52 UTC, Chris Murphy
Details
dump gpe/fix event status/enable reisters after resumed back from s3 (2.86 KB, text/plain)
2016-11-17 09:08 UTC, Chen Yu
Details
dmesg with c36 patch (149.00 KB, text/x-log)
2016-11-18 06:18 UTC, Chris Murphy
Details
dmesg with c36 patch, extra info (209.34 KB, text/x-log)
2016-11-18 06:30 UTC, Chris Murphy
Details
[PATCH] ACPICA: Add new API to query wakeup reasons (11.96 KB, patch)
2016-12-23 07:57 UTC, Lv Zheng
Details | Diff
dmesg with c41 patch and c44 boot param (115.92 KB, text/plain)
2016-12-27 06:15 UTC, Chris Murphy
Details
[PATCH] Obtain wakeup GPE number via procfs (877 bytes, patch)
2017-01-16 04:36 UTC, Lv Zheng
Details | Diff
acpiwakeup c48 (1.40 KB, text/plain)
2017-01-17 01:16 UTC, Chris Murphy
Details
acpiwakeup c51 (1.43 KB, text/plain)
2017-01-17 03:37 UTC, Chris Murphy
Details
dmesg c63 (151.01 KB, text/plain)
2017-01-18 03:40 UTC, Chris Murphy
Details
[PATCH] Revert bisected culprit (7.05 KB, patch)
2017-01-18 06:29 UTC, Lv Zheng
Details | Diff
lsmod (7.10 KB, text/plain)
2017-01-18 07:05 UTC, Chris Murphy
Details
lsusb (245 bytes, text/plain)
2017-01-18 07:05 UTC, Chris Murphy
Details
make failure 252261 (3.04 KB, text/plain)
2017-01-18 07:48 UTC, Chris Murphy
Details
dmesg good for c70 (124.64 KB, text/plain)
2017-01-18 18:09 UTC, Chris Murphy
Details
dmesg c88 (77.34 KB, text/plain)
2017-01-20 18:14 UTC, Chris Murphy
Details
dmesg c88 2 (106.59 KB, text/plain)
2017-01-21 03:26 UTC, Chris Murphy
Details

Description Chris Murphy 2016-10-28 18:38:01 UTC
Summary:

HP Spectre (skylake i7-6500U) upon lid closure does not remain suspended, it wakes, sleeps, wakes, sleeps, until battery death or indefinitely if plugged in.

Detail:

First kernel with problem, 4.8.0-0.rc0.git3.1.fc25 Linux v4.7-6438-gc624c86
Last kernel without problem 4.8.0-0.rc0.git2.1.fc25 Linux v4.7-5906-g194dc87; also tested as far back as kernel 4.5.5 and none of those exhibit the problem, so it is a regression.

Switching to mainline source for bisection, git bisect says e6cbd7f2efb433d717af72aa8510a9db6f7a7e05 is the first bad commit.

$ git checkout e6cbd7f2efb433d717af72aa8510a9db6f7a7e05~
$ git status
HEAD detached at e5146b1

make, and test; this kernel doesn't have the problem.
Comment 1 Chris Murphy 2016-10-28 18:38:21 UTC
Created attachment 243011 [details]
cpuinfo
Comment 2 Chris Murphy 2016-10-28 18:38:41 UTC
Created attachment 243021 [details]
dmidecode
Comment 3 Chris Murphy 2016-10-28 18:38:54 UTC
Created attachment 243031 [details]
lspci
Comment 4 Chris Murphy 2016-10-28 21:45:56 UTC
Created attachment 243051 [details]
dmesg

Monotonic time. Lid is closed at 37.371478, remains closed through 313.847882. There are 10 wake then sleep cycles in between those times, which should not happen (and don't happen when the compiling before the commit bisect blames.)
Comment 5 Chris Murphy 2016-10-28 21:54:51 UTC
Created attachment 243061 [details]
journal.log

Real time.
Oct 28 12:13:08 f25h.localdomain systemd-logind[734]: Lid closed.
Oct 28 12:18:16 f25h.localdomain systemd-logind[734]: Lid opened.

I can't tell what the trigger is for resume. Once resumed, services go back to work for 30 seconds and then there's a trigger (I guess by systemd) to suspend again.
Comment 6 Chris Murphy 2016-10-29 01:19:20 UTC
Problem is present in 4.9.0-0.rc2.git2.1.fc26.x86_64, Linux v4.9-rc2-138-g14970f2.
Comment 7 Chen Yu 2016-11-07 05:35:30 UTC
(In reply to Chris Murphy from comment #4)
> Created attachment 243051 [details]
> dmesg
> 
> Monotonic time. Lid is closed at 37.371478, remains closed through
> 313.847882. There are 10 wake then sleep cycles in between those times,
> which should not happen (and don't happen when the compiling before the
> commit bisect blames.)

Could you please provide /proc/acpi/wakeup after bootup? Let's check what the possible wakeup source is? 
Not sure why this is related to
commit e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Thu Jul 28 15:46:50 2016 -0700

    mm, page_alloc: remove fair zone allocation policy

Add Mel here.
Comment 8 Chen Yu 2016-11-07 05:36:24 UTC
Also please append 'no_console_suspend ignore_loglevel' to get more info across suspend/resume
Comment 9 Chris Murphy 2016-11-07 16:54:36 UTC
Created attachment 243831 [details]
/proc/acpi/wakeup
Comment 10 Chris Murphy 2016-11-07 16:57:24 UTC
Created attachment 243841 [details]
dmesg 4.8.6

4.8.6-300.fc25.x86_64 with no_console_suspend ignore_loglevel
Computer was booted with these options, lid closed for ~13 minutes, opened, and then captured this dmesg attachment.
Comment 11 Zhang Rui 2016-11-14 05:03:17 UTC
please check if the problem still exists after "echo XHC > /proc/acpi/wakeup"
Comment 12 Zhang Rui 2016-11-14 05:07:00 UTC
(In reply to Chris Murphy from comment #0)
> Summary:
> 
> HP Spectre (skylake i7-6500U) upon lid closure does not remain suspended, it
> wakes, sleeps, wakes, sleeps, until battery death or indefinitely if plugged
> in.
> 
> Detail:
> 
> First kernel with problem, 4.8.0-0.rc0.git3.1.fc25 Linux v4.7-6438-gc624c86
> Last kernel without problem 4.8.0-0.rc0.git2.1.fc25 Linux
> v4.7-5906-g194dc87; also tested as far back as kernel 4.5.5 and none of
> those exhibit the problem, so it is a regression.
> 
> Switching to mainline source for bisection, git bisect says
> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05 is the first bad commit.
> 
> $ git checkout e6cbd7f2efb433d717af72aa8510a9db6f7a7e05~
> $ git status
> HEAD detached at e5146b1
> 
BTW, does the problem exist using HEAD e6cbd7f2efb433d717af72aa8510a9db6f7a7e05?
Comment 13 Chris Murphy 2016-11-14 05:41:07 UTC
Created attachment 244381 [details]
dmesg 4.8.7 using XHC

(In reply to Zhang Rui from comment #11)
> please check if the problem still exists after "echo XHC > /proc/acpi/wakeup"

Problem still exists.
Comment 14 Chris Murphy 2016-11-14 05:43:26 UTC
(In reply to Zhang Rui from comment #12)
> (In reply to Chris Murphy from comment #0)
> > Switching to mainline source for bisection, git bisect says
> > e6cbd7f2efb433d717af72aa8510a9db6f7a7e05 is the first bad commit.
> > 
> > $ git checkout e6cbd7f2efb433d717af72aa8510a9db6f7a7e05~
> > $ git status
> > HEAD detached at e5146b1
> > 
> BTW, does the problem exist using HEAD
> e6cbd7f2efb433d717af72aa8510a9db6f7a7e05?

Untested. Should I do
$ git checkout e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
$ make
And then test?
Comment 15 Chris Murphy 2016-11-14 06:29:04 UTC
Created attachment 244391 [details]
dmesg for kernel HEAD detached at e6cbd7f

Problem exists.
Comment 16 Chen Yu 2016-11-14 15:53:43 UTC
Please disable both XHC and PWRB
by 
echo XHC > /proc/acpi/wakeup
echo PWRB > /proc/acpi/wakeup
then 
echo mem > /sys/power/state
does it resume automatically or remain suspended?
Comment 17 Chris Murphy 2016-11-14 16:54:24 UTC
(In reply to Chen Yu from comment #16)
> Please disable both XHC and PWRB
> by 
> echo XHC > /proc/acpi/wakeup
> echo PWRB > /proc/acpi/wakeup
> then 
> echo mem > /sys/power/state
> does it resume automatically or remain suspended?

Remains suspended. Pressing spacebar, dmesg | grep CPU, there is one sleep wake cycle.

[ 3501.399896] Disabling non-boot CPUs ...
[ 3501.401324] smpboot: CPU 1 is now offline
[ 3501.413117] smpboot: CPU 2 is now offline
[ 3501.433046] smpboot: CPU 3 is now offline
[ 3501.456070] Enabling non-boot CPUs ...
Comment 18 Chris Murphy 2016-11-14 20:25:51 UTC
I've also tried it with lid and the problem doesn't happen, so it seems that either echo PWRB > /proc/acpi/wakeup or both echo PWRB > /proc/acpi/wakeup and echo XHC > /proc/acpi/wakeup make the difference.
Comment 19 Lv Zheng 2016-11-15 05:58:03 UTC
(In reply to Chen Yu from comment #7)
> (In reply to Chris Murphy from comment #4)
> > Created attachment 243051 [details]
> > dmesg
> > 
> > Monotonic time. Lid is closed at 37.371478, remains closed through
> > 313.847882. There are 10 wake then sleep cycles in between those times,
> > which should not happen (and don't happen when the compiling before the
> > commit bisect blames.)
> 
> Could you please provide /proc/acpi/wakeup after bootup? Let's check what
> the possible wakeup source is? 
> Not sure why this is related to
> commit e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
> Author: Mel Gorman <mgorman@techsingularity.net>
> Date:   Thu Jul 28 15:46:50 2016 -0700
> 
>     mm, page_alloc: remove fair zone allocation policy
> 
> Add Mel here.

Are you sure the problem can be avoided after reverting this commit on top of recent kernel?
Comment 20 Lv Zheng 2016-11-15 05:58:39 UTC
(In reply to Chris Murphy from comment #18)
> I've also tried it with lid and the problem doesn't happen, so it seems that
> either echo PWRB > /proc/acpi/wakeup or both echo PWRB > /proc/acpi/wakeup
> and echo XHC > /proc/acpi/wakeup make the difference.

It sounds the system is woken up the PWRB, right?
Comment 21 Lv Zheng 2016-11-15 06:00:17 UTC
Could someone upload an acpidump output here if it is related to PWRB?
Comment 22 Chris Murphy 2016-11-15 06:15:51 UTC
(In reply to Lv Zheng from comment #19) 
> Are you sure the problem can be avoided after reverting this commit on top
> of recent kernel?

Uncertain, it won't cleanly revert and I'm not skilled enough with git to make it happen.
[chris@f25h linux]$ git checkout tags/v4.9-rc5
HEAD is now at a25f094... Linux 4.9-rc5
[chris@f25h linux]$ git checkout -b 4.9rc5_reve6cbd7
Switched to a new branch '4.9rc5_reve6cbd7'
[chris@f25h linux]$ git revert e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
error: could not revert e6cbd7f... mm, page_alloc: remove fair zone allocation policy
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'

(In reply to Lv Zheng from comment #20)

> It sounds the system is woken up the PWRB, right?
That's asking me a conclusion I'm unqualified to answer. I have no idea what writing PWRB to /proc/acpi/wakeup is doing.

(In reply to Lv Zheng from comment #21)
> Could someone upload an acpidump output here if it is related to PWRB?

Yeah, should I reboot with 4.9-rc5 so it's all in a clean state first?
Comment 23 Chris Murphy 2016-11-15 06:25:10 UTC
Created attachment 244541 [details]
acpidump 4.9-rc5
Comment 24 Zhang Rui 2016-11-15 06:27:26 UTC
I think it is sufficient to say that the problem is introduced by commit e6cbd7f2efb433d717af72aa8510a9db6f7a7e05, if
   kernel does not have the problem with
      $ git checkout e6cbd7f2efb433d717af72aa8510a9db6f7a7e05~
   which you have already verified in comment #0
AND
   kernel indeed has the problem with
      $ git checkout e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
   which you have confirmed in comment #15

TBH, I can not explain why this commit would introduce this problem, but let's see what Mel says.
Comment 25 Zhang Rui 2016-11-15 06:28:15 UTC
BTW, it's always nice if you can double check if the problem still exists in the latest upstream kernel.
Comment 26 Mel Gorman 2016-11-15 10:42:44 UTC
(In reply to Zhang Rui from comment #24)
> TBH, I can not explain why this commit would introduce this problem, but
> let's see what Mel says.

Unfortunately, in itself it doesn't explain much. The patch alters the PFN of pages returned by the allocator to bias towards higher addresses but that's about it. It's unclear why suspend/resume would care if more ZONE_NORMAL pages were used than ZONE_DMA32.
Comment 27 Lv Zheng 2016-11-16 03:08:59 UTC
(In reply to Chris Murphy from comment #9)
> Created attachment 243831 [details]
> /proc/acpi/wakeup

I cannot see the GPE# from this file.
To Rui:
Is there any other way for us to obtain the _PRW result?

I checked the code:
DefinitionBlock ("dsdt.aml", "DSDT", 2, "HPQOEM", "SLIC-MPC", 0x00000000)
{
    OperationRegion (PNVA, SystemMemory, 0x3CF6DA18, 0x0204)
    Field (PNVA, AnyAcc, Lock, Preserve)
    {
        Offset (0x6C), 
        GPEM,   32, 
    }
    Scope (_SB)
    {
        Method (GGRP, 1, Serialized)
        {
            ShiftRight (And (Arg0, 0x00FF0000), 0x10, Local0)
            Return (Local0)
        }

        Method (GNMB, 1, Serialized)
        {
            Return (And (Arg0, 0xFFFF))
        }

        Method (GGPE, 1, NotSerialized)
        {
            Store (GGRP (Arg0), Local0)
            Store (GNMB (Arg0), Local1)
            Store (And (ShiftRight (GPEM, Multiply (Local0, 0x02)), 0x03
                ), Local2)
            If (LEqual (Local2, Zero))
            {
                Return (0x6F)
            }
            Else
            {
                Return (Add (Multiply (Subtract (Local2, One), 0x20), Local1
                    ))
            }
        }

        Device (PWRB)
        {
            Name (_HID, EisaId ("PNP0C0C"))  // _HID: Hardware ID
            Method (PKG2, 2, Serialized)
            {
                Name (PKG, Package (0x02)
                {
                    Zero, 
                    Zero
                })
                Store (Arg0, Index (PKG, Zero))
                Store (Arg1, Index (PKG, One))
                Return (PKG)
            }

            Method (_PRW, 0, NotSerialized)  // _PRW: Power Resources for Wake
            {
                Name (GPEB, Zero)
                Store (GGPE (0x02070003), GPEB)
                Return (PKG2 (GPEB, 0x04))
            }
        }
    }
}
It's not possible to obtain it remotely.

To Chris:
Is that possible for you to enable aml debugger and obtain \_SB.PWRB._PRW for us:
1. configure ACPI_DEBUGGER_USER to m (depends on DEBUG_FS, ACPI_DEBUG, ACPI_DEBUGGER)
2. rebuild and boot the new kernel
3. following the instruction in Documentation/acpi/aml-debugger.txt to load the debugger module
4. run acpidbg -b "ex \_SB.PWRB._PRW", and post the result here

Thanks in advance.
Comment 28 Lv Zheng 2016-11-16 03:11:56 UTC
(In reply to Mel Gorman from comment #26)
> (In reply to Zhang Rui from comment #24)
> > TBH, I can not explain why this commit would introduce this problem, but
> > let's see what Mel says.
> 
> Unfortunately, in itself it doesn't explain much. The patch alters the PFN
> of pages returned by the allocator to bias towards higher addresses but
> that's about it. It's unclear why suspend/resume would care if more
> ZONE_NORMAL pages were used than ZONE_DMA32.

To Mel:

I'm not sure if it's related:
Linux seems to have implemented ACPICA allocation OSLs in the following way:
static inline void *acpi_os_allocate(acpi_size size)
{
	return kmalloc(size, irqs_disabled()? GFP_ATOMIC : GFP_KERNEL);
}

static inline void *acpi_os_allocate_zeroed(acpi_size size)
{
	return kzalloc(size, irqs_disabled()? GFP_ATOMIC : GFP_KERNEL);
}
It looks the acpi_os_execute() is the only one using GFP_ATOMIC, and it will be invoked for GPE handlings, including wakeup GPE code.
What's your opinion?
Comment 30 Chris Murphy 2016-11-16 06:20:41 UTC
(In reply to Lv Zheng from comment #27)
> 4. run acpidbg -b "ex \_SB.PWRB._PRW", and post the result here

[chris@f25h ~]$ sudo ./linux/tools/power/acpi/tools/acpidbg/acpidbg -b "ex \_SB.PWRB._PRW"
Evaluating \_SB.PWRB._PRW
Evaluation of \_SB.PWRB._PRW returned object ffff892927160000, external buffer length 48
 [Package] Contains 2 Elements:
  [Integer] = 000000000000006F
  [Integer] = 0000000000000004
Comment 31 Lv Zheng 2016-11-16 07:06:37 UTC
To Chris:

So the wakeup GPE for the power button is gpe6f and the deepest sleep state is S4.
Next we should check the timing when the STS bit of GPE #6F is set.
We need 0x04000000 (ACPI_LV_IO) for acpi.debug_level and 0x00000002 (ACPI_HARDWARE) for acpi.debug_layer.
Possibly boot the kernel with:
acpi.debug_level=0x0400000F acpi.debug_layer=0x00000002
You need ACPI_DEBUG enabled (to enable the 2 acpi module parameters), and ACPI_DEBUGGER disabled (to avoid bugs).

With this kernel, please reproduce the issue and upload the dmesg for the problematic suspend/resume processes.
Hope we can find the GPE status read result in the dmesg output.

Thanks in advance.
Comment 32 Lv Zheng 2016-11-16 07:09:50 UTC
(In reply to Chris Murphy from comment #30)
> (In reply to Lv Zheng from comment #27)
> > 4. run acpidbg -b "ex \_SB.PWRB._PRW", and post the result here
> 
> [chris@f25h ~]$ sudo ./linux/tools/power/acpi/tools/acpidbg/acpidbg -b "ex
> \_SB.PWRB._PRW"
> Evaluating \_SB.PWRB._PRW
> Evaluation of \_SB.PWRB._PRW returned object ffff892927160000, external
> buffer length 48
>  [Package] Contains 2 Elements:
>   [Integer] = 000000000000006F
>   [Integer] = 0000000000000004

A persuasive example that the AML debugger is very useful for our remote support business.
Thanks! :)
Comment 33 Chris Murphy 2016-11-16 17:52:35 UTC
Created attachment 244811 [details]
dmesg acpi.debug_level=0x0400000F acpi.debug_layer=0x00000002

Looks like the dmesg buffer ran out of memory, as it's incomplete. So this is the output from 'sudo journalctl -b -k -o short-monotonic > journalkmsg.log' which ought to have all the same stuff.

Command line: BOOT_IMAGE=/vmlinuz-4.9.0-rc5 root=UUID=c45caf39-a048-4c44-90c9-535dc8003c71 rw rootflags=subvol=root elevator=noop LANG=en_US.UTF-8 acpi.debug_level=0x0400000F acpi.debug_layer=0x00000002 no_console_suspend ignore_loglevel


Monotonic times : description
238-712 : lid closed

712-1055 : unrelated (setup for next test)

1061-end  : initiate sleep a different way, via GNOME (it has a GUI power button icon that becomes a pause button if the user holds down alt, and mouse clicking that pause button makes it suspend to RAM); in this case it only goes through one sleep wake cycle, rather than repeating as it does with the lid closed.
Comment 34 Lv Zheng 2016-11-17 07:04:58 UTC
According to FADT:
[050h 0080   4]           GPE0 Block Address : 00001880
[054h 0084   4]           GPE1 Block Address : 00000000
[05Ch 0092   1]            GPE0 Block Length : 20
[05Dh 0093   1]            GPE1 Block Length : 00
[05Eh 0094   1]             GPE1 Base Offset : 10

GPE6F's STS register should be 0x188D, bit value should be 0x80.

I couldn't find this GPE flagged in the log (searching hw_read, 188D).

If S3 is triggered, I'm not sure if this GPE will be flagged after returning from BIOS.
Comment 35 Lv Zheng 2016-11-17 07:43:55 UTC
To: Yu
Maybe in drivers/acpi/sleep.c, after "Low-level resume complete".
We should add code to dump the fixed event status and GPE status between:
acpi_write_bit_register(ACPI_BITREG_SCI_ENABLE, 1)
and
acpi_leave_sleep_state_prep(acpi_state)
to obtain the wakeup reason.

To: Rui
What's your opinion?

Thanks
Comment 36 Chen Yu 2016-11-17 09:08:22 UTC
Created attachment 244861 [details]
dump gpe/fix event status/enable reisters after resumed back from s3

Trial patch to dump gpe/fix event status/enable registers after resumed back from s3. However I doubt if this make any sense, as we already know the wakeup source should be gpe6f, which is either connected to power button or lid.
Comment 37 Chen Yu 2016-11-18 04:44:44 UTC
Hi Christ, let's figure it out firstly what the wakeup source is. 
please apply this patch and rebuild kernel with CONFIG_ACPI_DEBUG,
then before suspend the system:
echo 0x08000000 > /sys/module/acpi/parameters/debug_level
echo 0x00000004 > /sys/module/acpi/parameters/debug_layer
and reproduce the issue,
after resumed back, provide dmesg log. thx
Comment 38 Chris Murphy 2016-11-18 06:18:29 UTC
Created attachment 245001 [details]
dmesg with c36 patch
Comment 39 Chris Murphy 2016-11-18 06:30:40 UTC
Created attachment 245011 [details]
dmesg with c36 patch, extra info

Comment 38, I closed the lid and left it closed for a little over a minute so it went through several cycles. This ends at time [  144.787238] when I open the lid.

Is there any chance that running GNOME for all of these tests affects the results? Let me know if I should boot runlevel 3 / multi-user.target instead of 5 / graphical.target, if that eliminates gnome-shell as a factor. I can use 'echo mem > /sys/power/state' to suspend.

This comment 39 is the same boot and dmesg as 38, but starting at monotonic time [  440.592226] I pushed the power button once. It's configured by default in gnome-shell to "suspend". The effect of the button being pushed goes through [  449.455185].

Next I then went to the gnome-shell utility pull down menu where there's a power button icon, clicked alt to make it turn to a pause icon, then clicked on that to make it suspend. That causes entries in the attached log starting at monotonic time [  754.322559]  until [  763.882067].

Maybe superfluous information but I've seen multilayered bugs before :D so maybe something else is also causing confusion.
Comment 40 Lv Zheng 2016-11-21 03:04:11 UTC
Interesting result here:
[   77.345674]   evevent-0233 ev_fixed_event_detect : Fixed Event Block: Enable 00000120 Status 00000001
[   77.345809]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 00-07: RunEnable=00, WakeEnable=00
[   77.345933]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 08-0F: RunEnable=00, WakeEnable=00
[   77.346055]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 10-17: RunEnable=00, WakeEnable=00
[   77.346177]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 18-1F: RunEnable=00, WakeEnable=00
[   77.346299]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 20-27: RunEnable=00, WakeEnable=00
[   77.346421]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 28-2F: RunEnable=00, WakeEnable=00
[   77.346542]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 30-37: RunEnable=00, WakeEnable=00
[   77.346663]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 38-3F: RunEnable=00, WakeEnable=00
[   77.346784]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 40-47: RunEnable=00, WakeEnable=00
[   77.346905]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 48-4F: RunEnable=00, WakeEnable=00
[   77.347041]     evgpe-0477 ev_gpe_detect         : Read registers for GPE 50-57: Status=00, Enable=01, RunEnable=01, WakeEnable=00
[   77.347173]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 58-5F: RunEnable=00, WakeEnable=00
[   77.347305]     evgpe-0477 ev_gpe_detect         : Read registers for GPE 60-67: Status=40, Enable=46, RunEnable=46, WakeEnable=00

Lv: Status & Enable is not 0 here.

[   77.347464]     evgpe-0477 ev_gpe_detect         : Read registers for GPE 68-6F: Status=00, Enable=80, RunEnable=80, WakeEnable=00
[   77.347659]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 70-77: RunEnable=00, WakeEnable=00
[   77.347845]     evgpe-0447 ev_gpe_detect         : Ignore disabled registers for GPE 78-7F: RunEnable=00, WakeEnable=00

It looks the GPE66, not GPE6F should be responsible for this issue...

Thanks
Lv
Comment 41 Lv Zheng 2016-12-23 07:57:10 UTC
Created attachment 248411 [details]
[PATCH] ACPICA: Add new API to query wakeup reasons

Hi,

Please try this patch, which captures SCI register enable/status values before suspending/after resuming, so that we can check if something wrong with the GPE management functionalities.

The boot parameter is:
  acpi.debug_level=0x80000000 acpi.debug_layer=0x00000006 log_buf_len=16M
You may need to enlarge log buffer according to your situation.

Please try a suspend/resume and upload the dmesg here.

Thanks and best regards
Lv
Comment 42 Chris Murphy 2016-12-23 19:39:42 UTC
Which config options to enable? DEBUG_FS, ACPI_DEBUG, ACPI_DEBUGGER? Or the ACPI_DEBUGGER_USER?
Comment 43 Chris Murphy 2016-12-23 21:44:26 UTC
Used 'git apply' to apply the patch, no errors, and checked some of the patched files and it looks like they've been applied. Built with DEBUG_FS, ACPI_DEBUG, ACPI_DEBUGGER enabled, but not ACPI_DEBUGGER_USER. Kernel command line is BOOT_IMAGE=/vmlinuz-4.9.0+ root=UUID=c45caf39-a048-4c44-90c9-535dc8003c71 ro rootflags=subvol=root elevator=noop i915.enable_guc_loading=-1 i915.enable_guc_submission=-1 acpi.debug_level=0x80000000 acpi.debug_layer=0x00000006 log_buf_len=16M no_console_suspend ignore_loglevel

But there are no debug messages at all in dmesg following suspend/resume.
Comment 44 Lv Zheng 2016-12-26 02:36:25 UTC
(In reply to Chris Murphy from comment #43)
> Used 'git apply' to apply the patch, no errors, and checked some of the
> patched files and it looks like they've been applied. Built with DEBUG_FS,
> ACPI_DEBUG, ACPI_DEBUGGER enabled, but not ACPI_DEBUGGER_USER. Kernel
> command line is BOOT_IMAGE=/vmlinuz-4.9.0+
> root=UUID=c45caf39-a048-4c44-90c9-535dc8003c71 ro rootflags=subvol=root
> elevator=noop i915.enable_guc_loading=-1 i915.enable_guc_submission=-1
> acpi.debug_level=0x80000000 acpi.debug_layer=0x00000006 log_buf_len=16M
> no_console_suspend ignore_loglevel

Should be CONFIG_ACPI_DEBUG=y.

> 
> But there are no debug messages at all in dmesg following suspend/resume.

Seems my debug_level is wrong, that's my fault.
Let me correct it:
acpi.debug_level=0x08000000 acpi.debug_layer=0x00000006 

Thanks
Lv
Comment 45 Chris Murphy 2016-12-27 06:15:18 UTC
Created attachment 248721 [details]
dmesg with c41 patch and c44 boot param

Pressed physical power button at ~48s monotonic time, laptop goes to sleep briefly then woke up.
Comment 46 Lv Zheng 2016-12-27 08:08:26 UTC
So this is what we are interested in:
[   53.072377]   hwsleep-0074 hw_detect_wakeup_event: Detecting wakup GPEs/Events before suspending    hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 00-07: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072416]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 08-0F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072431]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 10-17: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072450]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 18-1F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072465]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 20-27: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072484]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 28-2F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072502]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 30-37: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072521]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 38-3F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072535]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 40-47: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072554]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 48-4F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072568]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 50-57: Status=00, Enable=00, RunEnable=01, WakeEnable=00
[   53.072587]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 58-5F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072598]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 60-67: Status=00, Enable=00, RunEnable=46, WakeEnable=00
[   53.072609]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 68-6F: Status=00, Enable=A0, RunEnable=A0, WakeEnable=A0
[   53.072625]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 70-77: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072636]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 78-7F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072652]   evevent-0313 ev_save_wokenup_fixed_: Wakeup Fixed Event Block: Enable 00000120 Status 00000000
[   53.072165] ACPI: Low-level resume complete
[   53.072173]   hwsleep-0074 hw_detect_wakeup_event: Detecting wakup GPEs/Events after resuming    hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 00-07: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072192]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 08-0F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072200]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 10-17: Status=04, Enable=00, RunEnable=00, WakeEnable=00
[   53.072208]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 18-1F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072217]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 20-27: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072226]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 28-2F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072235]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 30-37: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072243]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 38-3F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072252]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 40-47: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072261]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 48-4F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072270]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 50-57: Status=00, Enable=01, RunEnable=01, WakeEnable=00
[   53.072278]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 58-5F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072283]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 60-67: Status=00, Enable=00, RunEnable=46, WakeEnable=00
[   53.072289]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 68-6F: Status=88, Enable=00, RunEnable=A0, WakeEnable=A0
[   53.072294]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 70-77: Status=00, Enable=04, RunEnable=00, WakeEnable=00
[   53.072300]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 78-7F: Status=00, Enable=00, RunEnable=00, WakeEnable=00
[   53.072306]   evevent-0313 ev_save_wokenup_fixed_: Wakeup Fixed Event Block: Enable 00000000 Status 00008000

Let me check it and get back to you later.

Thanks
Lv
Comment 47 Lv Zheng 2016-12-28 07:50:33 UTC
Before suspending:
[   53.072568]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 50-57: Status=00, Enable=00, RunEnable=01, WakeEnable=00
[   53.072598]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 60-67: Status=00, Enable=00, RunEnable=46, WakeEnable=00
[   53.072609]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 68-6F: Status=00, Enable=A0, RunEnable=A0, WakeEnable=A0

Runtime GPEs:
 50 61 62 66
Wakeup GPEs:
 6D 6F
Enabled GPEs:
 6D 6F

After resuming:
[   53.072200]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 10-17: Status=04, Enable=00, RunEnable=00, WakeEnable=00
[   53.072270]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 50-57: Status=00, Enable=01, RunEnable=01, WakeEnable=00
[   53.072283]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 60-67: Status=00, Enable=00, RunEnable=46, WakeEnable=00
[   53.072289]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 68-6F: Status=88, Enable=00, RunEnable=A0, WakeEnable=A0
[   53.072294]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 70-77: Status=00, Enable=04, RunEnable=00, WakeEnable=00

Runtime GPEs:
 50 61 62 66
Wakeup GPEs:
 6D 6F
Enabled GPEs:
 6D 6F 72
Raised GPEs:
 12 67 6F

The GPE 6F seems to be the cause of the wakeup.
It's not possible to obtain the related device to GPE-6F, let's think about a debugging mean.

Thanks and best regards
Lv
Comment 48 Lv Zheng 2017-01-16 04:36:08 UTC
Created attachment 251831 [details]
[PATCH] Obtain wakeup GPE number via procfs

Sorry for the late response.
Please apply this patch and rebuild the kernel.
Boot the kernel and upload /proc/acpi/wakeup output here.

Thanks
Lv
Comment 49 Chris Murphy 2017-01-17 01:16:40 UTC
Created attachment 252041 [details]
acpiwakeup c48

With patch c48 applied.
Comment 50 Chris Murphy 2017-01-17 01:18:19 UTC
FWIW HP released a firmware update recently, which I have applied prior to comment 49 test.

[    9.054526] Hardware name: HP HP Spectre Notebook/81A0, BIOS F.30 12/15/2016
Comment 51 Lv Zheng 2017-01-17 02:05:50 UTC
The GPE6F seems to be the power button.
May I ask:
Did you press the power button before suspending?
Did you press the power button to wake the system up?

Thanks
Lv
Comment 52 Lv Zheng 2017-01-17 02:09:01 UTC
Maybe it's a mechanical design failure, the lid closure may press the power button...
You can try to disable kernel GPIO related configurations and try again.
Comment 53 Chris Murphy 2017-01-17 03:21:11 UTC
(In reply to Lv Zheng from comment #51)
> The GPE6F seems to be the power button.
> May I ask:
> Did you press the power button before suspending?
> Did you press the power button to wake the system up?

No, I booted and did only cat > /proc/acpi/wakeup; so should I reboot and first use the power button to suspend and then cat > /proc/acpi/wakeup?
Comment 54 Chris Murphy 2017-01-17 03:33:42 UTC
(In reply to Lv Zheng from comment #52)
> Maybe it's a mechanical design failure, the lid closure may press the power
> button...
a. systemd knows the difference
Jan 16 20:30:02 f25h systemd-logind[818]: Power key pressed.

Jan 16 20:31:01 f25h systemd-logind[818]: Lid closed.

The problem of immediate wakeup happens whether I initiate with power button or lid closure. The difference is with the lid closed, of course it goes back to sleep again, then wakes up, in a constant cycle.


b. I'm not sure I follow a flawed design hypothesis, both power button and lid triggered suspend worked find with kernels prior to 4.8. It's definitely a regression. And then the same hardware suspends OK with Windows 10, also using either power button or closing the lid.

> You can try to disable kernel GPIO related configurations and try again.

I don't know what this means.
Comment 55 Chris Murphy 2017-01-17 03:37:41 UTC
Created attachment 252051 [details]
acpiwakeup c51

Capture /proc/acpi/wakeup with kernel containing patch in c48 after using power button to sleep.
Comment 56 Lv Zheng 2017-01-17 04:06:25 UTC
Let me ask again since communication between you and me seems to be in a very dumb way, we don't have any unwritten understanding between us. That makes the communication very difficult, almost impossible for us who is not a native english speaker.

By asking this question.

> The GPE6F seems to be the power button.
> May I ask:
> Did you press the power button before suspending?
> Did you press the power button to wake the system up?

I meant the test you've done to attachment 248721 [details].
I didn't mean the test you've done to attachment 252041 [details].

I think we both knew that test done to attachment 252041 [details] needn't any power button press...

Thanks
Lv
Comment 57 Lv Zheng 2017-01-17 04:06:58 UTC
(In reply to Chris Murphy from comment #55)
> Created attachment 252051 [details]
> acpiwakeup c51
> 
> Capture /proc/acpi/wakeup with kernel containing patch in c48 after using
> power button to sleep.

Please stop doing this before solving the communication problem...
Comment 58 Lv Zheng 2017-01-17 04:11:13 UTC
(In reply to Lv Zheng from comment #56)
> Let me ask again since communication between you and me seems to be in a
> very dumb way, we don't have any unwritten understanding between us. That
> makes the communication very difficult, almost impossible for us who is not
> a native english speaker.
> 
> By asking this question.
> 
> > The GPE6F seems to be the power button.
> > May I ask:
> > Did you press the power button before suspending?
> > Did you press the power button to wake the system up?
> 
> I meant the test you've done to attachment 248721 [details].
> I didn't mean the test you've done to attachment 252041 [details].
> 

I meant the test you've done in comment 45, not the test you've done in comment 49.

> I think we both knew that test done to attachment 252041 [details] needn't
> any power button press...
> 
> Thanks
> Lv
Comment 59 Lv Zheng 2017-01-17 04:19:04 UTC
(In reply to Chris Murphy from comment #54)
> (In reply to Lv Zheng from comment #52)
> > Maybe it's a mechanical design failure, the lid closure may press the power
> > button...
> a. systemd knows the difference
> Jan 16 20:30:02 f25h systemd-logind[818]: Power key pressed.
> 
> Jan 16 20:31:01 f25h systemd-logind[818]: Lid closed.
> 
> The problem of immediate wakeup happens whether I initiate with power button
> or lid closure. The difference is with the lid closed, of course it goes
> back to sleep again, then wakes up, in a constant cycle.
> 
> 
> b. I'm not sure I follow a flawed design hypothesis, both power button and
> lid triggered suspend worked find with kernels prior to 4.8. It's definitely
> a regression. And then the same hardware suspends OK with Windows 10, also
> using either power button or closing the lid.
> 
> > You can try to disable kernel GPIO related configurations and try again.
> 
> I don't know what this means.

Let me discuss later after solving the communication problem.
Comment 60 Chris Murphy 2017-01-17 04:21:51 UTC
(In reply to Lv Zheng from comment #58)
> (In reply to Lv Zheng from comment #56)
> > Let me ask again since communication between you and me seems to be in a
> > very dumb way, we don't have any unwritten understanding between us. That
> > makes the communication very difficult, almost impossible for us who is not
> > a native english speaker.
> > 
> > By asking this question.
> > 
> > > The GPE6F seems to be the power button.
> > > May I ask:
> > > Did you press the power button before suspending?
> > > Did you press the power button to wake the system up?
> > 
> > I meant the test you've done to attachment 248721 [details].
> > I didn't mean the test you've done to attachment 252041 [details].
> > 
> 
> I meant the test you've done in comment 45, not the test you've done in
> comment 49.


The test I did in comment 45's attachement, I pressed the power button to suspend. The laptop woke up on its own 2-3 seconds later, I did not put the power button a 2nd time.
Comment 61 Chen Yu 2017-01-17 05:18:03 UTC
@Christ OK please let me confirm:
1. by echo mem > /sys/power/state, the system will suspend normally, however it will be woken up automatically 2-3 seconds later, then the system keeps alive.
2. by closing the lid, the system will suspend normally, however it will be woken up automatically 2-3 seconds later, then suspend again, then 2-3 seconds later woken up, and so on.


According to discussion above, it might the power button event wake up the system, and it looks very likely to be a firmwate issue. I really don' know why power button would trigger that wakeup event after commit in #Comment7 has been applied. 

@Lv, I'm thinking of another question,  maybe not strictly related to this thread, what if 0x6f does not map to power button but to other devices? That is to say, although the linux see the event 0x6f as a  power button, however the actual 0x6f is mapped to another wakeup device, say, AC adaptor. If the commit in #Comment 7 has impact the behavior of AC adaptor driver, say, become a wake device after patch applied, then we will see what we encountered now.

Is there a way for us to confirm if the GPE 0x6f is correctly mapped to the power button?
Comment 62 Chris Murphy 2017-01-17 21:46:08 UTC
(In reply to Chen Yu from comment #61)
> @Christ OK please let me confirm:
> 1. by echo mem > /sys/power/state, the system will suspend normally, however
> it will be woken up automatically 2-3 seconds later, then the system keeps
> alive.
> 2. by closing the lid, the system will suspend normally, however it will be
> woken up automatically 2-3 seconds later, then suspend again, then 2-3
> seconds later woken up, and so on.

Both are correct; but only since kernel 4.8. Also the Windows 10 test is a generic Microsoft Windows 10 Enterprise build, not a manufacturer specific build, nor are manufacturer provided drivers installed.
Comment 63 Lv Zheng 2017-01-18 02:03:15 UTC
(In reply to Chris Murphy from comment #62)
> (In reply to Chen Yu from comment #61)
> > @Christ OK please let me confirm:
> > 1. by echo mem > /sys/power/state, the system will suspend normally,
> however
> > it will be woken up automatically 2-3 seconds later, then the system keeps
> > alive.

The system shouldn't be woken up automatically, this is a regression.

> > 2. by closing the lid, the system will suspend normally, however it will be
> > woken up automatically 2-3 seconds later, then suspend again, then 2-3
> > seconds later woken up, and so on.

Without considering the auto wakeup, this looks normal IMO.
We needn't track this issue in this regression report.

> 
> Both are correct; but only since kernel 4.8. Also the Windows 10 test is a
> generic Microsoft Windows 10 Enterprise build, not a manufacturer specific
> build, nor are manufacturer provided drivers installed.

So let me ask for a test without pressing any power button:

1. apply attachment 248411 [details] and build the kernel with CONFIG_ACPI_DEBUG=y.
2. boot the kernel with "acpi.debug_level=0x08000000 acpi.debug_layer=0x00000006 log_buf_len=16M dyndbg="file ec.c +p".
3. echo mem > /sys/power/state, and wait until the system is woken up.
4. obtain the dmesg output and upload it here.

Thanks in advance
Lv
Comment 64 Chris Murphy 2017-01-18 03:40:27 UTC
Created attachment 252241 [details]
dmesg c63
Comment 65 Lv Zheng 2017-01-18 03:50:29 UTC
GPE 0x6F is still the only problem wakes the system up:
[   80.303111]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 68-6F: Status=00, Enable=A0, RunEnable=A0, WakeEnable=A0

"Status" is not set before handing the controlship from Linux to BIOS.

...

[   80.302655]     hwgpe-0614 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 68-6F: Status=88, Enable=00, RunEnable=A0, WakeEnable=A0

"Status" is set after re-obtaining the controlship.
Comment 66 Lv Zheng 2017-01-18 03:53:24 UTC
IMO, the only way to make sure the commit is the true culprit is to revert it on top of the bad kernel.

So could you try:
$ git revert e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
Instead of:
$ git checkout e6cbd7f2efb433d717af72aa8510a9db6f7a7e05~
And test it again to see if this is reproducible.

Thanks
Lv
Comment 67 Lv Zheng 2017-01-18 03:54:24 UTC
Please also upload lsmod/lspci/lsusb output here.

Thanks
Lv
Comment 68 Chris Murphy 2017-01-18 04:01:06 UTC
[chris@f25h linux]$ git reset --hard
HEAD is now at 69973b8 Linux 4.9
[chris@f25h linux]$ git revert e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
error: could not revert e6cbd7f... mm, page_alloc: remove fair zone allocation policy
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'

[chris@f25h linux]$ git checkout tags/v4.8
Note: checking out 'tags/v4.8'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at c8d2bc9... Linux 4.8
[chris@f25h linux]$ git revert e6cbd7f2efb433d717af72aa8510a9db6f7a7e05
error: could not revert e6cbd7f... mm, page_alloc: remove fair zone allocation policy
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'
Comment 69 Lv Zheng 2017-01-18 05:28:50 UTC
I'll paste a reversion patch for HEAD=69973b8 here.

Thanks
Comment 70 Lv Zheng 2017-01-18 05:40:25 UTC
Is it possible to apply attachment 248411 [details] to a latest known good kernel.
And do the following test so that we can learn what's the difference between the bad and the good kernels:

1. reset kernel head to a known good kernel
2. apply attachment 248411 [details] and build the kernel with CONFIG_ACPI_DEBUG=y.
3. boot the kernel with "acpi.debug_level=0x08000000 acpi.debug_layer=0x00000006 log_buf_len=16M dyndbg="file ec.c +p".
4. echo mem > /sys/power/state, and wait until the system is woken up.
5. obtain the dmesg output (dmesg-good.log) and upload it here.

Thanks
Lv
Comment 71 Lv Zheng 2017-01-18 06:29:23 UTC
Created attachment 252261 [details]
[PATCH] Revert bisected culprit

Please try to apply this patch instead of reverting the culprit on top of v4.9 tag.
Then build and boot the kernel to see if the problem persistes.

Thanks
Lv
Comment 72 Lv Zheng 2017-01-18 06:31:45 UTC
So I just asked for 2 tests:
c70: GPE suspend/resume info for a good kernel.
c71: Confirm if the bisected one is the culprit.
I'm sorry if my requests are not clear enough. :)

Thanks in advance and best regards
Lv
Comment 73 Chris Murphy 2017-01-18 06:56:16 UTC
HEAD is now at 523d939... Linux 4.7
[chris@f25h linux]$ git apply ~/a248411.patch
error: patch failed: drivers/acpi/acpica/aclocal.h:497
error: drivers/acpi/acpica/aclocal.h: patch does not apply
[chris@f25h linux]$
Comment 74 Chris Murphy 2017-01-18 07:03:17 UTC
c71: kernel is building with patch 252261 now
c67: lspci already attached, will attach lsmod and lsusb
Comment 75 Chris Murphy 2017-01-18 07:05:06 UTC
Created attachment 252271 [details]
lsmod
Comment 76 Chris Murphy 2017-01-18 07:05:33 UTC
Created attachment 252281 [details]
lsusb
Comment 77 Lv Zheng 2017-01-18 07:27:22 UTC
(In reply to Chris Murphy from comment #73)
> HEAD is now at 523d939... Linux 4.7
> [chris@f25h linux]$ git apply ~/a248411.patch
> error: patch failed: drivers/acpi/acpica/aclocal.h:497
> error: drivers/acpi/acpica/aclocal.h: patch does not apply
> [chris@f25h linux]$

OK, I'll check it.

Thanks
Lv
Comment 78 Chris Murphy 2017-01-18 07:48:08 UTC
Created attachment 252301 [details]
make failure 252261

make fails using c71 patch 252261 (make works after using git apply --reverse)
Comment 79 Lv Zheng 2017-01-18 08:03:56 UTC
(In reply to Lv Zheng from comment #77)
> (In reply to Chris Murphy from comment #73)
> > HEAD is now at 523d939... Linux 4.7
> > [chris@f25h linux]$ git apply ~/a248411.patch
> > error: patch failed: drivers/acpi/acpica/aclocal.h:497
> > error: drivers/acpi/acpica/aclocal.h: patch does not apply
> > [chris@f25h linux]$
> 
> OK, I'll check it.
> 
> Thanks
> Lv

Couldn't see any problems in applying this patch via quilt.
So one of the following commands should work:
$ patch -p1 < <patch file>
or
$ git apply --ignore-witespace <patch file>
Please give it a try.
Comment 80 Lv Zheng 2017-01-18 08:07:44 UTC
(In reply to Chris Murphy from comment #78)
> Created attachment 252301 [details]
> make failure 252261
> 
> make fails using c71 patch 252261 (make works after using git apply
> --reverse)

The error doesn't seem to be relevant.
For c71 test, please

$ git reset 69973b8 --hard
$ patch -p1 < <attachment 252261 [details]>
$ make -j64 | tee build.log 2>&1
$ sudo make modules_install install

I've already confirmed locally that the kernel can be booted by doing the above before posting the patch.
So it should work.
If it's still not working, then:
$ git reset 69973b8 --hard
$ patch -p1 < <attachment 252261 [details]>
$ make clean
$ make -j64 | tee build.log 2>&1
$ sudo make modules_install install

Thanks
Lv
Comment 81 Chris Murphy 2017-01-18 10:05:48 UTC
c71/c80 test, kernel with patch 252261 does not have the problem. Laptop suspends with power button or laptop lid close, remains suspended. Comes out of suspension either with space bar key press, power button press, or if lid closed by opening the lid.

I'll do c70 test tomorrow.
Comment 82 Chris Murphy 2017-01-18 18:09:57 UTC
Created attachment 252361 [details]
dmesg good for c70

Per c70 instructions.
Comment 83 Lv Zheng 2017-01-19 01:59:47 UTC
(In reply to Chris Murphy from comment #81)
> c71/c80 test, kernel with patch 252261 does not have the problem. Laptop
> suspends with power button or laptop lid close, remains suspended. Comes out
> of suspension either with space bar key press, power button press, or if lid
> closed by opening the lid.

This proves that the bisected commit is really the culprit.
It may just have made something non-worked working now.
I'll check c70 result.

> 
> I'll do c70 test tomorrow.

Thanks
Lv
Comment 84 Lv Zheng 2017-01-19 02:09:06 UTC
For the c70 test, the power button wakeup GPE is not flagged:
[   74.904029]     hwgpe-0601 hw_save_wokenup_gpe_bl: Read registers for wakeup GPE 68-6F: Status=08, Enable=00, RunEnable=A0, WakeEnable=A0

And I couldn't see what was the cause of the wakeup.
How did you wake the system up?
By opening lid or RTC?

Thanks
Lv
Comment 85 Lv Zheng 2017-01-19 02:30:00 UTC
Linking original commit here:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=81c0a2bb
Comment 86 Chris Murphy 2017-01-19 02:58:04 UTC
Used 'echo mem > /sys/power/state' to suspend with lid left open. Pressed space bar to wake. I did separately test closing the lid and opening the lid, both work as expected; and I also tested power button to suspend and wake, both of those worked as expected.
Comment 87 Zhang Rui 2017-01-19 03:40:15 UTC
Summary:
currently, the problem is that, with the commit applied, there is a bogus power button event fired when the platform enters S3, and this results in an immediately system wakeup, and systemd puts the system into S3 again because lid is closed, and results in following bogus power button event... (Note that user never presses the power button)

So, we need to make clear what difference this commit brings, and why firmware generates the bogus power button event...

cc @ying.huang@intel.com, @fengguang.wu@intel.com to see if we have more ideas
Comment 88 Chen Yu 2017-01-20 03:40:00 UTC
@Christ, please help check on the problematic kernel with patch c70 by:
echo freeze > /sys/power/state
and wait to see if the system would be woken up automatically 2-3s later?

Since the driver hooks are almost the same for suspend to idle and suspend to ram, we can narrow down whether the bad commit has impact the driver, or the ACPI/BIOS power management suspend process.
Comment 89 Chris Murphy 2017-01-20 18:14:31 UTC
Created attachment 252651 [details]
dmesg c88

Different behavior: this command makes the backlight turn off, the power button remains illuminated and does not flash, pressing any keyboard keys does not resume. Resume only happened once I pressed power button.
Comment 90 Chris Murphy 2017-01-21 03:26:41 UTC
Created attachment 252681 [details]
dmesg c88 2

Replacing the earlier one due to a boot param typo relating to the debugging; described behavior remains the same.
Comment 91 Len Brown 2017-01-24 01:14:09 UTC
> HP HP Spectre Notebook/81A0, BIOS F.26 08/11/2016
> HP HP Spectre Notebook/81A0, BIOS F.30 12/15/2016

This sure looks like a BIOS bug to me.

ie. a random change in Linux has made a difference
in memory footprint that should have nothing to do with SMM,
but for some reason, it confuses SMM after we enter
or resume from S3.

Looks like a new machine, which is still getting BIOS updates.
Try the latest.  Complain to HP.  See if another
random change, either in system configuration or BIOS SETUP
will make the problem at hand go away.

from comment #16, the most reasonable workaround
on this system may be to not use the power button
as a wakeup source:

echo PWRB > /proc/acpi/wakeup
Comment 92 Chris Murphy 2017-01-24 01:35:43 UTC
(In reply to Len Brown from comment #91)
> > HP HP Spectre Notebook/81A0, BIOS F.26 08/11/2016
> > HP HP Spectre Notebook/81A0, BIOS F.30 12/15/2016
> 
> This sure looks like a BIOS bug to me.

That sounds speculative rather than conclusive.


> ie. a random change in Linux has made a difference
> in memory footprint that should have nothing to do with SMM,
> but for some reason, it confuses SMM after we enter
> or resume from S3.

Should not affect SMM, or does not affect SMM?

> 
> Looks like a new machine, which is still getting BIOS updates.
> Try the latest.

F.30 is the latest.

>  Complain to HP.  See if another
> random change, either in system configuration or BIOS SETUP
> will make the problem at hand go away.

It's a laptop, the configuration options are pretty limited, so far the problem happens whether power is connect or not, USB flash drive is connect or not. IT happens with and without UEFI Secure Boot enabled, and regardless of boot order. With or without TPM enabled. The firmware setup options are rather limited.

Of course I can open up a ticket with HP, and point to this long bug thread. Color me shocked if they don't say they don't support Linux on this hardware, and by the way the laptop doesn't have this problem whether it runs HP provided Windows blob, or installed with a stock Microsoft Windows installer, and it's obviously a Linux kernel regression.

So if someone has a really clear summary of exactly what the problem is in the firmware, that would give me a lot more leverage when going to HP asking for a firmware bug fix.

Note You need to log in before you can comment on or make changes to this bug.