Bug 59401 - Atom-based machine resumes immediately from S3 suspend
Summary: Atom-based machine resumes immediately from S3 suspend
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: All Linux
: P1 high
Assignee: acpi_power-sleep-wake
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-07 08:00 UTC by Zoltan Boszormenyi
Modified: 2014-10-28 05:28 UTC (History)
6 users (show)

See Also:
Kernel Version: v3.7+
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (44.20 KB, text/plain)
2013-06-07 08:07 UTC, Zoltan Boszormenyi
Details
Kernel .config (78.24 KB, application/octet-stream)
2013-06-07 08:08 UTC, Zoltan Boszormenyi
Details
Verbose lspci with device names (103.60 KB, text/plain)
2013-06-07 08:08 UTC, Zoltan Boszormenyi
Details
Verbose lspci with vendor:device IDs (101.77 KB, text/plain)
2013-06-07 08:08 UTC, Zoltan Boszormenyi
Details
acpidump after boot (145.03 KB, application/octet-stream)
2013-06-07 15:23 UTC, Zoltan Boszormenyi
Details
acpidump after suspend (145.03 KB, application/octet-stream)
2013-06-07 15:23 UTC, Zoltan Boszormenyi
Details
dmesg after two suspend/resume cycles (69.41 KB, text/plain)
2013-07-15 12:52 UTC, Zoltan Boszormenyi
Details
kernel log from 3.13-rc3+ with very verbose ACPI messages (1.86 MB, text/plain)
2013-12-10 14:14 UTC, Zoltan Boszormenyi
Details

Description Zoltan Boszormenyi 2013-06-07 08:00:47 UTC
We are working on an Intel Atom-based embedded PC and have to make suspend-to-RAM work but can't seem to succeed.

The symptom is that quite often, the machine resumes immediately after pm-suspend. Sometimes more than 20 times out of 50 attempts.

We tried 3.7.10, 3.9.4, 3.10-rc[234] and the linux-next branch from the
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
repository.

The attached dmesg is from linux-pm/linux-next (commit 01adc9fcbd94b71f69681913baed802f94e2f293) plus the latest drm-fixes patchset posted by Dave Airlie in http://marc.info/?l=linux-kernel&m=137049641300788&w=2 after I have disabled wakeup devices via sysfs files:

# find /sys -name "wakeup" | while read i ; do \
    DEV="$i" ; ENAB="`cat $i`" ; \
    if [ "$ENAB" = "enabled" ]; then \
        echo disabling $DEV ; \
        echo "disabled" >$DEV ; \
    fi ; \
done

However, the attached dmesg still has:

pcieport 0000:00:1c.1: System wakeup enabled by ACPI

This device is disabled by default in /sys/devices/pci0000\:00/0000\:00\:1c.1/power/wakeup so I have tried explicitly enabling then disabling it. Still, the next suspend resulted in an immediate resume.

We have cross-checked suspend-resume using Windows XP and Windows 7
and these OSs are able to properly suspend the machine 50 times out of
50 attempts with Intel's official driver for the GMA3150.

How can I make S3 suspend work reliably? Is there a missing piece from our kernel .config?
Comment 1 Zoltan Boszormenyi 2013-06-07 08:07:44 UTC
Created attachment 103771 [details]
dmesg

dmesg after an immediate resume. The first suspend after booting up resulted in immediate resuming.
Comment 2 Zoltan Boszormenyi 2013-06-07 08:08:05 UTC
Created attachment 103781 [details]
Kernel .config
Comment 3 Zoltan Boszormenyi 2013-06-07 08:08:33 UTC
Created attachment 103791 [details]
Verbose lspci with device names
Comment 4 Zoltan Boszormenyi 2013-06-07 08:08:57 UTC
Created attachment 103801 [details]
Verbose lspci with vendor:device IDs
Comment 5 Aaron Lu 2013-06-07 08:58:12 UTC
(In reply to comment #0)
> However, the attached dmesg still has:
> 
> pcieport 0000:00:1c.1: System wakeup enabled by ACPI
> 
> This device is disabled by default in
> /sys/devices/pci0000\:00/0000\:00\:1c.1/power/wakeup so I have tried
> explicitly
> enabling then disabling it. Still, the next suspend resulted in an immediate
> resume.

From the dmesg, the 1c.1 is a PCI bridge that connects to the Ethernet controller. Is it possible that this is related to Wake On Lan? Is there a BIOS setup option for WOL? Thanks.
Comment 6 Zoltan Boszormenyi 2013-06-07 09:53:01 UTC
Yes, there is a Wake-on-LAN option in the BIOS and it is currently enabled.
But previously I have tested it with being disabled and it didn't help.

I will try with both this BIOS setting disabled and disabling all the devices via the sysfs wakeup files.
Comment 7 Zoltan Boszormenyi 2013-06-07 09:57:24 UTC
No, it didn't help either.

Also, after resume, some devices re-gain their wakeup enabled state:

# find /sys -name "wakeup" | while read i ; do DEV="$i" ; ENAB="`cat $i`" ; if [ "$ENAB" = "enabled" ]; then echo $DEV  ; fi ; done                       
/sys/devices/pci0000:00/0000:00:1c.1/0000:02:00.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.1/usb3/3-2/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.1/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.2/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.3/power/wakeup
Comment 8 Zoltan Boszormenyi 2013-06-07 10:28:15 UTC
The previous dmesg contains a testing leftover kernel command line parameter:

acpi_sleep=nonvs,old_ordering

but it doesn't make any difference if it's there or not.
Comment 9 Lan Tianyu 2013-06-07 11:25:50 UTC
Please provide the output of "cat /proc/acpi/wakeup".
Comment 10 Zoltan Boszormenyi 2013-06-07 12:02:32 UTC
After booting up:

# cat /proc/acpi/wakeup 
Device	S-state	  Status   Sysfs node
P0P1	  S4	*disabled  pci:0000:00:1e.0
PS2K	  S4	*enabled   pnp:00:03
USB0	  S4	*enabled   pci:0000:00:1d.0
USB1	  S4	*enabled   pci:0000:00:1d.1
USB2	  S4	*enabled   pci:0000:00:1d.2
USB3	  S4	*enabled   pci:0000:00:1d.3
EUSB	  S4	*enabled   pci:0000:00:1d.7
P0P4	  S4	*disabled  pci:0000:00:1c.0
P0P5	  S4	*disabled  pci:0000:00:1c.1
P0P6	  S4	*disabled
P0P7	  S4	*disabled  pci:0000:00:1c.3
P0P8	  S4	*disabled
P0P9	  S4	*disabled

After touching the sysfs wakeup files with the below script:

# cat /proc/acpi/wakeup                                                                                                                                   
Device	S-state	  Status   Sysfs node
P0P1	  S4	*disabled  pci:0000:00:1e.0
PS2K	  S4	*disabled  pnp:00:03
USB0	  S4	*disabled  pci:0000:00:1d.0
USB1	  S4	*disabled  pci:0000:00:1d.1
USB2	  S4	*disabled  pci:0000:00:1d.2
USB3	  S4	*disabled  pci:0000:00:1d.3
EUSB	  S4	*disabled  pci:0000:00:1d.7
P0P4	  S4	*disabled  pci:0000:00:1c.0
P0P5	  S4	*disabled  pci:0000:00:1c.1
P0P6	  S4	*disabled
P0P7	  S4	*disabled  pci:0000:00:1c.3
P0P8	  S4	*disabled
P0P9	  S4	*disabled

The initially enabled devices in sysfs:

# find /sys -name "wakeup" | while read i ; \
do \
    DEV="$i" ; \
    ENAB="`cat $i`" ; \
    if [ "$ENAB" = "enabled" ]; then \
        echo $DEV ; \
        echo "disabled" >$DEV ; \
    fi ; \
done
/sys/devices/pnp0/00:03/power/wakeup
/sys/devices/pci0000:00/0000:00:1c.1/0000:02:00.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.1/usb3/3-2/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.1/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.2/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.3/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.7/power/wakeup
/sys/devices/LNXSYSTM:00/LNXPWRBN:00/power/wakeup

All devices with a wakeup file:

# find /sys -name "wakeup"
/sys/devices/pnp0/00:03/power/wakeup
/sys/devices/pnp0/00:06/tty/ttyS0/power/wakeup
/sys/devices/pnp0/00:07/tty/ttyS1/power/wakeup
/sys/devices/pnp0/00:08/tty/ttyS2/power/wakeup
/sys/devices/pnp0/00:09/tty/ttyS3/power/wakeup
/sys/devices/pnp0/00:0a/tty/ttyS4/power/wakeup
/sys/devices/pci0000:00/0000:00:1b.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1c.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1c.1/power/wakeup
/sys/devices/pci0000:00/0000:00:1c.1/0000:02:00.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1c.3/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.0/usb2/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.1/usb3/3-1/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.1/usb3/3-2/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.1/usb3/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.1/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.2/usb4/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.2/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.3/usb5/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.3/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.7/usb1/power/wakeup
/sys/devices/pci0000:00/0000:00:1d.7/power/wakeup
/sys/devices/pci0000:00/0000:00:1e.0/power/wakeup
/sys/devices/pci0000:00/0000:00:1f.2/power/wakeup
/sys/devices/platform/serial8250.3/tty/ttyS5/power/wakeup
/sys/devices/LNXSYSTM:00/LNXPWRBN:00/power/wakeup
Comment 11 Lan Tianyu 2013-06-07 13:32:39 UTC
Please attach the output of acpidump.
From the previous output, all wakeup should be disabled.
Comment 12 Zoltan Boszormenyi 2013-06-07 15:23:02 UTC
Created attachment 103831 [details]
acpidump after boot
Comment 13 Zoltan Boszormenyi 2013-06-07 15:23:54 UTC
Created attachment 103841 [details]
acpidump after suspend
Comment 14 Zoltan Boszormenyi 2013-06-07 15:29:09 UTC
These are the two acpidump outputs that are different.

It doesn't make any difference (in their md5sum) if the wakeup sources in sysfs are disabled or not.

However, there is a difference. When acpidump is run after boot, there is this message:

# ./acpidump >acpidump-after-boot-1.out
Wrong checksum for generic table!

After suspend/resume the message appears twice:

# acpidump >acpidump-after-suspend.out
Wrong checksum for generic table!
Wrong checksum for generic table!
Comment 15 Zoltan Boszormenyi 2013-06-26 16:34:00 UTC
Ping.

I have provided the acpidump output almost 3 weeks ago. Is it the proper one?
I used the kernel acpidump tool from the kernel GIT, in the
linux/tools/power/acpi directory.
Comment 16 Aaron Lu 2013-06-27 00:51:01 UTC
(In reply to comment #15)
> Ping.
> 
> I have provided the acpidump output almost 3 weeks ago. Is it the proper one?
> I used the kernel acpidump tool from the kernel GIT, in the
> linux/tools/power/acpi directory.

Yes it is the proper one, thanks. And I do not have any idea why this happened at the moment, sorry.
Comment 17 Aaron Lu 2013-07-09 02:08:01 UTC
Can you please try building the kernel with CONFIG_PM_RUNTIME not set? Thanks.
Comment 18 Zoltan Boszormenyi 2013-07-10 07:01:42 UTC
(In reply to Aaron Lu from comment #17)
> Can you please try building the kernel with CONFIG_PM_RUNTIME not set?
> Thanks.

I can do that next week. ATM I am on holiday and away from my office desktop.
Comment 19 Zoltan Boszormenyi 2013-07-15 12:46:01 UTC
I have tested the released 3.10.1 plus an extra patch from

https://lkml.org/lkml/2013/7/11/661

to avoid the regression mentioned on LKML with suspend in

http://marc.info/?t=137380792100003&r=1&w=2

That happened with this machine, too.

I tried both with and without PM_RUNTIME being set and both kernels gave me the same result: the first suspend was successful but the second one resulted in an immediate resume.

dmesg for the "PM_RUNTIME not set" run will be attached. It contains the warning messages mentioned previously in https://bugs.freedesktop.org/show_bug.cgi?id=65497 , I think the two problems have some common root.

I have also tested (or tried to test) these below with and without CONFIG_PM_RUNTIME.

- 3.11-rc1 plus the ext4 fixes landed in GIT very soon after rc1
  plus the above patch
- linux-next branch from the linux-pm repository from
  git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
  As of writing this, this branch contains 3.11-rc1 plus some fixes including
  the above patch.

The result in each configuration is similar to the above mail with the described regression. After "echo -n 'mem' >/sys/power/state" the machine seems to go into suspend but the power LED stays lit and cannot be resumed by pressing the power button. Pressing the power button for 4 seconds can turn the computer off.
Comment 20 Zoltan Boszormenyi 2013-07-15 12:52:57 UTC
Created attachment 106888 [details]
dmesg after two suspend/resume cycles

dmesg for 3.10.1
Comment 21 Zoltan Boszormenyi 2013-12-10 14:14:43 UTC
Created attachment 117961 [details]
kernel log from 3.13-rc3+ with very verbose ACPI messages

I have just tested kernel version 3.13.0-rc3-00157-g17b2112-dirty , the "dirty" is because of the one liner patch to increase the LVDS timeout to 2000ms from 1000ms. I added kernel command line options acpi.debug_layer=0xffffffff acpi.debug_level=0xffffffff.

Unfortunately a lot of logs were lost before klogd actually started, this is why the log starts so strangely. CONFIG_LOG_BUF_SHIFT=21, the maximum.

The logs should contain traces of two suspend-resume cycles, the second suspend resumed immediately.
Comment 22 Zoltan Boszormenyi 2013-12-10 14:20:38 UTC
The above mentioned patch is at
https://bugs.freedesktop.org/show_bug.cgi?id=65497#c5
Comment 23 Zoltan Boszormenyi 2013-12-10 18:33:32 UTC
It seems the patches in the linux-pm/linux-next tree make suspend/resume working reliably. It's at https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/log/?h=linux-next , currently at commit efef6dba52777eac8bf2a866caa2d8d80f4e84b0.

I have tested this machine using AHCI mode and 50 attempts out of 50 suspends were successful without immedate resuming. I will also test this kernel using IDE mode (both "Enhanced" and "Compatible") to see whether there is any difference when using the ata_piix driver in Linux.

As far as I know, the linux-next tree is targeted for 3.14, which won't be released before May 2014, but we can use the patches over 3.13.
Comment 24 Len Brown 2014-10-28 05:28:20 UTC
> I have tested this machine using AHCI mode and 50 attempts out of 50
> suspends  were successful without immediate resuming.

Great!
Unfortunately, unclear from above which patch helped.
If it is still an issue w/ upstream kernel, please re-open.

Note You need to log in before you can comment on or make changes to this bug.