Bug 195897 - S3 isn't supported and hangs, mem_sleep should mapped to s2idle by default - Dell Latitude 7275
Summary: S3 isn't supported and hangs, mem_sleep should mapped to s2idle by default - ...
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL: http://en.community.dell.com/techcent...
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-28 15:36 UTC by Jérôme de Bretagne
Modified: 2017-11-12 21:05 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.13 4.12 4.11 4.10 4.9 3.16
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump for Dell Latitude 7275 with BIOS 1.1.31 (954.20 KB, text/plain)
2017-05-28 15:43 UTC, Jérôme de Bretagne
Details
dmesg after pm_tracing kernel 4.12-rc2 (51.01 KB, text/plain)
2017-05-28 15:49 UTC, Jérôme de Bretagne
Details
dmesg on normal boot - kernel 4.12-rc2 (69.70 KB, text/plain)
2017-05-28 16:53 UTC, Jérôme de Bretagne
Details
output of lspci for Dell Latitude 7275 - kernel 4.12-rc2 (10.97 KB, text/plain)
2017-05-28 17:03 UTC, Jérôme de Bretagne
Details
acpidump for Dell Latitude 7275 with BIOS 1.1.20 (960.07 KB, text/plain)
2017-06-07 23:33 UTC, Jérôme de Bretagne
Details

Description Jérôme de Bretagne 2017-05-28 15:36:56 UTC
Dell Latitude 7275 is a 2-in-1 detachable laptop with an Intel Core-M processor (6th gen "Skylake"). There is a similar consumer product branded XPS 12 (9250). 

Attempting to suspend and resume this system fails, reproducibly 100%.

At first, the suspend step looked successful, the built-in screen and an external display both switching off, when triggered by:
# echo mem > /sys/power/state

However, comparing the behavior with a suspend from Windows shows differences: no LED flashing on Linux (LED remains blank); a USB-based network adapter would still remain powered after a suspend from Windows but not from Linux.

Trying to resume by pressing the power button, or the windows button (on the tablet side) or with the attached keyboard don't work. Only a very long-press on the power button will allow the device to quit that state (forced shutdown?) and then another press on power will reboot the device from the Bios.  

pm-hibernate works reliably on the other hand.

I've tried many different kernel versions, 3.16, 4.9, 4.11 (all 3 from Debian) and an upstream 4.12-rc2 that I've compiled with PM_TRACE_RTC and PM_DEBUG enabled.

When pm_tracing, here is the Magic number output I get reliably during the next forced restart in dmesg:

[    3.010473]   Magic number: 1:782:177
[    3.010567] acpi device:0d: hash matches


I've also tried to reduce the interaction with most dynamic modules by rmmoding most of them and blacklisting a few other ones. Here is the minimal set I got:

$ lsmod
Module                  Size  Used by
nls_ascii              16384  1
nls_cp437              20480  1
vfat                   20480  1
fat                    65536  1 vfat
evdev                  24576  1
autofs4                40960  2
ext4                  585728  1
crc16                  16384  1 ext4
jbd2                  102400  1 ext4
fscrypto               28672  1 ext4
mbcache                16384  1 ext4
crc32c_intel           24576  2
nvme                   28672  3
nvme_core              40960  5 nvme

but the behavior is exactly the same as when fully loaded. Here are a few other outputs:

# cat /sys/power/state
freeze mem disk

# cat /sys/power/mem_sleep 
s2idle [deep]

I've installed the latest Dell Bios for this machine, 1.1.31 from mid-May and I've tried reverting back to 2 older versions (down to 1.1.26) without any difference at all.

What other outputs would be useful to help investigate this issue? Let me know and I'll do my best to provide them.

Thanks,
Jerome
Comment 1 Jérôme de Bretagne 2017-05-28 15:43:44 UTC
Created attachment 256749 [details]
acpidump for Dell Latitude 7275 with BIOS 1.1.31

Here is the result of acpidump.
Comment 2 Jérôme de Bretagne 2017-05-28 15:49:21 UTC
Created attachment 256751 [details]
dmesg after pm_tracing kernel 4.12-rc2

Showing the Magic number section:

[    2.866144]   Magic number: 5:463:177
[    2.866237] acpi device:0d: hash matches

This run was done with the minimal set of modules loaded , on kernel 4.12-rc2
Comment 3 Jérôme de Bretagne 2017-05-28 16:53:07 UTC
Created attachment 256753 [details]
dmesg on normal boot - kernel 4.12-rc2

And here is the output of dmesg on a normal boot with kernel 4.12-rc2.

There are quite a few error messages, some ACPI-related :

$ dmesg | grep -i "error\|exception\|warning" | grep -v "load for iwlwifi"
[    0.068078] ACPI Error: [\_SB_.PCI0.SAT1] Namespace lookup failure, AE_NOT_FOUND (20170303/dswload-210)
[    0.068093] ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20170303/psobject-241)
[    0.068261] ACPI Exception: AE_NOT_FOUND, (SSDT:IdeTable) while loading table (20170303/tbxfload-228)
[    0.087414] ACPI Error: 1 table load failures, 7 successful (20170303/tbxfload-246)
[    0.585932] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM
[    2.631369] i8042: Warning: Keylock active
[    4.056735] EXT4-fs (nvme0n1p6): re-mounted. Opts: errors=remount-ro
[    4.202120] ACPI Warning: SystemMemory range 0x00000000FE028000-0x00000000FE0281FF conflicts with OpRegion 0x00000000FE028000-0x00000000FE028207 (\_SB.PCI0.GEXP.BAR0) (20170303/utaddress-247)
[    4.204558] ACPI Warning: \_SB.IETM._ART: Return Package type mismatch at index 0 - found Integer, expected Reference (20170303/nspredef-297)
[    4.204587] ACPI Warning: \_SB.IETM._TRT: Return Package has no elements (empty) (20170303/nsprepkg-130)
[    4.228131] intel-lpss: probe of INT3446:00 failed with error -16
[    4.247849] soc_button_array: probe of INT33D2:00 failed with error -2
[    4.309380] Error: Driver 'pcspkr' is already registered, aborting...


I don't know if one of them could give a hint about the source of the issue.
Comment 4 Jérôme de Bretagne 2017-05-28 17:03:10 UTC
Created attachment 256755 [details]
output of lspci for Dell Latitude 7275 - kernel 4.12-rc2
Comment 5 Jérôme de Bretagne 2017-05-30 21:41:28 UTC
Identical behavior tested and reproduced on kernel 4.12-rc3, giving the exact same Magic number after pm_tracing:

[    2.916076]   Magic number: 1:782:177
[    2.916177] acpi device:0d: hash matches
Comment 6 Chen Yu 2017-06-05 09:24:34 UTC
Hi,
could you please boot into a minimal shell by appending "init=/bin/bash" in the commandline, and test with different pm_test mode?

# cat /sys/power/pm_test
[none] core processors platform devices freezer
# echo freezer > /sys/power/pm_test
# echo mem > /sys/power/state
wait for 5 seconds
if succeed, try next pm_test mode:
# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state
until the "none" mode, to see if it works.

please do not enable pm_trace when testing
Comment 7 Jérôme de Bretagne 2017-06-05 10:58:20 UTC
Hi,

Thanks for these instructions. Here are the results when booting 4.12-rc3 into a minimal shell (with pm_trace not enabled as you suggested):

# echo freezer > /sys/power/pm_test
# echo mem > /sys/power/state
Success

# echo devices > /sys/power/pm_test
# echo mem > /sys/power/state
Success

# echo platform > /sys/power/pm_test
# echo mem > /sys/power/state
Success

# echo processors > /sys/power/pm_test
# echo mem > /sys/power/state
Success

# echo core > /sys/power/pm_test
# echo mem > /sys/power/state
Success

# echo none > /sys/power/pm_test
# echo mem > /sys/power/state
Failure, the device can't be resumed. The power button only allows to force-halt the system on a long press (seems 6s) and then it can be rebooted with another medium long press (for about 1s) that gives a small vibration.
Comment 8 Jérôme de Bretagne 2017-06-05 21:46:06 UTC
Hi,

I've investigated other parts, with the intuition that the behavior of the power button / EC may be the source of the issue. So here is some more feedback, hoping it can help.

When suspending in "s2idle" mode instead of the default "deep" value with:
 # echo freeze > /sys/power/state
the system can resume reliably but only with a *very* long press of the power button, for about 6 to 7 seconds.

It was quite surprising at first, especially since the system wakes up instantaneously from sleep on Windows with a usual short press.

I don't know if this long press trigger could still be expected when suspending in "deep" sleep state also, but this time with the long press being interpreted as a forced shutdown instead?

Are there other ways to trigger a resume from "deep" sleep, to check if this is the power button wake-up event that may have an issue, and not the suspend step as I've suspected so far?

Thanks,
Jérome


P.S. I have tried to resume using an RTC alarm clock but it doesn't wakeup from sleep it seems (either "s2idle" or "deep") while it works fine from suspend-to-disk state:
   rtc_cmos 00:01: RTC can wake from S4
Comment 9 Jérôme de Bretagne 2017-06-05 23:14:06 UTC
Hi again,

I've also seen the recent s2idle-dell-test branch created by Rafael J. Wysocki and it reminded me of changes I've seen in past BIOS updates for this system. In BIOS version 1.1.25 in particular, Dell had updated some behaviors as described here: https://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=FKMC9

Fixes: [...]
- Fix the system being reset if carrying in bag.

Enhancements: [...]
- Enhance power button behavior to avoid system reset triggered while putting in the bag.

There are no specific description about the actual modifications though. Since it seemed related to my assumption that there may be an issue with the power button resume event, I've done some tests by adapting Rafael's patch for the Latitude 7275 DMI values to see its effects.

I've then compared the behaviors when running the latest BIOS 1.1.31 and when running 1.1.20 (which was the previous version before 1.1.25 that introduced the above changes). I had to apply 1.1.20 from a recovery USB key btw as the downgrade from 1.1.25 was not supported using the official .exe... Here are the results:

* s2idle tests - BIOS 1.1.20

- kernel 4.12-rc3
Wake up on power long press of about 6-7s

- kernel 4.12-rc3 including the s2idle-dell-test patch (modified for the Latitude 7275)
Wake up on power short press   <-- main change detected


* s2idle tests - BIOS 1.1.31

- kernel 4.12-rc3
Wake up on power long press of about 6-7s

- kernel 4.12-rc3 including the s2idle-dell-test patch (modified for the Latitude 7275)
Wake up on power long press of about 6-7s


So no visible change with/without the patch on BIOS 1.1.31, but there was a positive difference on 1.1.20 as the patch made the s2idle wake-up to be triggered by a more usual short press. Too bad it doesn't work as-is on the latest BIOS revision.

Now, coming back to the original bug report, I wonder if there are ways to update / adapt the EC-based wakeup from ("deep") suspend-to-RAM as this suspend state never worked in all 4 above scenarios.

Are there any logs / other inputs that would be useful to investigate this assumption?

Yu, should Rafael or someone else be CCed on that bug report maybe?

Thanks,
Jérome
Comment 10 Jérôme de Bretagne 2017-06-06 20:27:59 UTC
Same issue on the Latitude 7275 confirmed by another user on the Dell community forum: "it hangs pretty badly if you suspend it".

He got the same Magic number result when testing with pm_trace. URL added for reference.
Comment 11 Jérôme de Bretagne 2017-06-06 20:42:27 UTC
CCing Rafael to share the feedback in #9 about the 's2idel-dell-test' branch (modified for this Latitude 7275 of course) and maybe to get some other ideas/directions about how to investigate this Suspend-to-RAM issue.

The more interesting feedback so far seems to be the one in #8 about the s2idle power button behavior but that may be a wrong lead.

I'm willing to provide any other useful inputs, just let me know! The best would be for me go through a git-bisect session but I haven't found a single working kernel version up to now, and I've tried many...

Thanks,
Jérome
Comment 12 Jérôme de Bretagne 2017-06-06 22:43:15 UTC
Hi,

I've finally found this similar thread for the Dell XPS 13 9365: https://bugzilla.kernel.org/show_bug.cgi?id=192591 , after reading Rafael's comment in his recent EC-based wakeup patch set for some Dell systems stating:

"on the 9365 ACPI S3 (suspend-to-RAM) is not expected to be used at all (the OS these systems ship with never exercises the ACPI S3 path) and suspend-to-idle is the only viable system suspend mechanism in there."

In reference to #82 in this 9365 bug report, I can confirm that the ACPI_S0_LOW_POWER_IDLE FADT flag is also set in this 7275 system:

   # grep "Low Power" facp.dsl
                      Low Power S0 Idle (V5) : 1

indicating that S0 should (must?) be used instead of S3 on the Latitude 7275 model. One main difference though is that S3 still seems to work properly on Dell 9365 while it hangs the Dell 7275.

Srinivas, could you maybe try to get the same confirmation for the Dell 7275 system as you've shared in #80 for the Dell 9365? That would be great.

If S3 is indeed not supported and never going to work on that model, what about changing the /sys/power/mem_sleep value to "[s2idle] deep" instead of "s2idle [deep]"? 

What about even making s2idle the new default for devices setting the ACPI_S0_LOW_POWER_IDLE FADT flag perhaps?

Thanks,
Jérôme
Comment 13 Srinivas Pandruvada 2017-06-06 23:06:28 UTC
Please test Rafael's patch on Dell 7275. If suspend to idle works. If low power S0 idle bit is set we should also add this platform to the list.

Ultimately all platforms can rely on this bit, but till we have test data from several devices, it may not be safe.
Comment 14 Jérôme de Bretagne 2017-06-06 23:39:23 UTC
Hi Srinivas, I did already in fact, please see all the details in comment #9 above.

S3 is not safe currently on this system, while Suspend to idle works fine. It just need a very long press of about 6-7s to trigger the wake-up. This may be a good idea in some scenarios btw (when closing the lid for ex) on this 2-in-1 model since it has the power button on the tablet-portion side, so still on the outside even when the lid is closed. The default behavior should still be the usual short-press one, as is the case on Windows.

To sum up my testing (based on 4.12rc3 at the time), Rafael's patch made no difference on this Dell 7275 when running the latest BIOS 1.1.31 but worked as intended when running an older BIOS 1.1.20 (= short press to wakeup)! I'll try to test again on rc4 in the coming days.

Let me know what kind of other logs / inputs would be useful to better understand the events triggered by the power button on this system, and I'll share them.
Comment 15 Srinivas Pandruvada 2017-06-07 00:18:20 UTC
Can you directly comment in the mailing list with the result? There are some folks for Dell there, who may comment why this platform has to be different.
Comment 16 Jérôme de Bretagne 2017-06-07 23:16:38 UTC
Done, here it is for reference:
http://marc.info/?l=linux-pm&m=149687680603166&w=2 

It gives exactly the same result on 4.12rc4 as it did with 4.12rc3 in comment #9.
Comment 17 Jérôme de Bretagne 2017-06-07 23:33:11 UTC
Created attachment 256913 [details]
acpidump for Dell Latitude 7275 with BIOS 1.1.20

Here is the result of acpidump once downgraded to BIOS 1.1.20.

The previous acpidump was taken with the current BIOS 1.1.31.
Comment 18 Jérôme de Bretagne 2017-06-17 15:45:52 UTC
For those facing the same issue, here is the reference to the interesting follow-up email discussion: http://marc.info/?l=linux-pm&m=149705422318487&w=2

To sum up:

- Mario Limonciello was able to get the confirmation that this system is only officially supporting Connected Standby / Modern Standby "CS/MS" / suspend-to-idle (low power S0 idle) but not suspend-to-RAM (S3)

- its latest BIOS 1.1.31 exposes both Low Power S0 Idle and S3 through ACPI

- when both are declared in ACPI, Linux currently defaults to "deep" suspend-to-RAM (S3) as visible in /sys/power/mem_sleep, as opposed to Windows 8.1 / 10 which defaults to low power S0 idle mode

- when trying to use this default suspend-to-RAM (= S3) on Linux, the system hangs and can not resume. This happens with various (all?) BIOS up to the current 1.1.31 version.

- an internal Dell inquiry has been triggered to see if the S3 ACPI declaration could be removed in a future BIOS update, which would fix the issue by h
Comment 19 Jérôme de Bretagne 2017-06-17 16:08:32 UTC
(previous comment sent too quickly)

... which would fix the issue by making Linux default to suspend-to-idle on that system, which works overall (with the issue that a long-press on the power button is needed to wake-up).

- Another possible fix in the medium term, within the kernel this time, would be to default to suspend-to-idle instead of suspend-to-RAM when the ACPI_S0_LOW_POWER_IDLE FADT flag is set. It could be implemented either for all systems by default at some point or at least for systems known to have major issues with suspend-to-RAM: "Eventually this will change, but for now first we need to sort out the problems with the systems that do S2I."


In the meantime, a user-side fix is to manually change the default Linux behavior by setting /sys/power/mem_sleep to "[s2idle] deep" instead of "s2idle [deep]" with:
 # echo s2idle > /sys/power/mem_sleep

or to suspend manually from the command line with:
 # echo freeze > /sys/power/state
which is using the suspend-to-idle mode.
Comment 20 Chen Yu 2017-06-18 14:25:37 UTC
After suspended to idle(with Rafael's patch modified for the Latitude 7275) on top of BIOS 1.1.31, have to hold the power button for 6 second to wakeup.Well, I think there is still problem that it takes so much time to resume. Another test might be, how about waking up the system from s2idle by rtcwake?
Comment 21 Chen Yu 2017-06-18 14:28:49 UTC
(In reply to Chen Yu from comment #20)
> After suspended to idle(with Rafael's patch modified for the Latitude 7275)
> on top of BIOS 1.1.31, have to hold the power button for 6 second to
> wakeup.Well, I think there is still problem that it takes so much time to
> resume. Another test might be, how about waking up the system from s2idle by
> rtcwake?
AFAIK, BIOS is not involved during s2idle, is it possible that the system has already resumed but the graphic did not show up? maybe this can be verified by ping the 7275 across s2idle test cycle.
Comment 22 Jérôme de Bretagne 2017-06-18 15:59:47 UTC
Hi Chen,

Having to hold the power button for 6 seconds has been discussed at length in the thread mentioned in comment #18; good news is we start to have a good idea of the root cause.

I will open a separate bug entry about this to avoid mixing 2 totally different issues, since defaulting to Suspend-to-RAM / S3 on this machine remains a real issue for end users.
Comment 23 Len Brown 2017-06-19 22:38:26 UTC
we should blacklist this machine as not supporting S3.
Comment 24 Jérôme de Bretagne 2017-07-09 14:27:23 UTC
I've searched a little bit but I haven't found code yet implementing a similar blacklisting for other models not supporting S3.

Where would it be a good place to implement this blacklisting? When loading the compatible Sx ACPI modes maybe?

In case that would be useful, here are some DMI values on this machine:
DMI_SYS_VENDOR   : "Dell Inc."
DMI_PRODUCT_NAME : "Latitude 7275"
Comment 25 Jérôme de Bretagne 2017-08-04 13:19:49 UTC
I can confirm that the following commit:
   "ACPI / PM: Prefer suspend-to-idle over S3 on some systems"
proposed for linux-next in the linux-pm tree fixes this issue of the Dell Latitude 7275 system.

Indeed mem_sleep is now properly mapped to s2idle by default: 
   $ cat /sys/power/mem_sleep
   [s2idle] deep

Thanks again to Rafael. I'll mark this bug as RESOLVED once it will make it into Linus tree.
Comment 26 Zhang Rui 2017-08-07 03:07:32 UTC
Thanks for testing.
This is the bugzilla process we're using, the bug should be marked as Resolved once a patch is proposed for upstream and has been conformed to solve the problem, and the bug will be marked as closed once the patch go upstream.

So mark the bug as resolved.
Comment 27 Jérôme de Bretagne 2017-11-12 21:05:18 UTC
The commit mentioned in comment #25 is now upstream, part of Linux 4.14, here for reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e870c6c87cf9484090d28f2a68aa29e008960c93

So mark this bug as closed.

Note You need to log in before you can comment on or make changes to this bug.