With commit f007cad (Revert "firmware: add sanity check on shutdown/suspend) from Linus’ master branch, resuming the Dell XPS 13 9360 from suspend does not work. For whatever reason *s2idle* seems to be the default again. The same problem was reported in the past [1], and therefore the default was changed back to *deep*. So I wonder this wasn’t tested thoroughly. Please revert back to *deep* by default until *s2idle* has been really tested. Bug 192591 (Suspend to idle & ram issues on Dell XPS 13 9365) [1] seems to be a similar issue, and deals with *s2idle* versus *deep* too. [1] https://lkml.org/lkml/2017/1/17/609 [2] https://bugzilla.kernel.org/show_bug.cgi?id=192591
What's the status of this? FWIW: My XPS13 (9360) with rc2+ uses deep by default and s2idle seems to work as well; but yes, I saw a few hickups during the early devel phase of 4.14, too.
(In reply to Thorsten Leemhuis from comment #1) > What's the status of this? FWIW: My XPS13 (9360) with rc2+ uses deep by > default Not for me. With v4.14-rc2-165-g770b782 it is s2idle by default. ``` commit e870c6c87cf9484090d28f2a68aa29e008960c93 Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Mon Jul 31 23:43:18 2017 +0200 ACPI / PM: Prefer suspend-to-idle over S3 on some systems Modify the ACPI system sleep support setup code to select suspend-to-idle as the default system sleep state if (1) the ACPI_FADT_LOW_POWER_S0 flag is set in the FADT and (2) the Low Power Idle S0 _DSM interface has been discovered and (3) the default sleep state was not selected from the kernel command line. The main motivation for this change is that systems where the (1) and (2) conditions are met typically ship with OSes that don't exercise the S3 path in the platform firmware which remains untested and turns out to be non-functional at least in some cases. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Mario Limonciello <mario.limonciello@dell.com> ``` Again, despite having asked this already, the system this was tested on is *not* noted. Mario, Rafael, could we please have this information in the future in the commit message. > and s2idle seems to work as well; but yes, I saw a few hickups > during the early devel phase of 4.14, too. I tested it again, and contrary to your results, *s2idle* does *not* work. *deep* works. (And is slow as hell, as Dell refuses to fix the firmware as documented in #185611.) [1] https://bugzilla.kernel.org/show_bug.cgi?id=185611
Created attachment 258695 [details] Linux configuration
(In reply to Paul Menzel from comment #3) > Created attachment 258695 [details] > Linux configuration So I have a 9360 here and it works for me as of -rc2 (I'll test -rc3 later today). What exactly doesn't work for you in s2idle?
BTW, on my 9360 "deep" works too (AFAICS) and it is not "slow as hell".
Created attachment 258697 [details] XPS13 9360 config Here's my .config for the XPS13 9360, FWIW.
I also saw suspend regressions (with deep) early in 4.14 (sometime between rc1 and rc2) but I have not observed any other regressions since those were fixed. I would also like to hear what "doesn't work" in s2idle for you Paul.
(In reply to Rafael J. Wysocki from comment #4) > (In reply to Paul Menzel from comment #3) > > Created attachment 258695 [details] > > Linux configuration > > So I have a 9360 here and it works for me as of -rc2 (I'll test -rc3 later > today). Mine is with touch screen, but that shouldn’t matter. Thank you for attaching your configuration in comment 6. > What exactly doesn't work for you in s2idle? After pressing the power button to “resume”, the screen stays black. I have to hold it for then seconds then to power off the system. (In reply to Rafael J. Wysocki from comment #5) > BTW, on my 9360 "deep" works too (AFAICS) and it is not "slow as hell". Well, three to six seconds for resume is very slow compared to Google Chromebooks or Apple devices. Opening the screen, I want to continue working and not stare on a black screen.
(In reply to Paul Menzel from comment #8) > (In reply to Rafael J. Wysocki from comment #4) > > (In reply to Paul Menzel from comment #3) > > > Created attachment 258695 [details] > > > Linux configuration > > > > So I have a 9360 here and it works for me as of -rc2 (I'll test -rc3 later > > today). > > Mine is with touch screen, but that shouldn’t matter. Thank you for > attaching your configuration in comment 6. > > > What exactly doesn't work for you in s2idle? > > After pressing the power button to “resume”, the screen stays black. I have > to hold it for then seconds then to power off the system. Something seems to be missing in your configuration. Please check if the intel_hid module is loaded, for one (that's needed for the power button wakeups to work). > (In reply to Rafael J. Wysocki from comment #5) > > BTW, on my 9360 "deep" works too (AFAICS) and it is not "slow as hell". > > Well, three to six seconds for resume is very slow compared to Google > Chromebooks or Apple devices. Opening the screen, I want to continue working > and not stare on a black screen. Mine resumes within a second or so from "deep". Suspicious ...
(In reply to Rafael J. Wysocki from comment #9) > (In reply to Paul Menzel from comment #8) > > (In reply to Rafael J. Wysocki from comment #4) > > > (In reply to Paul Menzel from comment #3) > > > > Created attachment 258695 [details] > > > > Linux configuration > > > > > > So I have a 9360 here and it works for me as of -rc2 (I'll test -rc3 > later > > > today). > > > > Mine is with touch screen, but that shouldn’t matter. Thank you for > > attaching your configuration in comment 6. > > > > > What exactly doesn't work for you in s2idle? > > > > After pressing the power button to “resume”, the screen stays black. I have > > to hold it for then seconds then to power off the system. > > Something seems to be missing in your configuration. > > Please check if the intel_hid module is loaded, for one (that's needed for > the power button wakeups to work). Well, please check intel_vbtn too. Both load here and I don't quite remember which one handles the events on this machine.
BTW, you know that you can use the keyboard to wake up from s2idle?
> BTW, you know that you can use the keyboard to wake up from s2idle? You need to configure it as a wakeup device though right?
(In reply to Rafael J. Wysocki from comment #9) > (In reply to Paul Menzel from comment #8) > > (In reply to Rafael J. Wysocki from comment #4) > > > (In reply to Paul Menzel from comment #3) > > > > Created attachment 258695 [details] > > > > Linux configuration > > > > > > So I have a 9360 here and it works for me as of -rc2 (I'll test -rc3 > later > > > today). > > > > Mine is with touch screen, but that shouldn’t matter. Thank you for > > attaching your configuration in comment 6. > > > > > What exactly doesn't work for you in s2idle? > > > > After pressing the power button to “resume”, the screen stays black. I have > > to hold it for then seconds then to power off the system. > > Something seems to be missing in your configuration. Comparing my configuration with mine, I do not see anything that is missing on my side. It’d be great if you built a Linux image with my configuration just to rule out any problems in that regard. > Please check if the intel_hid module is loaded, for one (that's needed for > the power button wakeups to work). Well, the power button press is recognized, and the light in the power button goes on after pressing it. The screen stays black though. The modules *intel_hid* and *intel_vbtn* are both loaded. I tested it again. Now the screen comes back but nothing else works, that means I see the status bar at the top, but the password dialog does not pop up, the mouse cursor is not there, and pressing the keyboard does nothing. Using the function key to control the keyboard light works though. (In reply to Mario Limonciello from comment #12) > > BTW, you know that you can use the keyboard to wake up from s2idle? > > You need to configure it as a wakeup device though right? Right. According to `grep enabled /proc/acpi/wakeup` only *XHC*, *LID0* and *PBTN* are enabled. > > (In reply to Rafael J. Wysocki from comment #5) > > > BTW, on my 9360 "deep" works too (AFAICS) and it is not "slow as hell". > > > > Well, three to six seconds for resume is very slow compared to Google > > Chromebooks or Apple devices. Opening the screen, I want to continue > working > > and not stare on a black screen. > > Mine resumes within a second or so from "deep". Suspicious ... So you mean, after one second you can enter your password to unlock the screen? What distribution do you use? I am on Ubuntu 16.04.3 LTS (Xenial Xerus).
On Monday, October 2, 2017 3:47:22 PM CEST bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=196907 > > --- Comment #13 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) > --- > (In reply to Rafael J. Wysocki from comment #9) > > (In reply to Paul Menzel from comment #8) > > > (In reply to Rafael J. Wysocki from comment #4) > > > > (In reply to Paul Menzel from comment #3) > > > > > Created attachment 258695 [details] > > > > > Linux configuration > > > > > > > > So I have a 9360 here and it works for me as of -rc2 (I'll test -rc3 > > later > > > > today). > > > > > > Mine is with touch screen, but that shouldn’t matter. Thank you for > > > attaching your configuration in comment 6. > > > > > > > What exactly doesn't work for you in s2idle? > > > > > > After pressing the power button to “resume”, the screen stays black. I > have > > > to hold it for then seconds then to power off the system. > > > > Something seems to be missing in your configuration. > > Comparing my configuration with mine, I do not see anything that is missing > on > my side. It’d be great if you built a Linux image with my configuration just > to > rule out any problems in that regard. > > > Please check if the intel_hid module is loaded, for one (that's needed for > > the power button wakeups to work). > > Well, the power button press is recognized, and the light in the power button > goes on after pressing it. OK, so the wakeup actually works. > The screen stays black though. > > The modules *intel_hid* and *intel_vbtn* are both loaded. > > I tested it again. Now the screen comes back but nothing else works, that > means > I see the status bar at the top, but the password dialog does not pop up, the > mouse cursor is not there, and pressing the keyboard does nothing. Using the > function key to control the keyboard light works though. Well, that's unusual. It looks like something crashes during resume, then. Can you set /sys/power/pm_test to "devices", try to suspend and see what happens? [It will resume automatically if it works.] > (In reply to Mario Limonciello from comment #12) > > > BTW, you know that you can use the keyboard to wake up from s2idle? > > > > You need to configure it as a wakeup device though right? > > Right. According to `grep enabled /proc/acpi/wakeup` only *XHC*, *LID0* and > *PBTN* are enabled. /proc/acpi/wakeup has nothing to do with that. You need to do # echo enabled > /sys/devices/platform/i8042/serio0/power/wakeup (as root) for it to work. > > > (In reply to Rafael J. Wysocki from comment #5) > > > > BTW, on my 9360 "deep" works too (AFAICS) and it is not "slow as hell". > > > > > > Well, three to six seconds for resume is very slow compared to Google > > > Chromebooks or Apple devices. Opening the screen, I want to continue > > working > > > and not stare on a black screen. > > > > Mine resumes within a second or so from "deep". Suspicious ... > > So you mean, after one second you can enter your password to unlock the > screen? Yes, after a second or so. > What distribution do you use? I am on Ubuntu 16.04.3 LTS (Xenial Xerus). I use openSUSE Leap 42.3 ATM, but there was some kind of Ubuntu installed on it and it didn't have problems with s2idle or S3 either.
/me puts his "regression tracker hat" down and takes his "just a user hat" TWIMC: I have no idea why my machine (XPS13, 9360, without touch) uses S3 by default; I can try to investigate. I recently installed the latest firmware (Version: 2.2.1; Release Date: 08/18/2017), but I doubt that's the reason. I can enable S2I manually and it works. I will give it a shot to see how reliable it works. To add to the confusion another data point: S3 suddenly became unreliable again; maybe it's the BIOS update, maybe 4.14 (4.13 seems to work reliable so far). And I have trouble with wifi as well (http://lists.infradead.org/pipermail/ath10k/2017-October/010189.html). #sigh
(In reply to Thorsten Leemhuis from comment #15) > /me puts his "regression tracker hat" down and takes his "just a user hat" > > TWIMC: I have no idea why my machine (XPS13, 9360, without touch) uses S3 by > default; I can try to investigate. I recently installed the latest firmware > (Version: 2.2.1; Release Date: 08/18/2017), but I doubt that's the reason. I > can enable S2I manually and it works. I will give it a shot to see how > reliable it works. I am still using 1.3.5 05/08/2017. I’ll test the update too. Thank you for the information. > To add to the confusion another data point: S3 suddenly became unreliable > again; maybe it's the BIOS update, maybe 4.14 (4.13 seems to work reliable > so far). And I have trouble with wifi as well > (http://lists.infradead.org/pipermail/ath10k/2017-October/010189.html). #sigh Please add me to CC of that discussion. I am also having wireless trouble, that NetworkManager is unable to connect, but it works manually using `wpa_supplicant`. I am also seeing the messages below. ``` ath10k_pci 0000:3a:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:3a:00.0.bin failed with error -2 ath10k_pci 0000:3a:00.0: Direct firmware load for ath10k/cal-pci-0000:3a:00.0.bin failed with error -2 ``` I do not see `failed to extract amsdu: -11`, but that might only be because I didn’t use the wireless that extensively.
(In reply to Rafael J. Wysocki from comment #14) > On Monday, October 2, 2017 3:47:22 PM CEST > bugzilla-daemon@bugzilla.kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=196907 > > > > --- Comment #13 from Paul Menzel > (pmenzel+bugzilla.kernel.org@molgen.mpg.de) > > --- > > (In reply to Rafael J. Wysocki from comment #9) > > > (In reply to Paul Menzel from comment #8) > > > > (In reply to Rafael J. Wysocki from comment #4) > > > > > (In reply to Paul Menzel from comment #3) […] > > I tested it again. Now the screen comes back but nothing else works, that > means > > I see the status bar at the top, but the password dialog does not pop up, > the > > mouse cursor is not there, and pressing the keyboard does nothing. Using > the > > function key to control the keyboard light works though. > > Well, that's unusual. > > It looks like something crashes during resume, then. > > Can you set /sys/power/pm_test to "devices", try to suspend and see what > happens? [It will resume automatically if it works.] Thank you. That worked. Does It also looks like the module i915 doesn’t find the DMC firmware despite it being present in `/lib/firmware/i915` and `/lib/firmware/4.12.0-rc2+`. ``` kernel: i915 0000:00:02.0: Direct firmware load for i915/kbl_dmc_ver1_01.bin failed with error -2 kernel: i915 0000:00:02.0: Failed to load DMC firmware [https://01.org/linuxgraphics/downloads/firmware], disabling runtime power management ```
(In reply to Paul Menzel from comment #17) > (In reply to Rafael J. Wysocki from comment #14) > > On Monday, October 2, 2017 3:47:22 PM CEST > > bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=196907 > > > > > > --- Comment #13 from Paul Menzel > > (pmenzel+bugzilla.kernel.org@molgen.mpg.de) > > > --- > > > (In reply to Rafael J. Wysocki from comment #9) > > > > (In reply to Paul Menzel from comment #8) > > > > > (In reply to Rafael J. Wysocki from comment #4) > > > > > > (In reply to Paul Menzel from comment #3) > > […] > > > > I tested it again. Now the screen comes back but nothing else works, that > > means > > > I see the status bar at the top, but the password dialog does not pop up, > > the > > > mouse cursor is not there, and pressing the keyboard does nothing. Using > > the > > > function key to control the keyboard light works though. > > > > Well, that's unusual. > > > > It looks like something crashes during resume, then. > > > > Can you set /sys/power/pm_test to "devices", try to suspend and see what > > happens? [It will resume automatically if it works.] > > Thank you. That worked. Does > > It also looks like the module i915 doesn’t find the DMC firmware despite it > being present in `/lib/firmware/i915` and `/lib/firmware/4.12.0-rc2+`. > > ``` > kernel: i915 0000:00:02.0: Direct firmware load for i915/kbl_dmc_ver1_01.bin > failed with error -2 > kernel: i915 0000:00:02.0: Failed to load DMC firmware > [https://01.org/linuxgraphics/downloads/firmware], disabling runtime power > management > ``` Now, I built Linux 4.14-rc3, and the firmware errors are gone. (Maybe because the + is not in there?) Anyway, now resuming from S0ix, this time I could even see the password window appear, but it froze right away, The mouse cursor doesn’t move anymore, and keyboard presses do nothing. Pressing the power button also does not help. Only holding it for ten seconds. I guess I could see the background as during suspend (which took ten seconds or so), I moved the cursor causing the password dialog to appear. Hints for debugging are much appreciated. But right now it’s a regression, which, if not fixed before the 4.14 release, should result in having the commit enabling s2idle reverted.
Since you're running on Ubuntu 16.04 userspace, are you also running the accompanied X stack that goes with 4.13ish? It should be the one from Artful. Normally an out of sync userspace and kernel don't matter significantly, but I'd start there.
(In reply to Thorsten Leemhuis from comment #15) > /me puts his "regression tracker hat" down and takes his "just a user hat" > > TWIMC: I have no idea why my machine (XPS13, 9360, without touch) uses S3 by > default; I can try to investigate. I recently installed the latest firmware > (Version: 2.2.1; Release Date: 08/18/2017), but I doubt that's the reason. I > can enable S2I manually and it works. I will give it a shot to see how > reliable it works. I'd look into the UEFI platform firmware setup. It looks like the "Low Power S0 Idle" bit is not set in the ACPI tables on your machine and there is a switch for that in the platform firmware setup IIRC.
@Paul: What happens if you suspend to idle and wake up using the keyboard? Does it still behave as described in comment #18?
Also please set /sys/power/pm_test to "platform", try to suspend and see what happens.
Moreover, please try to disable the strong stack protection and retest.
Created attachment 258703 [details] XPS13 9360 kernel config (based on the one from Paul) @Paul: I built the kernel using your .config, but I had to disable the "strong stack protection" feature, because my compiler doesn't support it, and that might result in further changes, so the .config actually used for the build is attached. According to ./scripts/diffconfig the difference is minimum, though. Everything works for me with this .config, care to try it?
(In reply to Rafael J. Wysocki from comment #20) > (In reply to Thorsten Leemhuis from comment #15) > > TWIMC: I have no idea why my machine (XPS13, 9360, without touch) uses S3 > by > > default; I can try to investigate. > I'd look into the UEFI platform firmware setup. FWIW: Nothing there, but I found it's a layer 8 error: I still had "echo deep > /sys/power/mem_sleep" in rc.local from the old days (4.10 or something) where s2idle was first introduced, but didn't work properly yet (and was not yet disabled by default). Uhhps. Sorry for the noise /me grumbles at himself
(In reply to Rafael J. Wysocki from comment #24) > Created attachment 258703 [details] > XPS13 9360 kernel config (based on the one from Paul) > > @Paul: I built the kernel using your .config, but I had to disable the > "strong stack protection" feature, because my compiler doesn't support it, > and that might result in further changes, so the .config actually used for > the build is attached. > > According to ./scripts/diffconfig the difference is minimum, though. > > Everything works for me with this .config, care to try it? Thank you, I used that configuration (4.4-rc2+-config-9360-alternative), and here are the results. 1. Booting that system, then doing ``` $ echo s2idle | sudo tee /sys/power/mem_sleep $ systemctl suspend ``` causes the system to quickly go off. Pressing the power button I see the same desktop background as before, meaning the screen lock didn’t kick in. Unfortunately, it is frozen. 2. Reboot, doing ``` $ echo s2idle | sudo tee /sys/power/mem_sleep $ echo enabled | sudo tee /sys/devices/platform/i8042/serio0/power/wakeup $ echo platform | sudo tee /sys/power/pm_test $ systemctl suspend ``` the system sleeps, and quickly after the power button LED goes off for half a second or less, and turns on again. The monitor stays black, but waiting and pressing keys, the password dialog is shown. So it more or less worked. 3. After that, during the same boot ``` $ echo platform | sudo tee /sys/power/pm_test $ systemctl suspend ``` the screen stays black.
(In reply to Mario Limonciello from comment #19) > Since you're running on Ubuntu 16.04 userspace, are you also running the > accompanied X stack that goes with 4.13ish? It should be the one from > Artful. > > Normally an out of sync userspace and kernel don't matter significantly, but > I'd start there. No, I just use the shipped userspace. The Linux kernel’s no regression policy demands, that it keeps working.
>No, I just use the shipped userspace. The Linux kernel’s no regression policy >>demands, that it keeps working. Whether or not anyone will want to admit it there are userspace dependencies on the graphics stack, and fixes that happen to land in userspace that fix freezes that occurred in the kernel stack. There is a reason that Canonical backports upgraded versions of the X stack with their HWE releases of new kernels. Anyway I know this policy exists, and if we don't have this figured out in time for 4.14 I would much rather see a commit that quirks your system to deep than one that reverts the default behavior on FADT low power idle bit. There are other systems that this fixes. Anyway that discussion aside for now let's keep digging. Can you VT switch before going into s2idle and see if there is anything informative with a panic or stack trace? Depending how hard it's down, maybe you can SSH into the box during this time to look what's going on. Furthermore, have you confirmed that with rc3, rafael's config and adjusting sysfs attribute mem_sleep to "deep" things are actually resolved when you go down? Do you encounter any other freezes on the system?
Created attachment 258713 [details] Linux messages for ACPI S3 (deep) (In reply to Mario Limonciello from comment #28) > >No, I just use the shipped userspace. The Linux kernel’s no regression > policy > >>demands, that it keeps working. > Whether or not anyone will want to admit it there are userspace dependencies > on the graphics stack, and fixes that happen to land in userspace that fix > freezes that occurred in the kernel stack. There is a reason that Canonical > backports upgraded versions of the X stack with their HWE releases of new > kernels. If you mean the LTS enablement stacks, these are already installed. ``` $ LANG=C sudo apt-get install --install-recommends linux-generic-hwe-16.04 xserver-xorg-hwe-16.04 Reading package lists... Done Building dependency tree Reading state information... Done linux-generic-hwe-16.04 is already the newest version (4.10.0.35.37). xserver-xorg-hwe-16.04 is already the newest version (1:7.7+16ubuntu3~16.04.1). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. ``` > Anyway I know this policy exists, and if we don't have this figured out in > time for 4.14 I would much rather see a commit that quirks your system to > deep than one that reverts the default behavior on FADT low power idle bit. > There are other systems that this fixes. > > Anyway that discussion aside for now let's keep digging. > > Can you VT switch before going into s2idle and see if there is anything > informative with a panic or stack trace? Switching to a VT beforehand indeed changes something. The screen comes back on, and I can read what was written there before. But I cannot insert anything. Interestingly, I am able to switch VTs, but still no dice when entering anything. When reaching the “X VT”, it stops working, and I cannot switch back (or the screen is not updated). > Depending how hard it's down, maybe you can SSH into the box during this time > to look what's going on. Due to network trouble with this Linux kernel, NetworkManager does not assign an IP address, I have to stop NetworkManager and WPA Supplicant, and assign the IP address manually with `dhclient en…`. After resume, the system loses the IP address. > Furthermore, have you confirmed that with rc3, rafael's config and adjusting > sysfs attribute mem_sleep to "deep" things are actually resolved when you go > down? Yes, normal ACPI S3 (deep) always works. Please find the Linux messages (journalctl -k) attached. > Do you encounter any other freezes on the system? No. [1] https://wiki.ubuntu.com/Kernel/LTSEnablementStack
Just an update, I was able to still the ping the system, but I am unable to log in with SSH.
>If you mean the LTS enablement stacks, these are already installed. OK great to hear. You're in much better shape then, i'm not as worried about this. >Due to network trouble with this Linux kernel, NetworkManager does not assign >>an IP address, I have to stop NetworkManager and WPA Supplicant, and assign >>the IP address manually with `dhclient en…`. After resume, the system loses >>the IP address. It looks like you might a USB Ethernet dongle according to your log. Maybe you'll have more luck with that staying up (or coming back up) after resume. >Switching to a VT beforehand indeed changes something. The screen comes back >>on, and I can read what was written there before. But I cannot insert >>anything. Interestingly, I am able to switch VTs, but still no dice when >>entering anything. When reaching the “X VT”, it stops working, and I cannot >>switch back (or the screen is not updated). So the system is somewhat alive still, this feels like a regression in runtime PM for the graphics stack somewhere. The S2I codepath exercises the same codepaths that runtime PM does Looking at your log, I think you might actually have a DA200 HDMI/VGA/ethernet/USB type-C dongle. Am I right? Do you normally keep it plugged in all the time? Does not having it plugged in while going into s2idle help?
(In reply to Mario Limonciello from comment #31) […] > Looking at your log, I think you might actually have a DA200 > HDMI/VGA/ethernet/USB type-C dongle. > Am I right? Yes, you are. > Do you normally keep it plugged in all the time? Does not > having it plugged in while going into s2idle help? No, I only plugged it in to get debug information. The problem is also there without the dongle plugged in.
>No, I only plugged it in to get debug information. The problem is also there >>without the dongle plugged in. OK. Since you were able to VT switch, that makes me suspect that the system was able to log things after the wakeup for s2idle. Can you share that too? Anything interesting about the state of i915 would be good to see.
(In reply to Paul Menzel from comment #27) > (In reply to Mario Limonciello from comment #19) > > Since you're running on Ubuntu 16.04 userspace, are you also running the > > accompanied X stack that goes with 4.13ish? It should be the one from > > Artful. > > > > Normally an out of sync userspace and kernel don't matter significantly, > but > > I'd start there. > > No, I just use the shipped userspace. The Linux kernel’s no regression > policy demands, that it keeps working. Suspend-to-idle doesn't really depend on user space, so it shouldn't matter.
> Suspend-to-idle doesn't really depend on user space, so it shouldn't matter. I was getting at potentially there are userspace fixes related to i915 to maybe help with this freeze if Paul was on the original 16.04 X stack. This is N/A though.
(In reply to Paul Menzel from comment #26) > (In reply to Rafael J. Wysocki from comment #24) [cut] > 3. After that, during the same boot > > ``` > $ echo platform | sudo tee /sys/power/pm_test > $ systemctl suspend > ``` > > the screen stays black. This means that suspend/resume quite fundamentally doesn't work for you at all if the target system state is S0. IOW, the problem is not with the default setting, but with the whole thing being non-functional on your machine.
Which basically is limited to your machine ATM as all of the users of 9360 I know about (including myself) except for you don't see any s2idle problems on it. Anyway, please try: $ sudo su - # echo s2idle > /sys/power/mem_sleep # echo enabled > /sys/devices/platform/i8042/serio0/power/wakeup # echo 1 > /sys/power/pm_debug_messages # echo mem > /sys/power/state and then press any key on the keyboard to wake up. If it does wake up and if you are able to get anything out of it then, please save the output of dmesg and attach it here.
(In reply to Mario Limonciello from comment #35) > > Suspend-to-idle doesn't really depend on user space, so it shouldn't > matter. > I was getting at potentially there are userspace fixes related to i915 to > maybe help with this freeze if Paul was on the original 16.04 X stack. This > is N/A though. I'm still suspecting a graphics stack issue which triggers when X starts to talk to the GPU after resume. And I suspect that it happens because different pieces of it don't match. The way S3 helps is likely because of resetting the HW or similar.
@Paul: If you have a 4.13 4.13-rc still installed, it would be good to check if s2idle works for you with that.
(In reply to Rafael J. Wysocki from comment #39) > @Paul: If you have a 4.13 4.13-rc still installed, it would be good to check > if s2idle works for you with that. So, S0ix does not work with Linux 4.13-rc1+, Linus’ master branch from July 18, 2017. During suspend the image flashes, then resuming, the image gets back, but the system is frozen. No password dialog is shown. It also does not work with Linux 4.13+ (“4.14-rc0”) from Linux’ master branch from September 11th, 2017. Suspending takes quite long until the power LED turns off. Resuming, the screen stays black. PS: OT, but Dell doesn’t share what bugs are fixed in their firmware update [1]. [1] https://twitter.com/DellCares/status/915510808158855168
(In reply to Rafael J. Wysocki from comment #37) > Which basically is limited to your machine ATM as all of the users of 9360 I > know about (including myself) except for you don't see any s2idle problems > on it. > > Anyway, please try: > > $ sudo su - > # echo s2idle > /sys/power/mem_sleep > # echo enabled > /sys/devices/platform/i8042/serio0/power/wakeup > # echo 1 > /sys/power/pm_debug_messages > # echo mem > /sys/power/state > > and then press any key on the keyboard to wake up. > > If it does wake up and if you are able to get anything out of it then, > please save the output of dmesg and attach it here. Ok, test with Linux 4.14.-rc3, LightDM stopped, and on VT. System suspend and resumes, and I can press keys, which are show, but it hangs. No Linux messages are visible at all though. ``` $ echo mem | sudo tee /sys/power/state mem [my enter keys] niaedtuniae^C^C^C^C^C^C^C^C^C^C^C uniaend ```
If you have an idea, how I can access or print out the logs with no working network connection after resume, I am all ears. (I guess it needs to be printed to the console.)
(In reply to Paul Menzel from comment #41) > (In reply to Rafael J. Wysocki from comment #37) > > Which basically is limited to your machine ATM as all of the users of 9360 > I > > know about (including myself) except for you don't see any s2idle problems > > on it. > > > > Anyway, please try: > > > > $ sudo su - > > # echo s2idle > /sys/power/mem_sleep > > # echo enabled > /sys/devices/platform/i8042/serio0/power/wakeup > > # echo 1 > /sys/power/pm_debug_messages > > # echo mem > /sys/power/state > > > > and then press any key on the keyboard to wake up. > > > > If it does wake up and if you are able to get anything out of it then, > > please save the output of dmesg and attach it here. > > Ok, test with Linux 4.14.-rc3, LightDM stopped, and on VT. System suspend > and resumes, and I can press keys, which are show, but it hangs. No Linux > messages are visible at all though. And under X it just hangs, right? Please try to disable the C8-C10 idle states via /sys/devices/system/cpu/cpu[0-3]/cpuidle/state[6-8]/disabled and see if the problem is still there with that.
I should have mentioned that there is a relatively easy workaround for non-working s2idle which is to make the boot loader append "mem_sleep_default=deep" to the kernel command line.
Which reminds me of one more thing: please attach the output of dmidecode from your system.
So I spent some time trying to replicate Paul's setup to reproduce his problem. 1) Brand new XPS 9360 with Ubuntu 16.04 factory image. 2) Ran all regular Ubuntu updates. 3) Ran BIOS update to 2.2.1 using fwupd. 4) Compiled kernel 4.14-rc3 using the most recent config posted by Rafael. 5) Installed, rebooted [ 0.000000] Linux version 4.14.0-rc3 (test@test-XPS-13-9360) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)) #1 SMP Thu Oct 5 19:04:05 EDT 2017 6) systemctl suspend 7) Power button to wake up. No problems. [ 97.551459] PM: suspend entry (s2idle) [ 97.551460] PM: Syncing filesystems ... done. [ 97.581561] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 97.583183] OOM killer disabled. [ 97.583184] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 97.584281] Suspending console(s) (use no_console_suspend to debug) [ 1488.148940] OOM killer enabled. [ 1488.148942] Restarting tasks ... done. [ 1488.150759] audit: type=1400 audit(1507258475.839:123): apparmor="DENIED" operation="create" profile="/usr/sbin/cups-browsed" pid=960 comm="cups-browsed" family="unix" sock_type="stream" protocol=0 requested_mask="create" denied_mask="create" [ 1488.161360] [drm] RC6 on [ 1488.200752] PM: suspend exit 8) I noticed that I'm *not* using the latest Xorg HWE stack, just the regular one that came with 16.04.1 so I upgraded to try to get closer to Paul's config. Before: ii xorg 1:7.7+13ubuntu3 amd64 X.Org X Window System ii xorg-docs-core 1:1.7.1-1ubuntu1 all Core documentation for the X.org X Window System ii xorriso 1.4.2-4ubuntu1 amd64 command line ISO-9660 and Rock Ridge manipulation tool ii xserver-common 2:1.18.4-0ubuntu0.4 all common files used by various X servers ii xserver-xorg 1:7.7+13ubuntu3 amd64 X.Org X server ii xserver-xorg-core 2:1.18.4-0ubuntu0.4 amd64 Xorg X server - core server ii xserver-xorg-input-all 1:7.7+13ubuntu3 amd64 X.Org X server -- input driver metapackage ii xserver-xorg-input-evdev 1:2.10.1-1ubuntu2 amd64 X.Org X server -- evdev input driver ii xserver-xorg-input-synaptics 1.8.2-1ubuntu3 amd64 Synaptics TouchPad driver for X.Org server ii xserver-xorg-input-vmmouse 1:13.1.0-1ubuntu2 amd64 X.Org X server -- VMMouse input driver to use with VMWare ii xserver-xorg-input-wacom 1:0.32.0-0ubuntu3 amd64 X.Org X server -- Wacom input driver ii xserver-xorg-video-all 1:7.7+13ubuntu3 amd64 X.Org X server -- output driver metapackage ii xserver-xorg-video-amdgpu 1.1.2-0ubuntu0.16.04.1 amd64 X.Org X server -- AMDGPU display driver ii xserver-xorg-video-ati 1:7.7.0-1 amd64 X.Org X server -- AMD/ATI display driver wrapper ii xserver-xorg-video-fbdev 1:0.4.4-1build5 amd64 X.Org X server -- fbdev display driver ii xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1.2 amd64 X.Org X server -- Intel i8xx, i9xx display driver ii xserver-xorg-video-nouveau 1:1.0.12-1build2 amd64 X.Org X server -- Nouveau display driver ii xserver-xorg-video-qxl 0.1.4-3ubuntu3 amd64 X.Org X server -- QXL display driver ii xserver-xorg-video-radeon 1:7.7.0-1 amd64 X.Org X server -- AMD/ATI Radeon display driver ii xserver-xorg-video-vesa 1:2.3.4-1build2 amd64 X.Org X server -- VESA display driver ii xserver-xorg-video-vmware 1:13.1.0-2ubuntu3 amd64 X.Org X server -- VMware display driver After: ii xorg 1:7.7+13ubuntu3 amd64 X.Org X Window System ii xorg-docs-core 1:1.7.1-1ubuntu1 all Core documentation for the X.org X Window System ii xorriso 1.4.2-4ubuntu1 amd64 command line ISO-9660 and Rock Ridge manipulation tool ii xserver-common 2:1.18.4-0ubuntu0.4 all common files used by various X servers rc xserver-xorg 1:7.7+13ubuntu3 amd64 X.Org X server rc xserver-xorg-core 2:1.18.4-0ubuntu0.4 amd64 Xorg X server - core server ii xserver-xorg-core-hwe-16.04 2:1.19.3-1ubuntu1~16.04.2 amd64 Xorg X server - core server ii xserver-xorg-hwe-16.04 1:7.7+16ubuntu3~16.04.1 amd64 X.Org X server ii xserver-xorg-input-all-hwe-16.04 1:7.7+16ubuntu3~16.04.1 amd64 X.Org X server -- input driver metapackage ii xserver-xorg-input-evdev-hwe-16.04 1:2.10.5-1ubuntu1~16.04.1 amd64 X.Org X server -- evdev input driver ii xserver-xorg-input-synaptics-hwe-16.04 1.9.0-1ubuntu1~16.04.1 amd64 Synaptics TouchPad driver for X.Org server ii xserver-xorg-input-wacom-hwe-16.04 1:0.34.0-0ubuntu2~16.04.1 amd64 X.Org X server -- Wacom input driver ii xserver-xorg-legacy-hwe-16.04 2:1.19.3-1ubuntu1~16.04.2 amd64 setuid root Xorg server wrapper ii xserver-xorg-video-all-hwe-16.04 1:7.7+16ubuntu3~16.04.1 amd64 X.Org X server -- output driver metapackage ii xserver-xorg-video-amdgpu-hwe-16.04 1.3.0-0ubuntu1~16.04.1 amd64 X.Org X server -- AMDGPU display driver ii xserver-xorg-video-ati-hwe-16.04 1:7.9.0-0ubuntu1~16.04.1 amd64 X.Org X server -- AMD/ATI display driver wrapper ii xserver-xorg-video-fbdev-hwe-16.04 1:0.4.4-1build6~16.04.1 amd64 X.Org X server -- fbdev display driver rc xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1.2 amd64 X.Org X server -- Intel i8xx, i9xx display driver ii xserver-xorg-video-intel-hwe-16.04 2:2.99.917+git20170309-0ubuntu1~16.04.1 amd64 X.Org X server -- Intel i8xx, i9xx display driver ii xserver-xorg-video-nouveau-hwe-16.04 1:1.0.14-0ubuntu1~16.04.1 amd64 X.Org X server -- Nouveau display driver ii xserver-xorg-video-qxl-hwe-16.04 0.1.5-2build1~16.04.1 amd64 X.Org X server -- QXL display driver ii xserver-xorg-video-radeon-hwe-16.04 1:7.9.0-0ubuntu1~16.04.1 amd64 X.Org X server -- AMD/ATI Radeon display driver ii xserver-xorg-video-vesa-hwe-16.04 1:2.3.4-1build3~16.04.1 amd64 X.Org X server -- VESA display driver rc xserver-xorg-video-vmware 1:13.1.0-2ubuntu3 amd64 X.Org X server -- VMware display driver ii xserver-xorg-video-vmware-hwe-16.04 1:13.2.1-1build1~16.04.1 amd64 X.Org X server -- VMware display driver 9) Rebooted, tried systemctl suspend again. 10) Wake up from power button. No problems again. [ 69.715546] PM: suspend entry (s2idle) [ 69.715549] PM: Syncing filesystems ... done. [ 69.726396] Freezing user space processes ... (elapsed 0.002 seconds) done. [ 69.728961] OOM killer disabled. [ 69.728962] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 69.730254] Suspending console(s) (use no_console_suspend to debug) [ 69.934204] psmouse serio1: Failed to disable mouse on isa0060/serio1 [ 75.933261] ACPI: button: The lid device is not compliant to SW_LID. [ 76.177890] OOM killer enabled. [ 76.177896] Restarting tasks ... done. [ 76.197786] [drm] RC6 on [ 76.224637] PM: suspend exit I'm not sure what else about my setup can be any different than Paul's to lead to his problems. What else have you changed from the factory Ubuntu image? Funky DKMS packages? TLP configuration?
Created attachment 258733 [details] full dmesg output from working s2idle after 4..14-rc3 + xorg-hwe package
Created attachment 258735 [details] packages installed on Ubuntu
(In reply to Rafael J. Wysocki from comment #43) > (In reply to Paul Menzel from comment #41) > > (In reply to Rafael J. Wysocki from comment #37) > > > Which basically is limited to your machine ATM as all of the users of > 9360 > > I > > > know about (including myself) except for you don't see any s2idle > problems > > > on it. > > > > > > Anyway, please try: > > > > > > $ sudo su - > > > # echo s2idle > /sys/power/mem_sleep > > > # echo enabled > /sys/devices/platform/i8042/serio0/power/wakeup > > > # echo 1 > /sys/power/pm_debug_messages > > > # echo mem > /sys/power/state > > > > > > and then press any key on the keyboard to wake up. > > > > > > If it does wake up and if you are able to get anything out of it then, > > > please save the output of dmesg and attach it here. > > > > Ok, test with Linux 4.14.-rc3, LightDM stopped, and on VT. System suspend > > and resumes, and I can press keys, which are show, but it hangs. No Linux > > messages are visible at all though. > > And under X it just hangs, right? > > Please try to disable the C8-C10 idle states via > > /sys/devices/system/cpu/cpu[0-3]/cpuidle/state[6-8]/disabled > > and see if the problem is still there with that. Yes, the problem is still there with that. I stopped LightDM beforehand, and from a VT, I executed `echo mem | sudo tee /sys/power/state`. One more thing, on the VT where there was a login screen (TTY2 in my case), the key presses were not shown. Switching the TTY was still possible though.
(In reply to Rafael J. Wysocki from comment #44) > I should have mentioned that there is a relatively easy workaround for > non-working s2idle which is to make the boot loader append > "mem_sleep_default=deep" to the kernel command line. Yes, that is currently used to work around the problem.
I booted with `splash quiet` removed from the Linux command line, and added `nomodeset`. Unfortunadely, that didn’t change a thing (LightDM stoppd, issue `echo mem | sudo tee /sys/power/state`.
Created attachment 258743 [details] Output of `dmidecode` (In reply to Rafael J. Wysocki from comment #45) > Which reminds me of one more thing: please attach the output of dmidecode > from your system. Please find it attached.
(In reply to Mario Limonciello from comment #46) > So I spent some time trying to replicate Paul's setup to reproduce his > problem. […] Here is the list of installed packages with the prefix `xserver-xorg`. ``` $ LANG=C dpkg -l xserver-xorg* | grep ^ii ii xserver-xorg-core-hwe-16.04 2:1.19.3-1ubuntu1~16.04.2 amd64 Xorg X server - core server ii xserver-xorg-hwe-16.04 1:7.7+16ubuntu3~16.04.1 amd64 X.Org X server ii xserver-xorg-input-all-hwe-16.04 1:7.7+16ubuntu3~16.04.1 amd64 X.Org X server -- input driver metapackage ii xserver-xorg-input-evdev-hwe-16.04 1:2.10.5-1ubuntu1~16.04.1 amd64 X.Org X server -- evdev input driver ii xserver-xorg-input-synaptics-hwe-16.04 1.9.0-1ubuntu1~16.04.1 amd64 Synaptics TouchPad driver for X.Org server ii xserver-xorg-input-wacom-hwe-16.04 1:0.34.0-0ubuntu2~16.04.1 amd64 X.Org X server -- Wacom input driver ii xserver-xorg-legacy 2:1.18.4-0ubuntu0.4 amd64 setuid root Xorg server wrapper ii xserver-xorg-video-all-hwe-16.04 1:7.7+16ubuntu3~16.04.1 amd64 X.Org X server -- output driver metapackage ii xserver-xorg-video-amdgpu-hwe-16.04 1.3.0-0ubuntu1~16.04.1 amd64 X.Org X server -- AMDGPU display driver ii xserver-xorg-video-ati-hwe-16.04 1:7.9.0-0ubuntu1~16.04.1 amd64 X.Org X server -- AMD/ATI display driver wrapper ii xserver-xorg-video-fbdev-hwe-16.04 1:0.4.4-1build6~16.04.1 amd64 X.Org X server -- fbdev display driver ii xserver-xorg-video-intel-hwe-16.04 2:2.99.917+git20170309-0ubuntu1~16.04.1 amd64 X.Org X server -- Intel i8xx, i9xx display driver ii xserver-xorg-video-nouveau-hwe-16.04 1:1.0.14-0ubuntu1~16.04.1 amd64 X.Org X server -- Nouveau display driver ii xserver-xorg-video-qxl-hwe-16.04 0.1.5-2build1~16.04.1 amd64 X.Org X server -- QXL display driver ii xserver-xorg-video-radeon-hwe-16.04 1:7.9.0-0ubuntu1~16.04.1 amd64 X.Org X server -- AMD/ATI Radeon display driver ii xserver-xorg-video-vesa-hwe-16.04 1:2.3.4-1build3~16.04.1 amd64 X.Org X server -- VESA display driver ii xserver-xorg-video-vmware-hwe-16.04 1:13.2.1-1build1~16.04.1 amd64 X.Org X server -- VMware display driver ```
Created attachment 258745 [details] Difference in Linux messages from non-working vs. working Note, the non-working example is captured from *deep*, as *s2idle* does not work. > I'm not sure what else about my setup can be any different than Paul's to > lead to his problems. What else have you changed from the factory Ubuntu > image? > > Funky DKMS packages? TLP configuration? Please find the differences in the Linux messages attached. Here are the differences I see. 1. I still use the firmware 1.3.5. Mario uses 2.2.1. (I won’t use that without a published change-log.) 2. Despite Mario using the *latest* firmware, I have the newer microcode updates. (Why doesn’t Dell *latest* firmware ship them?) 3. I have Thunderbolt messages in my Linux messages. (That could be from when having the DA200 adapter plugged in?) 4. Oh, and I forgot, the drive is LUKS encrypted here. I thought, that’s standard by default. Maybe you spot more differences.
Well I can make a few changes to get closer to your config. My goal is to reproduce. > 1. I still use the firmware 1.3.5. Mario uses 2.2.1. (I won’t use that > >without a published change-log.) I downgraded to 1.3.5. >2. Despite Mario using the *latest* firmware, I have the newer microcode >>updates. (Why doesn’t Dell *latest* firmware ship them?) I installed intel-microcode from xenial-updates/restricted. >3. I have Thunderbolt messages in my Linux messages. (That could be from when >>having the DA200 adapter plugged in?) I don't have a DA200 handy, but I tried to do my S2I test with a WD15 plugged in. >4. Oh, and I forgot, the drive is LUKS encrypted here. I thought, that’s >>standard by default. Well this would require a full re-install. I don't suspect LUKS to be the cause as S4 isn't in use. With those changes to the configuration I still can't reproduce.
Created attachment 258747 [details] Mario Downgraded to 1.3.5, installed microcode, plugged in WD15
Checked all DKMS packages on the system, none are built for 4.14-rc3. Purged TLP to ensure it's not messing with going down. Still no problems with S2I for me. Other thing I could think would be firmware versions for the graphics FW. Did you hand modify anything? $ apt policy linux-firmware linux-firmware: Installed: 1.157.12 Candidate: 1.157.12 Version table: *** 1.157.12 500 500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages 500 http://archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages 100 /var/lib/dpkg/status OEM image does make the following other modifications relative to stock Ubuntu, but I don't think they're relevant. * diversion of /lib/firmware/ath10k/QCA6174/hw3.0/board.bin to /lib/firmware/ath10k/QCA6174/hw3.0/board.bin.wifi-qca6174 by wifi-qca6174-killer * diversion of /lib/udev/rules.d/50-bluetooth-hci-auto-poweron.rules to /lib/udev/rules.d/50-bluetooth-hci-auto-poweron.rules.oem-bluez-autoenable by oem-bluez-autoenable * diversion of /lib/firmware/ath10k/QCA6174/hw3.0/board-2.bin to /lib/firmware/ath10k/QCA6174/hw3.0/board-2.bin.wifi-qca6174 by wifi-qca6174-killer * diversion of /etc/bluetooth/main.conf to /etc/bluetooth/main.conf.oem-bluez-autoenable by oem-bluez-autoenable
Regarding network problems, I did notice network manager and dhclient isn't getting along with 4.14-rc3: [ 21.634528] audit: type=1400 audit(1507295458.955:58): apparmor="DENIED" operation="create" profile="/sbin/dhclient" pid=1969 comm="dhclient" family="unix" sock_type="stream" protocol=0 requested_mask="create" denied_mask="create" [ 21.637698] audit: type=1400 audit(1507295458.959:59): apparmor="DENIED" operation="create" profile="/usr/lib/NetworkManager/nm-dhcp-helper" pid=1970 comm="nm-dhcp-helper" family="unix" sock_type="stream" protocol=0 requested_mask="create" denied_mask="create" I would recommend doing the following before S2I: # /etc/init.d/apparmor teardown # /etc/init.d/apparmor stop Hopefully our network will go back up then after leaving S2I.
Looking at your diff closer I notice that you *do* have different firmware than is in the linux-firmware package in Ubuntu. i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem -[drm] Finished loading DMC firmware i915/kbl_dmc_ver1_01.bin (v1.1) +i915 0000:00:02.0: Direct firmware load for i915/kbl_dmc_ver1_01.bin failed with error -2 +i915 0000:00:02.0: Failed to load DMC firmware [https://01.org/linuxgraphics/downloads/firmware], disabling runtime power management. So I fetched https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/i915/kbl_dmc_ver1_01.bin?id=9facc31d772a3ed399760d2168d644f9de84d6db and put in place on my system. [ 1.267409] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_01.bin (v1.1) I still can't reproduce.
OK Thanks Mario for doing all this work! Paul, please let us know what else we can do to reproduce the problem that you are seeing.
(In reply to Rafael J. Wysocki from comment #60) > OK > > Thanks Mario for doing all this work! Seconded. Thank you Mario. Also for the AppArmor help to fix networking. > Paul, please let us know what else we can do to reproduce the problem that > you are seeing. Well, I am at a loss too, what is going on. You have my Linux messages, so no idea, what other difference there is besides the LUKS encrypted system. Removing the wireless modules before suspend, also didn’t help. Despite the working networking, I am unable to login after resume. Any hints on how to print something to the screen (`no_console_suspend` didn’t help) would be really great. I do have a BeagleBone Black, which supposedly works as a USB debug dongle. But no idea, how to build a Linux kernel to fix this.
Do you have anything else on your system that you've added that adjusts tunables in sysfs? Any options that you've put in /etc/modprobe.d that could related to module options? Anything you've put in /etc/sysctl.conf or /etc/sysctl.d?
(In reply to Mario Limonciello from comment #62) > Do you have anything else on your system that you've added that adjusts > tunables in sysfs? Not that I know of. > Any options that you've put in /etc/modprobe.d that could related to module > options? ``` $ ls -l --full-time /etc/modprobe.d/ insgesamt 56 -rw-r--r-- 1 root root 2507 2015-07-31 05:42:17.000000000 +0200 alsa-base.conf -rw-r--r-- 1 root root 325 2016-03-13 14:36:35.000000000 +0100 blacklist-ath_pci.conf -rw-r--r-- 1 root root 1603 2016-03-13 14:36:35.000000000 +0100 blacklist.conf -rw-r--r-- 1 root root 210 2016-03-13 14:36:35.000000000 +0100 blacklist-firewire.conf -rw-r--r-- 1 root root 697 2016-03-13 14:36:35.000000000 +0100 blacklist-framebuffer.conf -rw-r--r-- 1 root root 156 2015-07-31 05:42:17.000000000 +0200 blacklist-modem.conf lrwxrwxrwx 1 root root 41 2017-03-20 18:30:19.027495898 +0100 blacklist-oss.conf -> /lib/linux-sound-base/noOSS.modprobe.conf -rw-r--r-- 1 root root 583 2016-03-13 14:36:35.000000000 +0100 blacklist-rare-network.conf -rw-r--r-- 1 root root 1077 2016-03-13 14:36:35.000000000 +0100 blacklist-watchdog.conf -rw-r--r-- 1 root root 127 2015-03-11 19:00:32.000000000 +0100 dkms.conf -rw-r--r-- 1 root root 390 2016-04-12 12:06:37.000000000 +0200 fbdev-blacklist.conf -rw-r--r-- 1 root root 154 2015-11-10 06:13:21.000000000 +0100 intel-microcode-blacklist.conf -rw-r--r-- 1 root root 347 2016-03-13 14:36:35.000000000 +0100 iwlwifi.conf -rw-r--r-- 1 root root 104 2016-03-13 14:36:35.000000000 +0100 mlx4.conf -rw-r--r-- 1 root root 30 2017-01-26 19:31:13.000000000 +0100 vmwgfx-fbdev.conf $ md5sum /etc/modules 8b5141eea356759244831b4ed2df9aee /etc/modules ``` > Anything you've put in /etc/sysctl.conf or /etc/sysctl.d? ``` $ ls -l --full-time /etc/sysctl.conf -rw-r--r-- 1 root root 2084 2015-09-06 07:30:20.000000000 +0200 /etc/sysctl.conf $ md5sum /etc/sysctl.conf 76c1d8285c578d5e827c3e07b9738112 /etc/sysctl.conf $ ls -l /etc/sysctl.d/ insgesamt 40 -rw-r--r-- 1 root root 77 Jan 13 2016 10-console-messages.conf -rw-r--r-- 1 root root 490 Jan 13 2016 10-ipv6-privacy.conf -rw-r--r-- 1 root root 726 Jan 13 2016 10-kernel-hardening.conf -rw-r--r-- 1 root root 257 Jan 13 2016 10-link-restrictions.conf -rw-r--r-- 1 root root 1184 Jan 13 2016 10-magic-sysrq.conf -rw-r--r-- 1 root root 509 Jan 13 2016 10-network-security.conf -rw-r--r-- 1 root root 1292 Jan 13 2016 10-ptrace.conf -rw-r--r-- 1 root root 506 Jan 13 2016 10-zeropage.conf -rw-r--r-- 1 root root 69 Okt 24 2015 30-tracker.conf lrwxrwxrwx 1 root root 14 Jul 19 01:56 99-sysctl.conf -> ../sysctl.conf -rw-r--r-- 1 root root 519 Jan 13 2016 README ``` ``` $ lspci -nn 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:5904] (rev 02) 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:5916] (rev 02) 00:04.0 Signal processing controller [1180]: Intel Corporation Skylake Processor Thermal Subsystem [8086:1903] (rev 02) 00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) 00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Thermal subsystem [8086:9d31] (rev 21) 00:15.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller [8086:9d60] (rev 21) 00:15.1 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller [8086:9d61] (rev 21) 00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-LP CSME HECI [8086:9d3a] (rev 21) 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) 00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d14] (rev f1) 00:1c.5 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d15] (rev f1) 00:1d.0 PCI bridge [0604]: Intel Corporation Device [8086:9d18] (rev f1) 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:9d58] (rev 21) 00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC [8086:9d21] (rev 21) 00:1f.3 Audio device [0403]: Intel Corporation Device [8086:9d71] (rev 21) 00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] (rev 21) 3a:00.0 Network controller [0280]: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter [168c:003e] (rev 32) 3b:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01) 3c:00.0 Non-Volatile memory controller [0108]: Device [1c5c:1284] ```
(In reply to Paul Menzel from comment #61) > (In reply to Rafael J. Wysocki from comment #60) > > OK > > > > Thanks Mario for doing all this work! > > Seconded. Thank you Mario. Also for the AppArmor help to fix networking. > > > Paul, please let us know what else we can do to reproduce the problem that > > you are seeing. > > Well, I am at a loss too, what is going on. You have my Linux messages, so > no idea, what other difference there is besides the LUKS encrypted system. > > Removing the wireless modules before suspend, also didn’t help. Despite the > working networking, I am unable to login after resume. > > Any hints on how to print something to the screen (`no_console_suspend` > didn’t help) would be really great. I do have a BeagleBone Black, which > supposedly works as a USB debug dongle. But no idea, how to build a Linux > kernel to fix this. Wasn’t there also a way to save something in EFI variables?
Thanks for sharing all that, I don't see anything different in module options. I do however notice that we have different NVMe disks. Yours is a Hynix, mine is a Toshiba. Given state of your system not being so great even when you tear out pieces of graphics, I'm wondering if perhaps we're looking at an APST issue with that particular disk. My explanation would be that you aren't going resident to PS4 at runtime typically, but S2I that does happen. Try to boot up with setting nvme core's default_ps_max_latency to 0. Share your output from: #nvme id-ctrl /dev/nvm0n1 in that boot. Another possibility to look at this if it's APST would be break into your initramfs and to invoke S2I from the initramfs. The root on /dev/nvmen0 won't be mounted, so hopefully if it's a problem with the disk it shouldn't bring userspace down with it.
(In reply to Mario Limonciello from comment #65) > Thanks for sharing all that, I don't see anything different in module > options. > > I do however notice that we have different NVMe disks. Yours is a Hynix, > mine is a Toshiba. > > Given state of your system not being so great even when you tear out pieces > of graphics, I'm wondering if perhaps we're looking at an APST issue with > that particular disk. My explanation would be that you aren't going > resident to PS4 at runtime typically, but S2I that does happen. > > Try to boot up with setting nvme core's default_ps_max_latency to 0. Setting `nvme.default_ps_max_latency` on the Linux command line didn’t work. I guess it’s nvme-core, but no time now. Setting it at the running system over sysfs worked (_us at the end), but resume didn’t work either. > Share your output from: #nvme id-ctrl /dev/nvm0n1 in that boot. ``` NVME Identify Controller: vid : 0x1c5c ssvid : 0x1c5c sn : FS6BN01071010B54P mn : PC300 NVMe SK hynix 512GB fr : 20004A00 rab : 1 ieee : ace42e cmic : 0 mdts : 5 cntlid : 0 ver : 10200 rtd3r : 90f560 rtd3e : ea60 oaes : 0 oacs : 0x16 acl : 3 aerl : 3 frmw : 0x16 lpa : 0x2 elpe : 254 npss : 4 avscc : 0x1 apsta : 0x1 wctemp : 361 cctemp : 363 mtfa : 0 hmpre : 0 hmmin : 0 tnvmcap : 0 unvmcap : 0 rpmbs : 0 sqes : 0x66 cqes : 0x44 nn : 1 oncs : 0x1e fuses : 0 fna : 0 vwc : 0x1 awun : 255 awupf : 0 nvscc : 1 acwu : 0 sgls : 0 ps 0 : mp:5.87W operational enlat:5 exlat:5 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:2.40W operational enlat:30 exlat:30 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:1.90W operational enlat:100 exlat:100 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.1000W non-operational enlat:1000 exlat:1000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0060W non-operational enlat:1000 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ``` Bingo! (I tried below first.) > Another possibility to look at this if it's APST would be break into your > initramfs and to invoke S2I from the initramfs. The root on /dev/nvmen0 > won't be mounted, so hopefully if it's a problem with the disk it shouldn't > bring userspace down with it. I removed the options and started with `ro init=/bin/sh no_console_suspend`. That gave the following output without the time-stamps. ``` Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. nvme 0000:3c:00.0: Refused to change power state, currently in D3 nvme 0000:3c:00.0: Refused to change power state, currently in D3 nvme 0000:3c:00.0: Refused to change power state, currently in D3 nvme 0000:3c:00.0: Refused to change power state, currently in D3 nvme 0000:3c:00.0: Refused to change power state, currently in D3 nvme nvme0: Removing after probe failure status: -19 nvme0n1: detected capacity change from 512… to 0 ACPI: button: The lid device is not compliant to SW_LID. psmouse serio1: synaptics: queried max coordinates: x [..5666], y [..4734] psmouse sorio1: synaptics: queried min coordinates: x [1276..], y [1118..] ``` After that several stack traces appeared. Thorsten, Len, Rafael, what devices do you have? Thank you Mario. The latest theory looks promising. I am off over the weekend though.
(In reply to Paul Menzel from comment #66) > Thorsten, Len, Rafael, what devices do you have? 3c:00.0 Non-Volatile memory controller [0108]: Device [1c5c:1283] For details see below. FWIW: After using S2Idle for a few days now (earlier I had used S3) with mainline I got the impression that both S2Idle and S3 work unreliable on my machine since I switched to 4.14. Most of the time it resumes, but sometimes it doesn't (screen stays blank). I could not find a way to reproduce it (but I haven't tried hard yet) :-/ FWIW: S3 worked fine with 4.13. Will keep an eye on it. $ sudo nvme id-ctrl /dev/nvme0n1 NVME Identify Controller: vid : 0x1c5c ssvid : 0x1c5c sn : FJ6BN19201CYCC32A mn : PC300 NVMe SK hynix 256GB fr : 20005A00 rab : 1 ieee : ace42e cmic : 0 mdts : 5 cntlid : 0 ver : 10200 rtd3r : 90f560 rtd3e : ea60 oaes : 0 oacs : 0x16 acl : 3 aerl : 3 frmw : 0x16 lpa : 0x2 elpe : 254 npss : 4 avscc : 0x1 apsta : 0x1 wctemp : 361 cctemp : 363 mtfa : 0 hmpre : 0 hmmin : 0 tnvmcap : 0 unvmcap : 0 rpmbs : 0 sqes : 0x66 cqes : 0x44 nn : 1 oncs : 0x1e fuses : 0 fna : 0 vwc : 0x1 awun : 255 awupf : 0 nvscc : 1 acwu : 0 sgls : 0 subnqn : ps 0 : mp:5.87W operational enlat:5 exlat:5 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:2.40W operational enlat:30 exlat:30 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:1.90W operational enlat:100 exlat:100 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.1000W non-operational enlat:1000 exlat:1000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0060W non-operational enlat:1000 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:-
Rafael, Mario, hopefully you got everything from my side, and can come up with a fix. It’d be also great, if one of you got yourself the same drive, that is built in the device I have access to here. Nevertheless, please tell me, if you need anything else, or I can test patches.
I don't think we have all the details needed yet. Paul I'm in particular having a hard time understanding your comment 66. You mean that switching the APST latency on kernel command line did work to fix resume or not? From the past few comments I have to suspect that there was possibly a regression in the NVMe stack with relation to the Hynix SSD. Within the last few releases there was changes to this stuff, and there are now a bunch of special cased quirks. Examples: https://github.com/torvalds/linux/blob/master/drivers/nvme/host/core.c#L1698 https://github.com/torvalds/linux/blob/master/drivers/nvme/host/pci.c#L2287 It's entirely possible a similar quirk will be needed for Hynix SSD. Paul can you please do the following bisection on your setup? 1) Start with 4.11-rc1 2) Set the mem sleep default using sysfs to s2idle. If it doesn't exist in a given kernel, echo "freeze" into sys/power/state. 3) Set keyboard wakeup. 4) Go to s2idle 5) Wakeup using keyboard (power button won't easily work for some kernels). If you can bisect down to a given commit that would point closer to where the regression in NVMe stack was. If 4.11-rc1 also reproduces it then it's not a regression in the NVMe stack, but it might still be a problem that needs to be quirked.
(In reply to Mario Limonciello from comment #69) > I don't think we have all the details needed yet. > > Paul I'm in particular having a hard time understanding your comment 66. > You mean that switching the APST latency on kernel command line did work to > fix resume or not? No, it did *not* work. Today, I tried it again by setting `nvme.default_ps_max_latency_us=0` on the Linux kernel command line, and verifying that this value is stored in `/sys`. > From the past few comments I have to suspect that there was possibly a > regression in the NVMe stack with relation to the Hynix SSD. At least to my particular model (512 GB) as it works with Thorsten’s 256 GB device. > Within the last few releases there was changes to this stuff, and there are > now a bunch of special cased quirks. > Examples: > https://github.com/torvalds/linux/blob/master/drivers/nvme/host/core.c#L1698 > https://github.com/torvalds/linux/blob/master/drivers/nvme/host/pci.c#L2287 > > It's entirely possible a similar quirk will be needed for Hynix SSD. Well, it works with Microsoft Windows, doesn’t it? > Paul can you please do the following bisection on your setup? > 1) Start with 4.11-rc1 > 2) Set the mem sleep default using sysfs to s2idle. If it doesn't exist in > a given kernel, echo "freeze" into sys/power/state. > 3) Set keyboard wakeup. > 4) Go to s2idle > 5) Wakeup using keyboard (power button won't easily work for some kernels). > > If you can bisect down to a given commit that would point closer to where > the regression in NVMe stack was. If 4.11-rc1 also reproduces it then it's > not a regression in the NVMe stack, but it might still be a problem that > needs to be quirked. I’ll do that test, but it’d be great if you got yourself such a Hynix SSD to test with.
>No, it did *not* work. Today, I tried it again by setting >>`nvme.default_ps_max_latency_us=0` on the Linux kernel command line, and >>verifying that this value is stored in `/sys`. OK, that's unfortunate. So it's not APST, but it is something else NVMe related. > Well, it works with Microsoft Windows, doesn’t it? So? What's that got to do with a problem on the Linux kernel? That doesn't mean that there aren't quirks like those I linked above in the drivers used with Windows. Also the exact codepaths and timing used in Windows definitely DO vary. Windows inbox NVMe stack runs power management differently. Intel's RAID driver (Rapid start) has different policy decisions than Linux does NVMe stack does. I don't feel comparing this to Windows is useful. >I’ll do that test, but it’d be great if you got yourself such a Hynix SSD to >>test with. I'm looking around, but so far all of the SSD's I've got access to are Toshiba and Samsung. @Rafael - can you add some of the NVMe folks to this issue? I think they would be able to get to the bottom of it more effectively than we can.
Created attachment 258775 [details] Linux 4.11-rc1 messages (dmesg) from two suspends (s2idle) (In reply to Paul Menzel from comment #70) > (In reply to Mario Limonciello from comment #69) […] > > Paul can you please do the following bisection on your setup? > > 1) Start with 4.11-rc1 > > 2) Set the mem sleep default using sysfs to s2idle. If it doesn't exist in > > a given kernel, echo "freeze" into sys/power/state. > > 3) Set keyboard wakeup. > > 4) Go to s2idle > > 5) Wakeup using keyboard (power button won't easily work for some kernels). > > > > If you can bisect down to a given commit that would point closer to where > > the regression in NVMe stack was. If 4.11-rc1 also reproduces it then it's > > not a regression in the NVMe stack, but it might still be a problem that > > needs to be quirked. > > I’ll do that test, […] Building Linux 4.11-rc1, and testing s2idle as described by your steps, the power button LED doesn’t turn off. But the screen after a flicker goes black, and pressing a key wakes the system up. According to the attached Linux messages, the system actually slept, but I am not sure because the LED didn’t turn off.
LED not turning off with 4.11-rc1 is to be expected. There was some code that was added between then and now that adjusts the behavior of the EC. So that does mean there is an NVMe regression between 4.11-rc1 and now that caused this behavior. Can you proceed with the bisect?
The SSD in my 9360 is a Samsung one.
@Paul: Since we already know that it is failing in 4.13, I would bisect the non-merge changes in drivers/nvme/ between 4.11-rc1 and 4.13 to start with. That's around 240 commits, so it should take just a few steps.
Keith (CCed) is advising me that default_ps_max_latency_us is a parameter of nvme_core, not nvme. @Paul: Can you please test again with nvme_core.default_ps_max_latency_us=0 and see if that makes any difference?
(In reply to Rafael J. Wysocki from comment #76) > Keith (CCed) is advising me that default_ps_max_latency_us is a parameter of > nvme_core, not nvme. Yes, I mentioned that in my comment. > @Paul: Can you please test again with nvme_core.default_ps_max_latency_us=0 > and see if that makes any difference? Sorry, my bet copying it from the laptop screen on a different computer by typing. In comment #70 I actually did use `nvme_core`, so it should have been. > No, it did *not* work. Today, I tried it again by setting > `nvme_corebo.default_ps_max_latency_us=0` on the Linux kernel command line, > and > verifying that this value is stored in `/sys`. Sorry for the confusion.
Here's my test result, I have a TOSHIBA 256G nvme and s2i works well on top of 4.14-rc4 NVME Identify Controller: vid : 0x1179 ssvid : 0x1179 sn : Y6TS10UGT18T mn : THNSN5256GPUK NVMe TOSHIBA 256GB fr : 5KDA4101 rab : 1 ieee : 00080d cmic : 0 mdts : 0 cntlid : 0 ver : 0 rtd3r : 0 rtd3e : 0 oaes : 0 oacs : 0x17 acl : 3 aerl : 3 frmw : 0x2 lpa : 0x2 elpe : 127 npss : 4 avscc : 0 apsta : 0x1 wctemp : 351 cctemp : 355 mtfa : 0 hmpre : 0 hmmin : 0 tnvmcap : 0 unvmcap : 0 rpmbs : 0 sqes : 0x66 cqes : 0x44 nn : 1 oncs : 0x1e fuses : 0 fna : 0 vwc : 0x1 awun : 255 awupf : 0 nvscc : 0 acwu : 0 sgls : 0 ps 0 : mp:6.00W operational enlat:0 exlat:0 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:2.40W operational enlat:0 exlat:0 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:1.90W operational enlat:0 exlat:0 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0120W non-operational enlat:5000 exlat:25000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0060W non-operational enlat:100000 exlat:70000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:-
It's a little weird that 'devices' and 'platforms' mode works, as the real suspend-to-idle did mostly the same thing as 'platform', at lease for device drivers.
Okay, thanks for confirming the param. I thought it was likely a mistake since you mentioned it was confirmed in sysfs. I can't think of anything else particularly obvious that changed in the nvme driver that can account for this. Maybe something in pcie changed? That might make a bisect more steps if we need to account for that.
(In reply to Chen Yu from comment #79) > It's a little weird that 'devices' and 'platforms' mode works, as the real > suspend-to-idle did mostly the same thing as 'platform', at lease for device > drivers. "platform" only worked once in a row, however.
@Paul: We have a regression that broke s2idle for you between 4.11-rc1 and 4.13 and it appears to be related to NVMe. I guess at this point the most straightforward way to find the problematic change is to carry out a bisection, especially that you seem to be able to reproduce the issue 100% of the time.
(In reply to Rafael J. Wysocki from comment #82) > @Paul: We have a regression that broke s2idle for you between 4.11-rc1 and > 4.13 and it appears to be related to NVMe. I guess at this point the most > straightforward way to find the problematic change is to carry out a > bisection, especially that you seem to be able to reproduce the issue 100% > of the time. Sorry, debugging this already brought me behind schedule with other tasks, so I won’t be able to get to this until the end of next week. If you know a service, building me uploaded configs and a revision, and providing that over a repository, that would be great.
(In reply to Paul Menzel from comment #83) > (In reply to Rafael J. Wysocki from comment #82) > > @Paul: We have a regression that broke s2idle for you between 4.11-rc1 and > > 4.13 and it appears to be related to NVMe. I guess at this point the most > > straightforward way to find the problematic change is to carry out a > > bisection, especially that you seem to be able to reproduce the issue 100% > > of the time. > > Sorry, debugging this already brought me behind schedule with other tasks, > so I won’t be able to get to this until the end of next week. That's OK, please take your time. > If you know a service, building me uploaded configs and a revision, and > providing that over a repository, that would be great. You can look at the SUSE's Open Build Service (OBS), but (disclaimer) I haven't used it myself for quite a while, so not sure how suitable that is really.
(In reply to Rafael J. Wysocki from comment #84) > (In reply to Paul Menzel from comment #83) > > (In reply to Rafael J. Wysocki from comment #82) > > > @Paul: We have a regression that broke s2idle for you between 4.11-rc1 > and > > > 4.13 and it appears to be related to NVMe. I guess at this point the > most > > > straightforward way to find the problematic change is to carry out a > > > bisection, especially that you seem to be able to reproduce the issue > 100% > > > of the time. > > > > Sorry, debugging this already brought me behind schedule with other tasks, > > so I won’t be able to get to this until the end of next week. > > That's OK, please take your time. Just a small update. The regression happened between 4.12 and 4.13-rc1. > > If you know a service, building me uploaded configs and a revision, and > > providing that over a repository, that would be great. > > You can look at the SUSE's Open Build Service (OBS), but (disclaimer) I > haven't used it myself for quite a while, so not sure how suitable that is > really. Also, Ubuntu builds each Linux kernel in some repository, don’t they. It’d be great if somebody knew, how to access these packages easily.
(In reply to Paul Menzel from comment #85) > (In reply to Rafael J. Wysocki from comment #84) > > (In reply to Paul Menzel from comment #83) > > > (In reply to Rafael J. Wysocki from comment #82) > > > > @Paul: We have a regression that broke s2idle for you between 4.11-rc1 > > and > > > > 4.13 and it appears to be related to NVMe. I guess at this point the > > most > > > > straightforward way to find the problematic change is to carry out a > > > > bisection, especially that you seem to be able to reproduce the issue > > 100% > > > > of the time. > > > > > > Sorry, debugging this already brought me behind schedule with other > tasks, > > > so I won’t be able to get to this until the end of next week. > > > > That's OK, please take your time. > > Just a small update. The regression happened between 4.12 and 4.13-rc1. Thanks! > > > If you know a service, building me uploaded configs and a revision, and > > > providing that over a repository, that would be great. > > > > You can look at the SUSE's Open Build Service (OBS), but (disclaimer) I > > haven't used it myself for quite a while, so not sure how suitable that is > > really. > > Also, Ubuntu builds each Linux kernel in some repository, don’t they. It’d > be great if somebody knew, how to access these packages easily. OBS could build Ubuntu packages too last time I looked at it, though.
>Also, Ubuntu builds each Linux kernel in some repository, don’t they. It’d >be >great if somebody knew, how to access these packages easily. As noted on the Ubuntu kernel team wiki's page: https://wiki.ubuntu.com/Kernel/MainlineBuilds There is an archive with debs here: http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D I suppose it would be good to confirm that 4.12 works with that and 4.13-rc1 fails since it will be a different kernel config than you are using.
Here is the result. ``` $ git bisect bad 8110dd281e155e5010ffd657bba4742ebef7a93f is the first bad commit commit 8110dd281e155e5010ffd657bba4742ebef7a93f Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Fri Jun 23 15:24:32 2017 +0200 ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems Some recent Dell laptops, including the XPS13 model numbers 9360 and 9365, cannot be woken up from suspend-to-idle by pressing the power button which is unexpected and makes that feature less usable on those systems. Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is not expected to be used at all (the OS these systems ship with never exercises the ACPI S3 path in the firmware) and suspend-to-idle is the only viable system suspend mechanism there. The reason why the power button wakeup from suspend-to-idle doesn't work on those systems is because their power button events are signaled by the EC (Embedded Controller), whose GPE (General Purpose Event) line is disabled during suspend-to-idle transitions in Linux. That is done on purpose, because in general the EC tends to be noisy for various reasons (battery and thermal updates and similar, for example) and all events signaled by it would kick the CPUs out of deep idle states while in suspend-to-idle, which effectively might defeat its purpose. Of course, on the Dell systems in question the EC GPE must be enabled during suspend-to-idle transitions for the button press events to be signaled while suspended at all, but fortunately there is a way out of this puzzle. First of all, those systems have the ACPI_FADT_LOW_POWER_S0 flag set in their ACPI tables, which means that the OS is expected to prefer the "low power S0 idle" system state over ACPI S3 on them. That causes the most recent versions of other OSes to simply ignore ACPI S3 on those systems, so it is reasonable to expect that it should not be necessary to block GPEs during suspend-to-idle on them. Second, in addition to that, the systems in question provide a special firmware interface that can be used to indicate to the platform that the OS is transitioning into a system-wide low-power state in which certain types of activity are not desirable or that it is leaving such a state and that (in principle) should allow the platform to adjust its operation mode accordingly. That interface is a special _DSM object under a System Power Management Controller device (PNP0D80). The expected way to use it is to invoke function 0 from it on system initialization, functions 3 and 5 during suspend transitions and functions 4 and 6 during resume transitions (to reverse the actions carried out by the former). In particular, function 5 from the "Low-Power S0" device _DSM is expected to cause the platform to put itself into a low-power operation mode which should include making the EC less verbose (so to speak). Next, on resume, function 6 switches the platform back to the "working-state" operation mode. In accordance with the above, modify the ACPI suspend-to-idle code to look for the "Low-Power S0" _DSM interface on platforms with the ACPI_FADT_LOW_POWER_S0 flag set in the ACPI tables. If it's there, use it during suspend-to-idle transitions as prescribed and avoid changing the GPE configuration in that case. [That should reflect what the most recent versions of other OSes do.] Also modify the ACPI EC driver to make it handle events during suspend-to-idle in the usual way if the "Low-Power S0" _DSM interface is going to be used to make the power button events work while suspended on the Dell machines mentioned above Link: http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> :040000 040000 ed00fcd01bf8bb1db1026b16701996e5af257cf7 ca1507d67f8ac3b4d5e2fe9e0aba466cb52a1f15 M drivers ``` Here is the bisection log. ``` $ git bisect log # bad: [5771a8c08880cdca3bfb4a3fc6d309d6bba20877] Linux v4.13-rc1 # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect start 'v4.13-rc1' 'v4.12' # bad: [e5f76a2e0e84ca2a215ecbf6feae88780d055c56] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace git bisect bad e5f76a2e0e84ca2a215ecbf6feae88780d055c56 # bad: [1849f800fba32cd5a0b647f824f11426b85310d8] Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect bad 1849f800fba32cd5a0b647f824f11426b85310d8 # good: [cbcd4f08aa637b74f575268770da86a00fabde6d] Merge tag 'staging-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect good cbcd4f08aa637b74f575268770da86a00fabde6d # bad: [408c9861c6979db974455b9e7a9bcadd60e0934c] Merge tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm git bisect bad 408c9861c6979db974455b9e7a9bcadd60e0934c # good: [650fc870a2ef35b83397eebd35b8c8df211bff78] Merge tag 'docs-4.13' of git://git.lwn.net/linux git bisect good 650fc870a2ef35b83397eebd35b8c8df211bff78 # good: [d62eb5edf6643ede7e48b4d03ba972c0e8949acc] Merge tag 'regulator-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator git bisect good d62eb5edf6643ede7e48b4d03ba972c0e8949acc # bad: [8f8e5c3e2796eaf150d6262115af12707c2616dd] Merge branch 'acpi-pm' git bisect bad 8f8e5c3e2796eaf150d6262115af12707c2616dd # good: [301f8d7463b1f3d1fdb56ee1cb4abb674094531d] Merge branch 'pm-sleep' git bisect good 301f8d7463b1f3d1fdb56ee1cb4abb674094531d # good: [9a5f2c871af4cf6bd63ddb20061faa7049103350] Merge branches 'pm-domains', 'pm-avs' and 'powercap' git bisect good 9a5f2c871af4cf6bd63ddb20061faa7049103350 # good: [d07ff6523b1ed24d636365f8479b0db70946dc14] Merge branch 'uuid-types' git bisect good d07ff6523b1ed24d636365f8479b0db70946dc14 # bad: [a1a66393e39a97433bcc1737133ba7478993d247] ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags git bisect bad a1a66393e39a97433bcc1737133ba7478993d247 # good: [ef884112e55c60d9e208b6524ae1841ae7e2fb2c] platform: x86: intel-hid: Wake up the system from suspend-to-idle git bisect good ef884112e55c60d9e208b6524ae1841ae7e2fb2c # bad: [8110dd281e155e5010ffd657bba4742ebef7a93f] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems git bisect bad 8110dd281e155e5010ffd657bba4742ebef7a93f # first bad commit: [8110dd281e155e5010ffd657bba4742ebef7a93f] ACPI / sleep: EC-based wakeup from suspend-to-idle on recent systems ```
Thanks, I'm glad you did a bisect. Frankly that's really surprising to me (as I 'm sure it will be to Rafael as well). That particular commit is used to exercise an AML codepath that puts the EC into low power mode and when this happens the EC turns off the power LED. The EC doesn't do anything to the NVMe disk, so this result is fairly odd to me. Did you by chance purchase your XPS 9360 with Windows and have a dual boot? If so, it would be interesting to know if Windows is also failing in this area (which would indicate some most likely failing NVMe hardware). From an ACPI and EC perspective the AML codepath is actually the same on Windows when the machine enters low power idle resiliency phase from this commit.
Well, it may be surprising, but it doesn't leave a lot of room for what can be done to address it. It looks like 9360 should be blacklisted from using the "Low Power Idle S0" _DSM by default, which also make it use S3 instead of suspend-to-idle by default too. That actually shouldn't hurt people as we know that 9360 *can* do S3. I'll prepare a patch for that tomorrow.
Thanks Rafael, I agree that's the right solution for 4.14. I'd still like to get to the bottom of this though on the 9360, so can we continue to keep this open to try to root cause?
Unfortunately, reverting that patch is not easy on current master. Could you please provide a patch reverting this? Also, skimming the commit [1], I do not see how it is just affecting the embedded controller, as the _DSM stuff seems to be a generic mechanism affecting all components [2]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8110dd281e155e5010ffd657bba4742ebef7a93f [2] http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf (No HTTPS.)
I don't believe reverting that patch is the proper solution. As Rafael said we can just blacklist all XPS 9360's. I wish we could just blacklist against your SSD + system combination as everyone else with a 9360 has had success, but I don't believe that's easily doable. That particular patch *does* fix (at least) 3 other systems that I know about. So until we know why your SSD + system combination has problems that's the best solution. You'd have to walk down your ASL execution path to see the connection on affected devices. The _DSM method on the PEPD device for the 9360 when called with the arguments in that patch will cause the GUAM method to be executed. The GUAM method will send a message to the EC to go into low power mode or leave low power mode. In low power mode is also when the power LED will be turned off.
Created attachment 260361 [details] ACPI / PM: Prevent some Dells XPS13 9360 from using Low Power S0 Idle Reverting commits is not the only way to make things work. :-) Can you please check if this causes your machine go back to using S3? If so, suspend-to-idle should work again, too.
(In reply to Paul Menzel from comment #92) > Unfortunately, reverting that patch is not easy on current master. Could you > please provide a patch reverting this? > > Also, skimming the commit [1], I do not see how it is just affecting the > embedded controller, as the _DSM stuff seems to be a generic mechanism > affecting all components [2]. That actually depends on how the _DSM is implemented and Mario said what it was really doing in comment #89.
(In reply to Rafael J. Wysocki from comment #94) > Created attachment 260361 [details] > ACPI / PM: Prevent some Dells XPS13 9360 from using Low Power S0 Idle > > Reverting commits is not the only way to make things work. :-) I wanted to revert the patch on master to see if this fixes the issue, so the bisected commit is really the culprit. > Can you please check if this causes your machine go back to using S3? If > so, suspend-to-idle should work again, too. Sure, I’ll try to get to it as soon as possible. Additionally, it looks like that there is some ACPI debug message switch I need to toggle to get more information about what happens in the culprit commit.
Yes, you probably mean /sys/power/pm_debug_messages
Here is the log with your patch. Despite `echo 1 | sudo tee /sys/power/pm_debug_messages`, there do not seem to be any debug messages written. The power button LED doesn’t turn off, but that seems to be expected. ``` $ journalctl -k […] Okt 25 13:14:55 Ixpees kernel: PM: suspend entry (s2idle) Okt 25 13:14:55 Ixpees kernel: PM: Syncing filesystems ... done. Okt 25 13:14:55 Ixpees kernel: PM: Preparing system for sleep (s2idle) Okt 25 13:15:41 Ixpees kernel: Freezing user space processes ... (elapsed 0.002 seconds) done. Okt 25 13:15:41 Ixpees kernel: OOM killer disabled. Okt 25 13:15:41 Ixpees kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Okt 25 13:15:41 Ixpees kernel: PM: Suspending system (s2idle) Okt 25 13:15:41 Ixpees kernel: Suspending console(s) (use no_console_suspend to debug) Okt 25 13:15:41 Ixpees kernel: psmouse serio1: Failed to disable mouse on isa0060/serio1 Okt 25 13:15:41 Ixpees kernel: PM: suspend of devices complete after 2357.524 msecs Okt 25 13:15:41 Ixpees kernel: PM: late suspend of devices complete after 22.002 msecs Okt 25 13:15:41 Ixpees kernel: PM: suspend-to-idle Okt 25 13:15:41 Ixpees kernel: PM: noirq suspend of devices complete after 39.717 msecs Okt 25 13:15:41 Ixpees kernel: PM: Timekeeping suspended for 9.019 seconds Okt 25 13:15:41 Ixpees kernel: PM: Timekeeping suspended for 15.999 seconds Okt 25 13:15:41 Ixpees kernel: PM: Timekeeping suspended for 14.999 seconds Okt 25 13:15:41 Ixpees kernel: PM: Timekeeping suspended for 2.999 seconds Okt 25 13:15:41 Ixpees kernel: PM: noirq resume of devices complete after 45.780 msecs Okt 25 13:15:41 Ixpees kernel: PM: resume from suspend-to-idle Okt 25 13:15:41 Ixpees kernel: PM: early resume of devices complete after 2.958 msecs Okt 25 13:15:41 Ixpees kernel: ACPI: button: The lid device is not compliant to SW_LID. Okt 25 13:15:41 Ixpees kernel: usb 1-3: reset full-speed USB device number 2 using xhci_hcd Okt 25 13:15:41 Ixpees kernel: PM: resume of devices complete after 336.913 msecs Okt 25 13:15:41 Ixpees kernel: usb 1-3:1.0: rebind failed: -517 Okt 25 13:15:41 Ixpees kernel: usb 1-3:1.1: rebind failed: -517 Okt 25 13:15:41 Ixpees kernel: PM: Finishing wakeup. Okt 25 13:15:41 Ixpees kernel: OOM killer enabled. Okt 25 13:15:41 Ixpees kernel: Restarting tasks ... Okt 25 13:15:41 Ixpees kernel: audit: type=1400 audit(1508930141.379:127): apparmor="DENIED" operation="create" profile="/usr/sbin/cups-browsed" pid=948 comm="cups-browsed" fam Okt 25 13:15:41 Ixpees kernel: done. Okt 25 13:15:41 Ixpees kernel: [drm] RC6 on Okt 25 13:15:41 Ixpees kernel: PM: suspend exit […] ``` Two remarks regarding the commit. 1. Is it possible to add a message to the log, that the quirk is applied? 2. Please do not make the quirk depend on the firmware version. Keep in mind, that s2idle doesn’t work on a system, it can cause data loss, when the user didn’t save the work before suspend, which worked before.
3. Also, it’d be great to be able to run-time configure, if the quirk should be applied or not. That way, people, where the quirk is not needed can easily override it without having to recompile their Linux kernel.
(In reply to Rafael J. Wysocki from comment #97) > Yes, you probably mean /sys/power/pm_debug_messages Are you sure, that is for getting messages like below printed? ``` acpi_handle_debug(adev->handle, "_DSM function mask: 0x%x\n", bitmask); ```
(In reply to Paul Menzel from comment #98) > Here is the log with your patch. Despite `echo 1 | sudo tee > /sys/power/pm_debug_messages`, there do not seem to be any debug messages > written. The power button LED doesn’t turn off, but that seems to be > expected. > > ``` > $ journalctl -k > […] > Okt 25 13:14:55 Ixpees kernel: PM: suspend entry (s2idle) > Okt 25 13:14:55 Ixpees kernel: PM: Syncing filesystems ... done. > Okt 25 13:14:55 Ixpees kernel: PM: Preparing system for sleep (s2idle) > Okt 25 13:15:41 Ixpees kernel: Freezing user space processes ... (elapsed > 0.002 seconds) done. > Okt 25 13:15:41 Ixpees kernel: OOM killer disabled. > Okt 25 13:15:41 Ixpees kernel: Freezing remaining freezable tasks ... > (elapsed 0.001 seconds) done. > Okt 25 13:15:41 Ixpees kernel: PM: Suspending system (s2idle) > Okt 25 13:15:41 Ixpees kernel: Suspending console(s) (use no_console_suspend > to debug) > Okt 25 13:15:41 Ixpees kernel: psmouse serio1: Failed to disable mouse on > isa0060/serio1 > Okt 25 13:15:41 Ixpees kernel: PM: suspend of devices complete after > 2357.524 msecs Just another note, that it took the system quite a while to suspend. Here you can see that since 13:14:55 almost a minute passed. > […] > ```
Created attachment 260399 [details] Picture of Linux messages with Linux 4.14-rc6 built from Ubuntu Here is a picture of a boot with Linux 4.14-rc6 built by the Ubuntu Linux kernel team, that means *without* the revert commit, `no_console_suspend init=/bin/bash` on the Linux command line, and then the two commands below. ``` # echo 1 > /sys/power/pm_debug_messages # echo mem > /sys/power/state ``` Note, that the power button only woke up the system after pressing it several seconds. :( The next picture shows the rest of the messages.
Created attachment 260401 [details] Picture of Linux messages with Linux 4.14-rc6 built from Ubuntu [2/2]
(In reply to Paul Menzel from comment #98) > Here is the log with your patch. Despite `echo 1 | sudo tee > /sys/power/pm_debug_messages`, there do not seem to be any debug messages > written. The power button LED doesn’t turn off, but that seems to be > expected. > > ``` > $ journalctl -k > […] > Okt 25 13:14:55 Ixpees kernel: PM: suspend entry (s2idle) > Okt 25 13:14:55 Ixpees kernel: PM: Syncing filesystems ... done. > Okt 25 13:14:55 Ixpees kernel: PM: Preparing system for sleep (s2idle) Well, this is confusing, because this means suspend-to-idle is in use. Did you trigger this with "echo mem > /sys/power/state"? If so, can you please check what's there in /sys/power/mem_sleep with the patch applied after a fresh boot. [I'm assuming that you tested 4.14-rc6 with the patch on top.] > Okt 25 13:15:41 Ixpees kernel: Freezing user space processes ... (elapsed > 0.002 seconds) done. > Okt 25 13:15:41 Ixpees kernel: OOM killer disabled. > Okt 25 13:15:41 Ixpees kernel: Freezing remaining freezable tasks ... > (elapsed 0.001 seconds) done. > Okt 25 13:15:41 Ixpees kernel: PM: Suspending system (s2idle) > Okt 25 13:15:41 Ixpees kernel: Suspending console(s) (use no_console_suspend > to debug) > Okt 25 13:15:41 Ixpees kernel: psmouse serio1: Failed to disable mouse on > isa0060/serio1 > Okt 25 13:15:41 Ixpees kernel: PM: suspend of devices complete after > 2357.524 msecs > Okt 25 13:15:41 Ixpees kernel: PM: late suspend of devices complete after > 22.002 msecs > Okt 25 13:15:41 Ixpees kernel: PM: suspend-to-idle > Okt 25 13:15:41 Ixpees kernel: PM: noirq suspend of devices complete after > 39.717 msecs > Okt 25 13:15:41 Ixpees kernel: PM: Timekeeping suspended for 9.019 seconds > Okt 25 13:15:41 Ixpees kernel: PM: Timekeeping suspended for 15.999 seconds > Okt 25 13:15:41 Ixpees kernel: PM: Timekeeping suspended for 14.999 seconds > Okt 25 13:15:41 Ixpees kernel: PM: Timekeeping suspended for 2.999 seconds > Okt 25 13:15:41 Ixpees kernel: PM: noirq resume of devices complete after > 45.780 msecs > Okt 25 13:15:41 Ixpees kernel: PM: resume from suspend-to-idle > Okt 25 13:15:41 Ixpees kernel: PM: early resume of devices complete after > 2.958 msecs > Okt 25 13:15:41 Ixpees kernel: ACPI: button: The lid device is not compliant > to SW_LID. > Okt 25 13:15:41 Ixpees kernel: usb 1-3: reset full-speed USB device number 2 > using xhci_hcd > Okt 25 13:15:41 Ixpees kernel: PM: resume of devices complete after 336.913 > msecs > Okt 25 13:15:41 Ixpees kernel: usb 1-3:1.0: rebind failed: -517 > Okt 25 13:15:41 Ixpees kernel: usb 1-3:1.1: rebind failed: -517 > Okt 25 13:15:41 Ixpees kernel: PM: Finishing wakeup. > Okt 25 13:15:41 Ixpees kernel: OOM killer enabled. > Okt 25 13:15:41 Ixpees kernel: Restarting tasks ... > Okt 25 13:15:41 Ixpees kernel: audit: type=1400 audit(1508930141.379:127): > apparmor="DENIED" operation="create" profile="/usr/sbin/cups-browsed" > pid=948 comm="cups-browsed" fam > Okt 25 13:15:41 Ixpees kernel: done. > Okt 25 13:15:41 Ixpees kernel: [drm] RC6 on > Okt 25 13:15:41 Ixpees kernel: PM: suspend exit > […] > ``` OK, so it didn't crash and the power led was not off, so the blacklisting seems to be effective, but then S3 should be the default suspend method too. > Two remarks regarding the commit. > > 1. Is it possible to add a message to the log, that the quirk is applied? Yes. > 2. Please do not make the quirk depend on the firmware version. Keep in > mind, that s2idle doesn’t work on a system, it can cause data loss, when the > user didn’t save the work before suspend, which worked before. At this time I have no reason to believe that upgrading the firmware will not make the issue go away on your system. At the same time, I want the quirk to only affect systems that need to be quirked.
(In reply to Paul Menzel from comment #99) > 3. Also, it’d be great to be able to run-time configure, if the quirk should > be applied or not. That way, people, where the quirk is not needed can > easily override it without having to recompile their Linux kernel. OK, but let's first make sure that it actually works.
(In reply to Paul Menzel from comment #100) > (In reply to Rafael J. Wysocki from comment #97) > > Yes, you probably mean /sys/power/pm_debug_messages > > Are you sure, that is for getting messages like below printed? > > ``` > acpi_handle_debug(adev->handle, "_DSM function mask: 0x%x\n", > bitmask); > ``` No, it isn't. For that to be printed you need to enable dynamic debug in sleep.c.
Created attachment 260403 [details] ACPI / PM: Prevent some Dells XPS13 9360 from using Low Power S0 Idle This updated version of the patch will print a message if the quirk is in effect.
(In reply to Paul Menzel from comment #102) > Created attachment 260399 [details] > Picture of Linux messages with Linux 4.14-rc6 built from Ubuntu > > Here is a picture of a boot with Linux 4.14-rc6 built by the Ubuntu Linux > kernel team, that means *without* the revert commit, What do you mean by "revert commit"? The patch that I attached previously? > `no_console_suspend init=/bin/bash` on the Linux command line, and then the > two commands below. > > ``` > # echo 1 > /sys/power/pm_debug_messages > # echo mem > /sys/power/state > ``` > > Note, that the power button only woke up the system after pressing it > several seconds. :( What state was it in after the wakeup? > The next picture shows the rest of the messages. Well, this clearly shows a problem with the nvme device whose power state cannot be changed back to D0, apparently during a device resume transition. Unfortunately, one kernel configuration option is not set in the Ubuntu build and the messages from drivers/base/power/main.c don't show up.
Please apply the patch from comment #107 on top of 4.14-rc6, boot it and verify the following: (1) Whether or not the "Low Power S0 Idle interface disabled" (preceded by the device path) message is present in the dmesg output. (2) Whether or not you see "deep" in /sys/power/mem_sleep If (1) the message is present in the dmesg output and (2) you see "deep" in that file, please try to trigger system suspend with "echo mem > /sys/power/state" and let me know the outcome. Otherwise, I will need look into the system I have access to, but that will only be possible when I'm back home from the conference I'm attending (that should be on Saturday). @Mario: Can you please check the patch from comment #107 with the BIOS version line removed on your 9360?
@Rafael, Sure. Just tested on an 9360 with that patch on top of 4.14-rc6. 1) I confirmed this line is in dmesg output 2) /sys/power/mem_sleep has the value "deep" 3) Running that command does put it into S3.
@Keith, You're still on CC. Any thoughts about the nvme errors that were cropping up in OP's tests? It's really suspicious that in s2idle it takes nearly a minute to go down: Okt 25 13:15:41 Ixpees kernel: PM: suspend of devices complete after 2357.524 msecs and the system log has this continually: nvme 000:3c:00.0: Refused to change power state, currently in D3 In runtime-pm busted on this particular drive?
Well, because the problem appears to be not reproducible without the LPS0 _DSM, it looks like during the execution of that _DSM the platform does something to the SSD that confuses its firmware and then it cannot be put back into D0 by the OS in the usual way. That might have been fixed in the new platform firmware (BIOS) version, however.
Created attachment 260405 [details] ACPI / PM: Prevent Dell XPS13 9360 from using Low Power S0 Idle Paul, this is the patch tested by Mario. Can you please test it and confirm that you see the same results?
(In reply to Rafael J. Wysocki from comment #113) > Created attachment 260405 [details] > ACPI / PM: Prevent Dell XPS13 9360 from using Low Power S0 Idle > > Paul, this is the patch tested by Mario. > > Can you please test it and confirm that you see the same results? Just a small note, that I won’t have access to the device until next Monday. Also, if somebody provides the change-log for the newer firmware versions, then I’d happy to update. Otherwise, I’d boycott Dell’s practices and stay with the firmware, and somebody can test.
Thanks for your mail. I'm out of office until Nov 13, 2017 it? try helpdesk@molgen.mpg.de
>Also, if somebody provides the change-log for the newer firmware versions, >>then I’d happy to update. Otherwise, I’d boycott Dell’s practices and stay >>with the firmware, and somebody can test. Sure, it's your own decision whether to update or not. The changelog is typically posted with "Fixes and Enhancements". Here's a recent example: http://www.dell.com/support/home/us/en/19/drivers/driversdetails?driverId=JGCWT
Created attachment 260495 [details] Linux 4.14-rc7+ with patch messages (dmesg) with one suspend (deep) (In reply to Rafael J. Wysocki from comment #113) > Created attachment 260405 [details] > ACPI / PM: Prevent Dell XPS13 9360 from using Low Power S0 Idle > > Paul, this is the patch tested by Mario. > > Can you please test it and confirm that you see the same results? Yes, I see the same results. I have to re-read the other comments next week. ``` $ dmesg | grep disabled [ 0.000000] 3 disabled [ 0.000000] 4 disabled [ 0.000000] 5 disabled [ 0.000000] 6 disabled [ 0.000000] 7 disabled [ 0.000000] 8 disabled [ 0.000000] 9 disabled [ 0.000000] ACPI: Early table checksum verification disabled [ 0.038488] AppArmor: AppArmor disabled by boot time parameter [ 0.514822] ACPI: \_SB_.PEPD: Low Power S0 Idle interface disabled [ 1.221256] audit: initializing netlink subsys (disabled) [ 300.962574] OOM killer disabled. [ 305.285188] thunderbolt 0000:03:00.0: 0:3: disabled by eeprom [ 305.285190] thunderbolt 0000:03:00.0: 0:4: disabled by eeprom [ 305.285191] thunderbolt 0000:03:00.0: 0:5: disabled by eeprom [ 305.285290] thunderbolt 0000:03:00.0: 0:8: disabled by eeprom [ 305.285291] thunderbolt 0000:03:00.0: 0:9: disabled by eeprom [ 305.285346] thunderbolt 0000:03:00.0: 0:b: disabled by eeprom ``` Tested-by: Paul Menzel <pmenzel@molgen.mpg.de>
OK, thanks. I'm going to apply this in the spirit of not breaking existing setups, even though I have some serious doubts on the idea of running outdated BIOSes. :-)
(In reply to Rafael J. Wysocki from comment #118) > OK, thanks. > > I'm going to apply this in the spirit of not breaking existing setups, even > though I have some serious doubts on the idea of running outdated BIOSes. :-) Thank you to you and Mario for following through with this. Now I tried to update the firmware, but fwupd shipped by Ubuntu 16.04.3 LTS used by Dell fails me [1]. [1] https://github.com/hughsie/fwupd/issues/303
The problem is still reproducible with firmware version 2.3.1. > DMI: Dell Inc. XPS 13 9360/0839Y6, BIOS 2.3.1 10/03/2017
It looks like testing s2idle now became cumbersome as even with `echo s2idle | sudo tee /sys/power/mem_sleep` the “real” s2idle is not used and certain features(?) are deactivated. Is there a way to override that? Also, should this bug be used to track the real issue, or only the regression and a new one be created?
The regression as reported here was addressed by commit 71630b7a832f (ACPI / PM: Blacklist Low Power S0 Idle _DSM for Dell XPS13 9360). The underlying problem is firmware-related, as it can only be reproduced when the LP_S0 _DSM interface is used, so the commit above prevents your machine from using that _DSM. However, that also causes s2idle to work in the "legacy" mode in which EC events are not enabled over suspend/resume, so the power button doesn't wake up the machine from it (the keyboard should still wake it up if enabled). If you want to bypass the blacklist entry, apply this patch https://patchwork.kernel.org/patch/10058593/ and run the kernel with acpi_sleep=nobl in the command line, but I'm not recommending you to do that, as the machine will most likely fail to resume from s2idle then. I don't think it is useful to track it any more, as it is not likely to be addressed in the firmware (as it hasn't been addressed so far) and there is not enough information to address it in the kernel except for blacklisting the affected system(s) that has been done already.
One late follow-up. I think it works with Microsoft Windows, so is it really a firmware issue?
Created attachment 273765 [details] attachment-21170-0.html Thank you for the message. I am travelling/working remotely in China until Jan 22. Responses will be delayed.
(In reply to Paul Menzel from comment #124) > One late follow-up. I think it works with Microsoft Windows, so is it really > a firmware issue? It is related to the firmware interaction with the NVMe device and we don't know what the interaction is exactly and how Windows deals with it, so the only way to address this is to avoid the issue altogether.
Mario pointed me to bug 112121, where the NVMe device also goes missing. In bug 112121, comment 10 [2], it says that patch [3] fixes that issue. Unfortunately, applying that on Linux 4.16-rc1+ (commit 178e834c (Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)) and building the Linux kernel, the SK Hynix NVMe device doesn’t come back up. [1] https://bugzilla.kernel.org/show_bug.cgi?id=112121 "Some PCIe options cause devices to be removed after suspend" [2] https://bugzilla.kernel.org/show_bug.cgi?id=112121#c10 [3] https://patchwork.kernel.org/patch/10212201/
@Paul Menzel, In a non-public forum I heard of some other issues with a newer Hynix SSD (in a newer system) that reminded me of this. The issues are being worked around by disabling runtime PM for the SSD but leaving APST enabled. Would you be able to try that on your failing system in S2I to see if it helped?
Here's the series: https://patchwork.kernel.org/patch/10669517/ https://patchwork.kernel.org/patch/10669519/
Thanks for sharing. @Paul if you confirm this helps your drive too that would be great. It sounds like those patches might not be applied and if it fixes the issue for you too it's a good reason for them to be applied. > We generally quirk for grave issues that make devices unusable. > To me working around a D3 mode that consumers more power than D0 > does not quite fit that bill.
@Paul Menzel, Can you please test Linus' tree with 71630b7a832f reverted? There is a new patch series that was merged into 5.3 that adjusts the behavior of NVME disks over suspend to idle. The series merged included commit d916b1be94b6dc8d293abed2451f3062f6af7551. I expect it should fix your disk problem and would appreciate your confirmation so we can remove this now hopefully unnecessary blacklist.
Created attachment 283605 [details] attachment-13309-0.html I am currently taking excented leave and will not have access to this account through the end of July.
@Paul Menzel, Ping, I would like to follow up. Can you please test 5.3 final with 71630b7a832f reverted?
Hi! My XPS 9360 now uses s2idle, but does not reach s0ix, causing a significant battery drain. Maybe the blacklist removal should be reverted.
(In reply to Yorick van Pelt from comment #134) > Hi! My XPS 9360 now uses s2idle, but does not reach s0ix, causing a > significant battery drain. Maybe the blacklist removal should be reverted. Yorick, thank you for the report, but please create a new ticket, and upload all the details for your device. This issue is for the specific problem with the NVMe SSD in the device, which was fixed according to my test.