Bug 199689
Summary: | s2idle does not work in Dell XPS 9370 | ||
---|---|---|---|
Product: | Power Management | Reporter: | James Roper (jroper2) |
Component: | Hibernation/Suspend | Assignee: | Srinivas Pandruvada (srinivas.pandruvada) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | adam.caldwell, ahung, alejandro, andersonkw2, andy, cedric.bellegarde, daniel, dblack, dopey, erik, jonnylamb, kernel, leho, linux-kernel-bugs, marcodirect, noodles, ondrej, pmenzel+bugzilla.kernel.org, rjw, rui.zhang, ryan, sean, sin.pecado, spi, srinivas.pandruvada, stuff, superm1, timur.kristof, tytso, wendy.wang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.14 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Blacklist XPS 13 9370 from s2idle
turbostat output on 2018-06-18, after sleeping 1 min (systemctl suspend) turbostat output (systemctl suspend) turbostat output (echo mem > /sys/power/state) turbostat output (echo mem > /sys/power/state) attachment-24371-0.html Turbostat output after running LTR script Turbostat output after running LTR script w/o Satachi power monitor smime.p7s attachment-29801-0.html attachment-26080-0.html attachment-4576-0.html |
Description
James Roper
2018-05-11 04:54:19 UTC
Created attachment 275913 [details]
Blacklist XPS 13 9370 from s2idle
Here's a patch that implements the same fix as for the XPS 13 9360. I've tested it on my machine, and it works for me.
Here's an additional discussion from other XPS 13 9370 users who have experienced the same problem: https://www.reddit.com/r/Dell/comments/8b6eci/xp_13_9370_battery_drain_while_suspended/ @James, At least locally feel free to change the policy to "deep", but it's actually intentional to be using s2idle on this machine with the latest upstream kernel. Rather than swing the giant hammer around to swap back to S3, I would prefer that we find the problems in the kernel preventing you from getting into deep enough C states to not burn too much battery. Can you please start with the following: 1) run powertop --autotune This will reconfigure many of the defaults from the kernel to "better" values for power management purposes. See if that helps in a measurable way. If it's not helping in a significant way than this will require some more debugging. Can you please notate if you have an NVMe SSD or SATA SSD in your XPS 9370? Same problem here: - TLP enabled so I guess equivalent to powertop --autotune - NVMe SSD Give me any command to run on this laptop to help you debugging. Although TLP does many similar things to powertop autotune, some of its defaults are not adequate. For example I filed this as a result: https://github.com/linrunner/TLP/issues/344 That means that TLP will behave incorrectly both on AC and battery. So please explicitly check with powertop autotune. After doing steps suggested in comment #3, then what is difference in count before and after wakeup from suspend to idle /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us Also do before, so that we get more PM debug messages in dmesg echo 1 > /sys/power/pm_debug_messages ping... Hi, I also have an XPS 13 9370 so I hope I can help with investigating this. The system has an NVMe SSD (at least the devices node is called /dev/nvme0). powertop --autotune says: powertop: unrecognized option '--autotune' (using powertop-2.9-8.fc28.x86_64 here). So, at the beginning it looks like this: /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us is 0 /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us is 0 Then I put the laptop to sleep for ~10 minutes. After that: /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us is 0 /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us is 80717 Then I enabled tlp and put the laptop to sleep for ~20 minutes. After that: /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us is 0 /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us is still 80717 Not sure exactly what those numbers mean, but it looks suspicious that it stayed the same. The option is --auto-tune not --autotune (In reply to Srinivas Pandruvada from comment #9) > The option is > --auto-tune > > not --autotune Sorry! I did check powertop after enabling tlp, and all powersaving options were enabled (except the VM writeback timeout), so I think we should be seeing the same result with both. But, if you want, I can re-run the numbers with powertop --auto-tune - would that help? Those numbers are very low. When you say sleep, I think you did suspend (echo mem > /sys/power/state or similar using some tools). So for 10 minutes of suspend the it slept for 80uS. So please try with powertop --auto-tune. You can just let it sleep for 1 min and see what you get. I have a Dell XPS 13 9730 (2018), also with an NVMe SSD. This is the output from powertop --auto-tune: $ sudo powertop --auto-tune modprobe cpufreq_stats failedLoaded 5 prior measurements Cannot load from file /var/cache/powertop/saved_parameters.powertop File will be loaded after taking minimum number of measurement(s) with battery only RAPL device for cpu 0 RAPL Using PowerCap Sysfs : Domain Mask f RAPL device for cpu 0 RAPL Using PowerCap Sysfs : Domain Mask f Devfreq not enabled glob returned GLOB_ABORTED Cannot load from file /var/cache/powertop/saved_parameters.powertop File will be loaded after taking minimum number of measurement(s) with battery only To show power estimates do 304 measurement(s) connected to battery only Leaving PowerTOP I checked the Tunables tab in powertop to confirm that all tunables were "Good". My laptop was then suspended overnight with s2idle. These are the numbers resulting from that: $ date; cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us Thu Jun 14 03:06:01 PDT 2018 0 0 $ date; cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us Thu Jun 14 11:55:14 PDT 2018 0 0 The battery was 100% charged prior to suspending. It was down to 26% when resuming. @Alyssa Hung, Thanks, can you please confirm which kernel you tested and got that result? $ uname -a Linux xeli 4.16.13-2-ARCH #1 SMP PREEMPT Sat Jun 9 02:32:29 PDT 2018 x86_64 GNU/Linux First see if you see any error for download of firmware #dmesg | grep -i i915 If you don't see any error try this: # for i in {0..32}; do echo $i > ltr_ignore; done # turbostat # echo mem > /sys/power/state Wait for 1 minutes and wake up the system wait for few sample update on screen for turbostat output. Attach output of turbostat. Created attachment 276649 [details]
turbostat output on 2018-06-18, after sleeping 1 min (systemctl suspend)
I wasn't able to produce useful output exactly as requested.
# echo mem > /sys/power/state
would put the laptop to sleep (screen off, power indicator off) for about a second, after which it would re-wake itself. In order to make the laptop stay asleep, I had to use
# systemctl suspend
The attached output reflects that scenario.
Additionally, I have made some changes to the system since the last time I commented. This is the current kernel:
$ uname -a
Linux xeli 4.17.2-1-ARCH #1 SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018 x86_64 GNU/Linux
This boot parameter was added:
i915.enable_guc=1
And TLP was installed. powertop --auto-tune was _not_ run prior to the attached output. powertop Tunables tab indicated that all tunables were set to "Good" (by TLP) except for "VM writeback timeout".
This is updated output from the files I cat-ed in a previous comment:
$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
14699
0
Sorry, I was not clear, You need to change folder to # cd /sys/kernel/debug/pmc_core # for i in {0..32}; do echo $i > ltr_ignore; done You mean you couldn't run turbostat? You don't need to enable_guc=1. That residency is quite low. turbostat output was in the previous attachment, but it wasn't clear if that was run across the suspend-to-idle cycle or not. So Alyssa when you respond and do the LTR adjustment command from that directory please start turbostat in terminal one tab, enter S2I in another, sleep for 1 minute, wait a few seconds after wakeup for some more turbostat collection and then attach that. Created attachment 276671 [details]
turbostat output (systemctl suspend)
Sorry, I meant to annotate the output I attached to indicate when the suspend-and-resume happened, but forgot to save changes to the file before uploading it.
I can't suspend the laptop using the command:
# echo mem > /sys/power/state
because that only causes the screen to flicker briefly off, then back on. The laptop doesn't stay asleep.
The output attached to this comment was produced by putting the laptop to sleep with the command:
# systemctl suspend
Search for the string "--- suspend and resume ---" to see where the break happened.
Created attachment 276673 [details]
turbostat output (echo mem > /sys/power/state)
For completeness's sake, I repeated the process using the command
# echo mem > /sys/power/state
by re-running the command every time the laptop woke itself back up. Each time I had to re-run the command, that is annotated with "--- suspend and resume ---".
Alyssa, could you please report a new bug for S2I not working for you with 4.17.2, and reference it here. Please attached the Linux messages, that means output of `dmesg`, there? Could this bug report please be renamed to *Battery drained during s2idle on Dell XPS 13 9370*? @Alyssa: Can you please do (a) reboot the system and then (b) # echo 1 > /sys/power/debug_messages # echo mem > /sys/power/state as root and attach a dmesg output after that? I think the attachment 276671 [details] is after LTR adjustment. I see
I see 92.68% residency in CPU's lowest power in 1 minute. This is a good number.
Did you do?
# cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
They should have good numbers. If they are then we can identify which device LTR is a problem.
I just updated to the latest BIOS and with 4.18-rc1 to my 9370. I get in a minute of suspend, Both the CPU and the total system was at the lowest power for 59.7 seconds. OK, thanks! The problem doesn't affect all of the 9370's then, so basically we need to find out what the differences between them are and why they matter. @srinivas, So you had no LTR adjustments needed and are seeing good power state residency on your configuration? @Mario, I didn't do any LTR adjustment. I want to know from the reporters of this issue, is any device is connected to type C port (mouse, kb, usb-ethernet etc.)? @Rafael: I'm out of town all week, and can't seem to reproduce the problem (where echo mem > /sys/power/state can't make the laptop stay asleep) tonight. There may be something in my home environment that is a factor. I'll try to reproduce and provide the requested output next week. @Srinivas: No devices were connected to any of the laptop's ports when I witnessed the overnight battery drain while suspended. Hi, @Srinivas: No, none of the Type-C ports were plugged in when I did the test. The device was running on battery. Are you suggesting that this might be fixed on 4.18? I'm also running on kernel 4.17.2 (Fedora), and I haven't seen a problem with 'echo mem > /sys/power/state'. Created attachment 276787 [details]
turbostat output (echo mem > /sys/power/state)
I don't know what it is about my environment that changed, but 'echo mem >/sys/power/state' is working as expected now.
I repeated the test with 'for i in {0..32}; do echo $i >ltr_ignore; done'. The turbostat output is attached.
Residency numbers are much higher than before:
# cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
62693152
0
If I do the test without first doing the ltr_ignore thing, then residency numbers are both 0. What can I do to help figure out which LTR(s) are causing problems?
Kernel and boot params in use:
[ 0.000000] Linux version 4.17.2-1-ARCH (builduser@heftig-9574) (gcc version 8.1.1 20180531 (GCC)) #1 SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018
[ 0.000000] Command line: initrd=\intel-ucode.img initrd=\initramfs-linux.img root=PARTUUID=6d264517-7c51-4454-8b8c-2efff2bd878e rw i915.enable_guc=3
Having the same issue with XPS 13 9365 (2-in-1). Resume from suspend (s2idle) never works, including by pressing the power button for 6sec+. Machine requires a hard reboot every time. Actually please ignore above post. Machine would go to s3 after a extended suspend duration, which caused the problem. Forcing to stay at s2 only solved it. I guess the problem on the 9365 is inverse to the 9370. Marco, 9365 can't be woken up from S3, so it has to be suspend to idle only. Also unrelated to this issue, please keep this issue specifically around 9370 and s2idle power consumption. Anything around a different system or a different behavior should be a different issue. Alyssa Hung, We need to find the device which is causing this issue. You can run powertop in another window and see if some device is keeping system busy. Try this. After fresh boot and powertop --auto-tune, You can try to put in a script something like this: #!/bin/bash counter=0 until [ $counter -gt 32 ] do echo $counter > /sys/kernel/debug/pmc_core/ltr_ignore echo "LTR ignore for" $counter rtcwake -m freeze -s 10 residency=$(cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us) echo "residency is" $residency if [ $residency -eq 0 ]; then echo "Residency is non zero!" break fi ((counter++)) sleep 2 done There's a minor typo in the above script. if [ $residency -eq 0 ]; then Should be if [ $residency -gt 0 ]; then At least on my configuration that previously wasn't showing residency, after powertop autotune, configuring "0" and "1" I started to show residency. Thanks Mario. The corrected script: #!/bin/bash counter=0 until [ $counter -gt 32 ] do echo $counter > /sys/kernel/debug/pmc_core/ltr_ignore echo "LTR ignore for" $counter rtcwake -m freeze -s 10 residency=$(cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us) echo "residency is" $residency if [ $residency -gt 0 ]; then echo "Residency is non zero!" break fi ((counter++)) sleep 2 done Hi, sorry again for taking so long to respond. Here is the result of running the corrected version of the script: # sh ltr-test.sh LTR ignore for 0 rtcwake: wakeup from "freeze" using /dev/rtc0 at Sun Jul 1 06:29:16 2018 residency is 0 LTR ignore for 1 rtcwake: wakeup from "freeze" using /dev/rtc0 at Sun Jul 1 06:29:30 2018 residency is 9763589 Residency is non zero! Just to confirm the results, I re-ran it again after rebooting: # sh ltr-test.sh LTR ignore for 0 rtcwake: wakeup from "freeze" using /dev/rtc0 at Sun Jul 1 06:35:37 2018 residency is 0 LTR ignore for 1 rtcwake: wakeup from "freeze" using /dev/rtc0 at Sun Jul 1 06:35:50 2018 residency is 9231413 Residency is non zero! LTR 1 both times. Alyssa, Can you please try to go into BIOS setup and disable Thunderbolt? After doing this please re-run the test script to confirm if it's helped. Disabling Thunderbolt support does seem to have helped. Script output with Thunderbolt completely disabled in the firmware setup: LTR ignore for 0 rtcwake: wakeup from "freeze" using /dev/rtc0 at Wed Jul 4 02:15:36 2018 residency is 9436585 Residency is non zero! With Thunderbolt enabled, but Thunderbolt Boot Support disabled, residency was lower at 8627152. Disabling Thunderbolt completely is not something I'm able to do long-term, as I have docks and dongles that work only with Thunderbolt. I see Thunderbolt being turned off as a debugging tactic, once we have this fully comprehended I believe we should be leaving it turned on. Do you mean that if you have thunderbolt boot support turned off but thunderbolt on your are still seeing residency without running LTR ignore script? The other thing that I would like to know is if the power consumption seems reasonable to you when in this configuration (As this issue was originally about). Yes, I meant that having Thunderbolt turned on but Thunderbolt boot support turned off results in non-zero residency without running the LTR ignore script. With that configuration, I saw battery decrease 7% over 121 minutes while sleeping with s2idle. Subjectively, that seems unreasonable to me. I think the drain was closer to 1% per hour with deep sleep. > With that configuration, I saw battery decrease 7% over 121 minutes while > sleeping with s2idle. Subjectively, that seems unreasonable to me. I think > the drain was closer to 1% per hour with deep sleep. I suspect there is a second issue then here for your configuration. Are you sure that you had run --auto-tune with powertop (or used TLP to affect the same changes) in that test with TBT on but TBT boot off? We know for sure right now that TBT boot in BIOS setup causes problems with LTR. On a system with SATA I was able to use these two patches to make sure SATA got to deepest sleep state when --auto-tune was used: https://patchwork.kernel.org/patch/10502285/ https://patchwork.kernel.org/patch/10502287/ And then confirmed across an 7 hour span to have a 4% drop in battery. This was with one of the 4.18-rcX kernels (Sorry I forget if it was RC2 or RC5 and have both installed right now). Mario Limonciello from Dell suggested that I join this bug if I still had problems with s2idle. I was using 4.18-rc2 (about to upgrade to 4.18-rc6 if that's going to make a difference). I have a Dell XPS model 9370, with NVMe 1TB flash attached. I do have TBT boot turned off. I did run "powertop --auto-tune" before suspending. (In fact I trigger it out of a systemd unit at boot, and I double-checked that all of the powertop settings were "Good" before I did the suspend.) After an 11 hour (668 minute) suspend, the batteries declined from 6486 mAh to 3331 mAh. That works out to roughly 2.3 W per hour drain, and Mario suggested that if it was more than 1 W per hour, that I join this bug. # cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us 0 0 Since this is a NVMe system, there shouldn't be any SATA issues.... Created attachment 277585 [details]
attachment-24371-0.html
?Hello,
I am currently out on Paternity leave.
@Ted, Did you have anything plugged into USB ports over the suspend to idle run? Particularly of interest would be if anything was plugged into the Thunderbolt port. If you did - can you please compare results with nothing plugged in? Nothing was plugged in; there aren't enough USB ports, alas, for me to use a USB-C Nano Yubikey or anything like that. (My two USB-A Yubikeys are attached to a Hootoo mini-USB C dock that looks like a massively oversized dongle, and which is *not* plugged in when my laptop is in transit, for obvious reasons.) OK, thanks for confirming. Can you please try the LTR ignore script that was shared above? Comment 38. See if you get any different results. Created attachment 277879 [details]
Turbostat output after running LTR script
I tried using
# for i in {0..32}; do echo $i > ltr_ignore; done
... and it didn't seem to help. After running the above and collecting the data, I also tried this:
# cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
0
0
I was also measuring power utilization (with the battery fully charged) using a Satechi USB-C power monitor, and while doing a mem sleep, the Dell XPS (9370) was pulling 0.12-0.13 amps at 20 volts. This is compared to the 0.02 amps at 20V when in a deep sleep. I don't mind burning 2.5 Watts while the lid is closed when walking to a conference room, but if I'm taking my laptop home, and leaving it unconnected to power for ~12 hours, this is highly unfortunate....
P.S. All of this was running 4.18 plus the ext4.git and random.git patches that have been submitted to Linus for the 4.19 merge window: # uname -a Linux cwcc 4.18.0-00043-gfdade4115840 #35 SMP Sun Aug 12 21:33:40 EDT 2018 x86_64 GNU/Linux > I was also measuring power utilization (with the battery fully charged) using
> a Satechi USB-C power monitor
When you say also, does that mean it was a separate test? Or that during the LTR ignore run that it was run it was also connected?
According to your turbostat output you're not getting past PC2/PC3. Are you able to read debugfs pmc_core output (IIRC /sys/kernel/debug/pmc_core/pch_ip_gating or something similar)? Can you cat that to see what it claims is gating the PMC? I had the Satachi attached while I was doing the LTR ignore run. I can do a run without the Satachi if you think it might be what was causing it to not drop into lower states. (I doubt it, because my laptop has been draining without the Satachi attached.) Side note: one of the annoying things about the Dell XPS is that I can't tell if it is suspended while it is closed. One of the nice things about the Thinkpad is that there is an LED which is on solid when the laptop is running, and slowly blinks when it is suspended, and is totally off when the laptop is powered down. With the Dell XPS, there is no way to tell the status the laptop (which is why I use the Satachi power monitor most of the time when I'm at work --- what's especially annoying is when I try to suspend the laptop, and sometimes when the networking is up and chrome is running, something will cause the kernel to fail to suspend, and the only way I can tell is by watching the power utilization meter --- or by checking the temperature of my laptop bag when I get home. :-P ) # cat /sys/kernel/debug/pmc_core/pch_ip_power_gating_status PCH IP: 0 - PMC State: On PCH IP: 1 - OPI-DMI State: On PCH IP: 2 - SPI / eSPI State: On PCH IP: 3 - XHCI State: On PCH IP: 4 - SPA State: On PCH IP: 5 - SPB State: Off PCH IP: 6 - SPC State: Off PCH IP: 7 - GBE State: Off PCH IP: 8 - SATA State: Off PCH IP: 9 - HDA-PGD0 State: Off PCH IP: 10 - HDA-PGD1 State: Off PCH IP: 11 - HDA-PGD2 State: Off PCH IP: 12 - HDA-PGD3 State: Off PCH IP: 13 - RSVD State: Off PCH IP: 14 - LPSS State: Off PCH IP: 15 - LPC State: Off PCH IP: 16 - SMB State: Off PCH IP: 17 - ISH State: Off PCH IP: 18 - P2SB State: Off PCH IP: 19 - DFX State: Off PCH IP: 20 - SCC State: Off PCH IP: 21 - RSVD State: Off PCH IP: 22 - FUSE State: On PCH IP: 23 - CAMERA State: Off PCH IP: 24 - RSVD State: Off PCH IP: 25 - USB3-OTG State: Off PCH IP: 26 - EXI State: Off PCH IP: 27 - CSE State: Off PCH IP: 28 - CSME_KVM State: Off PCH IP: 29 - CSME_PMT State: Off PCH IP: 30 - CSME_CLINK State: Off PCH IP: 31 - CSME_PTIO State: Off PCH IP: 32 - CSME_USBR State: Off PCH IP: 33 - CSME_SUSRAM State: Off PCH IP: 34 - CSME_SMT State: Off PCH IP: 35 - RSVD State: Off PCH IP: 36 - CSME_SMS2 State: Off PCH IP: 37 - CSME_SMS1 State: Off PCH IP: 38 - CSME_RTC State: Off PCH IP: 39 - CSME_PSF State: Off Created attachment 277881 [details]
Turbostat output after running LTR script w/o Satachi power monitor
Here's a turbostat / LTR ignore run without the Satechi power monitor. (The laptop was powered via an Apple USB-C power adapter at the time, though).
> PCH IP: 3 - XHCI State: On
The part standing out to me is that XHCI is "On". Without your power adapter plugged in, is that the same result in the PMC debugging read?
Yes, XHCI appears to be always on. I tried doing a reboot and then unplugged the power, and it's still on: <tytso.root@cwcc> {/usr/projects/linux/ext4-fsverity}, level 2 (master) 998# cat /sys/kernel/debug/pmc_core/pch_ip_power_gating_status | grep XHCI PCH IP: 3 - XHCI State: On <tytso.root@cwcc> {/usr/projects/linux/ext4-fsverity}, level 2 (master) 998# uname -a Linux cwcc 4.18.0-00043-gfdade4115840 #35 SMP Sun Aug 12 21:33:40 EDT 2018 x86_64 GNU/Linux <tytso.root@cwcc> {/usr/projects/linux/ext4-fsverity}, level 2 (master) 999# fwupdmgr get-devices XPS 13 9370 System Firmware DeviceId: 8a21cacfb0a8d2b30c5ee9290eb71db021619f8b Guid: 7ceaf7a8-0611-4480-9e30-64d8de420c7c Guid: 230c8b18-8d9b-53ec-838b-6cfc0383493a Plugin: uefi Flags: internal|updatable|require-ac|supported|registered|needs-reboot Version: 0.1.4.0 VersionLowest: 0.1.4.0 Icon: computer Created: 2018-08-16 XPS 9370 Thunderbolt Controller DeviceId: 40ed9997af3dc4b0fda197fd2e4f1243afa74c5b Guid: 4eeb9d07-a96c-56d6-92d3-4a23ee7a6e4a Summary: Unmatched performance for high-speed I/O Plugin: thunderbolt Flags: internal|updatable|supported|registered Vendor: Dell VendorId: TBT:0x00D4 Version: 28.00 Icon: computer Created: 2018-08-16 Ping? Is there anything else you'd like me to try? Is this system is with NVMe or SATA-SSD? The one of the PCI bridge is ON. I wonder if there is problem with APST on this card. My system has a NVMe SSD: # nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 Y77S10C8TYAT KXG50ZNV1T02 NVMe TOSHIBA 1024GB 1 1.02 TB / 1.02 TB 512 B + 0 B AADA4102 I can attach the dmidecode output if that would be helpful. I think # nvme id-ctrl output would be more useful Here you go: # nvme id-ctrl /dev/nvme0 NVME Identify Controller: vid : 0x1179 ssvid : 0x1179 sn : Y77S10C8TYAT mn : KXG50ZNV1T02 NVMe TOSHIBA 1024GB fr : AADA4102 rab : 3 ieee : 00080d cmic : 0 mdts : 9 cntlid : 0 ver : 10201 rtd3r : 186a0 rtd3e : 7a120 oaes : 0 ctratt : 0 rrls : 0 oacs : 0x17 acl : 3 aerl : 7 frmw : 0x14 lpa : 0x2 elpe : 127 npss : 4 avscc : 0 apsta : 0x1 wctemp : 351 cctemp : 355 mtfa : 20 hmpre : 0 hmmin : 0 tnvmcap : 1024209543168 unvmcap : 0 rpmbs : 0 edstt : 36 dsto : 1 fwug : 0 kas : 0 hctma : 0 mntmt : 0 mxtmt : 0 sanicap : 0 hmminds : 0 hmmaxd : 0 nsetidmax : 0 sqes : 0x66 cqes : 0x44 maxcmd : 0 nn : 1 oncs : 0x5f fuses : 0x1 fna : 0 vwc : 0x1 awun : 31 awupf : 0 nvscc : 0 acwu : 31 sgls : 0 subnqn : nqn.2017-03.jp.co.toshiba:KXG50ZNV1T02 NVMe TOSHIBA 1024GB:Y77S10C8TYAT ioccsz : 0 iorcsz : 0 icdoff : 0 ctrattr : 0 msdbd : 0 ps 0 : mp:6.00W operational enlat:0 exlat:0 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:2.40W operational enlat:0 exlat:0 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:1.90W operational enlat:0 exlat:0 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.0500W non-operational enlat:1500 exlat:1500 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0030W non-operational enlat:50000 exlat:80000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- Have you installed a new NVMe yourself or this system was shipped with this disk?Can you still boot Windows? It's the original NVMe, and I installed Debian until of Windows. I had tried to transfer the Windows to a USB drive, but it didn't quite work and the Windows installation on the USB toasted itself after an update got confused because it was on the USB drive. I do have a Windows on a USB drive that was originally installed on a Lenovo laptop, and for which I had downloaded the Dell XPS 13 drivers. I was using this update the XPS 13 BIOS before I had managed to get fwdupdmgr working. (short version: fwdupdmgr is unhappy if you boot in Legacy BIOS mode, and but the Debian installer was not able to install UEFI boot on the XPS 13. So I had to do a Legacy BIOS mode installation, and then manually set up the UEFI boot partition and set up UEFI boot by hand.) UEFI boot and fwdupdmgr is working now and I've done at least one or two BIOS updates using fwdupdmgr, so I don't think it's related to my current issue with power management --- especially since S3 suspend works just fine. I can try booting the Windows from a USB flash drive, but that's Windows 10 booting in Legacy mode. Microsoft doesn't seem to like people booting off of external USB devices, so between the fact that it's an external boot device, and it's in Legacy mode, some things might not work. But if you want me to perform an experiment using the external Windows 10 system, I can give it a try. I haven't needed to boot Windows in months, though, so there's a possibility that some Microsoft auto-update will end up trashing the Windows on a USB flash disk setup. At which point I can pull the Lenovo T470 out of storage, update Windows on it, and then transfer the Windows to the USB stick, and then copy over the Dell drivers..... what a mess. I don't miss Windows. :-) It is the original NVMe, so I guess then Windows would entered low power states.After powertop auto-tune, do you ever get PCH IP: 4 - SPA State: On as PCH IP: 4 - SPA State: Off When you do multiple times with some wait between multiple calls? cat /sys/kernel/debug/pmc_core/pch_ip_power_gating_status # powertop --auto-tune modprobe cpufreq_stats failedLoaded 750 prior measurements RAPL device for cpu 0 RAPL Using PowerCap Sysfs : Domain Mask f RAPL device for cpu 0 RAPL Using PowerCap Sysfs : Domain Mask f Devfreq not enabled glob returned GLOB_ABORTED Leaving PowerTOP # grep SPA /sys/kernel/debug/pmc_core/pch_ip_power_gating_status PCH IP: 4 - SPA State: On # grep SPA /sys/kernel/debug/pmc_core/pch_ip_power_gating_status PCH IP: 4 - SPA State: On # grep SPA /sys/kernel/debug/pmc_core/pch_ip_power_gating_status PCH IP: 4 - SPA State: On # I've tried waiting a while and it's always "On", and never "Off". Ping, is there any more feedback that can aid in getting to the bottom of this. I see ~0.7-0.8 Wh drain in sleep which is really unsustainable. BTW, I've tried checking the PCH IP gating check and in my case I do see it switching: $ while true; do sudo grep SPA /sys/kernel/debug/pmc_core/pch_ip_power_gating_status; sleep 10s; done PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: On PCH IP: 4 - SPA State: Off PCH IP: 4 - SPA State: On Pall: You may have different issue. Did you do powertop --auto-tune and run script in comment 38? Tso: In your case I think NVMe is keeping the root bridge busy? I don't have such NVMe card. Mario, Do you happen to have such card? (In reply to Srinivas Pandruvada from comment #69) > Tso: > In your case I think NVMe is keeping the root bridge busy? I don't have such > NVMe card. > Mario, > Do you happen to have such card? Could this be the same issue as bug #196907, where there are problems with a PC300 NVMe SK hynix 512GB? [1]: https://bugzilla.kernel.org/show_bug.cgi?id=196907 (In reply to Srinivas Pandruvada from comment #68) > Pall: > You may have different issue. Did you do powertop --auto-tune and run script > in comment 38? Yes and no. I powertop --auto-tune at startup, so that's been done. When it comes to the low_power_idle_cpu_residency_us, I get non-zero value reported. Not entirely sure whether that makes my issue different? BTW, I have the XPS 13 9370 with the same NVME drive as T. Tso just 512 GB in size (model KXG50ZNV512G NVMe TOSHIBA 512GB). Paul Menzel: I can't say that I don't see turbostat output in that bug to check while not doing suspend to idle the system has a residency lower than PC3. But won't hurt and try to blacklist. Can anybody try? Drive falling off the bus over s2idle and a drive staying awake are two different problems to me. Blacklisting s2idle will certainly work around high power consumption, but there appears to still be at least one (maybe two) real cases of higher power consumption with these particular NVMe SSDs. With NVMe, the expectation is that (Autonomous Power State Transition) is used to put the SSD into lower power states. If you look at Ted's output you'll notice two "non-operational" states that have a much lower power consumption (ps3 and ps4) ps 3 : mp:0.0500W non-operational enlat:1500 exlat:1500 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0030W non-operational enlat:50000 exlat:80000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:- In order for the PCH to show s0 residency the SSD needs to be spending enough time idle to automatically enter those states in micro seconds (enlat). APST has been supported since kernel 4.11 with this commit. https://github.com/torvalds/linux/commit/c5552fde102fcc3f2cf9e502b8ac90e3500d8fdf The kernel did adjust the max latency it would allow to enter these states with this commit in 4.12: https://github.com/torvalds/linux/commit/9947d6a09cd71937dade2fc14640e4843ae19802 Once configured the drives are supposed to work autonomously. If they're idle long enough, they stop using power. If something prods them they way up. So I would wonder if something is causing periodic activity on those disks even over s2idle? We have 9370 with the same NVMe as Szilárd Páll. This system can go to low power and shows both counts > 0 /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us and /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us Here the system has 4.19-rc1. But it used to work even with older kernels. So please check if you get both counts during s2idle. Remove any devices connected to the system to avoid dependency on peripherals. The power you are measuring is wall power, I guess, so this is not the power the system is consuming during s2idle. The power brick also consumes power and may be charging too. So your issue is not same as Theodore Tso. I'm having a seriously hard time following the discussion and matching replies to the messages they reply to (bugzilla ftw), so sorry if I sound confused. $ cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us 0 0 $ systemctl suspend $ cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us; cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us 6408462 6365970 I have nothing connected -- never had during the experimentation and reporting here. I seem to get both counters to show residency, but I still get serious battery drain during sleep. > The power you are measuring is wall power, I guess, so this is not the power > the system is consuming during s2idle. The power brick also consumes power > and may be charging too. Not sure if this is to me, but I'm confused: the energy spent during idle on battery has nothing to do with the charger. I simply look at the change in energy field reported in /org/freedesktop/UPower/devices/battery_BAT0 and calculate the energy decrease per unit of time. > So your issue is not same as Theodore Tso. I don't see why, but fine, it may well be. Should I file a separate bug report than? Any suggestions for short to mid-term mitigation -- this issue takes me back to the state of Linux on laptops from 15 years ago and I'd really like to snap back to the present and be able to use my laptop as it's intended to be used. :) Created attachment 278367 [details] smime.p7s On 09/07/18 12:48, bugzilla-daemon@bugzilla.kernel.org wrote: > I don't see why, but fine, it may well be. Should I file a separate > bug report than? Yes, please, and document the bug number here. > Any suggestions for short to mid-term mitigation -- this issue takes > me back to the state of Linux on laptops from 15 years ago and I'd > really like to snap back to the present and be able to use my laptop > as it's intended to be used. :) Can’t you just disable s2idle, and enable ACPI S3 by setting `mem_sleep_default=deep` on the Linux kernel command line? I agree that's the proper short term mitigation and we probably have two different (but similar issues) happening here. *** Bug 201523 has been marked as a duplicate of this bug. *** If you have the SK Hynix SSD this could be the cause of high power draw: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1801875 Just for the record, my system was suffering with this issue of high power drain during S2IDLE. My XPS 13 9370 has NVMe model KXG50ZNV512G, shipped with firmware AADA4102, that is mentioned as problematic on ArchWiki [https://wiki.archlinux.org/index.php/Dell_XPS_13_(9370)#Storage]. Yesterday, I've upgraded firmware to AADA4106, released by Dell on Jan 22, 2019. I've switched back to s2idle sleep. Overnight, it drained around 10 percent so I guess this issue is fixed for me by NVMe firmware upgrade. As I don't have Windows, I've managed to extract the firmware out of .exe file supplied by Dell and do the NVMe upgrade using nvme-cli on Linux: https://gist.github.com/klingtnet/22ab0b907e2d9d20f98c72c93ea5dd37#gistcomment-2830279 Created attachment 281035 [details]
attachment-29801-0.html
?
?Hello,
I'm home sick Feb 6, expect delayed response..
Should I assume whatever is going on here is also why my 9380 experiences high power consumption during sleep (maybe 5-10% battery per hour instead of the 1-2% I'd expect)? Just to clarify, I have no issues entering or resuming from sleep, just the high power consumption during sleep. I am using powertop --auto-tune at boot. @Adam, without digging into the details it's impossible to know if it's the same root cause for your particular issue. Would you please open a separate bug and attach similar things as were requested in this bug to various folks and we can see? Also if you can please make sure you are checking with latest kernel release. I can also confirm that the high power drain is caused by the SSD that the 9370 comes with. A little while ago I replaced the original Toshiba SSD with a Samsung 970 EVO 1TB model, and now the power drain in s2idle is significantly lower. I think Ondřej's solution should also work for those using the original SSD. Since Ondřej was able to upgrade the SSD firmware with nvme-cli, it looks like it is upgradeable from Linux. Is there a possibility to release the updated SSD firmware through fwupd? Hi, I have a HP Spectre x360 13t-ap000, with the exact same problem. I also have NVMe and it drains lots of power under s2idle, and the only option I have in /sys/power/mem_sleep is that one. I am running Arch Linux. $ uname -a Linux behemoth 5.1.8-arch1-1-ARCH #1 SMP PREEMPT Sun Jun 9 20:28:28 UTC 2019 x86_64 GNU/Linux fyi $ sudo lshw -class storage *-storage description: Non-Volatile memory controller product: SK hynix vendor: SK hynix physical id: 0 bus info: pci@0000:6d:00.0 version: 00 width: 64 bits clock: 33MHz capabilities: storage pm pciexpress msix nvm_express bus_master cap_list configuration: driver=nvme latency=0 resources: irq:16 memory:a0000000-a0003fff I am having battery drains on the Dell XPS13 9370 as well. That's why some time ago I changed Kernel command line to mem_sleep_default=deep. That creates now lately some issues on Ubuntu 18.04 LTS with gdm3 as unlocking after suspending the notebook let gdm3 freeze. I am on Kernel 4.18.0-25 and have a NVMe model SSDPEKKF512G8 NVMe INTEL 512GB. (In reply to Sebastian from comment #87) > I am having battery drains on the Dell XPS13 9370 as well. That's why some > time ago I changed Kernel command line to mem_sleep_default=deep. That > creates now lately some issues on Ubuntu 18.04 LTS with gdm3 as unlocking > after suspending the notebook let gdm3 freeze. > > I am on Kernel 4.18.0-25 and have a NVMe model SSDPEKKF512G8 NVMe INTEL > 512GB. Your issue is unrelated to this bug report. Please report your issue to the Ubuntu bug tracker [1]. You might want to try the latest Linux kernel as a data point of this is Linux kernel related and has been fixed in the mean-time. [1]: https://bugs.launchpad.net/ [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/ I am not talking here about the gdm3 issue but the battery drain in S2 on a XPS13 9370. Why is this unrelated to this bug report? (In reply to Sebastian from comment #89) > I am not talking here about the gdm3 issue but the battery drain in S2 on a > XPS13 9370. Why is this unrelated to this bug report? Sorry, I misunderstood that then. Please try if the battery drain issue with s2idle still exists with the latest Linux version. (You should also contact the Dell support, if you bought the device with a GNU/Linux distribution.) @All, I'd like to mention that regarding battery drain on 9370 there is a patch series for putting NVME into proper sleep state that should be merged into 5.3. This should hopefully help the remaining power drain issues on 9370 that have been seen in S2I. You can either test this branch: https://git.kernel.org/pub/scm/linux/kernel/git/kbusch/linux.git/log/?h=nvme-power or wait for the first 5.3rc to be cut. The patches are in Linus' tree now, you can test from there and battery drain from NVME should be resolved in S2I. I'm seeing a possibly related issue with my new XPS 13 9380. I haven't measured loss on s2idle because I can't get it to stay sleeping. I have both a yubikey 5c nano, and whenever it's installed, the xps 13 9380 wakes up within a couple of minutes of suspending and stays awake. the kernel logs seem to show that it's attempting to sleep every few minutes but wakes up immediately (or doesn't actually sleep, i ahven't been able to tell). Also, as soon as I plug in power (usb-c power) it wakes up and doesn't go to sleep. The bios option of waking up on ac power is disabled. Setting the sleep mode to deep instead of s2idle seems to solve this problem. Is this related? or something entirely separate? Created attachment 284329 [details] attachment-26080-0.html I will be out of the office until Monday, August 12th. I will have limited access to email during this time. If you need immediate assistance please call my office at 586.263.1775 and press 1 for support or email support@eclipse-online.com. Thanks, Ryan @Andy Wang: With a USB device plugged in the CPU package will not go into as deep of a state, but it should still be using less power than most "active" use cases. I believe that USB device plugged causing certain behaviors in is a separate case than those reporting on this bug however. Those in this bug I believe have issues with one of two things: 1) NVME not going into proper state (which is resolved in kernel 5.3) 2) ASPM not configured properly (Which can be caused by using TLP 1.1 or less or configuring the kernel ASPM policy to anything but "default"). @All others: (Btw) It would be good if anyone who was affected by this bug could confirm using 5.3rc3 or later that they don't have any remaining issues so we can close this bug. (In reply to Mario Limonciello from comment #95) > @Andy Wang: > > With a USB device plugged in the CPU package will not go into as deep of a > state, but it should still be using less power than most "active" use cases. > > I believe that USB device plugged causing certain behaviors in is a separate > case than those reporting on this bug however. > > Those in this bug I believe have issues with one of two things: > 1) NVME not going into proper state (which is resolved in kernel 5.3) > 2) ASPM not configured properly (Which can be caused by using TLP 1.1 or > less or configuring the kernel ASPM policy to anything but "default"). > > @All others: > (Btw) It would be good if anyone who was affected by this bug could confirm > using 5.3rc3 or later that they don't have any remaining issues so we can > close this bug. I can confirm that this problem is fixed on my 9370 with kernel version 5.3rc4 @Srinivas, can you close this based on #96? I can confirm that S2 sleep is considerably better with 5.3 than previously (though I don't use the original SSD that came with the 9370 anymore). The 9370 can now actually go a couple of days in S2. (In reply to Mario Limonciello from comment #95) > > Those in this bug I believe have issues with one of two things: > 1) NVME not going into proper state (which is resolved in kernel 5.3) > 2) ASPM not configured properly (Which can be caused by using TLP 1.1 or > less or configuring the kernel ASPM policy to anything but "default"). > > @All others: > (Btw) It would be good if anyone who was affected by this bug could confirm > using 5.3rc3 or later that they don't have any remaining issues so we can > close this bug. Hi @Mario. Does this patchset have a chance of getting backported to 4.19? At least right now not via 4.19.y. Distros certainly can backport it. I know that ChromeOS has done this for their 4.19, so you can reference that if you want to try. Created attachment 285229 [details] attachment-4576-0.html I will be out of the office until Tuesday, October 8th. I will have limited access to email during this time. If you need immediate assistance please call my office at 586.263.1775 and press 1 for support or email support@eclipse-online.com. Thanks, Ryan On a 9370 with 5.5.0-rc5 I do get proper s2idle (i.e. non-zero /sys/devices/system/cpu/cpuidle/low_power_idle_{cpu,system}_residency_us), unless my Yubikey (ID 1050:0200) is plugged in. With the Yubikey plugged in, low_power_idle_* stays at zero and battery is empty within a day. powertop shows all tunables as Good, TLP is version 1.2.2. (In reply to Daniel Albers from comment #102) > On a 9370 with 5.5.0-rc5 I do get proper s2idle (i.e. non-zero > /sys/devices/system/cpu/cpuidle/low_power_idle_{cpu,system}_residency_us), > unless my Yubikey (ID 1050:0200) is plugged in. > With the Yubikey plugged in, low_power_idle_* stays at zero and battery is > empty within a day. > > powertop shows all tunables as Good, TLP is version 1.2.2. Please create a new issues for this, and reference this here. (In reply to Daniel Albers from comment #102) > On a 9370 with 5.5.0-rc5 I do get proper s2idle (i.e. non-zero > /sys/devices/system/cpu/cpuidle/low_power_idle_{cpu,system}_residency_us), > unless my Yubikey (ID 1050:0200) is plugged in. > With the Yubikey plugged in, low_power_idle_* stays at zero and battery is > empty within a day. > > powertop shows all tunables as Good, TLP is version 1.2.2. I have opened a new issue for this at https://bugzilla.kernel.org/show_bug.cgi?id=216556 |