Bug 211441 - S0ix: high battery drain - Dell 9500
Summary: S0ix: high battery drain - Dell 9500
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: Run-Time-PM (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-27 15:01 UTC by kostadin.karaivanov
Modified: 2022-06-30 09:56 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.10.9-201.fc33.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
3 minutes turbostat output (5.81 KB, text/plain)
2021-03-04 08:07 UTC, kostadin.karaivanov
Details
ts-running.out (44.40 KB, text/plain)
2021-05-24 06:44 UTC, kostadin.karaivanov
Details
dmesg with pm debug enabled (119.57 KB, text/plain)
2021-08-13 19:45 UTC, kostadin.karaivanov
Details
lspci -vvv output (67.35 KB, text/plain)
2021-08-13 19:49 UTC, kostadin.karaivanov
Details

Description kostadin.karaivanov 2021-01-27 15:01:55 UTC
The system reaches S0ix but it burns 2.1Wh while idle which translates to ~2.5% battery live per hour. That's less than 2 days which i this not quite OK.

Following some troubleshooting steps from https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting 

I find that 
PCH IP: 51 - CNVI                            	State: On

which has to be Off for S0ix to work as per the document above.

# grep -h . /sys/kernel/debug/pmc_core/pch_ip_power_gating_status
PCH IP: 0  - PMC                             	State: On
PCH IP: 1  - OPI-DMI                         	State: On
PCH IP: 2  - SPI/eSPI                        	State: On
PCH IP: 3  - XHCI                            	State: Off
PCH IP: 4  - SPA                             	State: Off
PCH IP: 5  - SPB                             	State: Off
PCH IP: 6  - SPC                             	State: Off
PCH IP: 7  - GBE                             	State: Off
PCH IP: 8  - SATA                            	State: Off
PCH IP: 9  - HDA_PGD0                        	State: Off
PCH IP: 10 - HDA_PGD1                        	State: Off
PCH IP: 11 - HDA_PGD2                        	State: Off
PCH IP: 12 - HDA_PGD3                        	State: Off
PCH IP: 13 - SPD                             	State: Off
PCH IP: 14 - LPSS                            	State: Off
PCH IP: 15 - LPC                             	State: Off
PCH IP: 16 - SMB                             	State: Off
PCH IP: 17 - ISH                             	State: Off
PCH IP: 18 - P2SB                            	State: Off
PCH IP: 19 - NPK_VNN                         	State: On
PCH IP: 20 - SDX                             	State: Off
PCH IP: 21 - SPE                             	State: Off
PCH IP: 22 - Fuse                            	State: On
PCH IP: 23 - SBR8                            	State: Off
PCH IP: 24 - CSME_FSC                        	State: Off
PCH IP: 25 - USB3_OTG                        	State: Off
PCH IP: 26 - EXI                             	State: Off
PCH IP: 27 - CSE                             	State: Off
PCH IP: 28 - CSME_KVM                        	State: Off
PCH IP: 29 - CSME_PMT                        	State: Off
PCH IP: 30 - CSME_CLINK                      	State: Off
PCH IP: 31 - CSME_PTIO                       	State: Off
PCH IP: 32 - CSME_USBR                       	State: Off
PCH IP: 33 - CSME_SUSRAM                     	State: Off
PCH IP: 34 - CSME_SMT1                       	State: Off
PCH IP: 35 - CSME_SMT4                       	State: Off
PCH IP: 36 - CSME_SMS2                       	State: Off
PCH IP: 37 - CSME_SMS1                       	State: Off
PCH IP: 38 - CSME_RTC                        	State: Off
PCH IP: 39 - CSME_PSF                        	State: Off
PCH IP: 40 - SBR0                            	State: On
PCH IP: 41 - SBR1                            	State: On
PCH IP: 42 - SBR2                            	State: On
PCH IP: 43 - SBR3                            	State: Off
PCH IP: 44 - SBR4                            	State: On
PCH IP: 45 - SBR5                            	State: On
PCH IP: 46 - CSME_PECI                       	State: Off
PCH IP: 47 - PSF1                            	State: On
PCH IP: 48 - PSF2                            	State: On
PCH IP: 49 - PSF3                            	State: On
PCH IP: 50 - PSF4                            	State: On
PCH IP: 51 - CNVI                            	State: On
PCH IP: 52 - UFS0                            	State: Off
PCH IP: 53 - EMMC                            	State: Off
PCH IP: 54 - SPF                             	State: Off
PCH IP: 55 - SBR6                            	State: On
PCH IP: 56 - SBR7                            	State: On
PCH IP: 57 - NPK_AON                         	State: On
PCH IP: 58 - HDA_PGD4                        	State: Off
PCH IP: 59 - HDA_PGD5                        	State: Off
PCH IP: 60 - HDA_PGD6                        	State: Off
PCH IP: 61 - PSF6                            	State: On
PCH IP: 62 - PSF7                            	State: On
PCH IP: 63 - PSF8                            	State: On
Comment 1 kostadin.karaivanov 2021-01-27 15:13:23 UTC
in not idle state with almost no user activity the upower says:
# upower -i /org/freedesktop/UPower/devices/battery_BAT0
  native-path:          BAT0
  vendor:               SMP
  model:                DELL 70N2F95
  serial:               58
  power supply:         yes
  updated:              27.01.2021 (ср) 17:09:48 (9 seconds ago)
  has history:          yes
  has statistics:       yes
  battery
    present:             yes
    rechargeable:        yes
    state:               discharging
    warning-level:       none
    energy:              48,564 Wh
    energy-empty:        0 Wh
    energy-full:         78,0216 Wh
    energy-full-design:  84,2916 Wh
    energy-rate:         5,0046 W
    voltage:             11,508 V
    time to empty:       9,7 hours
    percentage:          62%
    capacity:            92,5615%
    technology:          lithium-polymer
    icon-name:          'battery-full-symbolic'
  History (rate):
    1611760188	5,005	discharging

discharge rate while in S0ix is only half the 5W I see while not in S0ix.
Comment 2 Zhang Rui 2021-03-04 06:17:38 UTC
please run "turbostat -o ts-freeze-3m.out rtcwake -m freeze -s 180" and attach the ts-freeze-3m.out in this bug report. Let's see how much S0ix residency we can get during the sleep.
Note that this command will bring the system back after 3 minutes, so please be patient until the system wakes up.
Comment 3 kostadin.karaivanov 2021-03-04 08:07:34 UTC
Created attachment 295631 [details]
3 minutes turbostat output
Comment 4 kostadin.karaivanov 2021-03-04 08:09:47 UTC
turbostat attached
Comment 5 Zhang Rui 2021-03-04 13:47:00 UTC
Ok, from the turbostat output attached, you can get 98% PC10 residency and 0 S0ix residency.

CC Wendy.
Comment 6 kostadin.karaivanov 2021-05-24 06:44:30 UTC
Created attachment 296965 [details]
ts-running.out
Comment 7 kostadin.karaivanov 2021-05-24 06:47:32 UTC
Strange thing is leaving the device idle with just turbostat running shows(In reply to Zhang Rui from comment #5)
> Ok, from the turbostat output attached, you can get 98% PC10 residency and 0
> S0ix residency.
> 
> CC Wendy.

Strange thing is leaving the device idle with just turbostat running shows some SYS%LPI residency (ts-running.out attached) 

While "turbostat -o ts-freeze-3m.out rtcwake -m freeze -s 180" shows SYS%LPI 0.

Both done with 5.12.5-300.fc34.x86_64 kernel
Comment 8 wendy.wang 2021-05-24 13:20:31 UTC
Which means you get the opportunistic(runtime) s0ix, it's good, but the residency is not good enough: ~31.06%
of course you can check any good residency with large interval, e.g. 10 minutes
turbostat -o tc.out -i 600
Comment 9 kostadin.karaivanov 2021-05-25 00:30:56 UTC
(In reply to wendy.wang from comment #8)
> Which means you get the opportunistic(runtime) s0ix, it's good, but the
> residency is not good enough: ~31.06%
> of course you can check any good residency with large interval, e.g. 10
> minutes
> turbostat -o tc.out -i 600

well not really. Running turbostat with -i 600 gives SYS%LPI 0.5 perhaps due to the fact that half way through suspend kicks in and I have no s0ix there.
Comment 10 kostadin.karaivanov 2021-08-08 22:09:27 UTC
Ping. Is there anything I can contribute as info to get this worked on?
Comment 11 David Box 2021-08-13 00:43:43 UTC
I can suggest to look at some of the common issues that blocked s0ix on those early platforms. 

1. First we need to see dmesg logs. Please set /sys/power/pm_debug_messages to 1 and provide the debug log for your suspend. Also enable PCI PM debug with this command (as root):

echo -n "file pci-driver.c +p" > /sys/kernel/debug/dynamic_debug/control

We can check the suspend state of devices as well as look for device errors, wakeups, or other issues.

2. Set /sys/module/acpi/parameters/sleep_no_lps0 to Y. This disables calls to an ACPI method which may (among many things it does) be enforcing extra S0ix requirements that some platforms cannot achieve.

3. Provide lspci -vvv (run as root).

4. Since you are getting 98% PC10 the issue is unlikely to be Embedded Controller (EC) related. But you can try anyway to disable EC wakeup by writing Y to /sys/module/acpi/parameters/ec_no_wakeup. After doing this, only the power button will wake your system from suspend. This option prevents EC interrupts that could block s0ix (but this would typically also block most package c-state too).

If these do not work you may consider using s3 instead if the BIOS supports it. To check you need to cat /sys/power/state to see if mem is supported. To use it you need to set /sys/power/mem_sleep to deep.

David
Comment 12 kostadin.karaivanov 2021-08-13 19:45:30 UTC
Created attachment 298317 [details]
dmesg with pm debug enabled
Comment 13 kostadin.karaivanov 2021-08-13 19:49:21 UTC
Created attachment 298319 [details]
lspci -vvv output
Comment 14 kostadin.karaivanov 2021-08-13 20:15:36 UTC
(In reply to David Box from comment #11)
> I can suggest to look at some of the common issues that blocked s0ix on
> those early platforms. 
> 
> 1. First we need to see dmesg logs. Please set /sys/power/pm_debug_messages
> to 1 and provide the debug log for your suspend. Also enable PCI PM debug
> with this command (as root):
> 
> echo -n "file pci-driver.c +p" > /sys/kernel/debug/dynamic_debug/control
> 
> We can check the suspend state of devices as well as look for device errors,
> wakeups, or other issues.

dmesg output with both options attached. 
The last suspend entry with debug enabled starts at line 1573

> 
> 2. Set /sys/module/acpi/parameters/sleep_no_lps0 to Y. This disables calls
> to an ACPI method which may (among many things it does) be enforcing extra
> S0ix requirements that some platforms cannot achieve.
> 


no change in the behavior.
turbostat --show SYS%LPI echo mem > /sys/power/state 

...
86.760754 sec
SYS%LPI
0.00
0.00 


> 3. Provide lspci -vvv (run as root).

attached

> 
> 4. Since you are getting 98% PC10 the issue is unlikely to be Embedded
> Controller (EC) related. But you can try anyway to disable EC wakeup by
> writing Y to /sys/module/acpi/parameters/ec_no_wakeup. After doing this,
> only the power button will wake your system from suspend. This option
> prevents EC interrupts that could block s0ix (but this would typically also
> block most package c-state too).
>

With this one I was unable to wake up the system with the power button.
I had to hold it for some (long) time and then press it again to cold boot the laptop.
 
> If these do not work you may consider using s3 instead if the BIOS supports
> it. To check you need to cat /sys/power/state to see if mem is supported. To
> use it you need to set /sys/power/mem_sleep to deep.

S3 is not an option. The BIOS is supporting it bit the platform does not as confirmed in https://bugzilla.kernel.org/show_bug.cgi?id=208603

> 
> David
Comment 15 kostadin.karaivanov 2021-08-13 20:41:17 UTC
not sure if it helps but experiment I did. 
I left this running in terminal:

sudo turbostat -q -S -s GFX%rc6,Pkg%pc2,Pkg%pc8,Pkg%pc9,CPU%LPI,SYS%LPI
GFX%rc6	Pkg%pc2	Pkg%pc8	Pkg%pc9	CPU%LPI	SYS%LPI
79.91	8.38	32.70	0.00	7.35	5.56
92.53	9.36	35.41	0.00	30.05	22.52
86.83	10.17	35.80	0.00	18.31	17.52 <-- suspend here
6349.58	1.52	0.22	0.00	67.22	0.00  <-- first line after resume
57.27	4.60	0.00	0.00	0.00	0.00
^C82.47	7.22	0.00	0.00	0.00	0.00
Comment 16 kostadin.karaivanov 2021-09-27 09:48:55 UTC
Does it still needs more info and what it could be ?
Comment 17 kostadin.karaivanov 2022-06-29 06:23:32 UTC
with the update to kernel 5.18.6-200.fc36.x86_64 I am finally able to achieve s0ix residency. 

I'm not sure what change between  5.17.7-300.fc36.x86_64 and 5.18 fixed it but now it is OK.
Comment 18 Zhang Rui 2022-06-29 07:49:01 UTC
Good to know.
But once it works, at least we can git bisect when the problem come back again.

Bug closed. Mark as unreproducible as we don't know what exactly the fix is for now.
Comment 19 kostadin.karaivanov 2022-06-30 05:33:06 UTC
I can still try to bisect it as time permits. Will post update here if appropriate.
Comment 20 kostadin.karaivanov 2022-06-30 09:56:33 UTC
booting with vanilla 5.17.0 also gets s0ix working so it must have been the firmware upgrade to version 1.14.0 that fixed it.
Today's upgrade to bios firmware 1.15.1 with fedora kernel 5.18.7-200 is good too.

Note You need to log in before you can comment on or make changes to this bug.