Bug 201465 - Increased power consumption when using runtime-pm to suspend nvidia gpu
Summary: Increased power consumption when using runtime-pm to suspend nvidia gpu
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-18 11:19 UTC by Maik Freudenberg
Modified: 2019-07-01 06:13 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Maik Freudenberg 2018-10-18 11:19:07 UTC
Test-setup:
Using runtime pm to suspend to turn off the discrete gpu of this system:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller [8086:0c04] (rev 06)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06)
00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller [8086:0c05] (rev 06)
00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06)
00:03.0 Audio device [0403]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [8086:0c0c] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05)
00:16.0 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 [8086:8c3a] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05)
00:1b.0 Audio device [0403]: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller [8086:8c20] (rev 05)
00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 [8086:8c16] (rev d5)
00:1c.4 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 [8086:8c18] (rev d5)
00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05)
00:1f.0 ISA bridge [0601]: Intel Corporation HM86 Express LPC Controller [8086:8c49] (rev 05)
00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086:8c03] (rev 05)
00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086:8c22] (rev 05)
07:00.0 3D controller [0302]: NVIDIA Corporation GK208M [GeForce GT 740M] [10de:1292] (rev a1)
08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136] (rev 07)
09:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev 73)

by issueing
echo "auto" > /sys/bus/pci/devices/<nvidia-pci-id>/power/control
and checking the status of the gpu through
cat /sys/bus/pci/devices/<nvidia-pci-id>/power/runtime_status
'suspended'

Power consumption as per battery discharge rate displayed by powertop, previously cross-checked with an external power meter, pretty accurate.
All datapoints taken with X/Gnome running. Without X, about 0.4W less. All data in idle state.
1) Proprietary driver loaded, for completeness only.
a. Nvidia gpu used, idle: 14.7W
b. Intel gpu used, Nvidia gpu ON: 14.4W

2) No driver loaded, Intel gpu used
a. Nvidia gpu ON: 15.2W
b. bbswitch used to turn Nvidia gpu off: 12.7W
c. runpm set to 'auto', state 'suspended': 16.3W

Any advice to investigate this?
Comment 1 Tolga Cakir 2018-12-18 14:37:11 UTC
I've posted about this in a bug report on freedesktop bugzilla (https://bugs.freedesktop.org/show_bug.cgi?id=108058#c1). But it seems to be a kernel issue. My dmesg / logfile can be found there.

Pretty much facing the same issue on a Dell M3800. It features a Haswell processor (i7-4702HQ) and Nvidia K1100M (Kepler, NVE7), which is very similar to the GT740M.

Powertop reports ~7W idle after fresh boot, with the system entering package C-State 6. After S3 suspend / resume, the system doesn't enter PC6 anymore and idle power consumption is at around ~15W.

Using the nouveau driver, echoing lowest p-state to /sys/kernel/debug/dri/*/pstate gets my power consumption back to ~7W.

Blacklisting nouveau and using bbswitch instead fixes this behavior. The power consumption is as expected in that case.

I wasn't able to reproduce this issue on 4.14 LTS kernel. I've tested 4.19.2, where this issue is present. It was also present in 4.20-rc5.

Not entirely sure though, if this really is a regression or something else triggers it.

I will re-upload my files from freedesktop here, when I get back home.

Please let me know, how I can help to get this issue narrowed down. I have experience in kernel patching / building / testing.

Cheers,
Tolga
Comment 2 Zhang Rui 2019-07-01 06:13:27 UTC
(In reply to Tolga Cakir from comment #1)
> I've posted about this in a bug report on freedesktop bugzilla
> (https://bugs.freedesktop.org/show_bug.cgi?id=108058#c1). But it seems to be
> a kernel issue. My dmesg / logfile can be found there.
> 
Reassign this to the graphics experts, but my understanding is that kernel graphics bugs are also tracked in freedesktop bugzilla.

Note You need to log in before you can comment on or make changes to this bug.