Bug 201465

Summary: Increased power consumption when using runtime-pm to suspend nvidia gpu
Product: Drivers Reporter: Maik Freudenberg (hhfeuer)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: cevelnet, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.17 Subsystem:
Regression: No Bisected commit-id:

Description Maik Freudenberg 2018-10-18 11:19:07 UTC
Test-setup:
Using runtime pm to suspend to turn off the discrete gpu of this system:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor DRAM Controller [8086:0c04] (rev 06)
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller [8086:0c01] (rev 06)
00:01.1 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller [8086:0c05] (rev 06)
00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06)
00:03.0 Audio device [0403]: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller [8086:0c0c] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI [8086:8c31] (rev 05)
00:16.0 Communication controller [0780]: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 [8086:8c3a] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 [8086:8c2d] (rev 05)
00:1b.0 Audio device [0403]: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller [8086:8c20] (rev 05)
00:1c.0 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 [8086:8c16] (rev d5)
00:1c.4 PCI bridge [0604]: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #5 [8086:8c18] (rev d5)
00:1d.0 USB controller [0c03]: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 [8086:8c26] (rev 05)
00:1f.0 ISA bridge [0601]: Intel Corporation HM86 Express LPC Controller [8086:8c49] (rev 05)
00:1f.2 SATA controller [0106]: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] [8086:8c03] (rev 05)
00:1f.3 SMBus [0c05]: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller [8086:8c22] (rev 05)
07:00.0 3D controller [0302]: NVIDIA Corporation GK208M [GeForce GT 740M] [10de:1292] (rev a1)
08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136] (rev 07)
09:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev 73)

by issueing
echo "auto" > /sys/bus/pci/devices/<nvidia-pci-id>/power/control
and checking the status of the gpu through
cat /sys/bus/pci/devices/<nvidia-pci-id>/power/runtime_status
'suspended'

Power consumption as per battery discharge rate displayed by powertop, previously cross-checked with an external power meter, pretty accurate.
All datapoints taken with X/Gnome running. Without X, about 0.4W less. All data in idle state.
1) Proprietary driver loaded, for completeness only.
a. Nvidia gpu used, idle: 14.7W
b. Intel gpu used, Nvidia gpu ON: 14.4W

2) No driver loaded, Intel gpu used
a. Nvidia gpu ON: 15.2W
b. bbswitch used to turn Nvidia gpu off: 12.7W
c. runpm set to 'auto', state 'suspended': 16.3W

Any advice to investigate this?
Comment 1 Tolga Cakir 2018-12-18 14:37:11 UTC
I've posted about this in a bug report on freedesktop bugzilla (https://bugs.freedesktop.org/show_bug.cgi?id=108058#c1). But it seems to be a kernel issue. My dmesg / logfile can be found there.

Pretty much facing the same issue on a Dell M3800. It features a Haswell processor (i7-4702HQ) and Nvidia K1100M (Kepler, NVE7), which is very similar to the GT740M.

Powertop reports ~7W idle after fresh boot, with the system entering package C-State 6. After S3 suspend / resume, the system doesn't enter PC6 anymore and idle power consumption is at around ~15W.

Using the nouveau driver, echoing lowest p-state to /sys/kernel/debug/dri/*/pstate gets my power consumption back to ~7W.

Blacklisting nouveau and using bbswitch instead fixes this behavior. The power consumption is as expected in that case.

I wasn't able to reproduce this issue on 4.14 LTS kernel. I've tested 4.19.2, where this issue is present. It was also present in 4.20-rc5.

Not entirely sure though, if this really is a regression or something else triggers it.

I will re-upload my files from freedesktop here, when I get back home.

Please let me know, how I can help to get this issue narrowed down. I have experience in kernel patching / building / testing.

Cheers,
Tolga
Comment 2 Zhang Rui 2019-07-01 06:13:27 UTC
(In reply to Tolga Cakir from comment #1)
> I've posted about this in a bug report on freedesktop bugzilla
> (https://bugs.freedesktop.org/show_bug.cgi?id=108058#c1). But it seems to be
> a kernel issue. My dmesg / logfile can be found there.
> 
Reassign this to the graphics experts, but my understanding is that kernel graphics bugs are also tracked in freedesktop bugzilla.