Bug 52381 - Updating from kernel 3.5 to 3.6 doubles idle power consumption due to acpi_call laptop mode bug
Summary: Updating from kernel 3.5 to 3.6 doubles idle power consumption due to acpi_ca...
Status: CLOSED INVALID
Alias: None
Product: Power Management
Classification: Unclassified
Component: Run-Time-PM (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Huang Ying
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-06 15:46 UTC by da_fox
Modified: 2013-01-28 23:58 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.6
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Dmesg of a fresh boot of kernel 3.7, without the patch (57.08 KB, application/download)
2013-01-06 15:58 UTC, da_fox
Details
Dmesg of a fresh boot of kernel 3.7, with the patch (57.02 KB, application/download)
2013-01-06 15:59 UTC, da_fox
Details
Full output of lspci -vv (33.37 KB, text/plain)
2013-01-06 16:00 UTC, da_fox
Details

Description da_fox 2013-01-06 15:46:02 UTC
Some time ago I noticed that my laptop's battery life had significantly decreased. Some investigation revealed that the idle power consumption had doubled, because the built-in nvidia graphics card would not turn off any more (I have an optimus-enabled laptop, which has both an intel and an nvidia graphics card).

I do not use the nvidia card at all (I don't even have any drivers for it installed). I rely on the acpi_call module to make an acpi call to disable the nvidia card at boot. I.e. something like the following snippet from my on/off script, which is based on a script from the bumblebee project:
---8<---------
			try_call "\_SB.PCI0.PEG0.PEGP._DSM" "{0xF8,0xD8,0x86,0xA4,0xDA,0x0B,0x1B,0x47,0xA7,0x2B,0x60,0x42,0xA6,0xB5,0xBE,0xE0}" "0x100 0x1A {0x1,0x0,0x0,0x3}" 
			# ok to turn off: Buffer {0x59 0x0 0x0 0x11}
			# is already off: Buffer {0x41 0x0 0x0 0x11}
			try_call "\_SB.PCI0.PEG0.PEGP._PS3"
--->8---------



--- Steps to reproduce---
1) Fully charge laptop, then disconnect from charger (so that you are running on battery power).
2) Boot into kernel 3.5
3) Make sure system is idle
4) cat /sys/class/power_supply/BAT0/current_now # outputs a value around 10000 (10mW)
5) (Re)boot into kernel 3.6 (or later)
6) Make sure system is idle
7) cat /sys/class/power_supply/BAT0/current_now # outputs a value around 20000 (20mW)



--- Expected result ---
The reported values for 'current_now' should be the same for kernel 3.5 and 3.6



--- Actual result ---
'current_now' is doubled for kernel 3.6, compared to kernel 3.5.



--- Hardware ---
I have a 'Dell XPS 15 (L502x)' laptop. lspci reports the following for the nvidia card:
lspci -v -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 540M] (rev ff) (prog-if ff)
	!!! Unknown header type 7f



--- Bisection results ---
The following commit was identified as the first bad commit:
---8<---------
commit 71a83bd727cc31c5fe960c3758cb396267ff710e
Author: Zheng Yan <zheng.z.yan@intel.com>
Date:   Sat Jun 23 10:23:49 2012 +0800

    PCI/PM: add runtime PM support to PCIe port
    
    This patch adds runtime PM support to PCIe port.  This is needed by
    PCIe D3cold support, where PCIe device without ACPI node may be
    powered on/off by PCIe port.
    
    Because runtime suspend is broken for some chipsets, a black list is
    used to disable runtime PM support for these chipsets.
    
    Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl>
    Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
    Signed-off-by: Huang Ying <ying.huang@intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
--->8---------



I have verified that the following patch fixes the issue for me on kernel 3.7:
--->8---------
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 0761d90..0a11bc5 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -207,8 +207,8 @@ static int __devinit pcie_portdrv_probe(struct pci_dev *dev,
         * it by default.
         */
        dev->d3cold_allowed = false;
-       if (!pci_match_id(port_runtime_pm_black_list, dev))
-               pm_runtime_put_noidle(&dev->dev);
+       //if (!pci_match_id(port_runtime_pm_black_list, dev))
+               //pm_runtime_put_noidle(&dev->dev);
 
        return 0;
 }
--->8---------

I think the problem is that the return value from pci_match_id() is not being interpreted correctly: the exclamation mark should not be present. At present the blacklist is empty, and hence, presumably, pm_runtime_put_noidle() should not get called at all. However right now pm_runtime_put_noidle() is always called.

I will attach dmesgs shortly.
Please let me know if any additional information is required.
Comment 1 da_fox 2013-01-06 15:58:56 UTC
Created attachment 90531 [details]
Dmesg of a fresh boot of kernel 3.7, without the patch
Comment 2 da_fox 2013-01-06 15:59:22 UTC
Created attachment 90541 [details]
Dmesg of a fresh boot of kernel 3.7, with the patch
Comment 3 da_fox 2013-01-06 16:00:09 UTC
Created attachment 90551 [details]
Full output of lspci -vv
Comment 4 Rafael J. Wysocki 2013-01-08 11:51:12 UTC
You're right that the check in pci_match_id() in pcie_portdrv_probe() is wrong.

I'll post a patch to fix this later today.
Comment 5 Rafael J. Wysocki 2013-01-09 21:38:56 UTC
On a second thought, the check is really as intended (i.e. we only want to _put ports that aren't blacklisted, but the blacklist is empty, so we do that for all of them).
Comment 6 Rafael J. Wysocki 2013-01-09 21:40:16 UTC
Assigned to Ying.
Comment 7 Huang Ying 2013-01-10 00:46:03 UTC
Could you try the latest bbswitch?  It appears that they have resolved an similar issue recently.

Or could you try to call your nvdia graphic card turn off script before your runtime PM enable script (usually be laptop-mode-tools if you do not write it by yourself).
Comment 8 da_fox 2013-01-14 17:45:23 UTC
Ok, so I finally got around to testing the bbswitch module. This does seem to work. I no longer use the acpi_call script (or module).

Since bbswitch resolves the issue for me, I did not test putting the acpi_call before laptop-mode starts, I hope this is ok.

The question that remains is, is this then a bug in acpi_call, or not a bug at all? What is the reason that this commit makes switching the nvidia card on/off through the normal ACPI call fail? I'd like to point out that the setup with acpi_call has worked for almost two years for me... although I'm fine with using bbswitch instead :)
Comment 9 Len Brown 2013-01-15 00:54:56 UTC
acpi_call is not upstream, and never will be.

if acpi_call shipped with a distro, then a bug should
be filed against that distro asking that it be removed.

closed.

Note You need to log in before you can comment on or make changes to this bug.