Bug 48981 (cybercyst)

Summary: commit breaks the ability to turn on and off my nvidia optimus card
Product: Power Management Reporter: Forrest Loomis (cybercyst)
Component: Run-Time-PMAssignee: Huang Ying (ying.huang)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, kam1kaz3, lenb, peter, rui.zhang, ying.huang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.6 and up Subsystem:
Regression: No Bisected commit-id:

Description Forrest Loomis 2012-10-17 22:54:48 UTC
I recently upgraded to linux 3.6 in the testing repo in Arch Linux and to my dismay found that bbswitch was no longer able to power on and off my NVIDIA optimus card.

I bisected the kernel and found that it stopped working at commit 71a83bd727cc31c5fe960c3758cb396267ff710e 

I posted a bug report with the programmers behind bumblebee / bbswitch at https://github.com/Bumblebee-Project/bbswitch/issues/35 and we could use some help finding out why we are seeing this error!

Thanks so much!
Comment 1 Len Brown 2012-10-18 00:49:54 UTC
there have been some changes recently,
can you verify that this is still a problem in 3.7-rc1?
Comment 2 Huang Ying 2012-10-18 02:06:41 UTC
(In reply to comment #0)
> I recently upgraded to linux 3.6 in the testing repo in Arch Linux and to my
> dismay found that bbswitch was no longer able to power on and off my NVIDIA
> optimus card.
> 
> I bisected the kernel and found that it stopped working at commit
> 71a83bd727cc31c5fe960c3758cb396267ff710e 
> 
> I posted a bug report with the programmers behind bumblebee / bbswitch at
> https://github.com/Bumblebee-Project/bbswitch/issues/35 and we could use some
> help finding out why we are seeing this error!

Can you try to disable the runtime PM for PCIe bridge of NVIDIA card?
Comment 3 Forrest Loomis 2012-10-18 02:07:23 UTC
... Could you provide instructions to do that?
Comment 4 Forrest Loomis 2012-10-18 02:20:25 UTC
Linux 3.4-rc1 also does not allow bbswitch to turn on and off my NVIDIA card.
Comment 5 Forrest Loomis 2012-10-18 02:27:31 UTC
Whoops, I mean 3.7-rc1...
Comment 6 Forrest Loomis 2012-10-18 02:39:37 UTC
FIXED: 
I changed /etc/laptop-mode/conf.d/runtime-pm.conf and changed the line

CONTROL_RUNTIME_PM="auto"

to:

CONTROL_RUNTIME_PM="0"

Everything works as expected now.

Laptop-mode-tools' default settings were to blame here...
Thanks Huang Ying, your post helped me dig up the solution!
Comment 7 Huang Ying 2012-10-18 02:48:57 UTC
(In reply to comment #6)
> FIXED: 
> I changed /etc/laptop-mode/conf.d/runtime-pm.conf and changed the line
> 
> CONTROL_RUNTIME_PM="auto"
> 
> to:
> 
> CONTROL_RUNTIME_PM="0"
> 
> Everything works as expected now.
> 
> Laptop-mode-tools' default settings were to blame here...
> Thanks Huang Ying, your post helped me dig up the solution!

I still think this maybe a bumblebee / bbswitch issue.  It need to resume device before operating on the device.  Can you suggest that?
Comment 8 Peter Wu 2012-10-18 10:40:27 UTC
> I still think this maybe a bumblebee / bbswitch issue.  It need to resume
> device before operating on the device.  Can you suggest that?

bbswitch (the kernel module, bumblebee is just a user) does not claim a device, it just does a one-shot power on/off action.

The device is woken by calling the _PS0 ACPI method. The code that brings the card back to life is:

    pci_set_power_state(pdev, PCI_D0);
    pci_restore_state(pdev);
    pci_enable_device(pdev);
    pci_set_master(pdev);

And off:

    pci_save_state(pdev);
    pci_clear_master(pdev);
    pci_disable_device(pdev);
    pci_set_power_state(pdev, PCI_D3hot);
(In my unpushed repo I have changed this to PCI_D3cold since the device actually sleeps that deep, is that a sensible thing to do?)

My Nvidia video card is connected through the PCIe port according to sysfs:
0000:01:00.0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0

lspci
00:01.0 PCI bridge [0604]: Intel Corporation Core Processor PCI Express x16 Root Port [8086:0045] (rev 02)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF108 [GeForce GT 425M] [10de:0df0] (rev a1)

The following holds regardless of having the nvidia card powered on or off.
Once I enable runtime PM for 00:01.0 (the PCIe port), `lspci` reports the correct values, but /proc/bus/pci/01/00 reports all bits on. bbswitch tries to determine whether the card is on or off by reading the PCI config space: if it returns all bits on, the nvidia card is assumed to be off. The code for that is:

    pci_read_config_dword(dis_dev, 0, &cfg_word);
Apparently, this fails with run-time PM for PCIe enabled. The bbswitch module does not really operate a device, it only turns the card on and off. When a driver like nouveau/nvidia is loaded, bbswitch refuses to disable the nvidia card. Hence I do not attempt to register as a driver as that makes it not possible for other drivers to register, right? Any suggestions?

When I load the nouveau driver, reading the pci config space works fine (that is not all bits are on)
Comment 9 Huang Ying 2012-10-19 05:13:56 UTC
(In reply to comment #8)
> > I still think this maybe a bumblebee / bbswitch issue.  It need to resume
> > device before operating on the device.  Can you suggest that?
> 
> bbswitch (the kernel module, bumblebee is just a user) does not claim a
> device,
> it just does a one-shot power on/off action.
> 
> The device is woken by calling the _PS0 ACPI method. The code that brings the
> card back to life is:
> 
>     pci_set_power_state(pdev, PCI_D0);
>     pci_restore_state(pdev);
>     pci_enable_device(pdev);
>     pci_set_master(pdev);
> 
> And off:
> 
>     pci_save_state(pdev);
>     pci_clear_master(pdev);
>     pci_disable_device(pdev);
>     pci_set_power_state(pdev, PCI_D3hot);

Why not use pm_runtime_suspend/pm_runtime_resume for the device?  If that is impossible, you need to suspend/resume the parent of pdev (bridge) manually too.
Comment 10 Peter Wu 2012-10-19 15:46:25 UTC
I believe that pm_runtime_{suspend,resume} is not possible for this driver because it binds the device. Is that a correct observation from me? I went the manual way by calling pm_runtime_{get,put}_sync on the bus device (assuming that is the parent).

So, to make this bug more relevant to the Linux kernel instead of my project, this runtime PM behavior breaks /proc/bus/pci/??/??.?. Before 3.6, this file could be read even if there is no driver available. After this commit, this file reads empty. Note that /sys/bus/pci/devices/0000:??:??.?/config returns the correct values.
Comment 11 Huang Ying 2012-10-24 07:12:23 UTC
(In reply to comment #10)
> I believe that pm_runtime_{suspend,resume} is not possible for this driver
> because it binds the device. Is that a correct observation from me? I went
> the
> manual way by calling pm_runtime_{get,put}_sync on the bus device (assuming
> that is the parent).

Yes.  That should works.

> So, to make this bug more relevant to the Linux kernel instead of my project,
> this runtime PM behavior breaks /proc/bus/pci/??/??.?. Before 3.6, this file
> could be read even if there is no driver available. After this commit, this
> file reads empty. Note that /sys/bus/pci/devices/0000:??:??.?/config returns
> the correct values.

Oh, yes, that is a bug.  I will fix it.  But I think you need open another bugzilla item?
Comment 12 Peter Wu 2012-10-28 11:13:36 UTC
I see you have fixed that /proc/acpi bug http://www.spinics.net/lists/linux-pci/msg18282.html

This issue can be closed as FIXED!
Comment 13 Rafael J. Wysocki 2012-11-02 09:52:22 UTC
*** Bug 49031 has been marked as a duplicate of this bug. ***
Comment 14 Florian Mickler 2012-11-11 18:44:27 UTC
A patch referencing this bug report has been merged in Linux v3.7-rc5:

commit b3c32c4f9565f93407921c0d8a4458042eb8998e
Author: Huang Ying <ying.huang@intel.com>
Date:   Thu Oct 25 09:36:03 2012 +0800

    PCI/PM: Fix proc config reg access for D3cold and bridge suspending
Comment 15 Zhang Rui 2012-11-13 06:37:49 UTC
Ying, can this bug be closed?
Comment 16 Huang Ying 2012-11-13 07:01:15 UTC
(In reply to comment #15)
> Ying, can this bug be closed?

Yes.  I think so.