Bug 42696 - Wrong ACPI handle is being detected for NVIDIA graphics card on Lenovo Ideapad Y470/Y570
Summary: Wrong ACPI handle is being detected for NVIDIA graphics card on Lenovo Ideapa...
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Video (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Aaron Lu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-30 18:48 UTC by Peter Wu
Modified: 2013-10-06 08:59 UTC (History)
12 users (show)

See Also:
Kernel Version: 3.3.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump for Lenovo Ideapad Y470 (356.62 KB, text/plain)
2012-01-30 18:49 UTC, Peter Wu
Details
acpidump for Toshiba Satellite P870 (367.02 KB, text/plain)
2012-06-03 14:23 UTC, Peter Wu
Details
ACPI-return-first-_ADR-match-for-acpi_get_child.patch (2.65 KB, patch)
2012-11-11 22:25 UTC, Peter Wu
Details | Diff
PCI / ACPI: Rework ACPI device node objects lookup (4.45 KB, patch)
2012-12-26 21:42 UTC, Rafael J. Wysocki
Details | Diff
PCI / ACPI: Rework ACPI device node objects lookup (4.44 KB, patch)
2012-12-26 21:44 UTC, Rafael J. Wysocki
Details | Diff

Description Peter Wu 2012-01-30 18:48:09 UTC
On the Lenovo Ideapad Y470 and Lenovo Ideapad Y570, the kernel assigns the \_SB.PCI0.PEG0.VGA handle to the PCI device (possibly because the _DOS method is found on the handle?)

However, the correct _DSM, _ROM, _PS0 and _PS3 can be found on the  \_SB.PCI0.PEG0.PEGP handle. _PSx on the VGA handle is basically a no-op, it only gets/sets the _PSC variable. _ROM does not exist and the _DSM method is not useful comparing it to the PEGP._DSM method.

Due to this issue, the nouveau driver fails to load the vbios (because _ROM does not exist on VGA). Similarly, the proprietary nvidia driver fails to load. As for the _PS0 and _PS0 methods, using the pci_set_power_state methods does not really disable the PCI device.
Comment 1 Peter Wu 2012-01-30 18:49:01 UTC
Created attachment 72236 [details]
acpidump for Lenovo Ideapad Y470
Comment 2 Peter Wu 2012-01-30 18:51:26 UTC
dmidecode information for the affected systems:

system-manufacturer   : LENOVO
system-product-name   : 20090                           
system-version        : Lenovo IdeaPad Y470             
baseboard-manufacturer: LENOVO
baseboard-product-name: Base Board Product Name
baseboard-version     : Base Board Version
bios-vendor           : LENOVO
bios-version          : 47CN30WW(V2.08)
bios-release-date     : 08/01/2011

system-manufacturer   : LENOVO
system-product-name   : 20091                           
system-version        : Lenovo IdeaPad Y570             
baseboard-manufacturer: LENOVO
baseboard-product-name: Base Board Product Name
baseboard-version     : Base Board Version
bios-vendor           : LENOVO
bios-version          : 47CN30WW(V2.08)
bios-release-date     : 08/01/2011
Comment 3 wliment 2012-02-05 08:06:07 UTC
@peter i think y460 with optimus nvidia card have the simillar problem,i have report somethings in your bbsswitch project.but i don't how to mention my problem.

so how i can process to find  if i have the same problem you are mention.
Comment 4 Peter Wu 2012-02-05 10:00:24 UTC
If you have the IdeaPad Y470 or IdeaPad Y570, you are affected for sure. Until
this bug is fixed, you can use the ugly hack mentioned at
https://github.com/Bumblebee-Project/bbswitch/issues/2#issuecomment-3797568
Comment 5 Peter Wu 2012-06-03 14:23:04 UTC
Created attachment 73502 [details]
acpidump for Toshiba Satellite P870

Another affected machine, a Toshiba Satellite P870 this time, running on 3.3.7.
https://github.com/Bumblebee-Project/Bumblebee/issues/173

Mapping of PCI Bus ID to their ACPI handles:

0000:00:01.0 060400 \_SB_.PCI0.PEG0
0000:00:02.0 030000 \_SB_.PCI0.GFX0
0000:01:00.0 030000 \_SB_.PCI0.PEG0.VGA_

The correct _ROM handle for the nvidia device exists on \_SB.PCI0.PEG0.PEGP.

dmidecode details:
baseboard-manufacturer: TOSHIBA
baseboard-product-name: Portable PC
baseboard-version     : MP
system-manufacturer   : TOSHIBA
system-product-name   : SATELLITE P870
system-version        : PSPLBE-01V00HFR
bios-vendor           : Insyde Corp.
bios-version          : 1.10
bios-release-date     : 03/21/2012
Comment 6 Peter Wu 2012-11-11 00:31:56 UTC
So let's check this again (the below is against 3.7).

- acpi_scan_init
 - acpi_bus_scan
  - acpi_bus_check_add
   - acpi_add_single_object
    - acpi_device_set_id
     - if (acpi_is_video_device(device))
        acpi_add_id(device, ACPI_VIDEO_HID) // add "LNXVIDEO" to PNP ids

The following video devices are detected (incomplete list):
- \_SB.PCI0.PEG0.PEGP due to _ROM
- \_SB.PCI0.PEG0.VGA due to _DOS

Now, the part that is responsible for setting archdata.acpi_handle (drivers/acpi/glue.c). Let's assume the nvidia PCI device at 01:00.0.

- ...
 - pci_bus_add_device
  - device_add
   - platform_notify points to acpi_platform_notify on x86 and ia64
     - type = acpi_get_bus_type() (returns acpi_pnp_bus)
     - type->find_device() calls acpi_pnp_find_device
      - loop through all ACPI bus things?
       - acpi_pnp_match: find the first (?) unbound device with its PNP id list containing "LNXVIDEO"
     - if a handle was found, call acpi_bind_one which basically sets dev->archdata.acpi_handle
(please correct me if I'm wrong, having a "struct acpi_bus_type" and a "acpi_bus_type" of type "struct bus_type" (in drivers/acpi/scan.c) is not helpful) 

Looking at a log from https://lists.launchpad.net/bumblebee/msg00069.html, the "video" module seems to be able to find the correct handle:
[    8.935976] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/device:2f/LNXVIDEO:00/input/input6
[    8.936011] ACPI: Video Device [PEGP] (multi-head: no  rom: yes  post: no)
[    8.938801] acpi device:3d: registered as cooling_device11
[    8.939038] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:01/input/input7
[    8.939091] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[    8.939127] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0

Request for owners of these laptops (notable IdeaPad Y570), please attach the output of the below, without applying the hack (one affected machine is enough):

(cd /sys/devices/LNXSYSTM:00/device:00;grep --include=path -r .)
ls -l /sys/bus/acpi/drivers/video
dmesg


If this does not give useful information, then the next step would be getting a kernel log with ACPI debugging enabled.
Comment 7 Peter Wu 2012-11-11 22:25:51 UTC
Created attachment 86101 [details]
ACPI-return-first-_ADR-match-for-acpi_get_child.patch

Looking at some logs that I have received, this has nothing to do with PNP which gets detected properly. So, it must be PCI. Digging further on that and with feedback from a Lenovo Ideapad Y570 user, I managed to track down the issue.

Please test this patch, comments are included.
Comment 8 Peter Wu 2012-11-12 22:29:47 UTC
Confirmed to work for a Lenovo Ideapad Y480
https://github.com/Bumblebee-Project/bbswitch/issues/2#issuecomment-10273482

I have forwarded the patch with an update commit message to the ACPI maintainers.
Comment 9 Zhang Rui 2012-11-13 08:35:29 UTC
to me, the problem is that pnp_bus_type should not bind a device by just checking the pnpid.

I think we should set pnp_dev->dev.acpi_handle directly when creating the pnp devices. and in acpi_pnp_match(), we should comparing the acpi_handle rather than comparing the pnp_ids.
rafael, what do you think?
Comment 10 Peter Wu 2012-11-13 10:09:27 UTC
@Zhang, the PNP ID<->ACPI handle mapping is correctly performed in this bug, but the PCI Bus ID <-> ACPI handle gets misdetected.
Comment 11 Zhang Rui 2012-11-21 01:54:26 UTC
(In reply to comment #0)
> On the Lenovo Ideapad Y470 and Lenovo Ideapad Y570, the kernel assigns the
> \_SB.PCI0.PEG0.VGA handle to the PCI device (possibly because the _DOS method
> is found on the handle?)
> 
> However, the correct _DSM, _ROM, _PS0 and _PS3 can be found on the 
> \_SB.PCI0.PEG0.PEGP handle. _PSx on the VGA handle is basically a no-op, it
> only gets/sets the _PSC variable. _ROM does not exist and the _DSM method is
> not useful comparing it to the PEGP._DSM method.
> 
> Due to this issue, the nouveau driver fails to load the vbios (because _ROM
> does not exist on VGA). 

for this Lenovo laptop, what is the device node that the nouveau driver binds?
say, /sys/bus/pci/...
Comment 12 Peter Wu 2012-11-21 08:50:23 UTC
What device node are you referring to, the PCI device 0000:01:00.0 that nouveau tries to use?
Comment 13 Zhang Rui 2012-11-23 07:24:39 UTC
yes.
please attach the output of lspci and "ls /sys/bus/pci/drivers/nouveau/".
Comment 14 Giorgio 2012-11-23 08:16:26 UTC
seems to work for my y580
Comment 15 Rafael J. Wysocki 2012-12-26 21:42:22 UTC
Created attachment 89721 [details]
PCI / ACPI: Rework ACPI device node objects lookup

Can you please check if the attached patch makes a difference?
Comment 16 Rafael J. Wysocki 2012-12-26 21:43:23 UTC
Sorry, wrong patch.
Comment 17 Rafael J. Wysocki 2012-12-26 21:44:29 UTC
Created attachment 89731 [details]
PCI / ACPI: Rework ACPI device node objects lookup

This one should apply on top of v3.8-rc1 (or current Linus' tree).
Comment 18 Peter Wu 2012-12-28 18:39:36 UTC
The patch from comment 17 has been confirmed to work [1]:

TheSiege wrote:
> I tested the patch mentioned in c17 from the bug with rc-1
> it still leaves me with a working optirun

Also interesting to note is that the machine does not boot at all without the patch [1]:
> yes this kernel has no previous patches or hack; and when
> I revert the patch my kernel can't even boot

 [1]: https://github.com/Bumblebee-Project/bbswitch/issues/2#issuecomment-11711656
Comment 19 Rafael J. Wysocki 2012-12-28 19:49:18 UTC
Thanks for testing.

Well, OK.  Let's try to push it, then.
Comment 20 Aaron Lu 2013-03-06 06:30:01 UTC
According to https://lkml.org/lkml/2013/1/23/451, this patch needs more discussion.
Comment 21 Philip 2013-03-10 18:34:16 UTC
I have a new Toshiba P870 and it seems to have a similar problem.

How may I help confirm that this problem with systems released a year and a half ago is still a problem with new systems?

I find that bumblebee's optirun works if I use Gary Gatling's patched kernels, but not the fedora packaged ones.

BIOS Information
        Vendor: Insyde Corp.
        Version: 6.30
        Release Date: 01/17/2013

System Information
        Manufacturer: TOSHIBA
        Product Name: Satellite P870
        Version: PSPLFU-039011

Base Board Information
        Manufacturer: TOSHIBA
        Product Name: Portable PC
        Version: MP

What other info would be helpful?
Comment 22 Peter Wu 2013-03-15 12:42:58 UTC
Rafael, it seems that you have pushed a change that fixes this issue (reported by a user of 3.9-rc2[1]), essentially doing the same as comment 7:

commit 33f767d767e9a684e9cd60704d4c049a2014c8d5
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Thu Jan 10 13:13:49 2013 +0100

    ACPI: Rework acpi_get_child() to be more efficient
    
    Observe that acpi_get_child() doesn't need to use the helper
    struct acpi_find_child structure and change it to work without it.
    Also, using acpi_get_object_info() to get the output of _ADR for the
    given device is overkill, because that function does much more than
    just evaluating _ADR (let alone the additional memory allocation
    done by it).
    
    Moreover, acpi_get_child() doesn't need to loop any more once it has
    found a matching handle, so make it stop in that case.  To prevent
    the results from changing, make it use do_acpi_find_child() as
    a post-order callback.
    
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

I cite a mail from Len:
On Friday 16 November 2012 11:25:47 Len Brown wrote:
> Peter,
> 
> It is great that you debugged this issue
> and proved where the problem is.
> 
> However, this patch can't possibly be the right way to go --
> as it is just as broken as the code it replaces.
> Were I to bet, I'd say that it will break as many machines
> as it fixes.  And when it does, where are we?
> 
> Clearly we need to be using a more clever search algorithm.
> 
> thanks,
> Len Brown, Intel Open Source Technology Center

So, apparently the bug is fixed in a correct way now? If another user can confirm it here, I'll mark it as resolved.

 [1]: https://github.com/Bumblebee-Project/bbswitch/issues/2#issuecomment-14939587
Comment 23 Aaron Lu 2013-03-26 07:59:20 UTC
Hi Philip,

If you have the same problem, can you please verify if 3.9-rc2 fixed your problem as suggested by Peter? Thanks.
Comment 24 Peter Wu 2013-03-29 09:21:17 UTC
I haven't personally verified it myself, but it has been confirmed by two people.

Fixed in Linux 3.9-rc2 and 3.8.5.
Comment 25 Aaron Lu 2013-03-29 14:59:47 UTC
Thanks Peter for the update.

The below commit fixed the problem and has entered Linus tree as of v3.9-rc1.

commit 33f767d767e9a684e9cd60704d4c049a2014c8d5
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Thu Jan 10 13:13:49 2013 +0100

    ACPI: Rework acpi_get_child() to be more efficient
Comment 26 Rafael J. Wysocki 2013-07-24 01:15:10 UTC
(In reply to Peter from comment #7)
> Created attachment 86101 [details]
> ACPI-return-first-_ADR-match-for-acpi_get_child.patch
> 
> Looking at some logs that I have received, this has nothing to do with PNP
> which gets detected properly. So, it must be PCI. Digging further on that
> and with feedback from a Lenovo Ideapad Y570 user, I managed to track down
> the issue.

I don't think you've ever explained what exactly you tracked the issue down to, which is kind of important in the context of bug #60561, so can you please tell me?
Comment 27 Peter Wu 2013-07-24 09:30:12 UTC
The original issue was that while iterating the list of devices, the last match would be returned. It seemed logical to me that the first result is immediately returned for efficiency reasons. Hence I suggested to return on the first match instead of continuing the iteration and ending up at the last device.

Using a dummy module that walked on the parent of the PCI video device 0000:01:00.0, (acpi_walk_namespace(ACPI_TYPE_DEVICE, parent, 1, find_child, NULL, NULL, NULL);) resulted in the following:
[  364.003582] walk: Walking through all handles...
[  364.003679] walk: Address: 00000000 (valid); handle: \_SB_.PCI0.PEG0.PEGP
[  364.003784] walk: Address: 00000001 (valid); handle: \_SB_.PCI0.PEG0.VGA1
[  364.003872] walk: Address: 00000000 (valid); handle: \_SB_.PCI0.PEG0.VGA_
[  364.003882] walk: Walked through all handles

Here, there are valid _ADR methods unlike in bug 60561.
Comment 28 Peter Wu 2013-07-24 09:37:05 UTC
Clarification, 0000:01:00.0 is the PCI video device, its parent (PCI Express Root port PEG0) is at 0000:00:01.0.
Comment 29 xanm 2013-10-06 04:13:12 UTC
Looks like i have this problem on:

MacBookPro3,1
Processor  2.4 GHz Intel Core 2 Duo
Memory  4 GB 667 MHz DDR2 SDRAM
Graphics  NVIDIA GeForce 8600M GT 256 MB

with nvidia driver at first i had: 

[  163.212919] nvidia 0000:01:00.0: irq 48 for MSI/MSI-X
[  163.224747] NVRM: failed to copy vbios to system memory.
[  163.224919] NVRM: RmInitAdapter failed! (0x30:0xffffffff:720)
[  163.224925] NVRM: rm_init_adapter(0) failed

and after some research i found this

nv_acpi_rom_method: failed to evaluate _ROM method!

nouveau driver doesn't work either
Comment 30 Peter Wu 2013-10-06 08:59:31 UTC
@xanm Please fill a new bug and post logs of the nouveau driver instead of the closed-source nvidia. Your machine is so old that it is unlikely to have something to do with this bug (apart from a regression).

While creating a new bug, please include:

- full dmesg with nouveau
- The file generated by: sudo acpidump > acpidump.txt

Note You need to log in before you can comment on or make changes to this bug.