|Summary:||Wrong ACPI handle is being detected for NVIDIA graphics card on Lenovo Ideapad Y470/Y570|
|Product:||ACPI||Reporter:||Peter Wu (peter)|
|Component:||Power-Video||Assignee:||Aaron Lu (aaron.lu)|
|Severity:||normal||CC:||aaron.lu, acpi-bugzilla, alan, anonymousmmn, gjxxx, lenb, pv.bugzilla+kernel, rjw, robert, rui.zhang, wliment, xanm|
acpidump for Lenovo Ideapad Y470
acpidump for Toshiba Satellite P870
PCI / ACPI: Rework ACPI device node objects lookup
PCI / ACPI: Rework ACPI device node objects lookup
Description Peter Wu 2012-01-30 18:48:09 UTC
On the Lenovo Ideapad Y470 and Lenovo Ideapad Y570, the kernel assigns the \_SB.PCI0.PEG0.VGA handle to the PCI device (possibly because the _DOS method is found on the handle?) However, the correct _DSM, _ROM, _PS0 and _PS3 can be found on the \_SB.PCI0.PEG0.PEGP handle. _PSx on the VGA handle is basically a no-op, it only gets/sets the _PSC variable. _ROM does not exist and the _DSM method is not useful comparing it to the PEGP._DSM method. Due to this issue, the nouveau driver fails to load the vbios (because _ROM does not exist on VGA). Similarly, the proprietary nvidia driver fails to load. As for the _PS0 and _PS0 methods, using the pci_set_power_state methods does not really disable the PCI device.
Comment 1 Peter Wu 2012-01-30 18:49:01 UTC
Created attachment 72236 [details] acpidump for Lenovo Ideapad Y470
Comment 2 Peter Wu 2012-01-30 18:51:26 UTC
dmidecode information for the affected systems: system-manufacturer : LENOVO system-product-name : 20090 system-version : Lenovo IdeaPad Y470 baseboard-manufacturer: LENOVO baseboard-product-name: Base Board Product Name baseboard-version : Base Board Version bios-vendor : LENOVO bios-version : 47CN30WW(V2.08) bios-release-date : 08/01/2011 system-manufacturer : LENOVO system-product-name : 20091 system-version : Lenovo IdeaPad Y570 baseboard-manufacturer: LENOVO baseboard-product-name: Base Board Product Name baseboard-version : Base Board Version bios-vendor : LENOVO bios-version : 47CN30WW(V2.08) bios-release-date : 08/01/2011
Comment 3 wliment 2012-02-05 08:06:07 UTC
@peter i think y460 with optimus nvidia card have the simillar problem,i have report somethings in your bbsswitch project.but i don't how to mention my problem. so how i can process to find if i have the same problem you are mention.
Comment 4 Peter Wu 2012-02-05 10:00:24 UTC
If you have the IdeaPad Y470 or IdeaPad Y570, you are affected for sure. Until this bug is fixed, you can use the ugly hack mentioned at https://github.com/Bumblebee-Project/bbswitch/issues/2#issuecomment-3797568
Comment 5 Peter Wu 2012-06-03 14:23:04 UTC
Created attachment 73502 [details] acpidump for Toshiba Satellite P870 Another affected machine, a Toshiba Satellite P870 this time, running on 3.3.7. https://github.com/Bumblebee-Project/Bumblebee/issues/173 Mapping of PCI Bus ID to their ACPI handles: 0000:00:01.0 060400 \_SB_.PCI0.PEG0 0000:00:02.0 030000 \_SB_.PCI0.GFX0 0000:01:00.0 030000 \_SB_.PCI0.PEG0.VGA_ The correct _ROM handle for the nvidia device exists on \_SB.PCI0.PEG0.PEGP. dmidecode details: baseboard-manufacturer: TOSHIBA baseboard-product-name: Portable PC baseboard-version : MP system-manufacturer : TOSHIBA system-product-name : SATELLITE P870 system-version : PSPLBE-01V00HFR bios-vendor : Insyde Corp. bios-version : 1.10 bios-release-date : 03/21/2012
Comment 6 Peter Wu 2012-11-11 00:31:56 UTC
So let's check this again (the below is against 3.7). - acpi_scan_init - acpi_bus_scan - acpi_bus_check_add - acpi_add_single_object - acpi_device_set_id - if (acpi_is_video_device(device)) acpi_add_id(device, ACPI_VIDEO_HID) // add "LNXVIDEO" to PNP ids The following video devices are detected (incomplete list): - \_SB.PCI0.PEG0.PEGP due to _ROM - \_SB.PCI0.PEG0.VGA due to _DOS Now, the part that is responsible for setting archdata.acpi_handle (drivers/acpi/glue.c). Let's assume the nvidia PCI device at 01:00.0. - ... - pci_bus_add_device - device_add - platform_notify points to acpi_platform_notify on x86 and ia64 - type = acpi_get_bus_type() (returns acpi_pnp_bus) - type->find_device() calls acpi_pnp_find_device - loop through all ACPI bus things? - acpi_pnp_match: find the first (?) unbound device with its PNP id list containing "LNXVIDEO" - if a handle was found, call acpi_bind_one which basically sets dev->archdata.acpi_handle (please correct me if I'm wrong, having a "struct acpi_bus_type" and a "acpi_bus_type" of type "struct bus_type" (in drivers/acpi/scan.c) is not helpful) Looking at a log from https://lists.launchpad.net/bumblebee/msg00069.html, the "video" module seems to be able to find the correct handle: [ 8.935976] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/device:2f/LNXVIDEO:00/input/input6 [ 8.936011] ACPI: Video Device [PEGP] (multi-head: no rom: yes post: no) [ 8.938801] acpi device:3d: registered as cooling_device11 [ 8.939038] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A08:00/LNXVIDEO:01/input/input7 [ 8.939091] ACPI: Video Device [GFX0] (multi-head: yes rom: no post: no) [ 8.939127] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 Request for owners of these laptops (notable IdeaPad Y570), please attach the output of the below, without applying the hack (one affected machine is enough): (cd /sys/devices/LNXSYSTM:00/device:00;grep --include=path -r .) ls -l /sys/bus/acpi/drivers/video dmesg If this does not give useful information, then the next step would be getting a kernel log with ACPI debugging enabled.
Comment 7 Peter Wu 2012-11-11 22:25:51 UTC
Created attachment 86101 [details] ACPI-return-first-_ADR-match-for-acpi_get_child.patch Looking at some logs that I have received, this has nothing to do with PNP which gets detected properly. So, it must be PCI. Digging further on that and with feedback from a Lenovo Ideapad Y570 user, I managed to track down the issue. Please test this patch, comments are included.
Comment 8 Peter Wu 2012-11-12 22:29:47 UTC
Confirmed to work for a Lenovo Ideapad Y480 https://github.com/Bumblebee-Project/bbswitch/issues/2#issuecomment-10273482 I have forwarded the patch with an update commit message to the ACPI maintainers.
Comment 9 Zhang Rui 2012-11-13 08:35:29 UTC
to me, the problem is that pnp_bus_type should not bind a device by just checking the pnpid. I think we should set pnp_dev->dev.acpi_handle directly when creating the pnp devices. and in acpi_pnp_match(), we should comparing the acpi_handle rather than comparing the pnp_ids. rafael, what do you think?
Comment 10 Peter Wu 2012-11-13 10:09:27 UTC
@Zhang, the PNP ID<->ACPI handle mapping is correctly performed in this bug, but the PCI Bus ID <-> ACPI handle gets misdetected.
Comment 11 Zhang Rui 2012-11-21 01:54:26 UTC
(In reply to comment #0) > On the Lenovo Ideapad Y470 and Lenovo Ideapad Y570, the kernel assigns the > \_SB.PCI0.PEG0.VGA handle to the PCI device (possibly because the _DOS method > is found on the handle?) > > However, the correct _DSM, _ROM, _PS0 and _PS3 can be found on the > \_SB.PCI0.PEG0.PEGP handle. _PSx on the VGA handle is basically a no-op, it > only gets/sets the _PSC variable. _ROM does not exist and the _DSM method is > not useful comparing it to the PEGP._DSM method. > > Due to this issue, the nouveau driver fails to load the vbios (because _ROM > does not exist on VGA). for this Lenovo laptop, what is the device node that the nouveau driver binds? say, /sys/bus/pci/...
Comment 12 Peter Wu 2012-11-21 08:50:23 UTC
What device node are you referring to, the PCI device 0000:01:00.0 that nouveau tries to use?
Comment 13 Zhang Rui 2012-11-23 07:24:39 UTC
yes. please attach the output of lspci and "ls /sys/bus/pci/drivers/nouveau/".
Comment 14 Giorgio 2012-11-23 08:16:26 UTC
seems to work for my y580
Comment 15 Rafael J. Wysocki 2012-12-26 21:42:22 UTC
Created attachment 89721 [details] PCI / ACPI: Rework ACPI device node objects lookup Can you please check if the attached patch makes a difference?
Comment 16 Rafael J. Wysocki 2012-12-26 21:43:23 UTC
Sorry, wrong patch.
Comment 17 Rafael J. Wysocki 2012-12-26 21:44:29 UTC
Created attachment 89731 [details] PCI / ACPI: Rework ACPI device node objects lookup This one should apply on top of v3.8-rc1 (or current Linus' tree).
Comment 18 Peter Wu 2012-12-28 18:39:36 UTC
The patch from comment 17 has been confirmed to work : TheSiege wrote: > I tested the patch mentioned in c17 from the bug with rc-1 > it still leaves me with a working optirun Also interesting to note is that the machine does not boot at all without the patch : > yes this kernel has no previous patches or hack; and when > I revert the patch my kernel can't even boot : https://github.com/Bumblebee-Project/bbswitch/issues/2#issuecomment-11711656
Comment 19 Rafael J. Wysocki 2012-12-28 19:49:18 UTC
Thanks for testing. Well, OK. Let's try to push it, then.
Comment 20 Aaron Lu 2013-03-06 06:30:01 UTC
According to https://lkml.org/lkml/2013/1/23/451, this patch needs more discussion.
Comment 21 Philip 2013-03-10 18:34:16 UTC
I have a new Toshiba P870 and it seems to have a similar problem. How may I help confirm that this problem with systems released a year and a half ago is still a problem with new systems? I find that bumblebee's optirun works if I use Gary Gatling's patched kernels, but not the fedora packaged ones. BIOS Information Vendor: Insyde Corp. Version: 6.30 Release Date: 01/17/2013 System Information Manufacturer: TOSHIBA Product Name: Satellite P870 Version: PSPLFU-039011 Base Board Information Manufacturer: TOSHIBA Product Name: Portable PC Version: MP What other info would be helpful?
Comment 22 Peter Wu 2013-03-15 12:42:58 UTC
Rafael, it seems that you have pushed a change that fixes this issue (reported by a user of 3.9-rc2), essentially doing the same as comment 7: commit 33f767d767e9a684e9cd60704d4c049a2014c8d5 Author: Rafael J. Wysocki <email@example.com> Date: Thu Jan 10 13:13:49 2013 +0100 ACPI: Rework acpi_get_child() to be more efficient Observe that acpi_get_child() doesn't need to use the helper struct acpi_find_child structure and change it to work without it. Also, using acpi_get_object_info() to get the output of _ADR for the given device is overkill, because that function does much more than just evaluating _ADR (let alone the additional memory allocation done by it). Moreover, acpi_get_child() doesn't need to loop any more once it has found a matching handle, so make it stop in that case. To prevent the results from changing, make it use do_acpi_find_child() as a post-order callback. Signed-off-by: Rafael J. Wysocki <firstname.lastname@example.org> I cite a mail from Len: On Friday 16 November 2012 11:25:47 Len Brown wrote: > Peter, > > It is great that you debugged this issue > and proved where the problem is. > > However, this patch can't possibly be the right way to go -- > as it is just as broken as the code it replaces. > Were I to bet, I'd say that it will break as many machines > as it fixes. And when it does, where are we? > > Clearly we need to be using a more clever search algorithm. > > thanks, > Len Brown, Intel Open Source Technology Center So, apparently the bug is fixed in a correct way now? If another user can confirm it here, I'll mark it as resolved. : https://github.com/Bumblebee-Project/bbswitch/issues/2#issuecomment-14939587
Comment 23 Aaron Lu 2013-03-26 07:59:20 UTC
Hi Philip, If you have the same problem, can you please verify if 3.9-rc2 fixed your problem as suggested by Peter? Thanks.
Comment 24 Peter Wu 2013-03-29 09:21:17 UTC
I haven't personally verified it myself, but it has been confirmed by two people. Fixed in Linux 3.9-rc2 and 3.8.5.
Comment 25 Aaron Lu 2013-03-29 14:59:47 UTC
Thanks Peter for the update. The below commit fixed the problem and has entered Linus tree as of v3.9-rc1. commit 33f767d767e9a684e9cd60704d4c049a2014c8d5 Author: Rafael J. Wysocki <email@example.com> Date: Thu Jan 10 13:13:49 2013 +0100 ACPI: Rework acpi_get_child() to be more efficient
Comment 26 Rafael J. Wysocki 2013-07-24 01:15:10 UTC
(In reply to Peter from comment #7) > Created attachment 86101 [details] > ACPI-return-first-_ADR-match-for-acpi_get_child.patch > > Looking at some logs that I have received, this has nothing to do with PNP > which gets detected properly. So, it must be PCI. Digging further on that > and with feedback from a Lenovo Ideapad Y570 user, I managed to track down > the issue. I don't think you've ever explained what exactly you tracked the issue down to, which is kind of important in the context of bug #60561, so can you please tell me?
Comment 27 Peter Wu 2013-07-24 09:30:12 UTC
The original issue was that while iterating the list of devices, the last match would be returned. It seemed logical to me that the first result is immediately returned for efficiency reasons. Hence I suggested to return on the first match instead of continuing the iteration and ending up at the last device. Using a dummy module that walked on the parent of the PCI video device 0000:01:00.0, (acpi_walk_namespace(ACPI_TYPE_DEVICE, parent, 1, find_child, NULL, NULL, NULL);) resulted in the following: [ 364.003582] walk: Walking through all handles... [ 364.003679] walk: Address: 00000000 (valid); handle: \_SB_.PCI0.PEG0.PEGP [ 364.003784] walk: Address: 00000001 (valid); handle: \_SB_.PCI0.PEG0.VGA1 [ 364.003872] walk: Address: 00000000 (valid); handle: \_SB_.PCI0.PEG0.VGA_ [ 364.003882] walk: Walked through all handles Here, there are valid _ADR methods unlike in bug 60561.
Comment 28 Peter Wu 2013-07-24 09:37:05 UTC
Clarification, 0000:01:00.0 is the PCI video device, its parent (PCI Express Root port PEG0) is at 0000:00:01.0.
Comment 29 xanm 2013-10-06 04:13:12 UTC
Looks like i have this problem on: MacBookPro3,1 Processor 2.4 GHz Intel Core 2 Duo Memory 4 GB 667 MHz DDR2 SDRAM Graphics NVIDIA GeForce 8600M GT 256 MB with nvidia driver at first i had: [ 163.212919] nvidia 0000:01:00.0: irq 48 for MSI/MSI-X [ 163.224747] NVRM: failed to copy vbios to system memory. [ 163.224919] NVRM: RmInitAdapter failed! (0x30:0xffffffff:720) [ 163.224925] NVRM: rm_init_adapter(0) failed and after some research i found this nv_acpi_rom_method: failed to evaluate _ROM method! nouveau driver doesn't work either
Comment 30 Peter Wu 2013-10-06 08:59:31 UTC
@xanm Please fill a new bug and post logs of the nouveau driver instead of the closed-source nvidia. Your machine is so old that it is unlikely to have something to do with this bug (apart from a regression). While creating a new bug, please include: - full dmesg with nouveau - The file generated by: sudo acpidump > acpidump.txt