Bug 194889
Summary: | Bisected: BBSwitch issue - gets incorrect acpi handle returned for nvida card | ||
---|---|---|---|
Product: | ACPI | Reporter: | mike |
Component: | Config-Other | Assignee: | Rafael J. Wysocki (rjw) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | jmatic, kernel, luis.zaldivar, lv.zheng, rjw, rui.zhang |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.10.1 4.10.2 4.10.3 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg for 4.10.2
dmesg for 4.9.14 - working acpidump lspci -vv lspci -vvx grep . /sys/bus/acpi/devices/*/modalias /sys/bus/acpi/devices/*/path lspci -vvx PATCH: Introduce new flag for devices with _HID/_CID Debug patch against glue.c |
Created attachment 255257 [details]
dmesg for 4.9.14 - working
Created attachment 255259 [details]
acpidump
Created attachment 255261 [details]
lspci -vv
I should say that obviously this breaks BBSwitch and it's ability to switch off the nvidia card. Also, though possibly unrelated, Bumblebee does not function with 4.10.1/2. While it works fine in 4.9.14 and before. There are a few other reports on ARCH forums and BUG Tracker with similar issues Marking this as duplicate due to: [ 6.093918] lenovoU430 kernel: bbswitch: failed to evaluate \_SB_.PCI0.RP05.PXSX._DSM {0xF8,0xD8,0x86,0xA4,0xDA,0x0B,0x1B,0x47,0xA7,0x2B,0x60,0x42,0xA6,0xB5,0xBE,0xE0} 0x100 0x0 {0x00,0x00,0x00,0x00}: AE_NOT_FOUND [ 6.093925] lenovoU430 kernel: bbswitch: failed to evaluate \_SB_.PCI0.RP05.PXSX._DSM {0xA0,0xA0,0x95,0x9D,0x60,0x00,0x48,0x4D,0xB3,0x4D,0x7E,0x5F,0xEA,0x12,0x9F,0xD4} 0x102 0x0 {0x00,0x00,0x00,0x00}: AE_NOT_FOUND [ 6.093935] lenovoU430 kernel: ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160930/nsarguments-95) *** This bug has been marked as a duplicate of bug 194431 *** Not a duplicate as far as I can tell. My issue isn't the _DSM calls, it's that the wrong acpi handle gets returned to bbswitch prior to this. When the correct handle is returned and used, as with 4.9.x, bbswitch functions correctly. An error is correctly thrown for \_SB.PCI0.GFX0._DSM as that's the Intel card. It is also correct for \_SB_.PCI0.RP05.PXSX._DSM as this is an invalid handle. The PXSX should be PEGP. If I manually make that call with acpi_call with the correct handle, I get no error and a good response. OK. Reopening and checking the logs. I'm having similar problems on acer aspire v5. Possibly related to https://bugzilla.kernel.org/show_bug.cgi?id=60829 ? Is that possible to bisect the culprit out? I checked the decompiled tables. PEGP is defined in ssdt5: DefinitionBlock ("", "SSDT", 1, "LENOVO", "CB-01 ", 0x00000001) { ... Scope (\_SB.PCI0.RP05) { Device (PEGP) { Name (_ADR, Zero) // _ADR: Address Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake { Return (GPRW) /* External reference */ 0x09 0x04 } } } It seems in DSDT, every RPxx device contains a PXSX as follows: Device (PXSX) { Name (_ADR, Zero) // _ADR: Address Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake { Return (GPRW (0x69, 0x04)) } } However I don't know whether BBSwitch chose PXSX rather than PEGP. It looks it's BBSwitch specific logic: [ 6.093893] lenovoU430 kernel: bbswitch: version 0.8 [ 6.093899] lenovoU430 kernel: bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0 [ 6.093906] lenovoU430 kernel: bbswitch: Found discrete VGA device 0000:09:00.0: \_SB_.PCI0.RP05.PXSX [ 6.093918] lenovoU430 kernel: bbswitch: failed to evaluate \_SB_.PCI0.RP05.PXSX._DSM {0xF8,0xD8,0x86,0xA4,0xDA,0x0B,0x1B,0x47,0xA7,0x2B,0x60,0x42,0xA6,0xB5,0xBE,0xE0} 0x100 0x0 {0x00,0x00,0x00,0x00}: AE_NOT_FOUND [ 6.093925] lenovoU430 kernel: bbswitch: failed to evaluate \_SB_.PCI0.RP05.PXSX._DSM {0xA0,0xA0,0x95,0x9D,0x60,0x00,0x48,0x4D,0xB3,0x4D,0x7E,0x5F,0xEA,0x12,0x9F,0xD4} 0x102 0x0 {0x00,0x00,0x00,0x00}: AE_NOT_FOUND [ 6.093935] lenovoU430 kernel: ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160930/nsarguments-95) [ 6.093978] lenovoU430 kernel: bbswitch: No suitable _DSM call found. We don't have BBSwitch in the Linux kernel upstream. So why don't you ask support from BBSwitch community? Thanks Lv I think bbswitch just goes through pci devices with right pci class and uses ACPI_HANDLE to get the handle. I'm guessing ACPI_HANDLE returns the wrong handle. https://github.com/Bumblebee-Project/bbswitch/blob/master/bbswitch.c line 390 (In reply to Lv Zheng from comment #9) > Is that possible to bisect the culprit out? I'm going to have a look at doing this. Might be a little while to get through it on my laptop. (In reply to Matic from comment #8) > I'm having similar problems on acer aspire v5. Possibly related to > https://bugzilla.kernel.org/show_bug.cgi?id=60829 ? Certainly looks like it is to me. Interestingly, by going through git commits makes me suspect commit "c2a6bbaf0c5f90463a7011a295bbdb7e33c80b51 ACPI / scan: Prefer devices without _HID/_CID for _ADR matching" which is only one that seems to directly affect the patch from your bug since 4.9.x. Can confirm that commit "c2a6bbaf0c5f90463a7011a295bbdb7e33c80b51 ACPI / scan: Prefer devices without _HID/_CID for _ADR matching" is the cause of this. Not fully bisected, but took apportity between builds to rebuild arch mainline kernel with this reverted by commenting out the relevant change in glue.c line 108/9. return sta_present && list_empty(&adev->pnp.ids) ? FIND_CHILD_MAX_SCORE : FIND_CHILD_MIN_SCORE; becomes return sta_present /* && list_empty(&adev->pnp.ids) */ ? FIND_CHILD_MAX_SCORE : FIND_CHILD_MIN_SCORE; Then both bbswitch and bumblebee function correctly. The proper handle is returned. For completeness I tried return ( sta_present && list_empty(&adev->pnp.ids) ) ? FIND_CHILD_MAX_SCORE : FIND_CHILD_MIN_SCORE; in case this was being evaluated incorrectly, but this resulted in the original error. At the first glance, I think the problem is probably related to the device enumeration order. Say, there may be multiple devices (acpi handles) that match the logic in bbswitch_init(), and the first one matched will be returned. In this case, if kernel changes the device' enumeration order, it may result in different acpi handle got by bbswitch. anyway, please attach the lspci -vvx output. Scope (\_SB.PCI0.RP05) { Device (PEGP) { Name (_ADR, Zero) // _ADR: Address Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake { Return (GPRW) /* External reference */ 0x09 0x04 } } } Device (RP05) { ... Device (PXSX) { Name (_ADR, Zero) // _ADR: Address Method (_PRW, 0, NotSerialized) // _PRW: Power Resources for Wake { Return (GPRW (0x69, 0x04)) } } ... } First of all, both PXSX and PEGP are under RP05, but they have the same _ADR, this is clearly a violation of ACPI spec. Only \_SB.PCI0.RP05.PEGP has _DSM method, so PEGP is the right handle we should return, in order to make the bbswitch work. please attach the output of "grep . /sys/bus/acpi/devices/*/modalias /sys/bus/acpi/devices/*/path" > Both PXSX and PEGP are under RP05, but they have the same _ADR, this is
> clearly a violation of ACPI spec.
We need to think about a possiblity that the _ADR is not such critical in practice.
It looks this is not an isolated case related to _ADR.
Created attachment 255635 [details]
lspci -vvx
lspci -vvx
Kernel Arch 4.10.6
bbswitch not loaded, so nvidia card in default and powered state
Created attachment 255637 [details]
grep . /sys/bus/acpi/devices/*/modalias /sys/bus/acpi/devices/*/path
Created attachment 255639 [details]
lspci -vvx
kernel 4.10.6 Arch mainline
bbswitch not working and nvidia/nouveau modules not loaded. Card powered.
Redone with sudo.
/sys/bus/acpi/devices/LNXVIDEO:00/modalias:acpi:LNXVIDEO: /sys/bus/acpi/devices/LNXVIDEO:00/path:\_SB_.PCI0.RP05.PEGP This explains why the commit "c2a6bbaf0c5f90463a7011a295bbdb7e33c80b51 ACPI / scan: Prefer devices without _HID/_CID for _ADR matching" breaks bbswitch. Before the commit, return sta_present ? FIND_CHILD_MAX_SCORE : FIND_CHILD_MIN_SCORE; FIND_CHILD_MAX_SCORE is returned for PEGP. After the commit return sta_present && list_empty(&adev->pnp.ids) ? FIND_CHILD_MAX_SCORE : FIND_CHILD_MIN_SCORE; FIND_CHILD_MIN_SCORE is returned for PEGP because, as a video bus device, a fake pnpid "LNXVIDEO" is assigned to PEGP. So the problem is that we should not use list_empty(&adev->pnp.ids) to check the existence of _HID/_CID. A debug patch will attached later. Created attachment 255641 [details]
PATCH: Introduce new flag for devices with _HID/_CID
Please apply this patch on top of the bogus commit and see if it helps.
Patch works for me. Using arch 4.10.6 kernel. Doesn't seem to have caused any other issues so far. (In reply to Zhang Rui from comment #22) > Created attachment 255641 [details] > PATCH: Introduce new flag for devices with _HID/_CID > > Please apply this patch on top of the bogus commit and see if it helps. It is not bogus, but overly coarse grained. Anyway, thanks for looking into this. Created attachment 255645 [details]
Debug patch against glue.c
I'm wondering, though, if the new flag is really needed.
Something like this patch should suffice, but of course it needs to be tested on the system with the issue that the problematic commit attempted to address.
(In reply to Rafael J. Wysocki from comment #25) > Created attachment 255645 [details] > Debug patch against glue.c > > I'm wondering, though, if the new flag is really needed. I don't know the background of c2a6bbaf0c5f ("ACPI / scan: Prefer devices without _HID/_CID for _ADR matching"), is there a particular problem that this fixes? If checking _HID is sufficient for that case, surely we can make use of platform_id flag. (In reply to Zhang Rui from comment #26) > (In reply to Rafael J. Wysocki from comment #25) > > Created attachment 255645 [details] > > Debug patch against glue.c > > > > I'm wondering, though, if the new flag is really needed. > > I don't know the background of c2a6bbaf0c5f ("ACPI / scan: Prefer devices > without _HID/_CID for _ADR matching"), is there a particular problem that > this fixes? Yes, it does. Hence the Tested-by: tag. :-) (In reply to Rafael J. Wysocki from comment #25) > Created attachment 255645 [details] > Debug patch against glue.c > > I'm wondering, though, if the new flag is really needed. > > Something like this patch should suffice, but of course it needs to be > tested on the system with the issue that the problematic commit attempted to > address. This patch is working too. The patch from comment #25 has been submitted as https://patchwork.kernel.org/patch/9657331/ |
Created attachment 255255 [details] dmesg for 4.10.2 Hardware lenovo optimus laptop with 3D controller: NVIDIA Corporation GK208M [GeForce GT 730M]. Running Arch On kernel 4.9.x and before BBSwitch returns bbswitch: Found discrete VGA device 0000:09:00.0: \_SB_.PCI0.RP05.PEGP on kernel 4.10.1/2 BBSwitch returns bbswitch: Found discrete VGA device 0000:09:00.0: \_SB_.PCI0.RP05.PXSX Using acpi_call on 4.10.x shows that \_SB_.PCI0.RP05.PEGP is correct.