The bbswitch kernel module from https://github.com/Bumblebee-Project turns off hybrid optimus nvidia cards when they are not being used. However, after commit 1cc0c998fdf2cb665d625fb565a0d6db5c81c639 (ACPI: Fix D3hot v D3cold confusion) bbswitch can no longer turn off the NVIDIA 540M video card on my Dell XPS15 (L502x). Reverting the commit fixes the problem. http://lists.opensuse.org/opensuse-kernel/2012-05/msg00039.html looks like the same issue on another PC.
There are three commits between rc1 and rc6. Commit 37239978778806ecba54da60676abb46870acebb removes setting the valid flag when _PSx is found (where x is a state between 0 and 3). 3ebc81b8937d2bc1d0d0064bed29434dfce490aa is not particularly interesting, it updates the validation for a state (states greater than D3cold are nonsense) and would add a check that is removed in the next commit. Part of the next commit 1cc0c998fdf2cb665d625fb565a0d6db5c81c639 diff (indentation removed for formatting) - /* State is valid if we have some power control */ - if (ps->resources.count || ps->flags.explicit_set) + /* + * State is valid if there are means to put the device into it. + * D3hot is only valid if _PR3 present. + */ + if (ps->resources.count || + (ps->flags.explicit_set && i < ACPI_STATE_D3_HOT)) I think that the last line is wrong. Shouldn't it be "i <= ACPI_STATE_D3_HOT"? (note the equality sign). With the current code, D3cold is ignored (okay I guess), but D3hot is ignored too for all devices without a power resource for the given state. Can you try changing "<" to "<=" as described?
Nicely spotted! Changing it to "i <= ACPI_STATE_D3_HOT" also fixes the problem. I wonder if the ACPI devs see this bug mailing list? I think subsystems use bugzilla, but others don't.
See Documentation/SubmittingPatches and send a correcting fix.
Created attachment 73262 [details] acpi: fix regression on D3hot detection Well uhh, I'm not sure how exactly the mail should look like, so I'll put the patch here if you don't mind. @rocko, I've put your name and address in the patch, assuming you do not mind.
Created attachment 73303 [details] PCI / ACPI / PM: Restore the old behavior of acpi_pci_set_power_state() Does the attached patch fix the problem too?
The patch in comment #5 does *not* fix the problem (ie after I reverted the patch in #4).
Rafael, the nvidia card can be disabled through the _PS3 method and is done with `pci_set_power_state(dis_dev, PCI_D3hot);` as you can see on https://github.com/Bumblebee-Project/bbswitch/blob/d12ce73a0d042fce3738aea800f55f8a23874e8c/bbswitch.c#L240 switcheroo does something similar. I think I have found the real issue. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=drivers/acpi/scan.c;h=7417267e88fa9763617e0cc7e36fde6154ff4335;hb=1cc0c998fdf2cb665d625fb565a0d6db5c81c639#l906 905 /* Set defaults for D0 and D3 states (always valid) */ 906 device->power.states[ACPI_STATE_D0].flags.valid = 1; 907 device->power.states[ACPI_STATE_D0].power = 100; 908 device->power.states[ACPI_STATE_D3].flags.valid = 1; 909 device->power.states[ACPI_STATE_D3].power = 0; With the change, ACPI_STATE_D3 has become "4": include/acpi/actypes.h:502:#define ACPI_STATE_D3_HOT (u8) 3 include/acpi/actypes.h:503:#define ACPI_STATE_D3 (u8) 4 include/acpi/actypes.h:504:#define ACPI_STATE_D3_COLD ACPI_STATE_D3 @rocko, can you revert the patches and try ACPI_STATE_D3_HOT instead of ACPI_STATE_D3 for those lines?
Created attachment 73308 [details] another patch that fixes the issue @Peter, yes changing to ACPI_STATE_D3_HOT as you suggested does fix the problem. I've attached the output of git diff just to be clear what I changed.
Well, this is not a correct fix. I would prefer it if you didn't guess things, but tried to understand what really happened instead. :-) acpi_bus_get_power_flags() works as intended, ie. it only sets the "valid" flag for ACPI_STATE_D3_HOT if _PR3 is present, so the problem has to be elsewhere. So, let's see why it still doesn't work with the patch from comment #5 applied. * pci_set_power_state(dis_dev, PCI_D3hot) is called. * __pci_complete_power_transition(dev, PCI_D3hot) is called. * pci_platform_power_transition(dev, PCI_D3hot) is called. * platform_pci_set_power_state(dev, PCI_D3hot) is called. This calls pci_platform_pm->set_state() with the same arguments, which for ACPI points to acpi_pci_set_power_state(). * acpi_pci_set_power_state(dev, PCI_D3hot) is called. With the patch from comment #5 applied it choses ACPI_STATE_D3 as the argument for acpi_bus_set_power(), which calls __acpi_bus_set_power(device, ACPI_STATE_D3), where ACPI_STATE_D3 == 4 as you have observed already. * Thus object_name[] contains the string "_PS4", which obviously doesn't represent anything meaningful, right? So, in addition to the patch in comment #5 we need the following change in __acpi_bus_set_power(): Index: linux/drivers/acpi/bus.c =================================================================== --- linux.orig/drivers/acpi/bus.c +++ linux/drivers/acpi/bus.c @@ -250,6 +250,10 @@ static int __acpi_bus_set_power(struct a return -ENODEV; } + /* _PS3 should be executed for transitions into D3cold. */ + if (state == ACPI_STATE_D3_COLD) + object_name[3] = '3'; + /* * Transition Power * ---------------- I'll attach an updated patch in a while.
Created attachment 73314 [details] ACPI / PCI / PM: Fix device PM regression related to D3cold Updated patch from comment #5, please test.
Created attachment 73315 [details] ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold Small modification: it only is necessary to check for state == D3cold if we're going into a deeper state. Please test!
Created attachment 73316 [details] Fix for calling _PS3 for D3cold Rafael, that was indeed a guess. Before trying to cook another patch I started understanding what you were exactly trying to achieve with your patch by looking at the spec and the related subsystems. Funny enough my idea for a fix was the same as yours, setting the state to '3' explicitly. Unfortunately, this did not work. Looking further, this makes sense since there is also a check before the _PSx method is executed: if (device->power.states[state].flags.explicit_set) { explicit_set is 1 if _PSx was detected in acpi_bus_get_power_flags. The attached patch is tested and fixes this (three lines to satisfy the kernel coding style). A quick grep showed that PCI_D3hot is used where "D3" is meant, i.e. "do not care about D3hot or D3cold, just save power by putting the device off". So yes, for compatibility reasons, it's probably better to keep the D3hot state as it for now.
Created attachment 73317 [details] ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold Right. As it turns out, I had one more ACPI patch applied and it fixed that last problem. I'd prefer to do it like in this attached patch. Can you please test if it works for you still?
I wish I could say yes, but it fails with: ACPI: Device [PEGP] failed to transition to D4 Looking at your changes, it became clear that code in __acpi_bus_set_power is wrongly placed. It should be placed in the statement following the "state < device->power.state" condition, not the else-statement after. Otherwise, it causes the function to call _PS4 (fails). Now this fixes it at boot, but on suspend/resume I get the error again if bbswitch tries to put the device back in D3 state. bbswitch: 1. Before suspend, it turns the card back on: bbswitch_on(). 2. bbswitch_on() calls pci_set_power_state(dis_dev, PCI_D0); somewhere 3. After resume, it turns the card back off if necessary: bbswitch_off(). 4. bbswitch_off() calls pci_set_power_state(dis_dev, PCI_D3hot); Before the D3 changes, I had no issues with suspend/resume. Perhaps there are more places where ACPI_STATE_D3 (4) should become .._HOT (3)? Do you want be to test the suspend/resume process while setting object_name[3] = '3' earlier (i.e. right before the if/else as in my last patch)?
Hmm, have I been drunk or something? Putting it in the if is wrong as it should indeed be called only "if target state is higher than current state". Perhaps there is a missing validation somewhere, something that assumed only one D3 state (comparison with ACPI_STATE_D3?). I'll dig again, any ideas / patches are appreciated.
I see one possibly (unrelated) error: acpi_bus_update_power first calls acpi_bus_get_power and then tries to acpi_bus_set_power with the state returned by the getter. But acpi_bus_get_power calls _PSC which returns at most 3 for D3, it does not make a distinction between D3hot and D3cold. On the 258 acpidumps I checked, there is no \_SB._OSC method that could advertise both D3hot and D3 support, not does it have any _PR3 method as stated in table 6-147 of the ACPI Spec v5: > Field name: _PR3 support > This bit is set if OSPM supports reading _PR3and using power > resources to switch power. Note this handshake translates to an > operating model that the platform and OSPM supports both the power > model containing both D3hot and D3. From sect 2.4 Sleeping State Definitions of the ACPI spec v5: > The D3hot state differs from the D3 state in two distinct parameters; the > main power rail is present and software can access a device in D3hot. For > devices that support both D3hot and D3 exposed to OSPM via _PR3, device > software/drivers must always assume OSPM will target D3and must > assume device context will be lost. Can't we stick to one D3 state then and add a flag for support of D3hot?
(In reply to comment #14) > I wish I could say yes, but it fails with: > > ACPI: Device [PEGP] failed to transition to D4 > I suppose that the source of the error message is acpi_power_transition(), which does some wrong things for D3cold (state == 4). I'll attach an updated patch in a little while.
Created attachment 73320 [details] ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold Please test this one.
(In reply to comment #16) > I see one possibly (unrelated) error: > acpi_bus_update_power first calls acpi_bus_get_power and then tries to > acpi_bus_set_power with the state returned by the getter. > But acpi_bus_get_power calls _PSC which returns at most 3 for D3, it does not > make a distinction between D3hot and D3cold. I don't think this is a big deal at the moment, but we need to fix this. Thanks for spotting it! > On the 258 acpidumps I checked, there is no \_SB._OSC method that could > advertise both D3hot and D3 support, not does it have any _PR3 method as > stated > in table 6-147 of the ACPI Spec v5: > > Field name: _PR3 support > > This bit is set if OSPM supports reading _PR3and using power > > resources to switch power. Note this handshake translates to an > > operating model that the platform and OSPM supports both the power > > model containing both D3hot and D3. > > From sect 2.4 Sleeping State Definitions of the ACPI spec v5: > > The D3hot state differs from the D3 state in two distinct parameters; the > > main power rail is present and software can access a device in D3hot. For > > devices that support both D3hot and D3 exposed to OSPM via _PR3, device > > software/drivers must always assume OSPM will target D3and must > > assume device context will be lost. > Can't we stick to one D3 state then and add a flag for support of D3hot? We might, but for that we'd need to revert some commits that have dependencies on top. Besides, D3hot and D3cold are supposed to be different _states_. My goal for now is to make pci_set_power_state(PCI_D3hot) work as before and we can take care of the other things (like comments printing D4 etc.) later.
It still does not work, not on boot nor after suspend/resume. Still the same erorr. In case you are interested, I have checked the property of the PCI device (a nvidia VGA controller) that I am trying to turn off (after trying to bring it in D3hot which is at the moment an alias to D3cold): struct pci_dev *pdev; /* assume that pdev contains VGA controller dev */ pdev->current_state == 3 I can add some debugging messages arround, what would you like to know?
Created attachment 73321 [details] ACPI: Debug acpi_bus_set_power() Please apply this patch on top of the previous one and grep dmesg for the messages added by it.
I added a bit more debugging, the "transitioning from D..." message comes right before the if condition "state < device->power.state", acpi.debug_layer=0x550000: [ 16.968478] ACPI: transitioning from D255 to D4 [ 16.968485] ACPI: Executed _PS4 for device [PEGP], status: 5 [ 16.968487] ACPI: Device [PEGP] failed to transition to D4 [ 17.007702] utils-0286 [00] evaluate_integer : Return value [0] [ 17.123579] utils-0286 [00] evaluate_integer : Return value [1] [ 17.175566] utils-0286 [00] evaluate_integer : Return value [28] The same message occurs after resume. It seems that initially the power state is (unsigned char)-1 and therefore it thinks that it's greater than the target state D3. Since machines with _PR3 are uncommon (I haven't seen one so far), this issue could be hidden for a long time (without power resources, the end result is the same). Again, I'm happy to test any patches :)
(In reply to comment #22) > I added a bit more debugging, the "transitioning from D..." message comes > right > before the if condition "state < device->power.state", > acpi.debug_layer=0x550000: > > [ 16.968478] ACPI: transitioning from D255 to D4 Ah. > [ 16.968485] ACPI: Executed _PS4 for device [PEGP], status: 5 So the case is that the device is being transitioned directly from ACPI_STATE_UNKNOWN to D3cold, which isn't going to work. This is a result of an initialization failure. Most likely, __acpi_bus_get_power() fails for your device for some reason and I'd like to figure out why. In the meantime, I'll attach another patch for you to try.
Created attachment 73322 [details] ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold In this patch the method name for D3cold is always set to _PS3, regardless of the direction of the transition.
Thanks, your new patch fixes the bugs. The bbswitch module works properly on boot/resume. Compared to 3.3.6, I found one difference when grepping for err|warn, previously I had the next line: ACPI: Video Device [PEGP] (multi-head: no rom: yes post: no) now it shows: video: probe of LNXVIDEO:00 failed with error 5 On the 3.3.6 kernel, /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/LNXVIDEO:00/uevent shows: DRIVER=video MODALIAS=acpi:LNXVIDEO: on the new 3.4-rc7 kernel, this is missing. The ACPI handle as shown through the path property is \_SB_.PCI0.P0P2.PEGP. acpidump is available at https://bugs.launchpad.net/lpbugreporter/+bug/752542/comments/488 I am not sure if it is related to this bug, but do not notice any missing functionality as a result of this error either.
(In reply to comment #25) > Thanks, your new patch fixes the bugs. The bbswitch module works properly on > boot/resume. Good, thanks for your patience! > Compared to 3.3.6, I found one difference when grepping for err|warn, > previously I had the next line: > > ACPI: Video Device [PEGP] (multi-head: no rom: yes post: no) > now it shows: > > video: probe of LNXVIDEO:00 failed with error 5 > > On the 3.3.6 kernel, > /sys/devices/LNXSYSTM:00/device:00/PNP0A08:00/device:01/LNXVIDEO:00/uevent > shows: > > DRIVER=video > MODALIAS=acpi:LNXVIDEO: > on the new 3.4-rc7 kernel, this is missing. The ACPI handle as shown through > the path property is \_SB_.PCI0.P0P2.PEGP. acpidump is available at > https://bugs.launchpad.net/lpbugreporter/+bug/752542/comments/488 > > I am not sure if it is related to this bug, but do not notice any missing > functionality as a result of this error either. This is a different problem, but I think it's worth looking into. Can you please create a separate bug entry for it and let me know the number?
Wow, that made it into the kernel git tree quickly! The patches fixes the issue on my laptop too, thanks.
@Rafael, seems that it's caused by: commit ea9f8856bd6d4ed45885b06a338f7362cd6c60e5 Author: Igor Murzov <intergalactic.anonymous@gmail.com> Date: Fri Mar 30 21:32:08 2012 +0400 ACPI video: Harden video bus adding. It is always better to check return values, so add some new checks and correct existing ones. v2: Be consistent and don't mix errors from -E* and AE_* namespaces. Signed-off-by: Igor Murzov <e-mail@date.by> Signed-off-by: Len Brown <len.brown@intel.com> In particular: @@ -1343,15 +1343,17 @@ static int acpi_video_bus_get_devices(struct acpi_video_bus *video, struct acpi_device *device) { - int status = 0; + int status; struct acpi_device *dev; - acpi_video_device_enumerate(video); + status = acpi_video_device_enumerate(video); + if (status) + return status; And indeed, previously I got an ACPI exception and now I still do, but with addition of the probe failure: ACPI Exception: AE_NOT_FOUND, Evaluating _DOD (20120320/video-1147) video: probe of LNXVIDEO:00 failed with error 5 So I think this is the expected behaviour?
Yes, it is, but I still wonder why acpi_bus_init_power() doesn't get you the correct power state.
Where do you conclude that from? Note that the other files (lspci.txt) in that tarball may not be that accurate as devices may have been disabled manually.
5c7dd710f691d1b44c39e32d2f05b4286ff51f99 (ACPI / PCI / PM: Fix device PM regression related to D3hot/D3cold) shipped in linux-3.4 closed. the video discussion should go to another forum or bug report.