Kernel Bug Tracker – Bug 6892
ACPI loses PCI wakeup requests
Last modified: 2009-10-25 10:40:32 UTC
Most recent kernel where this bug did not occur: None
As discussed at OLS, on my Intel-based system the EHCI PCI controller generates
a PCI PME (Power Management Event) signal if a USB device is plugged in while
the controller is in D3. Presumably the PME signal causes an ACPI interrupt and
it should register as a PCI wakeup event, but nothing seems to happen. Attached
below is a commented transcript illustrating the problem.
If needed, I can attach this computer's DSDT table.
Created attachment 8608 [details]
Session showing lack of response to wakeup request
This transcript was made under vanilla 2.6.18-rc2 with CONFIG_ACPI_DEBUG set.
Yeah, I've seen this for a long time ... ever since I first tried remote
wakeup with USB, circa 2.6.10 ... it's not specific to EHCI, same thing
happens with OHCI. (And ISTR with UHCI, at least on VIA hardware, where
the PCI PM extensions are available; I don't think Intel UHCI does that.)
It's not entirely clear to me how PCI "wants" to handle runtime wakeup
events, but it's also clear that ACPI isn't doing anything useful with
them; the PME# indicator in the config space clearly indicates the wake
One obvious suggestion is to call the pci_driver.resume() method ... but
another approach would be to have a pci_driver.wakeup() method, and that
has the virtue that resume() would not be overloaded. However, I can't
really think of something a driver should do differently in a wakeup()
versus a resume() method, so I'd suggest just using resume().
RELATED: for every device capable of issuing wakeup events, ACPI should
be issuing a device_init_wakeup(&pdev->dev, 1) to set the driver model
please cat /proc/acpi/wakeup
$ cat /proc/acpi/wakeup
Device Sleep state Status
P0P1 4 disabled
UAR1 4 disabled
USB0 3 disabled
USB1 3 disabled
USB2 3 disabled
USB3 3 disabled
AC97 4 disabled
SLPB 4 * enabled
I'm not sure whether this is relevant. In principle any PCI device can raise a
PME# signal, whether it's on the motherboard or not. These signals need to be
handled, regardless of what ACPI knows about the device that raised them.
Could you clarify ... I thought /proc/acpi/wakeup was only supposed
to apply during system sleep states (standby/S1, STR/S3, swsusp/S4)
not during runtime/S0 ... since only during system sleep states has
acpi_pm_enter() been called, and thus ACPI told to use those wakeup
So I don't see why /proc/acpi/wakeup would affect runtime wakeup
events at all... (Of course, maybe that's part of the problem ...)
ISTR seeing the same problem with a boot script that activates wakeup
events for every device listed in that file, and if I get a chance I'll
try to redo that experiment soon.
Is this still a problem in linux-2.6.22.stable or later?
I have a question about the runtime wakeup.
who is responsible for the device runtime wakeup, ACPI or device native driver?
The current code disables all these wakeup gpe so that ACPI is not aware of any
wakeup events during runtime. And only enable them if needed before entering
system sleep state.
Thus I think the runtime wakeup events should be handled by the native device driver, right?
To Len: There is no longer any simple way to test the problem, since there is no user interface for doing a runtime suspend of a PCI device. I'll try adding such an interface for testing purposes and let you know what happens.
To Rui: ACPI is responsible. When a PCI device is in D3 it can't send any signals to the CPU except for PME#, which is routed to ACPI, not to the PCI driver. From what you say, it sounds like ACPI is wrong in disabling wakeup events while the system is running.
Created attachment 12992 [details]
Add PCI sysfs method for manual suspend/resume
Here's a quick patch adding a sysfs attribute file called "suspend" for PCI devices. Writing 1 to the file suspends the device and writine 0 to the file resumes it.
Created attachment 12993 [details]
Result of my testing
The attached file shows the result of my testing. When the PCI device is suspended with remote wakeup enabled, plugging in a USB device does activate the PME# wakeup signal. But the kernel does not respond and the controller remains suspended.
Now I don't know if this is really an ACPI problem or not. The controller generating PME# is not on the motherboard; it is part of an add-on PCI card. But the PME# signal belongs to the PCI bus and it should be handled properly no matter who generates it.
We could added a workaround in ACPI code to handle PME based GPE. Currently we have method to find a native device based on it's ACPI device, so in the GPE handler, we could do someting for the native device.
The problem is do we need a framework in device modle for such thing, for example each driver provides a .wakeup_notify, so in GPE handler we call the .wakeup_notify to notify driver.
PCIE native PME handling requires a similar mechanism.
I think adding a new pci_driver method -- wakeup(), pme(), or whatever -- is the right approach to handling runtime PME# signaling. That would be called by whatever handles the PME# signal.
So for example on bare hardware the PME# handler would scan PCI to see which devices report PME status, then call those devices' drivers. (Intel has various IXP platforms which are used that way with Linux, potentially handy for development...) That would need to wake up any parent devices which are in low power states.
And ACPI would probably need two mechanisms, if I understand correctly. They'd both key on whether the relevant devices have wakeup() methods. Devices in the southbridge probably have individual GPEs; if they can handle runtime wake events, their GPEs would be enabled when putting the device into a low power state. I found the ACPI spec unclear-to-silent about add-on cards, but I think the story there is that PCI bridges have a GPE for their PME# signal. If that GPE triggers, then the handler would do the same scanning done on bare hardware for non-ACPI systems.
The /proc/acpi/wakeup stuff should be irrelevant.
Some of the PCI/ACPI wakeup patches I've submitted, but which have not been merged (why??), would need to kick in here.
Created attachment 13257 [details]
native PME driver
I haven't try ACPI GPE based PME wakeup, but had an experiment to implement a PCIE native PME driver. With the driver, if a PCIE device invokes a PME, we will get a notify (interrupt), it appears ok in WOL test of E1000.
Currently I just print something in the notify. PM should have a generic policy and PME notify policy and policy decides what should be done.
Considering Linux hasn't a runtime power management framework, do you think it's the good time to discuss this issue, Alan & David?
Interesting example. Now I want to see it for normal PCI, which is all I have locally, and coexisting with ACPI. :)
Re: PM should have a generic policy ... that's what the current wakeup bits do; those policies are per-device, and based on what they can do in a given system. Drivers are responsive to that policy.
PCI drivers can use pci_enable_wake() in conjunction with device-specific capabilities such as which WOL mechanisms work, and system-specific ones such as what's wired up at what suspend levels on a given board. Non-PCI drivers, like platform drivers on many SOC chips, would do the relevant stuff for their environment.
Re: PME# notify policy: that's PCI-specific. It's what I referred to above as a wakeup() or pme() driver entry. Obviously a PCI driver wouldn't receive such notifications if it hasn't called issued a valid pci_enable_wake() call and and put the device in a low power state (since otherwise the PME# wouldn't be issued by that device).
I don't see much of a need for a runtime PM framework here. PCI would need some PCI-specific support, but that's nothing generic. There are SOC platforms which already have this sort of mechanism; it's driven by IRQs not magic driver entry points, which demonstrates that Linux doesn't need any more generic infrastructure for this to happen. Note that PME# is atypical in that it's an out-of-band mechanism. More usual is, as I commented, just having the normal IRQ handling framework, with enable_irq_wake() etc, report wakeup events.
I'm fine with solution of wakeup() or pme() in current stage.
The ACPI GPE based PME is a little complex, so need some time. But first we need pci_enable_wake hook into ACPI to enable GPE. David, I thought you have a patch for this, can you repost to the list and get it merged?
>Re: PM should have a generic policy ... that's what the current wakeup bits do;
There is one thing that I concern about.
When the wakeup bit is set, ACPI enables the wakeup GPE, so that PME can
wakeup the device at runtime, but it can also wakeup the system when the
system is in sleep state.
If a user want to enable the runtime wakeup support, he may find that the system may wakeup when it's in sleep state which is not expected,
so this is a regression. And that's why David's patch was dropped from -mm tree.
IMO, runtime wakeup and system sleep wakeup are semantically different while they are controlled be the same flag in Linux.
If we have a look at the Windows device manager, we can see that two config
options are available for a wakeup-aware device:
1."Allow the computer to turn off this device to save power"
2."Allow this device to bring the conputer out of standby"
Apparently they stand for runtime wakeup and system sleep wakeup. So I'm wondering if we need another wakeup flag for the system sleep wakeup control.
>The ACPI GPE based PME is a little complex, so need some time.
>But first we need pci_enable_wake hook into ACPI to enable GPE.
>David, I thought you have a patch for this, can you repost to the list and get it merged?
In the ACPI GPE based PME case, drivers should call pci_enable_wake once the wakeup bit is set, and the corresponding ACPI platform hook should enable the wakeup GPE as a result rather than only setting the wakeup flags for the corresponding ACPI device.
I was wondering why all those patches all got dropped. It's customary to let people know _why_ that's done...
It seems like one issue was a behavioral change: making devices wake by default. That would be easy to change, if necessary, though it would be IMO inadvisable. (And that would only affect _one_ of the several patches which were dropped.) Note also that you're wrong about this being a "regression". It could not be a "regression" unless there a bug was introduced. In this case, any such bug would necessarily be a latent one in some PCI device driver which couldn't handle the wake events it requested ... which previously would never arrive. I don't recall any bugs being reported to me (and you didn't mention any in your comment).
Another incorrect comment there is that "when the wakeup bit is set, ACPI enables the wakeup GPE so that PME can wakeup the device at runtime". Untrue, as evidenced by this bug report: there's no runtime wakeup support in ACPI. The only ACPI code enabling wakeup GPEs kicks in during system suspend. The point of the bug report is that the PCI mechanism to support runtime wakeups just doesn't work with ACPI.
I can't see a semantic difference in terms of the device. In both cases, the device issues a wake event. When a user expects a wake event to have an effect, the system state should be irrelevant. (The user should not need to know or care about the system state when using a peripheral. They may not even be able to know that information...) Expecting that to matter is needless complexity, and would violate the "principle of least surprise" as well as several other principles of good system design.
Re what MS-Windows does ... that should not be a significant issue for Linux. And I'll observe that "turn off device to save power" says nothing at all about wakeup, or about using *functional* device low power states. At best, that's a comment that MS-Windows is confused about runtime PM. I read it as just permission to put devices into ACPI D3 states, which might not be desirable in some cases because of concerns like "only BIOS knows how to turn it back on". Those concerns are not part of the design center for Linux-based open systems; at best they're bugs to work around.
Re "drivers should call pci_enable_wake" ... for now, only during system suspend state transitions. In the future, as part of entering runtime low power states. NEVER as a direct consequence of enabling wakeup through the driver model flag; for example, enabling wake from PCI_D0 could cause much confusion. Agreed that those ACPI flags should vanish, but the entire reason they exist seems to be because of Bug #6892 (this!) whereby the current ACPI stack ignores PME# except optionally during system sleep states.
Created attachment 13307 [details]
I have a patch set for the topic, which supports native PME and GPE based PME. Still not finished, but good enough for test. I tested two devices, EHCI which uses GPE and E1000 which uses native PME, both are ok.
Question: UHCI host controller hasn't PME capability, but I saw some GPEs are for such kind devices. How could it invokes a GPE?
Intel motherboards have a platform-specific mechanism for the onboard UHCI controllers to signal a GPE; it is enabled by certain bits in the PCI configuration space. This is documented in the data sheet for the motherboard's chipset. uhci-hcd ignores those bits, but it's possible that the ACPI tables and AML routines refer to them.
UHCI controllers from other manufacturers either do include PME support or else don't have any remote wakeup capability at all.
Created attachment 13426 [details]
acpi uses driver model flags
Current version of the patch dating from January or so. This basically initializes the driver model wakeup flags for ACPI devices which can source wakeup events, and uses those driver model flags during system suspend transitions.
I've not checked this patch recently without a handful of other patches also applied, but the most notable limitation will be that it only applies to devices which are found in ACPI tables. So it excludes, for example, add-on PCI cards.
Created attachment 13452 [details]
Updated patches to correctly implement GPE based PME, this one should work for all devices, including UHCI.
Currently I use /proc/acpi/wakeup interface, when the acpi driver model flags patch is in, I'll refresh it and then push.
I tried applying the patch series in comment #21. The result wasn't as good as it should have been. First I did "echo USB3 >/proc/acpi/wakeup", where USB3 is my EHCI controller. Then I suspended the controller:
[ 243.180204] ACPI: PCI interrupt for device 0000:00:1d.7 disabled
[ 243.193205] ehci_hcd 0000:00:1d.7: --> PCI D3/wakeup
Next I plugged in a USB device, which should cause a wakeup:
[ 256.647136] PCI device 0000:00:1d.7 invokes PME
[ 256.648159] Device 0000:00:1d.7 invoke GPE wakeup event
[ 256.648377] Device 0000:00:1d.7 invoke GPE wakeup event
... 95 repetitions elided ...
[ 256.658454] Device 0000:00:1d.7 invoke GPE wakeup event
At this point lspci showed that PME was on and PME-enable was off.
Then I resumed the controller by hand:
[ 357.135507] ehci_hcd 0000:00:1d.7: resume from PCI D3
[ 357.149740] PCI: Enabling device 0000:00:1d.7 (0000 -> 0002)
[ 357.149810] ACPI: PCI Interrupt 0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 17
[ 357.149933] PCI: Setting latency timer of device 0000:00:1d.7 to 64
[ 357.150015] PM: Writing back config space on device 0000:00:1d.7 at offset f (was 400, writing 40a)
[ 357.150120] PM: Writing back config space on device 0000:00:1d.7 at offset 4 (was 0, writing ffa7fc00)
[ 357.150206] PM: Writing back config space on device 0000:00:1d.7 at offset 1 (was 2900006, writing 2900106)
So there was some progress, but not enough. Were there any other patches I should have used?
Those ACPI patches don't seem to hook up to pci_enable_wake() or the driver model flags; much less provide an EHCI callback to activate the device!
You should not need to touch /proc/acpi/wakeup on an ACPI system, any more than you'd need to touch it for a non-ACPI system...
Things I noticed:
- Obviously, a hundred "invoke GPE wakeup" messages is excessive ... as if the PME-enable should have been disabled first.
- It's odd that the "PCI device X invokes PME" appears _after_ wakeup_event() was called, an oddity shared with other code ... it seems to only report errors (nonzero return) too, and is not a unique message (two files print the same text).
- The messaging should use driver model dev_*() calls pretty much everywhere.
I'll try to give it a try; with my current setup it'd go the ACPI-native route.
Yes, it doesn't hook into pci_enable_wake, I mentioned in comment #21 too.
>Device 0000:00:1d.7 invoke GPE wakeup event
I saw similar message, but just two or three. I did disable PME and clear PME status, but I need dig it up more to find the root cause. If everything is correct, this message shouldn't appear. At my test, PCIE native PME seems more reliable.
>The messaging should use driver model dev_*() calls pretty much everywhere.
Idealy we should call somthing like .resume, I just print out something in current stage.
The GPE flood issue is fixed by a patch in acpi-test tree. When the patch is in base, I'll try to send updated patches out.
Shaohua, what's the state of this bug?
I assume whatever patch was in acpi-test in January is in mainline now?
Yes, that patch is in base kernel. The real issue here is what's real usage model of this feature, considering we don't have runtime device suspend/resume support.
Not clear what you mean by "usage model". See for example Comment #2 re PCI; drivers need some kind of notification that their device(s) issued PME# (or other non-PCI wakeup signals, like the out-of-band scheme used by Intel's UHCI). It's not clear why that notification wouldn't just be calling the driver's resume() method; for now I'll assume that's what is used.
Lots of PCI PM stuff has changed recently. I've yet to make time to understand all the changes. Shaohua, do you mean to say that ACPI will now detect PME# events issued at runtime, clear them, and notify the device's driver?
IMO the standard parts of the model should not change: driver decides to enter low power state with wakeup enabled, enables the wakeup, changes the power state, then later gets a resume(). The same model *should* be used if the system is staying in S0 as if it's going into S3 (or any other S-state). And it shouldn't matter if ACPI is in use or not...
If you're talking about how/why a device enters a low power state, when it wasn't just told to suspend() as part of a system power state transition, that's a different question and is best left to the driver. Last I looked, network and USB drivers were the main ones in a position to leverage runtime wakeup events.
One reason this bug was filed is that USB has been in a position to use such mechanisms for some years now, but the PCI-over-ACPI infrastructure hasn't been there yet. (I noticed it wasn't working when I was testing the first USB PM support back on 2.6.9 kernels...) Today, a number of USB device drivers can handle runtime PM, so it's possible that entire trees of idle non-hub USB devices can enter low power states. Root hubs have, for some time now, suspended themselves when they have no active USB peripherals downstream.
When a USB root hub suspends, its HCD could choose to suspend its upstream bus link too (in this context PCI, but in other cases a SOC's platform bus) to save more power. It needs to enable wakeup, so that it can re-activate when a USB peripheral reactivates itself (remote wakeup) or is plugged/unplugged. If PCI is now ready to support this through ACPI, we could start filling in some of those missing pieces...
One open question is what we should do receving a wakeup requests, just call .resume?
>IMO the standard parts of the model should not change:
yes, this is the usage model, but I assume no device can enter suspend mode in runtime, right? This is why I said there is no usage model currently.
My vote is still what I suggested in comment #2: just call resume(). Nobody has come up with a reason why that would be inappropriate ...
Re: "I assume no device can enter suspend mode in runtime" ... bad assumption!!! Even just on PCs, but especially with embedded Linux hardware.
I already gave the example of USB, where a number of drivers support it today even on PCs. The "autosuspend" mechanisms handle that, and root hubs (in particular) did it for a long time before that infrastructure existed.
Linux drivers for embedded hardware fairly routinely adopt strategies where they enter low power modes ("suspend") whenever they don't actually need to be active. You wouldn't get over a week of battery life from a Nokia N810 tablet without aggressive runtime PM ... those little cell phone batteries can't store much power.
(And there's upcoming OMAP3 stuff that's very interesting ... not yet merged to mainline. When the last clock in a power domain is disabled, and drivers support it, those power domains can be automatically switched off to eliminate leakage current. As you probably know, that's become more important with recent generations of process technology.)
I guess this is fixed in the current mainline (2.6.32-rc5), closing.