Bug 197323 - cpu device PM QoS not behaving as expected
Summary: cpu device PM QoS not behaving as expected
Alias: None
Product: Power Management
Classification: Unclassified
Component: Run-Time-PM (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Rafael J. Wysocki
Depends on:
Reported: 2017-10-19 16:46 UTC by Reinette Chatre
Modified: 2018-08-08 03:38 UTC (History)
4 users (show)

See Also:
Kernel Version: v4.14-rc2
Tree: Mainline
Regression: No

PM / QoS: Fix device resume latency PM QoS (9.49 KB, patch)
2017-10-20 00:54 UTC, Rafael J. Wysocki
Details | Diff
PM / QoS: Fix device resume latency PM QoS (9.88 KB, patch)
2017-10-20 10:59 UTC, Rafael J. Wysocki
Details | Diff

Description Reinette Chatre 2017-10-19 16:46:21 UTC
I am experimenting with dynamic per-core latency tolerance and find them not to behave as I expected. The modification of per-core latency tolerance using the sysfs API (/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us) works for me as reported by turbostat - I can write a new resume latency tolerance to this sysfs file and turbostat shows me that this cpu's C-states change to accommodate the request. Unfortunately I am not able to achieve the same results programmatically using the dev_pm_qos_add_request() API. In addition, it also does not seem as though the device PM QoS sysfs interface supports multiple latency tolerance requests.

Please consider my usage and analysis below.

The problem seems to originate with a latency tolerance value of "0" to mean "no restriction" combined with the usage of a priority list (plist) to store new latency tolerance requests. As you can see, if I now use the dev_pm_qos_add_request() API to add a new latency request for a particular CPU that is larger than zero, then it will be added after the default "0" entry in the priority list. The priority list will consider "0" smaller than the new value, thus not updating the constraint's new target ... but since "0" has this special value then "no restriction" remains even though a tougher constraint exists.

To elaborate on what I am seeing:

The "no restriction" latency requirement is established at the time the cpu is registered:

	dev_pm_qos_expose_latency_limit(&cpu->dev, 0);

The above code causes that on boot we see the default "no constraint" requests created:

       swapper/0-1     [000] ....     0.344931: dev_pm_qos_add_request: device=cpu0 type=DEV_PM_QOS_RESUME_LATENCY new_value=0
       swapper/0-1     [000] ....     0.345067: pm_qos_update_target: action=ADD_REQ prev_value=0 curr_value=0
       swapper/0-1     [000] ....     0.345305: dev_pm_qos_add_request: device=cpu1 type=DEV_PM_QOS_RESUME_LATENCY new_value=0
       swapper/0-1     [000] ....     0.345375: pm_qos_update_target: action=ADD_REQ prev_value=0 curr_value=0
       swapper/0-1     [000] ....     0.345581: dev_pm_qos_add_request: device=cpu2 type=DEV_PM_QOS_RESUME_LATENCY new_value=0
       swapper/0-1     [000] ....     0.345650: pm_qos_update_target: action=ADD_REQ prev_value=0 curr_value=0
       swapper/0-1     [000] ....     0.345849: dev_pm_qos_add_request: device=cpu3 type=DEV_PM_QOS_RESUME_LATENCY new_value=0
       swapper/0-1     [000] ....     0.345917: pm_qos_update_target: action=ADD_REQ prev_value=0 curr_value=0

Sysfs is connected to the latency tolerance created on boot (this is the reason why the sysfs API works: it modifies the original latency requirement in place). 

# grep . /sys/devices/system/cpu/cpu?/power/pm_qos_resume_latency_us

Now, if somebody attempts to add a new latency tolerance using dev_pm_qos_add_request():
           runit-508   [002] ....   416.622465: dev_pm_qos_add_request: device=cpu2 type=DEV_PM_QOS_RESUME_LATENCY new_value=2
           runit-508   [002] ....   416.626314: pm_qos_update_target: action=ADD_REQ prev_value=0 curr_value=0
           runit-508   [002] ....   416.628220: dev_pm_qos_add_request: device=cpu3 type=DEV_PM_QOS_RESUME_LATENCY new_value=2
           runit-508   [002] ....   416.632141: pm_qos_update_target: action=ADD_REQ prev_value=0 curr_value=0

As you can see, the latency tolerance constraint is not updated since 2 > 0 even though 2 is a much tighter constraint than "0", which means "no constraint".

There is also a second area that is not behaving as expected. In dev_pm_qos_expose_latency_limit() a single latency tolerance request is created and associated with the device. When the user writes to the sysfs file it is this single latency tolerance request that is modified in place using dev_pm_qos_update_request(). The confusing part is that when the user reads from the sysfs file again the contents of this single latency tolerance request is presented - not the value determined to be the current latency tolerance based on all the requests. Sysfs thus does not reflect the current resume latency target (that is, it would not if it was possible to add a target larger than zero).

I looked around in this code and there appears to be a few areas where these issues could be addressed. For example,
- the system wide default latency tolerance values are not zero, could the device PM QoS be similar? The problem here is that "0" meaning "no restriction" is now part of the user space API and may not be able to change.
- if zero is kept as a default, "no restriction", could it be ignored if other entries are present?
- could sysfs perhaps be modified to reflect the current latency tolerance based on all requests?
Comment 1 Rafael J. Wysocki 2017-10-20 00:54:26 UTC
Created attachment 260299 [details]
PM / QoS: Fix device resume latency PM QoS

Please test the attached patch and let me know if it works for you.
Comment 2 Rafael J. Wysocki 2017-10-20 10:59:11 UTC
Created attachment 260303 [details]
PM / QoS: Fix device resume latency PM QoS

This is a slightly improved version of the patch attached previously.

I'm going to test it a bit and submit to linux-pm, but please test it too.
Comment 3 Reinette Chatre 2017-10-20 22:08:40 UTC
I tested the version that was posted to the mailing list. Feedback was provided via email as response to the patch: 

Feedback duplicated here for the record:

With this patch I am now able to dynamically modify the latency requirements via the kernel API.

Below I provide the details of what I tested.

On a four core system I wanted to dynamically constrain two cores from entering deeper C-states.
# grep . /sys/devices/system/cpu/cpuidle/current_*

On boot I now see (from the cpu online code):
       swapper/0-1     [000] ....     0.346148: dev_pm_qos_add_request: device=cpu0 type=DEV_PM_QOS_RESUME_LATENCY new_value=2147483647
       swapper/0-1     [000] ....     0.346247: pm_qos_update_target: action=ADD_REQ prev_value=2147483647 curr_value=2147483647
       swapper/0-1     [000] ....     0.346519: dev_pm_qos_add_request: device=cpu1 type=DEV_PM_QOS_RESUME_LATENCY new_value=2147483647
       swapper/0-1     [000] ....     0.346593: pm_qos_update_target: action=ADD_REQ prev_value=2147483647 curr_value=2147483647
       swapper/0-1     [000] ....     0.346794: dev_pm_qos_add_request: device=cpu2 type=DEV_PM_QOS_RESUME_LATENCY new_value=2147483647
       swapper/0-1     [000] ....     0.346867: pm_qos_update_target: action=ADD_REQ prev_value=2147483647 curr_value=2147483647
       swapper/0-1     [000] ....     0.347080: dev_pm_qos_add_request: device=cpu3 type=DEV_PM_QOS_RESUME_LATENCY new_value=2147483647
       swapper/0-1     [000] ....     0.347153: pm_qos_update_target: action=ADD_REQ prev_value=2147483647 curr_value=2147483647

At this time turbostat shows me that all four of the cores are in C6 more than 99% of the time.

Starting my code I request a 2us constraint on the resume latency of core #2 and #3 and this goes smoothly:
           runit-700   [002] ....   933.722548: dev_pm_qos_add_request: device=cpu2 type=DEV_PM_QOS_RESUME_LATENCY new_value=2
           runit-700   [002] ....   933.722956: pm_qos_update_target: action=ADD_REQ prev_value=2147483647 curr_value=2
           runit-700   [002] ....   933.723161: dev_pm_qos_add_request: device=cpu3 type=DEV_PM_QOS_RESUME_LATENCY new_value=2
           runit-700   [002] ....   933.723407: pm_qos_update_target: action=ADD_REQ prev_value=2147483647 curr_value=2

Running turbostat in parallel I can immediately observe that the cores are indeed not entering deeper C-states (they stay in C1). I also tried it by requesting a 0us constraint where turbostat showed that only C1 is entered minimally (0.09%).

Removing the constraint went just as smoothly, previous "no constraint" was restored and turbostat shows we are spending most time in C6 again.
           rmdir-757   [002] ....  1050.146021: dev_pm_qos_remove_request: device=cpu3 type=DEV_PM_QOS_RESUME_LATENCY new_value=-1
           rmdir-757   [002] ....  1050.146383: pm_qos_update_target: action=REMOVE_REQ prev_value=2 curr_value=2147483647
           rmdir-757   [002] ....  1050.146512: dev_pm_qos_remove_request: device=cpu2 type=DEV_PM_QOS_RESUME_LATENCY new_value=-1
           rmdir-757   [002] ....  1050.146812: pm_qos_update_target: action=REMOVE_REQ prev_value=2 curr_value=2147483647

Through all three stages every core kept reporting a resume_latency requirement of zero. Though not blocking anything from my side it is still of concern to me that there is no way to query the effective constraint from user space. 
# grep . /sys/devices/system/cpu/cpu?/power/pm_qos_resume_latency_us
Comment 4 Zhang Rui 2017-12-18 03:24:40 UTC
Patch merged in 4.15-rc1.

commit 0759e80b84e34a84e7e46e2b1adb528c83d84a47
Author:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AuthorDate: Tue Nov 7 11:33:49 2017 +0100
Commit:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CommitDate: Wed Nov 8 12:14:51 2017 +0100

    PM / QoS: Fix device resume latency framework
    The special value of 0 for device resume latency PM QoS means
    "no restriction", but there are two problems with that.
    First, device resume latency PM QoS requests with 0 as the
    value are always put in front of requests with positive
    values in the priority lists used internally by the PM QoS
    framework, causing 0 to be chosen as an effective constraint
    value.  However, that 0 is then interpreted as "no restriction"
    effectively overriding the other requests with specific
    restrictions which is incorrect.
    Second, the users of device resume latency PM QoS have no
    way to specify that *any* resume latency at all should be
    avoided, which is an artificial limitation in general.
    To address these issues, modify device resume latency PM QoS to
    use S32_MAX as the "no constraint" value and 0 as the "no
    latency at all" one and rework its users (the cpuidle menu
    governor, the genpd QoS governor and the runtime PM framework)
    to follow these changes.
    Also add a special "n/a" value to the corresponding user space I/F
    to allow user space to indicate that it cannot accept any resume
    latencies at all for the given device.
    Fixes: 85dc0b8a4019 (PM / QoS: Make it possible to expose PM QoS latency constraints)
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=197323
    Reported-by: Reinette Chatre <reinette.chatre@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Tested-by: Reinette Chatre <reinette.chatre@intel.com>
    Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Tested-by: Tero Kristo <t-kristo@ti.com>
    Reviewed-by: Ramesh Thomas <ramesh.thomas@intel.com>
Comment 5 youling257 2018-08-08 03:38:26 UTC
somebody online? please look this, https://bugzilla.kernel.org/show_bug.cgi?id=198631

Note You need to log in before you can comment on or make changes to this bug.