Bug 57141

Summary: CONFIG_X86_INTEL_PSTATE disables CPU frequency transition stats and many governors
Product: ACPI Reporter: Artem S. Tashkinov (aros)
Component: Power-ProcessorAssignee: Rafael J. Wysocki (rjwysocki)
Status: CLOSED INVALID    
Severity: high CC: aaron.lu, dirk.brandewie, kernel, lenb, rjw, rui.zhang, torvalds
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.9 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: 3.9 configuration

Description Artem S. Tashkinov 2013-04-26 14:31:18 UTC
Created attachment 100011 [details]
3.9 configuration

Without CONFIG_X86_INTEL_PSTATE

1) I can see frequency transition stats
2) I can use userspace, ondemand and conservative governors
3) I can see the current cores frequencies using /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq

This option/driver disables *everything* mentioned above.

Is this an intended behavior?

Of course, it's nice to finally see Turbo Frequencies in the ACPI driver but what about the rest?

This option also breaks all user space utilities.
Comment 1 Artem S. Tashkinov 2013-04-26 14:39:26 UTC
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
performance powersave

grep FREQ .config
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_GOV_COMMON=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
# CONFIG_X86_PCC_CPUFREQ is not set
CONFIG_X86_ACPI_CPUFREQ=y
# CONFIG_X86_ACPI_CPUFREQ_CPB is not set
# CONFIG_X86_CPUFREQ_NFORCE2 is not set
# CONFIG_PM_DEVFREQ is not set

[root@localhost cpufreq]# pwd
/sys/devices/system/cpu/cpu0/cpufreq

[root@localhost cpufreq]# ls -la
total 0
drwxr-xr-x 2 root root    0 Apr 26 20:11 .
drwxr-xr-x 9 root root    0 Apr 27  2013 ..
-r--r--r-- 1 root root 4096 Apr 26 20:14 affected_cpus
-r-------- 1 root root 4096 Apr 26 20:14 cpuinfo_cur_freq
-r--r--r-- 1 root root 4096 Apr 26 20:14 cpuinfo_max_freq
-r--r--r-- 1 root root 4096 Apr 26 20:14 cpuinfo_min_freq
-r--r--r-- 1 root root 4096 Apr 26 20:14 cpuinfo_transition_latency
-r--r--r-- 1 root root 4096 Apr 26 20:14 related_cpus
-r--r--r-- 1 root root 4096 Apr 26 20:14 scaling_available_governors
-r--r--r-- 1 root root 4096 Apr 26 20:14 scaling_driver
-rw-r--r-- 1 root root 4096 Apr 26 20:11 scaling_governor
-rw-r--r-- 1 root root 4096 Apr 26 20:27 scaling_max_freq
-rw-r--r-- 1 root root 4096 Apr 26 20:14 scaling_min_freq
-rw-r--r-- 1 root root 4096 Apr 26 20:14 scaling_setspeed
Comment 2 Artem S. Tashkinov 2013-04-26 14:46:34 UTC
Without this option:

[root@localhost cpufreq]# pwd
/sys/devices/system/cpu/cpu0/cpufreq

[root@localhost cpufreq]# ls -la
total 0
drwxr-xr-x 3 root root    0 Apr 26 20:44 .
drwxr-xr-x 9 root root    0 Apr 27  2013 ..
-r--r--r-- 1 root root 4096 Apr 26 20:44 affected_cpus
-r--r--r-- 1 root root 4096 Apr 26 20:44 bios_limit (+)
-r-------- 1 root root 4096 Apr 26 20:44 cpuinfo_cur_freq
-r--r--r-- 1 root root 4096 Apr 26 20:44 cpuinfo_max_freq
-r--r--r-- 1 root root 4096 Apr 26 20:44 cpuinfo_min_freq
-r--r--r-- 1 root root 4096 Apr 26 20:44 cpuinfo_transition_latency
-r--r--r-- 1 root root 4096 Apr 26 20:44 related_cpus
-r--r--r-- 1 root root 4096 Apr 26 20:44 scaling_available_frequencies (+)
-r--r--r-- 1 root root 4096 Apr 26 20:44 scaling_available_governors
-r--r--r-- 1 root root 4096 Apr 26 20:44 scaling_cur_freq (+)
-r--r--r-- 1 root root 4096 Apr 26 20:44 scaling_driver
-rw-r--r-- 1 root root 4096 Apr 26 20:44 scaling_governor
-rw-r--r-- 1 root root 4096 Apr 26 20:44 scaling_max_freq
-rw-r--r-- 1 root root 4096 Apr 26 20:44 scaling_min_freq
-rw-r--r-- 1 root root 4096 Apr 26 20:44 scaling_setspeed
drwxr-xr-x 2 root root    0 Apr 26 20:44 stats (+)
Comment 3 Artem S. Tashkinov 2013-04-26 14:47:26 UTC
And of course all the governors are available:

[root@localhost cpufreq]# cat scaling_available_governors
ondemand userspace powersave conservative performance
Comment 4 Rafael J. Wysocki 2013-04-27 14:16:10 UTC
(In reply to comment #0)
> Created an attachment (id=100011) [details]
> 3.9 configuration
> 
> Without CONFIG_X86_INTEL_PSTATE
> 
> 1) I can see frequency transition stats
> 2) I can use userspace, ondemand and conservative governors
> 3) I can see the current cores frequencies using
> /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
> 
> This option/driver disables *everything* mentioned above.
> 
> Is this an intended behavior?

Yes, it is.

> This option also breaks all user space utilities.

Which user space utilities does it break?
Comment 5 Artem S. Tashkinov 2013-04-27 15:13:02 UTC
(In reply to comment #4)
> (In reply to comment #0)
> > Created an attachment (id=100011) [details] [details]
> > 3.9 configuration
> > 
> > Without CONFIG_X86_INTEL_PSTATE
> > 
> > 1) I can see frequency transition stats
> > 2) I can use userspace, ondemand and conservative governors
> > 3) I can see the current cores frequencies using
> > /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
> > 
> > This option/driver disables *everything* mentioned above.
> > 
> > Is this an intended behavior?
> 
> Yes, it is.

How can I select the governor then? How can I set the constant desired frequency? How can I run at maximum speed? How can I run at minimum speed? How can I limit maximum CPU frequency?

> 
> > This option also breaks all user space utilities.
> 
> Which user space utilities does it break?

Every utility which reports CPU frequency - in Gnome/XFCE/KDE/etc - they all read the scaling_cur_freq file.
Comment 6 Rafael J. Wysocki 2013-04-27 20:12:12 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #0)
> > > Created an attachment (id=100011) [details] [details] [details]
> > > 3.9 configuration
> > > 
> > > Without CONFIG_X86_INTEL_PSTATE
> > > 
> > > 1) I can see frequency transition stats
> > > 2) I can use userspace, ondemand and conservative governors
> > > 3) I can see the current cores frequencies using
> > > /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
> > > 
> > > This option/driver disables *everything* mentioned above.
> > > 
> > > Is this an intended behavior?
> > 
> > Yes, it is.
> 
> How can I select the governor then?

You can't.  The governor is built into intel_pstate and cannot be changed.

> How can I set the constant desired frequency?
> How can I run at maximum speed? How can I run at minimum speed? How
> can I limit maximum CPU frequency?

There are three files under /sys/devices/system/cpu/intel_pstate/:

      max_perf_pct: limits the maximum P state that will be requested by
      the driver stated as a percentage of the avail performance.
    
      min_perf_pct: limits the minimum P state that will be  requested by
      the driver stated as a percentage of the avail performance.
    
      no_turbo: limits the driver to selecting P states below the turbo
      frequency range.

So set both max_perf_pct and min_perf_pct to 100 and either set or unset no_turbo depending on whether or not you want turbo to kick in.

> > > This option also breaks all user space utilities.
> > 
> > Which user space utilities does it break?
> 
> Every utility which reports CPU frequency - in Gnome/XFCE/KDE/etc - they all
> read the scaling_cur_freq file.

And what happens if that file is not present?  Do they crash?
Comment 7 Artem S. Tashkinov 2013-04-27 21:15:26 UTC
(In reply to comment #6)
> > 
> > Every utility which reports CPU frequency - in Gnome/XFCE/KDE/etc - they
> all
> > read the scaling_cur_freq file.
> 
> And what happens if that file is not present?  Do they crash?

No, they simply report no data. Is it possible to export scaling_cur_freq as a copy of cpuinfo_cur_freq? I guess it's going to be a trivial change, yet this file isn't that different in regard to what's being reported through it so it seems safe.

Anyways, thank you for clarification.

What troubles me with this option is that it has a very incomplete description in Kconfig. I guess most users would appreciate a lot if you put a few words about the changes in brings to the table.

Like: 

"If you enable this option then:

1) You cannot use any governors other than performance and powersave
2) scaling_setspeed interface becomes unavailable, you have to set options in /sys/devices/system/cpu/intel_pstate/*
3) scaling_cur_freq is no longer exported, you have to fetch that information from cpuinfo_cur_freq
4) CPU frequency transition stats are gone."

BTW, about item number 4 - why isn't it possible to see/export them?
Comment 8 Rafael J. Wysocki 2013-04-27 21:38:10 UTC
On Saturday, April 27, 2013 09:15:26 PM bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=57141
> 
> 
> Artem S. Tashkinov <t.artem@mailcity.com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEEDINFO                    |ASSIGNED
> 
> 
> 
> 
> --- Comment #7 from Artem S. Tashkinov <t.artem@mailcity.com>  2013-04-27
> 21:15:26 ---
> (In reply to comment #6)
> > > 
> > > Every utility which reports CPU frequency - in Gnome/XFCE/KDE/etc - they
> all
> > > read the scaling_cur_freq file.
> > 
> > And what happens if that file is not present?  Do they crash?
> 
> No, they simply report no data. Is it possible to export scaling_cur_freq as
> a
> copy of cpuinfo_cur_freq? I guess it's going to be a trivial change, yet this
> file isn't that different in regard to what's being reported through it so it
> seems safe.
> 
> Anyways, thank you for clarification.
> 
> What troubles me with this option is that it has a very incomplete
> description
> in Kconfig. I guess most users would appreciate a lot if you put a few words
> about the changes in brings to the table.
> 
> Like: 
> 
> "If you enable this option then:
> 
> 1) You cannot use any governors other than performance and powersave
> 2) scaling_setspeed interface becomes unavailable, you have to set options in
> /sys/devices/system/cpu/intel_pstate/*
> 3) scaling_cur_freq is no longer exported, you have to fetch that information
> from cpuinfo_cur_freq
> 4) CPU frequency transition stats are gone."
> 
> BTW, about item number 4 - why isn't it possible to see/export them?

Because with intel_idle we can't really follow the transitions the way we
can with "traditional" cpufreq drivers.

That said, Dirk Brandewie who is the author of intel_pstate can give you
more details regarding that.

Dirk, care to comment?
Comment 9 Dirk Brandewie 2013-04-29 15:17:06 UTC
Scaling_cur_freq is available when the scaling driver implements target() method and relies on the scaling driver setting policy->cur for the most recently requested frequency.

intel_pstate does not implement target() so the cpufreq core code does not export the file.  

With an internal governor the call chain to update policy->cur is never called. The external governors ondemand et al us the call __cpufreq_driver_target() in the core to request a change from the scaling driver.

The transition stats are collected in a notifier called from cpufreq_notify_transition() in the core code which is not used by scaling drivers with internal governors.
Comment 10 Artem S. Tashkinov 2013-04-29 15:57:29 UTC
What I understand from your explanation is that the Intel PState "governor" doesn't belong to the CPU frequency subsystem as it diverts from all known conventions in a major way.

I guess it would be best to make it separate both in terms of Kconfig help and filesystem hierarchy.

Something like

/sys/devices/system/cpu/cpuNN/intel_pstate/*
Comment 11 Dirk Brandewie 2013-04-29 16:39:15 UTC
This is not completely accurate the interface that my driver uses where is implements setpolicy() instead of target() has been in the kernel for a long time.

The ability for a scaling driver to have an internal governor was introduced to support the transmeta CPU (cpufreq/longrun.c)

I did not define the what sysfs files would/wouldn't be exposed or their semantics. This has been part of the cpufreq core for a while.

Early in its development my driver was stand alone and would actually call cpufreq_disable(), as part of the discussions to get the driver upstream it was pointed out that would likely encounter significant resistance ( likely from Linus ) and that the setpolicy() interface was added to support this type of driver.
Comment 12 Artem S. Tashkinov 2013-04-29 19:18:43 UTC
I'd be glad if you updated the Kconfig description for your driver taking my input (comment 7) as a starting point.

I've got no more questions.