Bug 197153 - Constant "cpu MHz" in /proc/cpuinfo
Summary: Constant "cpu MHz" in /proc/cpuinfo
Status: CLOSED DOCUMENTED
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-07 14:07 UTC by Artem S. Tashkinov
Modified: 2017-10-17 01:39 UTC (History)
4 users (show)

See Also:
Kernel Version: 4.13
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Artem S. Tashkinov 2017-10-07 14:07:02 UTC
$ grep MHz /proc/cpuinfo 
cpu MHz		: 2400.000
cpu MHz		: 2400.000
cpu MHz		: 2400.000
cpu MHz		: 2400.000

$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
1429867
1209696
594559
600065

Looks like a regression. Also /proc values got updated two hours ago, but then something weird happened and they got stuck.
Comment 1 Doug Smythies 2017-10-07 15:13:38 UTC
this is a duplicate of bug #197009.
the change was intentional.
Comment 2 Artem S. Tashkinov 2017-10-07 15:21:06 UTC
This breaks a lot of user space applications which report frequencies.

CC'ing Linus Torvalds.
Comment 3 Doug Smythies 2017-10-07 15:41:48 UTC
Back on bug #197009, I couldn't find the related e-mail thread. Now I did:
https://marc.info/?t=149766883400002&r=1&w=2
Comment 4 Artem S. Tashkinov 2017-10-07 15:53:45 UTC
(In reply to Doug Smythies from comment #3)
> Back on bug #197009, I couldn't find the related e-mail thread. Now I did:
> https://marc.info/?t=149766883400002&r=1&w=2

Thanks!

> Users who have been consulting /proc/cpuinfo to
> track changing CPU frequency will be dissapointed that
> it no longer wiggles -- perhaps being unaware of the
> limitations of the information they have been consuming.
> 
> Yes, they can change their scripts to look in sysfs
> cpufreq/scaling_cur_frequency.  Here they will find the same
> data of dubious quality here removed from /proc/cpuinfo.
> The value in sysfs will be addressed in a subsequent patch
> to address issues 1-3, above.
> 
> Issue 4 will remain -- users that really care about
> accurate frequency information should not be using either
> proc or sysfs kernel interfaces.
> They should be using using turbostat(8), or a similar
> purpose-built analysis tool.

That sounds like a dozen of Linux utilities need to be rewritten to either include the turbostat source (how do you even see that? what about updates to this code?) or ... Len Brown has some other ideas in mind? Does turbostat work without insmod()'ing the msr module first? Most distros don't autopreload it.

You break the user space interfaces and don't offer any realistic alternatives at all. I for one refuse to run turbostat in console just to know what frequencies my CPU cores are running at.

This is not how it should work.
Comment 5 Artem S. Tashkinov 2017-10-07 16:02:06 UTC
While we're at it I don't understand why I see four different values at 

/sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq

My CPU has just two physical cores (the two others are HT). I'm OK with Value0 Value0 and Value1 Value1 but not with four different values. I understand that frequencies are dynamic but not that dynamic so that the cat utility which completes in under 0.001 seconds could see four quite different values.
Comment 6 Artem S. Tashkinov 2017-10-07 22:33:31 UTC
The /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq interface is also broken in 4.13.

This one liner:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq | awk '{if (max<$0) max=$0}END{printf "%.2fGHz",max/1000000}'

prints values from 0.50GHz to 2.55GHz even though my laptop is more or less completely idle (no running applications aside from XFCE itself).
Comment 7 Doug Smythies 2017-10-08 03:37:21 UTC
@Artem: Please be considerate that, since this bug is unassigned, emails are going to the entire linux-pm e-mail list.

> The /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq interface is also
> broken in 4.13.

It is not broken, and the numbers you are getting are most probably correct.
If you are running a desktop computer with a GUI, then this statement:

"my laptop is more or less completely idle"

simply is not true. You would have to disable the GUI and a bunch of other
services to get a more idle system (which would still not be completely idle).

If I use my test Ubuntu 16.04.3 server (no GUI) and disable several services, i.e.:

$ cat set_cpu_turn_off_services
#! /bin/bash
# Turn off some services to try to get "idle" to be more "idle"
sudo systemctl stop mysql.service
sudo systemctl stop apache2.service
sudo systemctl stop nmbd.service
sudo systemctl stop smbd.service
sudo systemctl stop cron.service
sudo systemctl stop winbind.service
sudo systemctl stop apt-daily.timer

then for your command I usually get 1.61 GHz, which is the minimum for my processor.

In order to not influence the system I was testing, I ran your command at a very slow rate of once every 2 seconds for 6112 samples. 5970 (97.68%) of them reported the minimum frequency for my 8 CPU processor.
Comment 8 Zhang Rui 2017-10-11 05:49:12 UTC
*** Bug 197017 has been marked as a duplicate of this bug. ***
Comment 9 Len Brown 2017-10-17 01:39:27 UTC
There are multiple reasons, detailed in the commit message below,
why /proc/cpuinfo "cpu MHz" is constant (again) starting in Linux-4.13.

Note that up through 2005, "cpu MHz" was constant on all of x86_64.
However, it was variable on some kernels and some hardware configurations.
Some consistency was added when this patch https://lwn.net/Articles/162548/
made the field return the last cpufreq requested value on most systems.
But much of that consistency was lost starting in 2013 when intel_pstate
supplied a more precise, but less accurate variable value.
As intel_pstate adoption grew, so did user confusion.

Most notably, some that support Linux distributions complained,
and asked to return to the original scheme, where the result was a constant
that doesn't cause their phone to ring.

And so the current policy restores constant "cpu MHz" in /proc/cpuinfo.
It comes from the kernel's "cpu_khz" aka, processor "base frequency",
and it does not change.

Utilities that are unable to calculate frequency for themselves
can get what they are looking for from
/sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
which exists solely to answer this question.

Indeed, starting in Linux-4.13, scaling_cur_freq has been improved
on modern Intel hardware such that its result should be sufficiently
accurate, precise and consistent for all reasonable user-space
visual observations.


commit 51204e0639c49ada02fd823782ad673b6326d748
Author: Len Brown <len.brown@intel.com>
Date:   Fri Jun 16 20:03:11 2017 -0700

    x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"
    
    cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz
    that is otherwise reported in x86 /proc/cpuinfo "cpu MHz".
    
    There are four problems with this scheme,
    any of them is sufficient justification to delete it.
    
     1. Depending on which cpufreq driver is loaded, the behavior
        of this field is different.
    
     2. Distros complain that they have to explain to users
        why and how this field changes.  Distros have requested a constant.
    
     3. The two major providers of this information, acpi_cpufreq
        and intel_pstate, both "get it wrong" in different ways.
    
        acpi_cpufreq lies to the user by telling them that
        they are running at whatever frequency was last
        requested by software.
    
        intel_pstate lies to the user by telling them that
        they are running at the average frequency computed
        over an undefined measurement.  But an average computed
        over an undefined interval, is itself, undefined...
    
     4. On modern processors, user space utilities, such as
        turbostat(1), are more accurate and more precise, while
        supporing concurrent measurement over arbitrary intervals.
    
    Users who have been consulting /proc/cpuinfo to
    track changing CPU frequency will be dissapointed that
    it no longer wiggles -- perhaps being unaware of the
    limitations of the information they have been consuming.
    
    Yes, they can change their scripts to look in sysfs
    cpufreq/scaling_cur_frequency.  Here they will find the same
    data of dubious quality here removed from /proc/cpuinfo.
    The value in sysfs will be addressed in a subsequent patch
    to address issues 1-3, above.
    
    Issue 4 will remain -- users that really care about
    accurate frequency information should not be using either
    proc or sysfs kernel interfaces.
    They should be using using turbostat(8), or a similar
    purpose-built analysis tool.
    
    Signed-off-by: Len Brown <len.brown@intel.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Note You need to log in before you can comment on or make changes to this bug.