Bug 197945 - cpu frequency does not scale - Dell Lattitude E5530, Core i7
Summary: cpu frequency does not scale - Dell Lattitude E5530, Core i7
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: intel_pstate (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-21 01:27 UTC by Steven Haigh
Modified: 2017-11-30 20:23 UTC (History)
4 users (show)

See Also:
Kernel Version: 4.13.13-300.fc27.x86_64
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg output of Dell Lattitude E5530 (65.37 KB, text/plain)
2017-11-21 17:30 UTC, Steven Haigh
Details
dmesg with added dyndbg=file intel_pstate.c +p (67.38 KB, text/plain)
2017-11-22 10:40 UTC, Steven Haigh
Details
intel_pstate: A couple of debug printk()s (661 bytes, patch)
2017-11-22 14:11 UTC, Rafael J. Wysocki
Details | Diff

Description Steven Haigh 2017-11-21 01:27:25 UTC
model name      : Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz

When sitting idle, the cpu frequency does not decrease. This seems to lead to increased heat, battery usage, and a constantly running fan.

$ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
3461159
3446653
3515235
3496802

$ cd /sys/devices/system/cpu/cpufreq/policy0/
$ for i in *; do echo $i: $(cat $i); done
affected_cpus: 0
cpuinfo_max_freq: 3600000
cpuinfo_min_freq: 1200000
cpuinfo_transition_latency: 4294967295
related_cpus: 0
scaling_available_governors: performance powersave
scaling_cur_freq: 3511442
scaling_driver: intel_pstate
scaling_governor: powersave
scaling_max_freq: 3600000
scaling_min_freq: 1200000
scaling_setspeed: <unsupported>

Am happy to try and provide any further details.
Comment 1 Steven Haigh 2017-11-21 01:28:31 UTC
output of turbostat:

Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ     SMI     C1      C1E     C3      C6      C7      C1%     C1E%    C3%     C6%     C7%     CPU%c1  CPU%c3  CPU%c6  CPU%c7  CoreTmp PkgTmp  GFX%rc6 GFXMHz  Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7      PkgWatt CorWatt GFXWatt
-       -       215     6.24    3466    2891    4746    0       252     2034    481     0       10186   1.63    18.54   0.79    0.00    72.69   44.65   1.11    0.00    48.00   56      57      0.00    1250    0.00    0.00    0.00    0.0018.11    8.33    6.04
0       0       227     6.54    3473    2891    1010    0       58      673     93      0       2164    1.09    28.04   0.60    0.00    63.59   49.26   1.21    0.00    42.98   56      57      0.00    1250    0.00    0.00    0.00    0.0018.11    8.33    6.04
0       2       207     5.99    3470    2891    1206    0       51      465     154     0       2875    1.27    15.64   0.94    0.00    76.05   49.82
1       1       229     6.65    3460    2891    1566    0       103     407     125     0       2787    2.90    13.46   1.05    0.00    75.82   39.32   1.00    0.00    53.03   53
1       3       199     5.76    3462    2891    964     0       40      489     109     0       2360    1.28    16.99   0.57    0.00    75.30   40.20
Comment 2 Rafael J. Wysocki 2017-11-21 14:08:36 UTC
cpuinfo_transition_latency is bogus, but that shouldn't matter.

Something's odd.

Have you seen this with plain 4.13?

Have you tried 4.14?  If not, please do.
Comment 3 Rafael J. Wysocki 2017-11-21 16:42:32 UTC
Please also check what's in there in /sys/devices/system/cpu/intel_pstate/* and attach the output of dmesg (after a fresh boot) from a failing kernel.
Comment 4 Steven Haigh 2017-11-21 17:13:01 UTC
While I gather info, I have another laptop (none work issue) that does seem to work correctly:

From dmidecode:
Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: Dell Inc.
        Product Name: Inspiron 3537
        Version: A08
        Serial Number: 3ZW3SX1
        UUID: 4C4C4544-005A-5710-8033-B3C04F535831
        Wake-up Type: Power Switch
        SKU Number: Inspiron 3537
        Family: 00

model name      : Intel(R) Core(TM) i7-4500U CPU @ 1.80GHz

$ cat /proc/version
Linux version 4.13.13-300.fc27.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #1 SMP Wed Nov 15 15:47:50 UTC 2017

$ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
885425
982535
886804
798451

$ cd /sys/devices/system/cpu/cpufreq/policy0/
$ for i in *; do echo $i: $(cat $i); done
affected_cpus: 0
cpuinfo_max_freq: 3000000
cpuinfo_min_freq: 800000
cpuinfo_transition_latency: 4294967295
related_cpus: 0
scaling_available_governors: performance powersave
scaling_cur_freq: 2127263
scaling_driver: intel_pstate
scaling_governor: powersave
scaling_max_freq: 3000000
scaling_min_freq: 800000
scaling_setspeed: <unsupported>

$ cd /sys/devices/system/cpu/intel_pstate/
$ for i in *; do echo $i: $(cat $i); done
max_perf_pct: 100
min_perf_pct: 26
no_turbo: 0
num_pstates: 23
status: active
turbo_pct: 57

Give me a few minutes and I'll fire up the work laptop and gather exactly the same details again - plus the dmesg.
Comment 5 Steven Haigh 2017-11-21 17:29:40 UTC
From the work laptop that I originally lodged this issue about:

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: Dell Inc.
        Product Name: Latitude E5530 non-vPro
        Version: 01
        Serial Number: BNZDWW1
        UUID: 4C4C4544-004E-5A10-8044-C2C04F575731
        Wake-up Type: Power Switch
        SKU Number: Latitude E5530 non-vPro
        Family: Not Specified

$ cat /proc/version
Linux version 4.13.13-300.fc27.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #1 SMP Wed Nov 15 15:47:50 UTC 2017

$ cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
3410222
3114373
3138282
3263202

$ cd /sys/devices/system/cpu/cpufreq/policy0/
$ for i in *; do echo $i: $(cat $i); done
affected_cpus: 0
cpuinfo_max_freq: 3600000
cpuinfo_min_freq: 1200000
cpuinfo_transition_latency: 4294967295
related_cpus: 0
scaling_available_governors: performance powersave
scaling_cur_freq: 3093358
scaling_driver: intel_pstate
scaling_governor: performance
scaling_max_freq: 3600000
scaling_min_freq: 1200000
scaling_setspeed: <unsupported>

$ cd /sys/devices/system/cpu/intel_pstate/
$ for i in *; do echo $i: $(cat $i); done
max_perf_pct: 100
min_perf_pct: 33
no_turbo: 0
num_pstates: 25
status: active
turbo_pct: 29
Comment 6 Steven Haigh 2017-11-21 17:30:41 UTC
Created attachment 260761 [details]
dmesg output of Dell Lattitude E5530
Comment 7 Steven Haigh 2017-11-21 18:04:09 UTC
On the same machine, I also note this is interesting:

$ while true; do date; cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq; sleep 1; done
Wed Nov 22 05:02:21 AEDT 2017
3439173
3545192
3461566
3524594
Wed Nov 22 05:02:23 AEDT 2017
2080108
3490290
3531344
3451074
Wed Nov 22 05:02:24 AEDT 2017
1968543
2677264
1311234                                                                                                                                                      
3251583                                                                                                                                                      
Wed Nov 22 05:02:25 AEDT 2017                                                                                                                                                                                                                
1855877                                                                                                                                                                                                                                      
2264437                                                                                                                                                                                                                                      
3107480                                                                                                                                                                                                                                      
2258118                                                                                                                                                                                                                                      
Wed Nov 22 05:02:26 AEDT 2017                                                                                                                                                                                                                
1524171                                                                                                                                                                                                                                      
2365220                                                                                                                                                                                                                                      
1850871                                                                                                                                                                                                                                      
2442407                                                                                                                                                                                                                                      
Wed Nov 22 05:02:27 AEDT 2017                                                                                                                                                                                                                
2077300                                                                                                                                                                                                                                      
3090501                                                                                                                                                                                                                                      
3563860                                                                                                                                                                                                                                      
3480842                                                                                                                                                                                                                                      
Wed Nov 22 05:02:28 AEDT 2017                                                                                                                                                                                                                
3480785
3571819
3548358
3452782
Wed Nov 22 05:02:29 AEDT 2017
3403664
3126065
1941803
3404770
Wed Nov 22 05:02:30 AEDT 2017
2518917
3076549
1624617
3368908
Wed Nov 22 05:02:31 AEDT 2017
3077208
3345059
3514819
3426478
Wed Nov 22 05:02:32 AEDT 2017
1650530
3212238
3574962
3553168
Wed Nov 22 05:02:33 AEDT 2017
3507200
3540436
3441952
3564402
Wed Nov 22 05:02:35 AEDT 2017
2939500
3565573
3539760
3570695
Wed Nov 22 05:02:36 AEDT 2017
3361258
3412267
3425142
3363249
Wed Nov 22 05:02:37 AEDT 2017
1433039
3397464
3269907
3052990
Wed Nov 22 05:02:38 AEDT 2017
2866227
3021738
3380710
3566913
Wed Nov 22 05:02:39 AEDT 2017
1807272
3059071
2660091
3375225
Wed Nov 22 05:02:40 AEDT 2017
3085166
3569317
3567716
3473496
Wed Nov 22 05:02:41 AEDT 2017
1871541
3326360
1656168
3357178
Wed Nov 22 05:02:42 AEDT 2017
2424710
2968791
3399576
3092302

However, looking at the power states in 'powertop', I see the package stays mostly in C7.

          Package   |             Core    |            CPU 0       CPU 2
                    |                     | C0 active   0.7%        0.7%
                    |                     | POLL        0.0%    0.0 ms  0.0%    0.0 ms
                    |                     | C1E         0.3%    0.6 ms  0.1%    0.7 ms
C2 (pc2)    1.7%    |                     |
C3 (pc3)    0.0%    | C3 (cc3)    0.2%    | C3          0.1%    0.3 ms  0.0%    0.2 ms
C6 (pc6)    2.2%    | C6 (cc6)    0.0%    | C6          0.0%    0.7 ms  0.0%    0.0 ms
C7 (pc7)   72.7%    | C7 (cc7)   96.8%    | C7         98.4%   10.1 ms 98.8%    9.2 ms

                    |             Core    |            CPU 1       CPU 3
                    |                     | C0 active   0.8%        0.6%
                    |                     | POLL        0.0%    0.1 ms  0.0%    0.0 ms
                    |                     | C1E         0.2%    0.6 ms  0.3%    1.0 ms
                    |                     |
                    | C3 (cc3)    0.1%    | C3          0.1%    0.2 ms  0.1%    0.3 ms
                    | C6 (cc6)    0.0%    | C6          0.0%    0.0 ms  0.0%    0.0 ms
                    | C7 (cc7)   97.1%    | C7         98.7%    9.7 ms 98.7%   13.9 ms

                    |             GPU     |
                    |                     |
                    | Powered On 18.8%    |
                    | RC6        22.8%    |
                    | RC6p       58.4%    |
                    | RC6pp       0.0%    |
                    |                     |
                    |                     |
Comment 8 Rafael J. Wysocki 2017-11-21 21:54:39 UTC
On the affected machine, please do (as root)

# echo passive > /sys/devices/system/cpu/intel_pstate/status

then try to change the governor to "powersave" for all CPUs:

# for j in 0 1 2 3; do \
> echo powersave > /sys/devices/system/cpu/cpufreq/policy$j/scaling_governor
> done

and see what you get in scaling_cur_freq after that.

If that's 1200 MHz (as expected), try to change the governor to "schedutil" and see if the scaling works.
Comment 9 Steven Haigh 2017-11-22 00:37:34 UTC
Ok, as root:
# echo passive > /sys/devices/system/cpu/intel_pstate/status
# for j in 0 1 2 3; do echo powersave > /sys/devices/system/cpu/cpufreq/policy$j/scaling_governor; done
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
1196490
1196197
1196154
1196139

Then:
# for j in 0 1 2 3; do echo schedutil > /sys/devices/system/cpu/cpufreq/policy$j/scaling_governor; done
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
1196211
1196593
1497467
1196529
Comment 10 Rafael J. Wysocki 2017-11-22 01:19:02 UTC
OK, it looks like the problem is limited to intel_pstate in the active mode then.

Please reboot, check that /sys/devices/system/cpu/intel_pstate/status is "active" and then see if there is a file called "pstate_snb" in /sys/kernel/debug/ (as root).
Comment 11 Steven Haigh 2017-11-22 01:28:36 UTC
# cat /sys/devices/system/cpu/intel_pstate/status
active

# ls -l /sys/kernel/debug/pstate_snb
ls: cannot access '/sys/kernel/debug/pstate_snb': No such file or directory
Comment 12 Srinivas Pandruvada 2017-11-22 01:38:48 UTC
Run a tool which is present in the upstream Linux and attach output

tools/power/x86/intel_pstate_tracer/intel_pstate_tracer.py

It has some prerequisite packages which are described in the file header.

I think there is some constant IO activity.
Comment 13 Steven Haigh 2017-11-22 01:39:25 UTC
So I've just been observing the output of a 'watch cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq'.

I do notice that the frequency *does* actually drop - however it seems to raise again very quickly - even when the system is mostly idle.

Comparing to when using passive mode and a different governor - which seems to keep the lower clock frequency for longer - with the occasional increase, then back down.

Is this indicative of an issue? or is this how the pstate changing is supposed to work? Could it be that the pstate frequency changing is just a little ..... hyperactive?
Comment 14 Srinivas Pandruvada 2017-11-22 01:47:19 UTC
The output of the above tool will help us to find root cause.
Comment 15 Steven Haigh 2017-11-22 01:54:21 UTC
Trying to not turn this into a support session, but right now, I'm seeing:

# ./intel_pstate_tracer.py -i 10 -n test
python version (should be >= 2.7):
sys.version_info(major=2, minor=7, micro=14, releaselevel='final', serial=0)
Sleeping for  10 seconds
No valid data to plot

I've tried an interval of 1, 10, 60 - all with 'no valid data' errors.

What specifics are required kernel side for this?

The Fedora kernel has:
# grep -i trace /boot/config-4.13.14-300.fc27.x86_64
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_TRACEPOINTS=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
CONFIG_NETFILTER_XT_TARGET_TRACE=m
# CONFIG_DMA_FENCE_TRACE is not set
CONFIG_SCSI_IPR_TRACE=y
# CONFIG_VXGE_DEBUG_TRACE_ALL is not set
# CONFIG_ATH5K_TRACER is not set
# CONFIG_CAPI_TRACE is not set
# CONFIG_TRACE_SINK is not set
# CONFIG_F2FS_IO_TRACE is not set
# CONFIG_PSTORE_FTRACE is not set
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_STACKTRACE=y
# CONFIG_RCU_TRACE is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_TRACE_CLOCK=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_GENERIC_TRACER=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
CONFIG_SCHED_TRACER=y
CONFIG_HWLAT_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
# CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP is not set
CONFIG_STACK_TRACER=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
CONFIG_MMIOTRACE=y
# CONFIG_MMIOTRACE_TEST is not set
# CONFIG_TRACEPOINT_BENCHMARK is not set
CONFIG_TRACE_EVAL_MAP_FILE=y
# CONFIG_BACKTRACE_SELF_TEST is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
Comment 16 Srinivas Pandruvada 2017-11-22 02:00:27 UTC
It is not finding trace file with contents

# echo 1 > sys/kernel/debug/tracing/events/power/pstate_sample/enable
let system be idle for 10 seconds and
then attach the file
# cat /sys/kernel/debug/tracing/trace
Comment 17 Steven Haigh 2017-11-22 02:14:52 UTC
I get the feeling I have to turn something else on:

$ echo 1 > /sys/kernel/debug/tracing/events/power/pstate_sample/enable
$ cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
Comment 18 Srinivas Pandruvada 2017-11-22 05:01:30 UTC
I guess you are using default Fedora kernel. You don't have your own kernel config. You seems to have all CONFIG for tracing and you don't see any sample. This is a problem and may be function doing scaling is not called.


Try this
# cd /sys/kernel/debug/tracing
# cat available_tracers
You should see "function" tracer

# echo function > current_tracer

# cat current_tracer

You should see "function"

# cat trace | grep -i intel_pstate

You should see something like..

 gnome-terminal--2304  [002] d...   220.566747: intel_pstate_update_util <-update_load_avg
            bash-2349  [003] d.h.   220.567031: intel_pstate_update_util <-task_tick_fair
          <idle>-0     [002] d.h.   220.567348: intel_pstate_update_util <-update_load_avg
     gnome-shell-1784  [000] d...   220.567373: intel_pstate_update_util <-update_load_avg
     gnome-shell-1784  [000] d...   220.567374: intel_pstate_adjust_pstate <-intel_pstate_update_util
Comment 19 Steven Haigh 2017-11-22 06:38:41 UTC
# cd /sys/kernel/debug/tracing
# # cat available_tracers
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
# echo function > current_tracer
# cat current_tracer
function
# cat trace | grep -i intel_pstate
# cat trace | grep -i intel_pstate
# cat trace | grep -i intel_pstate

Ah! Forgot a step:
# echo 1 > /sys/kernel/debug/tracing/events/power/pstate_sample/enable

However, still no output in the trace regarding intel_pstate
Comment 20 Steven Haigh 2017-11-22 06:39:51 UTC
Interestingly, if I back the grep down to 'intel', I see lots of:

          <idle>-0     [000] d... 16224.904736: leave_mm <-intel_idle
          <idle>-0     [001] d... 16224.904800: intel_idle <-cpuidle_enter_state
          <idle>-0     [000] d... 16224.904839: intel_idle <-cpuidle_enter_state
          <idle>-0     [000] d... 16224.904839: leave_mm <-intel_idle
          <idle>-0     [001] d... 16224.904864: intel_idle <-cpuidle_enter_state
          <idle>-0     [000] d... 16224.905039: intel_idle <-cpuidle_enter_state
          <idle>-0     [001] d... 16224.905158: intel_idle <-cpuidle_enter_state
          <idle>-0     [000] d... 16224.905193: intel_idle <-cpuidle_enter_state
          <idle>-0     [000] d... 16224.905193: leave_mm <-intel_idle
          <idle>-0     [001] d... 16224.905215: intel_idle <-cpuidle_enter_state
          <idle>-0     [001] d... 16224.905253: intel_idle <-cpuidle_enter_state
          <idle>-0     [000] d... 16224.905379: intel_idle <-cpuidle_enter_state
          <idle>-0     [000] d... 16224.905379: leave_mm <-intel_idle
          <idle>-0     [001] d... 16224.905425: intel_idle <-cpuidle_enter_state
          <idle>-0     [000] d... 16224.905465: intel_idle <-cpuidle_enter_state
          <idle>-0     [001] d... 16224.905485: intel_idle <-cpuidle_enter_state
Comment 21 Srinivas Pandruvada 2017-11-22 07:36:39 UTC
Did you let it run for few seconds? Can you also increase the trace buffer size (You can increase buffer_size_in_kb in the same folder as trace file).

If you still don't see then we are not getting periodic callback. So the driver will not scale frequency.

We need to create a debug patch to check further.
Comment 22 Steven Haigh 2017-11-22 08:04:36 UTC
Hmmm ok.

I did:
# cat trace > ~/trace.txt
# ls -l ~/trace.txt
-rw-r--r-- 1 root root 17219952 Nov 22 19:03 /root/trace.txt

If it was all behaving, surely I'd see at least one hit in 17Mb of logs....
Comment 23 Srinivas Pandruvada 2017-11-22 09:13:42 UTC
Can you add this to kernel command line during boot and collect dmesg?
dyndbg="file intel_pstate.c +p"
Comment 24 Steven Haigh 2017-11-22 10:40:35 UTC
Created attachment 260779 [details]
dmesg with added dyndbg=file intel_pstate.c +p

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.13.14-300.fc27.x86_64 root=UUID=8b90bed8-43b9-4283-a5cf-0169acd438c7 ro rootflags=subvol=root rd.luks.uuid=luks-94704f71-460b-456d-9473-145077f9f772 rd.luks.uuid=luks-60898f8f-1e33-47fd-8a0a-b5add65027b0 rhgb quiet audit=0 LANG=en_AU.UTF-8 "dyndbg=file intel_pstate.c +p"
Comment 25 Rafael J. Wysocki 2017-11-22 13:38:38 UTC
@Srinivas: It sort of behaves as though the governor is "performance" even though it says "powersave".

@Steven: Would it be a problem to switch over to 4.14?  It may be somewhat easier to debug then.
Comment 26 Srinivas Pandruvada 2017-11-22 13:51:58 UTC
@Rafael: Correct. There is no trace so I guess cpufreq set_policy is called with PERFORMANCE.

Can you add a printk to check cpu->policy value
static int intel_pstate_set_policy(struct cpufreq_policy *policy)
{
...
...

	if (cpu->policy == CPUFREQ_POLICY_PERFORMANCE) {
            printk(KERN_ERR "intel_pstate policy is performance !!\n");
            ....
Comment 27 Rafael J. Wysocki 2017-11-22 14:11:35 UTC
Created attachment 260789 [details]
intel_pstate: A couple of debug printk()s

Please apply this debug patch to the kernel (should apply to 4.13.y too), boot with it and attach dmesg output.
Comment 28 Steven Haigh 2017-11-22 14:17:07 UTC
It seems there's an unofficial build of 4.14.0 for Fedora 28 - which should still run ok on Fedora 27...

Grabbed the packages from here:
https://koji.fedoraproject.org/koji/buildinfo?buildID=998399

# cat /proc/version
Linux version 4.14.0-1.fc28.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 7.2.1 20170829 (Red Hat 7.2.1-1) (GCC)) #1 SMP Mon Nov 13 14:09:27 UTC 2017

# cat /sys/devices/system/cpu/intel_pstate/status
active

# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
1962729
3390637
1351553
3507481
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
1291050
1237406
1598442
1540077
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
1293710
1240327
1774801
3129251

# cd /sys/devices/system/cpu/cpufreq/policy0/
# for i in *; do echo $i: $(cat $i); done
affected_cpus: 0
cpuinfo_max_freq: 3600000
cpuinfo_min_freq: 1200000
cpuinfo_transition_latency: 0
related_cpus: 0
scaling_available_governors: performance powersave
scaling_cur_freq: 2121493
scaling_driver: intel_pstate
scaling_governor: performance
scaling_max_freq: 3600000
scaling_min_freq: 1200000
scaling_setspeed: <unsupported>

# cd /sys/devices/system/cpu/intel_pstate/
# for i in *; do echo $i: $(cat $i); done
max_perf_pct: 100
min_perf_pct: 33
no_turbo: 0
num_pstates: 25
status: active
turbo_pct: 29

Switching to powersave for testing:
# for j in 0 1 2 3; do echo powersave > /sys/devices/system/cpu/cpufreq/policy$j/scaling_governor; done;

Confirm change:
# cd /sys/devices/system/cpu/cpufreq/policy0/
# for i in *; do echo $i: $(cat $i); done
affected_cpus: 0
cpuinfo_max_freq: 3600000
cpuinfo_min_freq: 1200000
cpuinfo_transition_latency: 0
related_cpus: 0
scaling_available_governors: performance powersave
scaling_cur_freq: 1294849
scaling_driver: intel_pstate
scaling_governor: powersave
scaling_max_freq: 3600000
scaling_min_freq: 1200000
scaling_setspeed: <unsupported>

This looks like expected behaviour:
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
1196386
1196213
1196168
1196264
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
1196599
1196222
1196697
1196737
# cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq
1196215
1196099
1196179
1196807
Comment 29 Rafael J. Wysocki 2017-11-22 14:29:36 UTC
OK, so it looks like 4.14 behaves correctly.

If that's the case, no need to test the patch I've attached, but please verify that the problem is really not there.
Comment 30 Steven Haigh 2017-11-22 14:31:42 UTC
Yeah - its just hit 1:30am now - so I'll check it out tomorrow at the office. Usually I notice if its behaving properly as the laptop fan turns off :)

At some point, the fan just started running constantly - which is what lead me down this path.

I'll observe for a while and see how we go with 4.14.0. If that is actually resolved fully, then maybe it indicates a change responsible.... but we'll see what the results are first :)
Comment 31 Rafael J. Wysocki 2017-11-22 22:08:41 UTC
The observed symptoms can be explained by issues with the tracking of time, which also would explain why some systems are affected and the others aren't.
Comment 32 Steven Haigh 2017-11-24 01:22:55 UTC
Ok - so after a bit of tinkering and trial / error, I booted today into 4.13.15 kernel. I noticed much more CPU fan activity (the thing is loud!). This is when I noticed I booted into 4.13.15 - and not the expected 4.14.0 kernel.

After validating this and rebooting into the 4.14.0 kernel, there is certainly lower CPU fan activity - and looking at the lmsensors output, a reduction of ~6-10 degrees C on the same workload while booted into 4.14.0.

Reported temps on both cores hover around 50-56c using kernel 4.13.15 - but reduce to 36-45c on 4.14.0.

The general workloads between measurements are roughly the same - browsing, email, ssh etc.
Comment 33 Rafael J. Wysocki 2017-11-27 00:32:28 UTC
OK

Just to be sure, please try to run intel_pstate_tracer.py as per comment #12, but this time with 4.14.0(+).
Comment 34 Steven Haigh 2017-11-27 00:49:22 UTC
I still don't seem to get any output using the tracer.

# echo 1 > /sys/kernel/debug/tracing/tracing_on
# echo function > /sys/kernel/debug/tracing/current_tracer
# echo 1 > /sys/kernel/debug/tracing/events/power/pstate_sample/enable
# ./intel_pstate_tracer.py -i 10 -n test
python version (should be >= 2.7):
sys.version_info(major=2, minor=7, micro=14, releaselevel='final', serial=0)
Sleeping for  10 seconds
No valid data to plot

It seems that the script disables the pstate_sample/enable correctly after running - so I'm not quite sure what's going on.
Comment 35 Rafael J. Wysocki 2017-11-27 13:34:03 UTC
OK

Please run

# turbostat --quiet --show CPU,Avg_MHz,Busy%,Bzy_MHz \
--add msr0x199,u32,raw,PERF_CTL --interval 3

with 4.14 for 30 s or so (~ 10 iterations) and attach the output.
Comment 36 Steven Haigh 2017-11-28 01:02:43 UTC
Fedora 27 - using kernel 4.14.0 from f28):

# turbostat --quiet --show CPU,Avg_MHz,Busy%,Bzy_MHz --add msr0x199,u32,raw,PERF_CTL --interval 3
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       187     5.74    3275    0x00000000
0       170     5.12    3321    0x00002400
2       172     5.24    3297    0x00002400
1       195     6.19    3164    0x00002400
3       213     6.42    3326    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       174     5.39    3236    0x00000000
0       156     4.72    3314    0x00002400
2       161     4.99    3235    0x00002400
1       187     6.10    3082    0x00002400
3       191     5.74    3337    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       173     5.33    3257    0x00000000
0       155     4.66    3339    0x00002400
2       169     5.20    3263    0x00002400
1       196     6.25    3143    0x00002400
3       173     5.23    3314    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       188     5.73    3301    0x00000000
0       209     6.19    3390    0x00002400
2       149     4.60    3258    0x00002400
1       213     6.66    3204    0x00002400
3       182     5.46    3353    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       168     5.17    3267    0x00000000
0       203     6.05    3375    0x00002400
2       163     5.01    3267    0x00002400
1       159     5.11    3119    0x00002400
3       148     4.51    3292    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       161     4.98    3249    0x00000000
0       197     5.93    3341    0x00002400
2       151     4.59    3303    0x00002400
1       170     5.45    3124    0x00002400
3       127     3.96    3218    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       168     5.20    3237    0x00000000
0       130     3.94    3301    0x00002400
2       194     5.93    3285    0x00002400
1       196     6.32    3118    0x00002400
3       151     4.63    3282    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       166     5.12    3254    0x00000000
0       140     4.22    3326    0x00002400
2       169     5.16    3290    0x00002400
1       191     6.13    3124    0x00002400
3       164     4.96    3315    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       178     5.45    3277    0x00000000
0       157     4.69    3357    0x00002400
2       170     5.22    3277    0x00002400
1       190     6.08    3136    0x00002400
3       195     5.82    3361    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       164     5.00    3292    0x00000000
0       173     5.12    3397    0x00002400
2       159     4.83    3307    0x00002400
1       168     5.32    3168    0x00002400
3       156     4.72    3305    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       167     5.08    3288    0x00000000
0       166     4.95    3370    0x00002400
2       163     4.92    3321    0x00002400
1       191     6.03    3183    0x00002400
3       146     4.43    3304    0x00002400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       170     5.15    3314    0x00000000
0       168     4.99    3374    0x00002400
2       169     5.12    3315    0x00002400
1       179     5.58    3211    0x00002400
3       165     4.91    3370    0x00002400
Comment 37 Srinivas Pandruvada 2017-11-28 03:21:31 UTC
Since there is no callback and limits are set to max, this is performance mode.
Is it possible that you have some script which is changing in policy on a specific CPU?

Can you try
# cpupower frequency-set -g powersave

Also output with debug patch by Rafael and suggested printk will be great. This is in addition to dyndbg in kernel command line to enable debug logs for intel_pstate.
Comment 38 Steven Haigh 2017-11-28 06:14:19 UTC
# cpupower frequency-set -g powersave                                                            
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
# turbostat --quiet --show CPU,Avg_MHz,Busy%,Bzy_MHz --add msr0x199,u32,raw,PERF_CTL --interval 3
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       322     12.20   2648    0x00000000
0       242     10.03   2423    0x00001400
2       518     18.56   2799    0x00001500
1       342     12.57   2727    0x00001500
3       186     7.65    2444    0x00001500
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       171     8.45    2031    0x00000000
0       159     7.81    2041    0x00001500
2       196     9.51    2062    0x00001500
1       167     8.45    1984    0x00001600
3       163     8.04    2032    0x00001600
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       171     8.09    2127    0x00000000
0       184     8.44    2182    0x00001400
2       171     7.71    2226    0x00001500
1       206     9.62    2151    0x00001400
3       125     6.59    1905    0x00001500
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       162     8.19    1985    0x00000000
0       155     7.77    1998    0x00001500
2       183     9.49    1939    0x00001600
1       151     7.64    1985    0x00001600
3       159     7.86    2027    0x00001500
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       160     7.97    2013    0x00000000
0       150     7.53    2002    0x00001500
2       161     8.23    1961    0x00001500
1       165     7.90    2094    0x00001500
3       163     8.21    1997    0x00001500
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       169     8.45    2002    0x00000000
0       148     7.90    1880    0x00001400
2       188     9.23    2048    0x00001400
1       163     8.26    1976    0x00001400
3       176     8.43    2091    0x00001400
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       168     8.34    2024    0x00000000
0       130     6.48    2011    0x00001600
2       173     8.66    2005    0x00001400
1       166     8.20    2027    0x00001500
3       205     10.04   2047    0x00001500
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       175     8.49    2068    0x00000000
0       187     9.28    2019    0x00001500
2       205     9.36    2203    0x00001400
1       167     8.27    2025    0x00001500
3       141     7.07    2004    0x00001500
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       353     13.47   2631    0x00000000
0       237     10.17   2334    0x00001600
2       246     10.63   2321    0x00001500
1       179     7.92    2264    0x00001600
3       751     25.13   2998    0x00001500
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       187     8.92    2109    0x00000000
0       205     9.87    2083    0x00001500
2       183     8.68    2116    0x00001400
1       162     7.49    2168    0x00001500
3       200     9.64    2083    0x00001500
CPU     Avg_MHz Busy%   Bzy_MHz   PERF_CTL
-       171     8.31    2060    0x00000000
0       190     8.80    2161    0x00001600
2       167     8.07    2071    0x00001500
1       146     7.17    2043    0x00001500
3       181     9.21    1967    0x00001600
^C

I don't have a buildroot handy for Fedora 27 to build the kernel - so I can't quite promise anything here at the moment...
Comment 39 Rafael J. Wysocki 2017-11-28 21:13:26 UTC
Well, it appears to be working.

You still can try to run intel_pstate_tracer.py with scaling_governor set to "powersave" (like in the previous comment).
Comment 40 Srinivas Pandruvada 2017-11-30 00:08:46 UTC
As Rafael noted, the scaling seems to work. I am thinking that you have some CPU policy is performance and some powersave. cpupower will keep all same. So your work computer may have some scripts.
Comment 41 Steven Haigh 2017-11-30 06:45:53 UTC
I believe the thing is that the 4.14.0 kernel works as it should - but the 4.13.x kernel does not.

I certainly have no heat / fan issues under 4.14.0.
Comment 42 Srinivas Pandruvada 2017-11-30 10:37:13 UTC
But the log you added to comment 36 shows that 4.14 kernel is requesting performance mode always.
Your log in comment 38 shows different dump (which is correct). So it is confusing.
Comment 43 Rafael J. Wysocki 2017-11-30 20:23:21 UTC
Anyway, since the problem is not reproducible with 4.14, I'm regarding it as fixed in the mainline.

4.13.y will be discontinued at one point, so you will need to switch over to 4.14.y anyway.

Note You need to log in before you can comment on or make changes to this bug.