Bug 209069 - CPU stuck at 800 MHz at any load - Xeon E3-1271v3 HSW
Summary: CPU stuck at 800 MHz at any load - Xeon E3-1271v3 HSW
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-pm@vger.kernel.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-29 14:51 UTC by Torbjörn Granlund
Modified: 2021-04-21 02:16 UTC (History)
5 users (show)

See Also:
Kernel Version: 5.8.5
Subsystem:
Regression: No
Bisected commit-id:


Attachments
attachment-11362-0.txt (487 bytes, text/plain)
2020-08-29 15:19 UTC, Torbjörn Granlund
Details
foo (6.94 KB, application/octet-stream)
2020-08-29 15:19 UTC, Torbjörn Granlund
Details
example of needed information (5.01 KB, text/plain)
2020-09-05 16:01 UTC, Doug Smythies
Details
test (6.33 KB, text/plain)
2020-09-06 17:14 UTC, Oleh Vinichenko
Details

Description Torbjörn Granlund 2020-08-29 14:51:11 UTC
Environment 1:
  OS:     GNU/Linux/Xen Gentoo 17.1 (ker=5.8.5 xen=4.12.3)
  mbd:    Supermicro X10SLH-F-O S1150 µATX (BIOS 3.3 2020-06-13)
  cpu:    Intel HWL X4 3600MHz (Xeon E3-1271v3, ECC)
  memory: 8192MB SDRAM DDR3L-1600 ECC (Samsung M391B1G73QH0-YK0Q)
  memory: 8192MB SDRAM DDR3L-1600 ECC (Samsung M391B1G73QH0-YK0Q)
  memory: 8192MB SDRAM DDR3L-1600 ECC (Samsung M391B1G73QH0-YK0Q)
  memory: 8192MB SDRAM DDR3L-1600 ECC (Samsung M391B1G73QH0-YK0Q)
  disk:   SATA SSD 2.5" 120GB Samsung SM863
  case:   Supermicro CSE-510T-203B

Environment 2. Almost identical, but with this cpu:
  cpu:    Intel BWL X4 3400MHz LLC=6M+128M (Xeon E3-1285Lv4)

I upgraded from 5.4.48 to 5.8.x for various versions of x to finally reach x = 5. Now, the systems get stuck at what is displayed as 800 MHz in /proc/cpuinfo.

When booted to run Xen, the problem goes away. When booting the (Xen Dom0-capable) kernel without Xen, the clock gets stuck at 800 MHz at any load.

It's certainly not just a problem with /proc/cpuinfo's displayed frequency; the systems are really, really sluggish.

A similarly configured Skylake system does NOT exhibit the same problem. (The motherboard of that system is Supermicro X11SSM.) Similarly configured Sandy Bridge and Westmere systems also do not exhibit this problem.
Comment 1 Barnabás Pőcze 2020-08-29 15:00:07 UTC
Can you check /sys/devices/system/cpu/cpu*/cpufreq/* and /sys/devices/system/cpu/intel_pstate/* ?
Comment 2 Torbjörn Granlund 2020-08-29 15:12:56 UTC
Correction: Sandy bridge is also affected, but not as much.
Comment 3 Torbjörn Granlund 2020-08-29 15:19:14 UTC
Created attachment 292211 [details]
attachment-11362-0.txt

  Can you check /sys/devices/system/cpu/cpu*/cpufreq/* and
  /sys/devices/system/cpu/intel_pstate/* ?

Sure.

The result of
  head /sys/devices/system/cpu/cpu*/cpufreq/*
is in the attachment.
Comment 4 Torbjörn Granlund 2020-08-29 15:19:14 UTC
Created attachment 292213 [details]
foo
Comment 5 Torbjörn Granlund 2020-08-29 15:23:32 UTC
Comment 3 was truncated by bugzilla. After the attachment, this text occurred:

hwl ~ # head /sys/devices/system/cpu/intel_pstate/*
==> /sys/devices/system/cpu/intel_pstate/max_perf_pct <==
100

==> /sys/devices/system/cpu/intel_pstate/min_perf_pct <==
20

==> /sys/devices/system/cpu/intel_pstate/no_turbo <==
0

==> /sys/devices/system/cpu/intel_pstate/num_pstates <==
33

==> /sys/devices/system/cpu/intel_pstate/status <==
passive

==> /sys/devices/system/cpu/intel_pstate/turbo_pct <==
13
Comment 6 Doug Smythies 2020-08-29 17:25:13 UTC
intel_cpufreq CPU frequency driver (intel_pstate in passive mode), userspace governor with a setspeed of 0.8GHz. So, you are getting what you asked for.
Comment 7 Torbjörn Granlund 2020-08-29 18:07:42 UTC
bugzilla-daemon@bugzilla.kernel.org writes:

  intel_cpufreq CPU frequency driver (intel_pstate in passive mode), userspace
  governor with a setspeed of 0.8GHz. So, you are getting what you asked for.

I will gladly confess my ignorance about this.

But...I've maintained these systems for a long time, and gradually upgraded
the kernels (as well as the user space).  The CPU_FREQ_GOV_USERSPACE
setting was inherited from some old default, and indeed is used by many
older kernels.

Only 5.8.x suddenly make Haswell and Broadwell run at 800 MHz.  Skylake is
not affected by this at all.  With 5.4.x, the systems run at their nominal
frequencies.

If this is a desired change in kernel behaviour, then many people might
see their systems slow down radically.
Comment 8 Doug Smythies 2020-09-02 18:36:11 UTC
My best guess is that with kernel 5.4.48 you were running with the intel_pstate CPU frequency scaling driver in active mode, and that you are now running in passive mode. As of kernel 5.8-rc1 by default processors without HWP will default to passive mode. If you still have kernel 5.4.48, you could check.

Your Skylake system has HWP, and wouldn't default to passive mode.

However, you also mentioned Sandy Bridge and Westmere systems being unaffected.
Sandy Bridge doesn't have HWP, and Westmere might even pre-date the pstate driver, I don't know. Anyway, this seems inconsistent with my "best guess" above. Suggest you check them, with those same instructions from comment 1.

If you think, or conclude, that you used to run in active mode you can either force active mode or force the schedutil governor in passive mode via the grub command line to get out of your locked low frequency state.
Comment 9 Oleh Vinichenko 2020-09-05 14:33:18 UTC
I hit this issue as well. On Lenovo ThinkPad W520, with SandyBridge CPU, Intel(R) Core(TM) i7-2860QM. gradually updated through versions 5.8.* from older kernels, while keeping kernel configuration the same. If this new change to be expected as default, there must be documentation about this new behaviour and what is required to keep active state. What is required to change in kernel configuration, if needed. This caused massive compilation time regressions and shoudl not be cast on users without update steps.
Comment 10 Oleh Vinichenko 2020-09-05 14:54:26 UTC
downgrade to 5.4.62 restores what is supposed to be "normal" behaviour
Comment 11 Doug Smythies 2020-09-05 16:01:44 UTC
Created attachment 292363 [details]
example of needed information

Please, we need more information. Do you know for certain that you were defaulting to "active" mode before and now "passive" mode. What is your kernel configuration in the CPU frequency scaling area? What, if anything, is specified on your grub command line? What governor to you end up with?

attached is an example of needed information.
Comment 12 Oleh Vinichenko 2020-09-05 16:15:36 UTC
this was judged by indirect observation such as significant ( twice or thrice longer ) compilation time of packages or kernel itself. i have never changed any governors and not bothered with this until problem observed, by looking at /proc/cpuinfo, and frequency was always stuck at 800MHz no matter what load was, such as compilation of chromium. I can provide technical data as in example with kernel 5.4.62
when in idle, and with heavy load, same with kernel 5.8.6 too.
Comment 13 Oleh Vinichenko 2020-09-06 17:14:36 UTC
Created attachment 292375 [details]
test
Comment 14 Doug Smythies 2020-09-06 18:37:07 UTC
Thanks. I re-created your results on my no-hwp processor, by also setting this in my kernel configuration:

< CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
< # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
---
> # CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
> CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE=y

The related commits are:

a00ec3874e7d cpufreq: intel_pstate: Select schedutil as the default governor
33aa46f252c7 cpufreq: intel_pstate: Use passive mode by default without HWP

I had thought it would default to the schedutil governor, but didn't.
Comment 15 Zhang Rui 2021-03-21 14:58:26 UTC
Hi, Doug,
I think you started a thread about this in the mailing list, right?
do you have the latest update of this issue?
Comment 16 Doug Smythies 2021-03-21 15:48:04 UTC
Hi Rui,

Actually, e-mails about this bug report are on list, because it hasn't been assigned.

I my opinion, this one can be closed because

CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE=y

doesn't have a consistent meaning.
(And I argued this very point with Dirk Brandewie, the original author and maintainer of the intel_pstate CPU frequency scaling driver, around 2013)

Note You need to log in before you can comment on or make changes to this bug.