Bug 121051

Summary: Offline CPU's stick at maximum clock frequency after resume from suspend
Product: Platform Specific/Hardware Reporter: Doug Smythies (dsmythies)
Component: x86-64Assignee: Chen Yu (yu.c.chen)
Status: RESOLVED DOCUMENTED    
Severity: normal CC: lenb, rjw, rui.zhang, srinivas.pandruvada, yu.c.chen
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.7-rc4 Subsystem:
Regression: No Bisected commit-id:

Description Doug Smythies 2016-06-27 14:52:00 UTC
If, for whatever reason, some CPUs are offline during a suspend resume cycle, they end up just spinning and consuming a lot of power after the resume. This issue has existed for at least a couple of few years now. It is as though they were forgotten about and nothing told them to be off-line during the resume.

This bug report replaces bug 80651, because the Original Poster keeps setting that one to closed, and we don't want to lose track of the issue. See that bug report for some of the history.

Example:

1.) Before (edited for readability):
$ sudo turbostat -S --debug sleep 10
 Avg_MHz   Busy% Bzy_MHz TSC_MHz CPU%c6 PkgWatt
       1    0.08    1615    3411 99.66  3.81

2.) Take some cores off-line (this is an i7-2700K):
# echo -n 0 > /sys/devices/system/cpu/cpu1/online
# echo -n 0 > /sys/devices/system/cpu/cpu2/online
# echo -n 0 > /sys/devices/system/cpu/cpu3/online
# echo -n 0 > /sys/devices/system/cpu/cpu5/online
# echo -n 0 > /sys/devices/system/cpu/cpu6/online
# echo -n 0 > /sys/devices/system/cpu/cpu7/online
# cat /sys/devices/system/cpu/cpu*/online
0
0
0
1
0
0
0

3.) Do a suspend / resume cycle:
# echo mem > /sys/power/state

4.) Check now (edited for readability):
$ sudo turbostat -S --debug sleep 10
 Avg_MHz   Busy% Bzy_MHz TSC_MHz CPU%c6 PkgWatt
      23    0.67    3403    3411 98.61  35.97

5.) Bring the CPUs back online:
# echo mem > /sys/power/state
# echo -n 1 > /sys/devices/system/cpu/cpu1/online
# echo -n 1 > /sys/devices/system/cpu/cpu2/online
# echo -n 1 > /sys/devices/system/cpu/cpu3/online
# echo -n 1 > /sys/devices/system/cpu/cpu5/online
# echo -n 1 > /sys/devices/system/cpu/cpu6/online
# echo -n 1 > /sys/devices/system/cpu/cpu7/online
# cat /sys/devices/system/cpu/cpu*/online
1
1
1
1
1
1
1

6.) Check now (edited for readability):
 Avg_MHz   Busy% Bzy_MHz TSC_MHz CPU%c6 PkgWatt
       0    0.03    1651    3410 99.86  4.03
Comment 1 Doug Smythies 2016-06-29 22:46:54 UTC
By the way, this same issue exists if I use the acpi-cpufreq CPU frequency scaling driver instead of the intel_pstate CPU frequency scaling driver. So while I copied the component settings from the other bug report, this actually doesn't appear to be intel_pstate specific.

@Chen Yu: From your comment 38 in the old bug report (https://bugzilla.kernel.org/show_bug.cgi?id=80651#c38), am I to assume the problem does not exist on your computer? i.e. is the issue perhaps hardware and/or distro specific?
Comment 2 Chen Yu 2016-07-29 03:04:58 UTC
(In reply to Doug Smythies from comment #1)
> By the way, this same issue exists if I use the acpi-cpufreq CPU frequency
> scaling driver instead of the intel_pstate CPU frequency scaling driver. So
> while I copied the component settings from the other bug report, this
> actually doesn't appear to be intel_pstate specific.
> 
> @Chen Yu: From your comment 38 in the old bug report
> (https://bugzilla.kernel.org/show_bug.cgi?id=80651#c38), am I to assume the
> problem does not exist on your computer? i.e. is the issue perhaps hardware
> and/or distro specific?

@Doug: Previously I didn't notice the problem on my laptop, and this is interesting and is this problem still reproducible? I think we can leverage intel_pstate trace both before/after suspend to figure it out whether it is a hw issue or a software bug?
Comment 3 Doug Smythies 2016-07-29 06:19:02 UTC
(In reply to Chen Yu from comment #2)
> @Doug: Previously I didn't notice the problem on my laptop, and this is
> interesting and is this problem still reproducible? I think we can leverage
> intel_pstate trace both before/after suspend to figure it out whether it is
> a hw issue or a software bug?

@Chen Yu. Yes, the problem is still reproducible with Kernel 4.7.

Myself, I don't think we can learn anything new from intel_pstate trace data. However, I did a before and after suspend trace with 6 of 8 CPUs offline (3 of 4 cores). The trace data was consistent with what we already know, after the suspend the CPUs that are supposed to be offline are spinning away. I assume their vote into the PLL is what is holding the CPU frequency high, although the high frequency also messes up the intel_pstate driver math for the 2 CPUs that are still online.

PState being given:
$ sudo rdmsr --bitfield 15:8 -d -a 0x198
35
35

PState being asked for:
$ sudo rdmsr --bitfield 15:8 -d -a 0x199
38
16

For reference (excerpt from turbostat):
cpu4: MSR_TURBO_RATIO_LIMIT: 0x23242526
35 * 100 = 3500 MHz max turbo 4 active cores
36 * 100 = 3600 MHz max turbo 3 active cores
37 * 100 = 3700 MHz max turbo 2 active cores
38 * 100 = 3800 MHz max turbo 1 active cores

i.e. the processor thinks all 4 cores are online, but the kernel thinks only 1 core is online.
Comment 4 Doug Smythies 2017-01-05 01:13:14 UTC
This issue is still reproducible with Kernel 4.10-rc2.
The status is "NEEDINFO" but I don't know what info further is required.
Comment 5 Doug Smythies 2017-03-13 18:48:15 UTC
This issue is still reproducible with Kernel 4.11-rc2.
Comment 6 Len Brown 2017-04-04 00:13:28 UTC
> # cat /sys/devices/system/cpu/cpu*/online
> 0
> 0
> 0
> 1
> 0
> 0
> 0

Is cpu0 offline in this experiment?
Why are there 7 rows above, instead of 8?
Comment 7 Len Brown 2017-04-04 00:26:39 UTC
(ignore comment #6, didn't realize that cpu0 had no /online file)
Comment 8 Len Brown 2017-04-04 04:56:05 UTC
I see this issue also, on my Skylake desktop, running upstream 4.11-rc5.

to repeat:
offline some processors, suspend, resume, and with turbostat, observe that:

1. there is no longer any package C-state residency
2. PkgWatt of the active idle system is over 6W, compared to 1W before suspend
3. active idle Bzy_Mhz of the available CPU(s) is over 3GHz,
   as compared to 800 Mhz before the suspend.
4. as reported above, the max turbo frequency is impacted.
    eg. on my machine the max turbo for 4,3,2,1 cores is 3600,3700,3800,3900

    so a spinloop with 1 cpu online before the suspend runs at 3900,
    but after the resume, it runs at 3600.

This looks like a BIOS bug.
Upon resume from S3, the BIOS should put offline processors in c6,
just like it (correctly) does on boot.

For Linux to work around this bug, it will have to get them
out of the BIOS with a Linux online, and then re-offline them
to get back into the user-requested configuration.
Unclear if we'd have to do that always on all systems, or
if there is a way to detect when such a workaround is needed.

That said...
In testing this issue I noticed that Ubuntu 15.10 ships with a systemd
configuration that uses cpusets.  cpusets are broken by manual ofline/online,
and so if you offline/online as above, you'll find that no tasks
will run on the newly online'd processors.
Comment 9 Doug Smythies 2017-04-04 06:47:53 UTC
(In reply to Len Brown from comment #8)

> That said...
> In testing this issue I noticed that Ubuntu 15.10

Ubuntu 15.10 is past End of Life.

> ships with a systemd
> configuration that uses cpusets.  cpusets are broken by manual ofline/online,
> and so if you offline/online as above, you'll find that no tasks
> will run on the newly online'd processors.

That is not consistent with my findings. Once I online the CPUs, and as far as I have ever been able to determine, everything is normal, including scheduling.
(I use Ubuntu 16.04 LTS on my test server.)
Comment 10 Doug Smythies 2017-07-16 21:48:15 UTC
This issue is still reproducible with Kernel 4.13-rc1.
Comment 11 Chen Yu 2018-01-24 07:59:36 UTC
Reproduced on my HP KBL platform using 4.14 kernel. The HWP is enabled on this platform.
For comparison, I've tested pm_test set to core, and this issue does not appear. That is to say, the BIOS might have done something strange across S3.
I noticed the following description regarding IA32_HWP_REQUEST Register in SDM:

Maximum_Performance (bits 15:8, RW) —
Excursions above the limit requested by OS are possible due to hardware coordination between the processor cores and other components in the package.

although I don't have clue what does the coordination mean here.

@Doug, I think you are using non-HWP mode, right?
Comment 12 Doug Smythies 2018-01-24 14:57:57 UTC
(In reply to Chen Yu from comment #11)

> 
> @Doug, I think you are using non-HWP mode, right?

Correct. My older i7-2600K processor does not have HWP.
Comment 13 Srinivas Pandruvada 2018-01-25 00:44:08 UTC
Chen Yu:
- First check HWP_REQUEST_MSR values on all the cpus.
-
cat /sys/class/drm/card0/gt*

It is possible that graphics is forcing higher frequencies.
Comment 14 Doug Smythies 2018-01-25 07:21:39 UTC
(In reply to Srinivas Pandruvada from comment #13)
> Chen Yu:
> - First check HWP_REQUEST_MSR values on all the cpus.

How? The kernel thinks the CPUs are offline (but actually they are not) and so we can not inquire as to the state of any MSRs, at least I have not been able to figure out how.

> -
> cat /sys/class/drm/card0/gt*
> 
> It is possible that graphics is forcing higher frequencies.

at least in my case, it isn't. I get the exact same numbers before and after creating this problem:

doug@s15:~/temp-k-git/linux$ grep . /sys/class/drm/card0/gt*
/sys/class/drm/card0/gt_act_freq_mhz:850
/sys/class/drm/card0/gt_boost_freq_mhz:1650
/sys/class/drm/card0/gt_cur_freq_mhz:850
/sys/class/drm/card0/gt_max_freq_mhz:1350
/sys/class/drm/card0/gt_min_freq_mhz:850
/sys/class/drm/card0/gt_RP0_freq_mhz:1350
/sys/class/drm/card0/gt_RP1_freq_mhz:850
/sys/class/drm/card0/gt_RPn_freq_mhz:850

Kernel = 4.15-rc9.
Comment 15 Chen Yu 2018-01-25 07:25:57 UTC
Actually there's a problem when dealing with hwp after resumed, I'll test with fix patch applied and check the drm freq. @Doug I think we encountered different issues.
Comment 16 Chen Yu 2018-01-25 11:12:07 UTC
(In reply to Chen Yu from comment #15)
> Actually there's a problem when dealing with hwp after resumed, I'll test
> with fix patch applied and check the drm freq. @Doug I think we encountered
> different issues.

Sent to https://patchwork.kernel.org/patch/10183901/
Humm, with this patch applied, the high freq issue disappeared - because the HWP is working well after resumed...
Comment 17 Doug Smythies 2018-01-25 15:45:37 UTC
(In reply to Chen Yu from comment #15)
> @Doug I think we encountered different issues.

Yes, I agree. See also my comment 1 above, where the issue is also present when using the acpi-cpufreq driver.
Comment 18 Chen Yu 2018-03-26 02:20:05 UTC
(In reply to Doug Smythies from comment #17)
> (In reply to Chen Yu from comment #15)
> > @Doug I think we encountered different issues.
> 
> Yes, I agree. See also my comment 1 above, where the issue is also present
> when using the acpi-cpufreq driver.

So I come back now.
Per Len's information, there's no pc6% after resumed. Since this is more likely a BIOS issue, how about the following "solution":

1. Try to detect this symptom in turbostat.
2. When all the following conditions are met, print the warning and request the 
   user to do a manual online->offline sequence for the CPUs:
   2.1 Periodically record the Busy% and pc6 residency when all the CPUs are online &&
   2.2 Later once some CPUs are offline,  compare the pc6 residency if the cpu 
       utilization Busy% is lower than the threshold, say, 5%, with the
       ones recorded in step 1.
   2.3 If the pc6 with cpus offline is much higher than the ones with all cpus
        online, then this symptom is detected.
Comment 19 Chen Yu 2018-03-26 07:40:15 UTC
So this issue seems to be platform specific, I'm wondering if this could be marked as a known issue/document, as I could not find a proper workaround in the kernel(not sure if the community would accept the solution to kick the offline cpus after resume) Or, does it make sense to add a hook in .suspend(), .resume(),
to compare the pc6 before/after resume, if there is too much drift, print a warning there that there might be something wrong with the system?
Comment 20 Doug Smythies 2018-03-26 14:23:31 UTC
(In reply to Chen Yu from comment #19)
> So this issue seems to be platform specific, I'm wondering if this could be
> marked as a known issue/document, as I could not find a proper workaround in
> the kernel(not sure if the community would accept the solution to kick the
> offline cpus after resume)

That would be O.K. with me. As mentioned in comment #1 I only made this bug report
because the original one got closed, and I thought the issue shouldn't be forgotten. Personally, I would never take CPUs off-line anyway.
Comment 21 Chen Yu 2018-03-27 00:34:02 UTC
Thanks.
Marked as Documented as we could not find a proper solution to fix the BIOS issue in the kernel.