Bug 5471

Summary: offline CPU burns more power than idle CPU
Product: ACPI Reporter: Shaohua (shaohua.li)
Component: Power-ProcessorAssignee: Shaohua (shaohua.li)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: blocking CC: a.p.zijlstra, aagaande, abhirana, acpi-bugzilla, alan, holger, hpa, lenb, mozilla_bugs, rui.zhang, trenn, venki
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: No Bisected commit-id:
Attachments: patch vs 2.6.36.y
patch vs 2.6.36.y
patch vs 2.6.35.y
patch vs 2.6.32.y

Description Shaohua 2005-10-19 22:51:59 UTC
In HT or multi-core system, suppose one sibiling is hot removed and the other 
sibiling is still running. If the two cores/sibilings share C-state/P-state 
logic (it requires specific coordination), it's possible the online 
sibiling/core can't go to deeper C-state/P-state because its sibiling is 
offline.
Comment 1 Len Brown 2005-10-27 18:22:32 UTC
For HW coordinated C-states:

The CPU that was taken logically off-line must
enter the deepest power saving C-state.  Currently
it executes the HALT instruction, which will prevent
the entire package from going deeper than C1 --
even if the awake peer in the package requests it.

Note, we don't have SW coordinated C-states on any systems today.
The P-state issues will be tracked outside this bug report.
Comment 2 Nick Craig-Wood 2007-04-10 09:19:21 UTC
This problem manifests itself on my laptop (T7200 processor, dual 2 GHz core2)
in the following way.  When a cpu is hot unplugged the power usage goes up by
200 mA (14%) rather than go down.  So paradoxically for maximum battery life it
is best to keep both CPUs in the core2 package running.
Comment 3 Alan 2009-03-24 07:56:45 UTC
Closing old stale bugs. If this is still a problem please re-open
Comment 4 Len Brown 2009-03-24 14:03:45 UTC
re-opened
Comment 5 Abhishek J 2010-01-21 03:01:37 UTC
I can attest this happening with i7 920 processor 

With the watts-up-pro wattmeter. on Evga x58 SLI-758 board with C6 state enabled and HT disabled.

Kernel tested - 2.6.30, 2.6.31, 2.6.32.3,

With all 4 cores running Wattmeter reading median 163 W
With     3 cores running Wattmeter reading median 168 W
With     2 cores running Wattmeter reading median 173 W
Comment 6 Thomas Renninger 2010-01-21 12:58:24 UTC
For reference:
There has been an unaccepted approach by Venki:
http://lkml.indiana.edu/hypermail/linux/kernel/0905.2/02692.html
Adding Venkatesh to CC list, should this one be assigned to you?
Comment 7 Abhishek J 2010-02-17 21:44:22 UTC
hello all, i finally got time around to patch and test this.

patched 2.6.32-5 with the `unaccepted patch` and it seems to work.

reduction of about 4-5 W per core shut for an i7 920. compared with C6 state enabled.

So any reason for using/not using this patch?

Thanks!
Comment 8 Zhang Rui 2010-03-12 03:35:40 UTC
re-assign to Shaohua as Venki will not look at this any more.
Comment 9 H. Peter Anvin 2010-09-20 22:52:58 UTC
Code currently in x86:tip/idle... assuming it doesn't fail in testing we'll push it to Linus in the next merge window.
Comment 10 Zhang Rui 2010-09-27 00:22:41 UTC
Mark this bug report as resolved because patch is already available.
we can close it when the patch is shipped in the upstream kernel.
Comment 11 Len Brown 2010-12-09 22:30:17 UTC
Created attachment 39662 [details]
patch vs 2.6.36.y

This patch vs 2.6.36.y consolidates 3 patches upstream in 2.6.37-rc
that address this problem.
Comment 12 Thomas Renninger 2010-12-09 23:08:03 UTC
This should not hold the patch off (took long enough until there finally showed up a solution), just two comments:

It looks like this patch re-implements deep sleep state in smpboot.c?

1) Double sleeping
Looks like it's double sleeping in case of mwait is available, shouldn't it be something like:
/* Returns error code if mwait is not available */
static inline *int* mwait_play_dead(void)
...
if (mwait_play_dead())
	hlt_play_dead();

2) This only works for mwait... 
Thinking of other possibilities for deep sleeping, namely IO triggered C-states, this would not work?
Instead of double implementing deep sleep something like this would be a cleaner way?:
if (cpuidle_driver_registered())
        cpuidle_enter_deepest_sleep();
else
	hlt_play_dead();

This would:
   - need the two functions being implemented in the cpuidle framework
   - not work as long as cpuidle drivers did not get registered..., but it
     shouldn't be a big deal, nobody offlines CPUs at boot time
Comment 13 H. Peter Anvin 2010-12-09 23:16:47 UTC
"Play dead" and "go to sleep" are not necessarily the same things -- in particular, there may be operations that are appropriate when going to sleep but not when offlining -- e.g. invoking ACPI routines -- and there may be operations which we can do when we're going to lose state anyway, i.e. entered an otherwise-disabled MWAIT state.

That doesn't mean this couldn't be added to the cpuidle framework, but using the cpuidle framework as it currently sits would be unacceptably dangerous.
Comment 14 Thomas Renninger 2010-12-09 23:30:31 UTC
> there may be operations that are appropriate when going to sleep but not when
> > offlining -- e.g. invoking ACPI routines
Going to ACPI driven sleep states from cpuidle "should" not invoke ACPI routines, just read an IO address. But I see the point (and I have to double check). Any other requirements?
cpuidle_enter_deepest_sleep() must of course not sleep/lock before going to idle.
If I have a lot of time I try to clean this further up if it makes sense.

Whatabout the double sleep?
Comment 15 H. Peter Anvin 2010-12-09 23:33:14 UTC
No, it should be cpuidle_play_dead() if anything.

There is no double sleep.   The play_dead routines never return if successful.
Comment 16 Thomas Renninger 2010-12-10 00:10:59 UTC
Thanks.
Comment 17 Len Brown 2010-12-11 04:31:00 UTC
My original plan was to add a play_dead to cpuidle drivers,
but I think that peter's solution is simpler, and solves
90% of the problem, which is to say, all the x86 hardware that
has shipped in the last few years supports native MWAIT.
Comment 18 Len Brown 2010-12-11 04:42:23 UTC
Created attachment 39772 [details]
patch vs 2.6.36.y

replace previous 2.6.36.y patch with the correct version.
(this one actually compiles... and has been verified
 with a power meter on 2.6.36.2).
Comment 19 Len Brown 2010-12-11 04:44:30 UTC
Created attachment 39782 [details]
patch vs 2.6.35.y

tested on 2.6.35.9
Comment 20 Len Brown 2010-12-11 04:50:18 UTC
Created attachment 39792 [details]
patch vs 2.6.32.y

tested on 2.6.32.27