In HT or multi-core system, suppose one sibiling is hot removed and the other sibiling is still running. If the two cores/sibilings share C-state/P-state logic (it requires specific coordination), it's possible the online sibiling/core can't go to deeper C-state/P-state because its sibiling is offline.
For HW coordinated C-states: The CPU that was taken logically off-line must enter the deepest power saving C-state. Currently it executes the HALT instruction, which will prevent the entire package from going deeper than C1 -- even if the awake peer in the package requests it. Note, we don't have SW coordinated C-states on any systems today. The P-state issues will be tracked outside this bug report.
This problem manifests itself on my laptop (T7200 processor, dual 2 GHz core2) in the following way. When a cpu is hot unplugged the power usage goes up by 200 mA (14%) rather than go down. So paradoxically for maximum battery life it is best to keep both CPUs in the core2 package running.
Closing old stale bugs. If this is still a problem please re-open
re-opened
I can attest this happening with i7 920 processor With the watts-up-pro wattmeter. on Evga x58 SLI-758 board with C6 state enabled and HT disabled. Kernel tested - 2.6.30, 2.6.31, 2.6.32.3, With all 4 cores running Wattmeter reading median 163 W With 3 cores running Wattmeter reading median 168 W With 2 cores running Wattmeter reading median 173 W
For reference: There has been an unaccepted approach by Venki: http://lkml.indiana.edu/hypermail/linux/kernel/0905.2/02692.html Adding Venkatesh to CC list, should this one be assigned to you?
hello all, i finally got time around to patch and test this. patched 2.6.32-5 with the `unaccepted patch` and it seems to work. reduction of about 4-5 W per core shut for an i7 920. compared with C6 state enabled. So any reason for using/not using this patch? Thanks!
re-assign to Shaohua as Venki will not look at this any more.
Code currently in x86:tip/idle... assuming it doesn't fail in testing we'll push it to Linus in the next merge window.
Mark this bug report as resolved because patch is already available. we can close it when the patch is shipped in the upstream kernel.
Created attachment 39662 [details] patch vs 2.6.36.y This patch vs 2.6.36.y consolidates 3 patches upstream in 2.6.37-rc that address this problem.
This should not hold the patch off (took long enough until there finally showed up a solution), just two comments: It looks like this patch re-implements deep sleep state in smpboot.c? 1) Double sleeping Looks like it's double sleeping in case of mwait is available, shouldn't it be something like: /* Returns error code if mwait is not available */ static inline *int* mwait_play_dead(void) ... if (mwait_play_dead()) hlt_play_dead(); 2) This only works for mwait... Thinking of other possibilities for deep sleeping, namely IO triggered C-states, this would not work? Instead of double implementing deep sleep something like this would be a cleaner way?: if (cpuidle_driver_registered()) cpuidle_enter_deepest_sleep(); else hlt_play_dead(); This would: - need the two functions being implemented in the cpuidle framework - not work as long as cpuidle drivers did not get registered..., but it shouldn't be a big deal, nobody offlines CPUs at boot time
"Play dead" and "go to sleep" are not necessarily the same things -- in particular, there may be operations that are appropriate when going to sleep but not when offlining -- e.g. invoking ACPI routines -- and there may be operations which we can do when we're going to lose state anyway, i.e. entered an otherwise-disabled MWAIT state. That doesn't mean this couldn't be added to the cpuidle framework, but using the cpuidle framework as it currently sits would be unacceptably dangerous.
> there may be operations that are appropriate when going to sleep but not when > > offlining -- e.g. invoking ACPI routines Going to ACPI driven sleep states from cpuidle "should" not invoke ACPI routines, just read an IO address. But I see the point (and I have to double check). Any other requirements? cpuidle_enter_deepest_sleep() must of course not sleep/lock before going to idle. If I have a lot of time I try to clean this further up if it makes sense. Whatabout the double sleep?
No, it should be cpuidle_play_dead() if anything. There is no double sleep. The play_dead routines never return if successful.
Thanks.
My original plan was to add a play_dead to cpuidle drivers, but I think that peter's solution is simpler, and solves 90% of the problem, which is to say, all the x86 hardware that has shipped in the last few years supports native MWAIT.
Created attachment 39772 [details] patch vs 2.6.36.y replace previous 2.6.36.y patch with the correct version. (this one actually compiles... and has been verified with a power meter on 2.6.36.2).
Created attachment 39782 [details] patch vs 2.6.35.y tested on 2.6.35.9
Created attachment 39792 [details] patch vs 2.6.32.y tested on 2.6.32.27