Bug 25252 - Offline cpu prevents sibling core to go to deep sleep state
Summary: Offline cpu prevents sibling core to go to deep sleep state
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-19 22:22 UTC by Ozlem Bilgir
Modified: 2011-03-04 23:35 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.37
Subsystem:
Regression: No
Bisected commit-id:


Attachments
c-state percentages (1.50 KB, text/plain)
2010-12-19 22:24 UTC, Ozlem Bilgir
Details
use the attached tool to report the info related with CPUID_MWAIT_LEAF (894 bytes, patch)
2010-12-27 01:46 UTC, ykzhao
Details | Diff
acpidump output (166.61 KB, application/octet-stream)
2010-12-27 23:15 UTC, Ozlem Bilgir
Details
acpidump output with acpidump-20101221 (236.39 KB, application/octet-stream)
2010-12-28 03:55 UTC, Ozlem Bilgir
Details
try the debug patch that disables irq on offline CPU (800 bytes, patch)
2010-12-31 03:26 UTC, ykzhao
Details | Diff
Cpuinfo, powertop and turbostat outputs of X5650 (14.83 KB, application/zip)
2011-01-06 21:09 UTC, Ozlem Bilgir
Details
Cpuinfo, powertop and turbostat outputs of L5520 (15.56 KB, application/zip)
2011-01-06 21:10 UTC, Ozlem Bilgir
Details
patch vs 2.6.37 (3.91 KB, patch)
2011-01-19 02:06 UTC, Len Brown
Details | Diff
patch vs 2.6.37 (3.91 KB, patch)
2011-01-19 02:45 UTC, Len Brown
Details | Diff
patch vs 2.6.38-rc4 (3.13 KB, patch)
2011-02-16 16:55 UTC, Len Brown
Details | Diff

Description Ozlem Bilgir 2010-12-19 22:22:51 UTC
Although with the patch from  Bug 5471 offline cpu goes to deepest sleep state, offline cpu prevents the sibling cpu to go the deepest sleep state. Here is c-state duration percentages from turbostat ( I also add an .csv attachment in case they doesn't look neat here)

For a 2 socket, 8 physical/16 logical core processor, all cores are online.

pkg core CPU   %c0   GHz  TSC   %c1    %c3    %c6   %pc3   %pc6 
               0.09 1.91 2.27   0.15   1.72  98.04   9.93  75.00
   0   0   0   0.50 1.80 2.27   0.55   2.43  96.53   9.93  74.99
   0   0   8   0.24 1.83 2.27   0.81   2.43  96.53   9.93  74.99
   0   1   1   0.06 1.97 2.27   0.20   0.00  99.74   9.93  74.99
   0   1   9   0.18 2.12 2.27   0.08   0.00  99.74   9.93  74.99
   0   2   2   0.03 2.04 2.27   0.04   0.00  99.93   9.93  74.98
   0   2  10   0.04 1.99 2.27   0.03   0.00  99.93   9.93  74.98
   0   3   3   0.02 2.08 2.27   0.01   0.00  99.97   9.93  74.98
   0   3  11   0.01 1.96 2.27   0.02   0.00  99.97   9.93  74.98
   1   0   4   0.17 2.13 2.27   0.10  11.37  88.36   9.93  75.02
   1   0  12   0.05 2.04 2.27   0.22  11.37  88.36   9.93  75.02
   1   1   5   0.04 1.72 2.27   0.08   0.00  99.88   9.93  75.02
   1   1  13   0.03 1.62 2.27   0.09   0.00  99.88   9.93  75.02
   1   2   6   0.02 1.82 2.27   0.05   0.00  99.93   9.93  75.02
   1   2  14   0.03 1.76 2.27   0.04   0.00  99.93   9.93  75.02
   1   3   7   0.01 1.84 2.27   0.02   0.00  99.97   9.93  75.02
   1   3  15   0.00 1.74 2.27   0.02   0.00  99.98   9.93  75.02


After cpu15 is offlined, its sibling (cpu7) doesn't go to c-6

pkg core CPU   %c0   GHz  TSC   %c1    %c3    %c6   %pc3   %pc6 
               0.21 1.90 2.27   1.22   6.74  91.83  75.48   0.00
   0   0   0   0.93 1.75 2.27   2.19   0.60  96.29  75.48   0.00
   0   0   8   0.30 1.79 2.27   2.81   0.60  96.29  75.48   0.00
   0   1   1   0.16 2.05 2.27   0.53   0.00  99.31  75.48   0.00
   0   1   9   0.50 2.19 2.27   0.18   0.00  99.31  75.48   0.00
   0   2   2   0.06 2.00 2.27   1.17   0.00  98.77  75.48   0.00
   0   2  10   0.06 1.95 2.27   1.17   0.00  98.77  75.48   0.00
   0   3   3   0.03 2.08 2.27   2.07   0.00  97.90  75.48   0.00
   0   3  11   0.04 2.08 2.27   2.07   0.00  97.90  75.48   0.00
   1   0   4   0.47 1.87 2.27   2.43   0.00  97.10  75.48   0.00
   1   0  12   0.14 1.87 2.27   2.76   0.00  97.10  75.48   0.00
   1   1   5   0.17 1.97 2.27   0.36   0.00  99.47  75.48   0.00
   1   1  13   0.14 1.86 2.27   0.40   0.00  99.47  75.48   0.00
   1   2   6   0.05 1.91 2.27   0.07   0.00  99.88  75.48   0.00
   1   2  14   0.03 1.80 2.27   0.09   0.00  99.88  75.48   0.00
   1   3   7   0.01 1.93 2.27   0.01  99.98   0.00  75.48   0.00
Comment 1 Ozlem Bilgir 2010-12-19 22:24:35 UTC
Created attachment 40942 [details]
c-state percentages
Comment 2 ykzhao 2010-12-27 01:46:07 UTC
Created attachment 41652 [details]
use the attached tool to report the info related with CPUID_MWAIT_LEAF

Will you please use the attached tool to report the info related with the CPUID_MWAIT_LEAF?

Will you please attach the output of acpidump on this machine? Please use the latest acpidump tool which can be downloaded from:
http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/

Thanks.
Comment 3 Ozlem Bilgir 2010-12-27 23:15:45 UTC
Created attachment 41752 [details]
acpidump output
Comment 4 Ozlem Bilgir 2010-12-27 23:18:48 UTC
I attached acpidump output (https://bugzilla.kernel.org/attachment.cgi?id=41752)

Here is info related to  CPUID_MWAIT_LEAF

edx is 1120
	CPU C1 is supported
	CPU C2 is supported
	CPU C3 is supported
Comment 5 ykzhao 2010-12-28 01:00:28 UTC
Hi, Ozlem
    Thanks for the update.
    Will you please try to use the latest dump tool and attach the output of acpidump again?(20101221) Sorry that the file in comment #3 doesn't contain the info related with CPU C-state table.

    From the turbostat info it seems that the offline CPU doesn't enter the deep C-state, which prevents the sibling CPU from entering the deep C-state.(Now it seems that it enters the CPU C3 instead of CPU C6).

Thanks.
    Yakui
Comment 6 Ozlem Bilgir 2010-12-28 03:53:34 UTC
Hi Yakui,

I attached the output of new acpidump when all cores are online.

I also get this warning when I ran acpidump "Wrong checksum for generic table!".
I don't know whether this is relevent , but just wanted to let you know.

Another thing I noticed from the turbostat output I wrote in my first comment is that when some cpus go offline from one package, *the other package* stops uses c6 state as well. Which doesn't make sense even if the offline cpu goes to c3 instead of c6. 

Thanks,
Ozlem
Comment 7 Ozlem Bilgir 2010-12-28 03:55:04 UTC
Created attachment 41772 [details]
acpidump output with acpidump-20101221
Comment 8 ykzhao 2010-12-31 03:26:13 UTC
Created attachment 41982 [details]
try the debug patch that disables irq on offline CPU

Will you please try the attached debug patch and see whether the issue can be fixed?

Thanks.
    Yakui
Comment 9 Ozlem Bilgir 2011-01-02 18:41:49 UTC
Hi Yakui,

I tried the patch, but I got the similar turbostat values.

Thanks,
Ozlem
Comment 10 ykzhao 2011-01-06 08:54:51 UTC
Hi, Ozlem
    Thanks for your test. Sorry for the late response.
    I do some tests on another two machines. But I get the different test result with that on your machine. In my test one is a notebook based on Ironlake and another is a server machine based on Westmere. The following is my test result.
    a. The notebook based on Ironlake
       After one CPU is offlined, the sibling CPU can't enter the deep C-state. But after the patch in comment #8, it can work well as expected.
    b. The server machine based on Westmere
       It can work well after one CPU is offlined.  Even when more CPUs are offlined, it still can work.

    Will you please attach the output of "cat /proc/cpuinfo" on your machine?
    
    Will you please also attach the output on your machine after running the following command?
    >sleep 20; powertop -d >powertop_dump

Thanks.
    Yakui
Comment 11 Ozlem Bilgir 2011-01-06 21:08:18 UTC
Hi Yakui,

Thanks for your response. I was experimenting on Xeon L5520. Now, I also experiment  on Westmere (X5650). Kernel version 2.6.37. I tried your patch on both machines. I attached cpuinfo, powertop & turbostat outputs of the 2 machines with and without your patch for when all cores are online and when 1 core is offlined cases.

Here is the summary of what I observed;

1) Powertop vs turbostat: As you know, powertop reads time&usage of c-states from the counters which kernel is collecting. Turbostat reads info from MSR. Since turbostat is getting the info from hardware, I feel like it is more realistic. (I'd appreciate if you comment on this). 

2)About X5650

  -Without the patch; 
    -Under very light load, when all cores are online, c6 usage of all cores are almost 100% as can be seen from turbostat. Package c6 usage was around ~85% for both packages. (You can see these from the turbostat_dump.xls attachment) .Power consumption of the machine was ~55W. After core 23 is offlined, c6 usage of sibling core (core 11 )decreased to 0% ( *5-10 min after* I disabled core 23). Package c6 usage is decreased to 0% for both packages.  Power consumption increased to ~70W. although powertop doesn't show decrease in c6 usage(c3 in powertop notation), increase in power consumption tells the opposite. If both core 23 & 11 were using c6 after core 23 is offlined as expected, I would expect power consumption to be similar to having all cores online. This is because
all cores were in deep sleep state most of the time and disabling one logical core shouldn't have changed anything.
This also makes me believe turbostat output more than powertop.

  -With the patch;
   -similar results to without patch case.

3)About L5520,
  -Without the patch; 
    -Similar to X5650. c6 usage of sibling core decreased to 0%, package c6 usage decreased to 0% after one core is offlined. Power consumption increased from ~68W to ~79W.
  -With patch;
    -similar results to without patch case.

Ozlem
Comment 12 Ozlem Bilgir 2011-01-06 21:09:36 UTC
Created attachment 42612 [details]
Cpuinfo, powertop and turbostat outputs of X5650
Comment 13 Ozlem Bilgir 2011-01-06 21:10:00 UTC
Created attachment 42622 [details]
Cpuinfo, powertop and turbostat outputs of L5520
Comment 14 ykzhao 2011-01-07 05:58:06 UTC
Hi, Ozlem
    Thanks for your info.
    From the info in comment #12/#13 it seems that the number of wakeup per second is higher than that in my test.In your test the number of wakeup per second is about 60. At the same time the number of interrupt per second is also higher than that in my test.
    
    Then I change the test method and increase the average number of wakeup per second. It is very lucky that now the issue can be reproduced after one CPU is offlined. Next step I will try to find one proper solution.

Thanks.
Comment 15 Len Brown 2011-01-10 22:16:49 UTC
I've reproduced this issue.
The problem is that the offline CPU is spinning
on repeated failures trying to get into a deep C-state.
Comment 16 Ozlem Bilgir 2011-01-11 19:19:35 UTC
Hi Len,

Thanks for the update. So,as you said, since the disabled cpu fails to get into deepest c-state,
its sibling doesn't go to deepest c-state. There is another thing I noticed and didn't understand. It's about package c-states. What I'd expect from package c states is for example  if all cpu's in one socket are in deepest c-state, the package will also go to deepest state by flushing the shared cache.. etc. Moreover, the state of one package (socket ) shouldn't be affected by the other package. However, as you can see from the turbostat output I attached to my first comment, when one core is one socket is disabled, the other socket's c-6 usage drops to 0 as well.


In order to be sure, I did another experiment without disabling cores and got similar result. When I set affinity of the processes only to the cores in one socket, I can see the c-6 state usages of the cores in the other socket around 99% as expected (please see below). However, package c-6 usage is around 3% in both packages. So,
I think there is another issue there as well.  What do you think? should I report it on a separate bug report?

pkg core CPU   %c0   GHz  TSC   %c1    %c3    %c6   %pc3   %pc6 
              15.02 2.26 2.27   3.91  10.96  70.12   9.62   2.38
   0   0   0  69.96 2.26 2.27  12.18   3.02  14.83   9.61   2.37
   0   0   8  64.17 2.26 2.27  17.98   3.02  14.83   9.61   2.37
   0   1   1  56.47 2.27 2.27   8.61   2.67  32.26   9.61   2.37
   0   1   9  48.22 2.27 2.27  16.85   2.67  32.26   9.61   2.37
   0   2   2   0.01 2.12 2.27   0.05  28.15  71.80   9.61   2.37
   0   2  10   0.01 2.13 2.27   0.05  28.15  71.80   9.61   2.37
   0   3   3   0.01 2.09 2.27   0.04  53.83  46.12   9.61   2.37
   0   3  11   0.01 2.09 2.27   0.04  53.83  46.12   9.61   2.37
   1   0   4   0.01 1.62 2.27   0.04   0.00  99.95   9.64   2.38
   1   0  12   0.02 1.88 2.27   0.03   0.00  99.95   9.64   2.38
   1   1   5   0.09 2.08 2.27   0.07   0.00  99.84   9.64   2.38
   1   1  13   0.01 1.60 2.27   0.15   0.00  99.84   9.64   2.38
   1   2   6   0.00 1.59 2.27   0.01   0.00  99.99   9.64   2.38
   1   2  14   0.00 1.62 2.27   0.01   0.00  99.99   9.64   2.38
   1   3   7   0.34 1.60 2.27   3.51   0.00  96.15   9.64   2.38
   1   3  15   1.00 1.60 2.27   2.85   0.00  96.15   9.64   2.38
Comment 17 Len Brown 2011-01-18 00:57:45 UTC
re: package affinity of package c-states in comment #16

pc6 is effectively a system-wide C-state
in WSM hardware because the memory controllers
are integrated (and shared) into (by) the packages.

ie. what you see is simply how the hardware works.
If you disabled c6 (and thus pc6), you'll see that
pc3 shows a little more package independence than
pc6.  however, pc6 support is a net win.
Comment 18 Len Brown 2011-01-19 02:06:52 UTC
Created attachment 44122 [details]
patch vs 2.6.37

please try this patch vs 2.6.37
you can observe it was applied with "dmesg | grep idle"
and you can disable its effect by booting with "intel_idle.auto_demote=1"
Comment 19 Len Brown 2011-01-19 02:45:25 UTC
Created attachment 44132 [details]
patch vs 2.6.37

fix syntax error in previous patch...
Comment 20 Len Brown 2011-02-16 16:55:53 UTC
Created attachment 47962 [details]
patch vs 2.6.38-rc4

simplified patch, NHM/WSM only, independent of NO_HZ, no cmdline override.
candidate for upstream.
Comment 21 Florian Mickler 2011-03-04 23:35:04 UTC
merged for 2.6.38-rc8 (or final .38)

commit 14796fca2bd22acc73dd0887248d003b0f441d08
Author: Len Brown <len.brown@intel.com>
Date:   Tue Jan 18 20:48:27 2011 -0500

    intel_idle: disable NHM/WSM HW C-state auto-demotion

Note You need to log in before you can comment on or make changes to this bug.