Bug 17291 - acpi_idle sees only c1 - gigabyte GA-P55A-UD6 (Core-i7-875)
Summary: acpi_idle sees only c1 - gigabyte GA-P55A-UD6 (Core-i7-875)
Status: REJECTED DOCUMENTED
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_bios
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-29 11:58 UTC by James Le Cuirot
Modified: 2010-11-30 05:44 UTC (History)
3 users (show)

See Also:
Kernel Version: next-20100827
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg when using acpi_idle (41.10 KB, text/plain)
2010-10-24 22:49 UTC, James Le Cuirot
Details
dmesg when using intel_idle (40.94 KB, text/plain)
2010-10-24 22:50 UTC, James Le Cuirot
Details
turbostat output when using acpi_idle (3.48 KB, text/plain)
2010-10-24 22:52 UTC, James Le Cuirot
Details
turbostat output when using intel_idle (12.37 KB, text/plain)
2010-10-24 22:53 UTC, James Le Cuirot
Details
2.6.36 kernel config (70.28 KB, text/plain)
2010-10-30 23:59 UTC, James Le Cuirot
Details

Description James Le Cuirot 2010-08-29 11:58:48 UTC
I have an i7-875K and with intel_idle enabled, the readings from coretemp are very erratic. The reading from all four cores will frequently drop very low, sometimes even as low as 21°C, before returning to what I suppose is normal at around 30-40°C. Usually they will all drop to the same temperature but not always. For example, I might get 25/25/25/26. I would expect the temperature to drop when in a high C-State but given that 21°C is more or less room temperature and the way in which the readings suddenly rise and fall by as much as 10°C, I think that this is not working properly. My very uneducated guess would be that when a core is in C3 and/or C6, it cannot get a reading from the sensor and so it gets 0 instead.

I did a bisect and found that the problem appeared as soon as intel_idle was introduced. I can confirm that the problem is still present as of linux-next-20100827.

I'm CC'ing Len Brown as he is the author of intel_idle.
Comment 1 Jean Delvare 2010-08-29 16:39:54 UTC
My own educated guess goes as follows:

* The temperature values returned from MSR by modern Intel CPUs are not accurate when far away from the CPU's maximum temperature. Values aren't even actual degrees Celsius, they are arbitrary counts down from the maximum temperature. I don't know the exact max temperature of your CPU model, but it's typically in the 85-100°C range, so the 30-40°C you get definitely qualify as far away from the CPU's maximum temperature.
* The goal of the intel_idle driver is to lower power consumption of the CPU. Lower power consumption means lower temperature, so it is no surprise that the coretemp driver reports lower values. The exact numbers are irrelevant, as you are out of the range into which the coretemp driver reports temperature values remotely resembling degrees Celsius.

Your assumption that "it cannot get a reading from the sensor and so it gets 0 instead" doesn't hold technical examination. The coretemp driver reports a value which is computed as (Tmax - register_value), where Tmax is a constant. If register value was 0 then the driver would return Tmax (like 85°C or 100°C), which isn't the case here.

So I don't see anything that needs fixing here, other than adding a proper information that the coretemp driver doesn't report accurate degrees Celsius but relative, arbitrary values.
Comment 2 James Le Cuirot 2010-08-29 21:14:23 UTC
The maximum reported temperature for this CPU is 99°C. I didn't think it was very accurate but I guess it's still a little less accurate than I expected. I would still like to hear from Len though, as he works for Intel, and is probably in the best position to comment on whether these erratic temperature changes reflect what is actually happening when power saving occurs - or whether a problem reading the sensor is more likely.
Comment 3 Len Brown 2010-09-30 22:04:44 UTC
James,
Please do an apples/apples comparison using
CONFIG_INTEL_IDLE=y and boot with and without
"intel_idle.max_cstate=0", which will disable
intel_idle and run acpi_idle in the same kernel binary.

Preferably use the latest upstream kernel,
or at least 2.6.35.stable
Comment 4 James Le Cuirot 2010-10-01 22:25:27 UTC
Hi Len. Just tried with 2.6.36-rc6 and it is definitely intel_idle.
Comment 5 Jean Delvare 2010-10-10 13:22:24 UTC
FWIW, I am not able to reproduce the problem reported here on my Intel Xeon E5520 (kernel 2.6.36-rc7). I have intel_idle enabled, and values reported by the coretemp driver are stable around 50°C, as they were before.

James, I fear that your reply in comment #4 won't make Len happy. He asked you for a specific test, you should preform that test and report the result, rather than a bold and unhelpful "it is definitely intel_idle".

When done with testing intel_idle.max_cstate=0, you could try values 1 and 2, for completeness. This will tell us whether a specific C-state is at fault, or the intel_idle driver as a whole.

I would also like to know whether your system works with the ACPI idle driver. Assuming you have CONFIG_ACPI_PROCESSOR enabled (and the "processor" module loaded if needed), the ACPI idle driver should run. At least it does on my system (Asus Z8NA-D6), and I observe an essentially similar behavior when using the ACPI idle driver or the dedicated Intel idle driver (but I guess it depends on the BIOS to some extent.)

So please check the contents of file /sys/devices/system/cpu/cpuidle/current_driver when booting with intel_idle.max_cstate=0, and also the contents of directory /sys/devices/system/cpu/cpu0/cpuidle with and without this option. If the problem only happens with the intel_idle driver and not with the ACPI idle driver, maybe a comparison of these sysfs attributes will provide a hint.
Comment 6 James Le Cuirot 2010-10-10 17:28:29 UTC
Sorry if I was brief but I did do as he asked.

I have now checked it further as you have suggested, using 2.6.36-rc7. With intel_idle.max_cstate=0, current_driver reports acpi_idle. With this option set to 1, 2 or 3 or not set at all (same as 3, I think?) then it reports intel_idle.

As for the behaviour, 1 didn't seem much different from 0. 2 had fairly low temps (29-35) but they were relatively stable. 3, as I said before, seems to be the same as no option at all with the erratic behaviour.

The /sys/devices/system/cpu/cpu0/cpuidle directory doesn't exist under acpi_idle. With the option set to 1, I get state0 and state1, then state2 with 2 and state3 with 3. The names of these are...

C0
NHM-C1
NHM-C3
NHM-C6

I check the temperatures with gkrellm but I have noticed the "sensors" program doesn't always reflect the same temperatures at the moment you run it. gkrellm only updates once a second but it seems to pick up the lowest temperatures much more often than sensors does. I want to stress that sensors does these pick up these tempatures sometimes - gkrellm isn't making them up.

But having seen how low the temperatures go without being erratic under option 2, I am now prepared to accept that this behaviour is a reflection of the fact that C6 is measurably cooler than C3 but only occurs a fraction of the time.
Comment 7 Len Brown 2010-10-16 08:05:55 UTC
> The /sys/devices/system/cpu/cpu0/cpuidle directory
> doesn't exist under acpi_idle

That is abnormal.  Possibly the failure here
is actually that ACPI C-states are not working...

In ACPI mode, please share the output from
'cat /proc/acpi/processor/*/power'

and also the complete dmesg

I'd like to see the difference between
'grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*'
with and without "intel_idle.max_cstate=0", but if
that information is missing in the ACPI configuration,
then we'll have to get it another way.

Please grab turbostat from here:
http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools-latest/turbostat/turbostat.c

and run '# turbostat -v' for a bit
and capture what is going on when you see the erratic temperature readings.
Do it again with "intel_idle.max_cstate=0" to see the same system
state in ACPI mode.
Comment 8 Len Brown 2010-10-24 03:13:08 UTC
James, can you try ACPI mode, per comment #7?
Comment 9 James Le Cuirot 2010-10-24 09:50:29 UTC
Yes, sorry, just been very busy. Will try ASAP. :)
Comment 10 James Le Cuirot 2010-10-24 22:49:02 UTC
Created attachment 34862 [details]
dmesg when using acpi_idle
Comment 11 James Le Cuirot 2010-10-24 22:50:50 UTC
Created attachment 34872 [details]
dmesg when using intel_idle
Comment 12 James Le Cuirot 2010-10-24 22:52:13 UTC
Created attachment 34882 [details]
turbostat output when using acpi_idle
Comment 13 James Le Cuirot 2010-10-24 22:53:12 UTC
Created attachment 34892 [details]
turbostat output when using intel_idle

The large variations in C6 usage is interesting. This may well be what I'm seeing.
Comment 14 Len Brown 2010-10-25 04:05:01 UTC
turbostat shows that intel_idle is succeeding in getting
the package into pc6, c0 residency is 0.10%,
and the frequency drops as low as 1.2GHz.

That all looks good.

You will also find with intel_idle that if you run
a single threaded application, the that thread may
be able to run as fast as 3.6GHz, which is also good.

turbostat shows that the acpi_idle case is not getting
any deeper than c1, c0 time is over 2.00%, and
the frequency is pegged at 3.2GHz.  That is not good.
The c1 limitation will prevent you from ever reaching
a frequency higher than 3.2GHz, and using only C1
is a waste of power.

While there may be lower temperatures seen with intel_idle,
and the reading may jump around, the larger bug here is
that when you use acpi_idle, it is not reaching the idle
power saving states that it should; and that will impact
both your energy savings and your maximum performance.

please attach the .config
please attach the output from acpidump.

please build with CONFIG_ACPI_DEBUG=y
and boot the acpi_idle kernel with acpi.debug_layer=0x20000000
and attach the dmesg

BTW. the output from powertop -d
on the acpi_idle case may also be useful
Comment 15 Len Brown 2010-10-25 04:09:35 UTC
oh, and another thing to try to get acpi_idle c-states
to appear is "processor.nocst=1"  (this uses the legacy
FADT C-states, which we shouldn't have to use, but apparently
something is wrong with getting the c-states via CST on this box)
Comment 16 Len Brown 2010-10-26 02:08:30 UTC
Oh, please also check if you are running an up-to-date BIOS
and that the SETUP defaults are in force, in particular
any settings related to power management.
Comment 17 James Le Cuirot 2010-10-30 23:59:04 UTC
Created attachment 35552 [details]
2.6.36 kernel config

Again, sorry for the delay, my house has been upside down this week. Here is the kernel config. I run Gentoo and configure my own kernel so it's a little minimalistic but I'm pretty sure everything that's needed for this is enabled. I'll try to get the rest to you tomorrow.
Comment 18 James Le Cuirot 2010-10-31 10:27:49 UTC
Mystery solved, it seems. I checked the BIOS settings and found a deeply nested page that I hadn't seen before, mostly relating to C-states. While none of the options were disabled, all of them were on auto. I enabled most of them (all but CPU EIST and Bi-Directional PROCHOT) and booted with intel_idle.max_cstate=0. It now does seem to be using the other C-states.

# cat /sys/devices/system/cpu/cpu0/cpuidle/*/name
C0
C1
C2
C3

# turbostat
core CPU   %c0   GHz  TSC   %c1    %c3    %c6   %pc3   %pc6 
           0.48 2.68 2.93   1.03   2.77  95.72   3.49  45.69
   0   0   0.06 1.24 2.93   0.16   5.64  94.14   3.49  45.69
   0   4   0.04 1.24 2.93   0.18   5.64  94.14   3.49  45.69
   1   1   0.45 1.88 2.93   3.80   0.26  95.49   3.49  45.69
   1   5   2.47 3.18 2.93   1.78   0.26  95.49   3.49  45.69
   2   2   0.07 1.23 2.93   0.22   3.53  96.18   3.49  45.69
   2   6   0.05 1.25 2.93   0.24   3.53  96.18   3.49  45.69
   3   3   0.39 1.44 2.93   0.88   1.65  97.09   3.49  45.69
   3   7   0.28 2.36 2.93   0.98   1.65  97.09   3.49  45.69

I don't know what "auto" really means here. Even the manual doesn't say under what conditions these would actually be enabled. If you're curious, check page 40 of my motherboard manual.

http://download.gigabyte.ru/manual/mb_manual_ga-p55a-ud6_e.pdf
Comment 19 James Le Cuirot 2010-10-31 10:33:11 UTC
Before we close this, may I just ask if there is actually any point enabling EIST while using turbo mode and these extra C-states? I can't seem to find a clear answer out there.
Comment 20 Jean Delvare 2010-10-31 10:37:42 UTC
James, how can you have C1-C3 available when booting with intel_idle.max_cstate=0? Makes no sense to me.
Comment 21 James Le Cuirot 2010-10-31 10:42:01 UTC
That setting effectively disables intel_idle so it was using acpi_idle instead. I don't think it affects acpi_idle?
Comment 22 Jean Delvare 2010-10-31 10:48:21 UTC
Ah, OK, I get it now, sorry for the noise.
Comment 23 Len Brown 2010-11-30 05:44:09 UTC
Thanks for verifying that acpi_idle can get into deep C-states
once they are enabled in the BIOS.

I can't explain why the BIOS for this board does not enable them
when you select the global SETUP defaults.  Perhaps a subsequent
version of the BIOS gets this right.

> ...if there is actually any point enabling
> EIST while using turbo mode and these extra C-states?

Yes.
see the output from turbostat -d -- it will tell you the LFM,
which is the low frequency mode associated with the lowest voltage
of the part.  That operating point is actually the most efficient
way to retire instructions.  The other operating points, including
all other P-states and all turbo states, are higher performance,
but also higher voltage, and thus -- by definition -- less efficient.
Without EIST, that high-efficiency operating point is not available.

Note You need to log in before you can comment on or make changes to this bug.