I have an i7-875K and with intel_idle enabled, the readings from coretemp are very erratic. The reading from all four cores will frequently drop very low, sometimes even as low as 21°C, before returning to what I suppose is normal at around 30-40°C. Usually they will all drop to the same temperature but not always. For example, I might get 25/25/25/26. I would expect the temperature to drop when in a high C-State but given that 21°C is more or less room temperature and the way in which the readings suddenly rise and fall by as much as 10°C, I think that this is not working properly. My very uneducated guess would be that when a core is in C3 and/or C6, it cannot get a reading from the sensor and so it gets 0 instead. I did a bisect and found that the problem appeared as soon as intel_idle was introduced. I can confirm that the problem is still present as of linux-next-20100827. I'm CC'ing Len Brown as he is the author of intel_idle.
My own educated guess goes as follows: * The temperature values returned from MSR by modern Intel CPUs are not accurate when far away from the CPU's maximum temperature. Values aren't even actual degrees Celsius, they are arbitrary counts down from the maximum temperature. I don't know the exact max temperature of your CPU model, but it's typically in the 85-100°C range, so the 30-40°C you get definitely qualify as far away from the CPU's maximum temperature. * The goal of the intel_idle driver is to lower power consumption of the CPU. Lower power consumption means lower temperature, so it is no surprise that the coretemp driver reports lower values. The exact numbers are irrelevant, as you are out of the range into which the coretemp driver reports temperature values remotely resembling degrees Celsius. Your assumption that "it cannot get a reading from the sensor and so it gets 0 instead" doesn't hold technical examination. The coretemp driver reports a value which is computed as (Tmax - register_value), where Tmax is a constant. If register value was 0 then the driver would return Tmax (like 85°C or 100°C), which isn't the case here. So I don't see anything that needs fixing here, other than adding a proper information that the coretemp driver doesn't report accurate degrees Celsius but relative, arbitrary values.
The maximum reported temperature for this CPU is 99°C. I didn't think it was very accurate but I guess it's still a little less accurate than I expected. I would still like to hear from Len though, as he works for Intel, and is probably in the best position to comment on whether these erratic temperature changes reflect what is actually happening when power saving occurs - or whether a problem reading the sensor is more likely.
James, Please do an apples/apples comparison using CONFIG_INTEL_IDLE=y and boot with and without "intel_idle.max_cstate=0", which will disable intel_idle and run acpi_idle in the same kernel binary. Preferably use the latest upstream kernel, or at least 2.6.35.stable
Hi Len. Just tried with 2.6.36-rc6 and it is definitely intel_idle.
FWIW, I am not able to reproduce the problem reported here on my Intel Xeon E5520 (kernel 2.6.36-rc7). I have intel_idle enabled, and values reported by the coretemp driver are stable around 50°C, as they were before. James, I fear that your reply in comment #4 won't make Len happy. He asked you for a specific test, you should preform that test and report the result, rather than a bold and unhelpful "it is definitely intel_idle". When done with testing intel_idle.max_cstate=0, you could try values 1 and 2, for completeness. This will tell us whether a specific C-state is at fault, or the intel_idle driver as a whole. I would also like to know whether your system works with the ACPI idle driver. Assuming you have CONFIG_ACPI_PROCESSOR enabled (and the "processor" module loaded if needed), the ACPI idle driver should run. At least it does on my system (Asus Z8NA-D6), and I observe an essentially similar behavior when using the ACPI idle driver or the dedicated Intel idle driver (but I guess it depends on the BIOS to some extent.) So please check the contents of file /sys/devices/system/cpu/cpuidle/current_driver when booting with intel_idle.max_cstate=0, and also the contents of directory /sys/devices/system/cpu/cpu0/cpuidle with and without this option. If the problem only happens with the intel_idle driver and not with the ACPI idle driver, maybe a comparison of these sysfs attributes will provide a hint.
Sorry if I was brief but I did do as he asked. I have now checked it further as you have suggested, using 2.6.36-rc7. With intel_idle.max_cstate=0, current_driver reports acpi_idle. With this option set to 1, 2 or 3 or not set at all (same as 3, I think?) then it reports intel_idle. As for the behaviour, 1 didn't seem much different from 0. 2 had fairly low temps (29-35) but they were relatively stable. 3, as I said before, seems to be the same as no option at all with the erratic behaviour. The /sys/devices/system/cpu/cpu0/cpuidle directory doesn't exist under acpi_idle. With the option set to 1, I get state0 and state1, then state2 with 2 and state3 with 3. The names of these are... C0 NHM-C1 NHM-C3 NHM-C6 I check the temperatures with gkrellm but I have noticed the "sensors" program doesn't always reflect the same temperatures at the moment you run it. gkrellm only updates once a second but it seems to pick up the lowest temperatures much more often than sensors does. I want to stress that sensors does these pick up these tempatures sometimes - gkrellm isn't making them up. But having seen how low the temperatures go without being erratic under option 2, I am now prepared to accept that this behaviour is a reflection of the fact that C6 is measurably cooler than C3 but only occurs a fraction of the time.
> The /sys/devices/system/cpu/cpu0/cpuidle directory > doesn't exist under acpi_idle That is abnormal. Possibly the failure here is actually that ACPI C-states are not working... In ACPI mode, please share the output from 'cat /proc/acpi/processor/*/power' and also the complete dmesg I'd like to see the difference between 'grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*' with and without "intel_idle.max_cstate=0", but if that information is missing in the ACPI configuration, then we'll have to get it another way. Please grab turbostat from here: http://www.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/pmtools-latest/turbostat/turbostat.c and run '# turbostat -v' for a bit and capture what is going on when you see the erratic temperature readings. Do it again with "intel_idle.max_cstate=0" to see the same system state in ACPI mode.
James, can you try ACPI mode, per comment #7?
Yes, sorry, just been very busy. Will try ASAP. :)
Created attachment 34862 [details] dmesg when using acpi_idle
Created attachment 34872 [details] dmesg when using intel_idle
Created attachment 34882 [details] turbostat output when using acpi_idle
Created attachment 34892 [details] turbostat output when using intel_idle The large variations in C6 usage is interesting. This may well be what I'm seeing.
turbostat shows that intel_idle is succeeding in getting the package into pc6, c0 residency is 0.10%, and the frequency drops as low as 1.2GHz. That all looks good. You will also find with intel_idle that if you run a single threaded application, the that thread may be able to run as fast as 3.6GHz, which is also good. turbostat shows that the acpi_idle case is not getting any deeper than c1, c0 time is over 2.00%, and the frequency is pegged at 3.2GHz. That is not good. The c1 limitation will prevent you from ever reaching a frequency higher than 3.2GHz, and using only C1 is a waste of power. While there may be lower temperatures seen with intel_idle, and the reading may jump around, the larger bug here is that when you use acpi_idle, it is not reaching the idle power saving states that it should; and that will impact both your energy savings and your maximum performance. please attach the .config please attach the output from acpidump. please build with CONFIG_ACPI_DEBUG=y and boot the acpi_idle kernel with acpi.debug_layer=0x20000000 and attach the dmesg BTW. the output from powertop -d on the acpi_idle case may also be useful
oh, and another thing to try to get acpi_idle c-states to appear is "processor.nocst=1" (this uses the legacy FADT C-states, which we shouldn't have to use, but apparently something is wrong with getting the c-states via CST on this box)
Oh, please also check if you are running an up-to-date BIOS and that the SETUP defaults are in force, in particular any settings related to power management.
Created attachment 35552 [details] 2.6.36 kernel config Again, sorry for the delay, my house has been upside down this week. Here is the kernel config. I run Gentoo and configure my own kernel so it's a little minimalistic but I'm pretty sure everything that's needed for this is enabled. I'll try to get the rest to you tomorrow.
Mystery solved, it seems. I checked the BIOS settings and found a deeply nested page that I hadn't seen before, mostly relating to C-states. While none of the options were disabled, all of them were on auto. I enabled most of them (all but CPU EIST and Bi-Directional PROCHOT) and booted with intel_idle.max_cstate=0. It now does seem to be using the other C-states. # cat /sys/devices/system/cpu/cpu0/cpuidle/*/name C0 C1 C2 C3 # turbostat core CPU %c0 GHz TSC %c1 %c3 %c6 %pc3 %pc6 0.48 2.68 2.93 1.03 2.77 95.72 3.49 45.69 0 0 0.06 1.24 2.93 0.16 5.64 94.14 3.49 45.69 0 4 0.04 1.24 2.93 0.18 5.64 94.14 3.49 45.69 1 1 0.45 1.88 2.93 3.80 0.26 95.49 3.49 45.69 1 5 2.47 3.18 2.93 1.78 0.26 95.49 3.49 45.69 2 2 0.07 1.23 2.93 0.22 3.53 96.18 3.49 45.69 2 6 0.05 1.25 2.93 0.24 3.53 96.18 3.49 45.69 3 3 0.39 1.44 2.93 0.88 1.65 97.09 3.49 45.69 3 7 0.28 2.36 2.93 0.98 1.65 97.09 3.49 45.69 I don't know what "auto" really means here. Even the manual doesn't say under what conditions these would actually be enabled. If you're curious, check page 40 of my motherboard manual. http://download.gigabyte.ru/manual/mb_manual_ga-p55a-ud6_e.pdf
Before we close this, may I just ask if there is actually any point enabling EIST while using turbo mode and these extra C-states? I can't seem to find a clear answer out there.
James, how can you have C1-C3 available when booting with intel_idle.max_cstate=0? Makes no sense to me.
That setting effectively disables intel_idle so it was using acpi_idle instead. I don't think it affects acpi_idle?
Ah, OK, I get it now, sorry for the noise.
Thanks for verifying that acpi_idle can get into deep C-states once they are enabled in the BIOS. I can't explain why the BIOS for this board does not enable them when you select the global SETUP defaults. Perhaps a subsequent version of the BIOS gets this right. > ...if there is actually any point enabling > EIST while using turbo mode and these extra C-states? Yes. see the output from turbostat -d -- it will tell you the LFM, which is the low frequency mode associated with the lowest voltage of the part. That operating point is actually the most efficient way to retire instructions. The other operating points, including all other P-states and all turbo states, are higher performance, but also higher voltage, and thus -- by definition -- less efficient. Without EIST, that high-efficiency operating point is not available.