I have a Supermicro motherboard with an Intel Atom C2758 processor where coretemp does not seem to be working correctly. The core temperature sensor appears to be detected correctly with sensors-detect and shows up as expected in the output of running 'sensors' as coretemp-isa-0000. However, the temperatures reported only update for a short period of time after the system boots. After that, the reported temperatures get 'stuck' and do not change. I can get the overall CPU temperature from the IPMI interface with ipmitool, and this value does change over time even after the values reported by coretemp stop updating. I don't see any interesting messages in dmesg. The entries in /sys/devices/platform/coretemp.0 also do not change after the output of sensors gets stuck, so the problem is definitely either a hardware issue or a driver issue, not an issue with lm-sensors. Reloading the coretemp module also does not help. The values reported by sensors and sysfs are the same before and after reloading the driver. cpuinfo (1 core out of 8): processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 77 model name : Intel(R) Atom(TM) CPU C2758 @ 2.40GHz stepping : 8 microcode : 0x127 cpu MHz : 2400.000 cache size : 1024 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch epb tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm arat bugs : bogomips : 4802.11 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: output of sensors: coretemp-isa-0000 Adapter: ISA adapter Core 0: +47.0°C (high = +98.0°C, crit = +98.0°C) Core 1: +47.0°C (high = +98.0°C, crit = +98.0°C) Core 2: +47.0°C (high = +98.0°C, crit = +98.0°C) Core 3: +47.0°C (high = +98.0°C, crit = +98.0°C) Core 4: +46.0°C (high = +98.0°C, crit = +98.0°C) Core 5: +46.0°C (high = +98.0°C, crit = +98.0°C) Core 6: +47.0°C (high = +98.0°C, crit = +98.0°C) Core 7: +46.0°C (high = +98.0°C, crit = +98.0°C) sysfs entries: $ cat /sys/devices/platform/coretemp.0/hwmon/hwmon0/temp*_input 47000 47000 47000 47000 46000 46000 47000 46000
Just updated to kernel version 4.10.1; no change.
the temperature reported by coretemp driver are directly read from MSR. Thus this sounds like a hardware issue to me. please attach the turbostat output as well, when the problem is reproduced.
sensors output: $ sensors coretemp-isa-0000 Adapter: ISA adapter Core 0: +39.0°C (high = +98.0°C, crit = +98.0°C) Core 1: +39.0°C (high = +98.0°C, crit = +98.0°C) Core 2: +38.0°C (high = +98.0°C, crit = +98.0°C) Core 3: +38.0°C (high = +98.0°C, crit = +98.0°C) Core 4: +38.0°C (high = +98.0°C, crit = +98.0°C) Core 5: +38.0°C (high = +98.0°C, crit = +98.0°C) Core 6: +36.0°C (high = +98.0°C, crit = +98.0°C) Core 7: +36.0°C (high = +98.0°C, crit = +98.0°C) turbostat output, with turbostat.c edited to force no_MSR_MISC_PWR_MGMT to 1 to avoid an I/O error while reading msr 0x1aa: $ sudo ./turbostat --debug turbostat version 17.04.12 - Len Brown <lenb@kernel.org> CPUID(0): GenuineIntel 11 CPUID levels; family:model:stepping 0x6:4d:8 (6:77:8) CPUID(1): SSE3 MONITOR - EIST TM2 TSC MSR ACPI-TM TM CPUID(6): APERF, No-TURBO, DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, EPB cpu5: MSR_IA32_MISC_ENABLE: 0x00850089 (TCC EIST No-MWAIT PREFETCH TURBO) CPUID(7): No-SGX SLM BCLK: 100.0 Mhz RAPL: 2185 sec. Joule Counter Range, at 30 Watts cpu5: MSR_PLATFORM_INFO: 0xc0080001800 12 * 100.0 = 1200.0 MHz max efficiency frequency 24 * 100.0 = 2400.0 MHz base frequency cpu5: MSR_IA32_POWER_CTL: 0x00000000 (C1E auto-promotion: DISabled) cpu5: MSR_TURBO_RATIO_LIMIT: 0x00000000 cpu5: MSR_PKG_CST_CONFIG_CONTROL: 0x0000840e (locked: pkg-cstate-limit=14: pc6) cpu5: POLL: CPUIDLE CORE POLL IDLE cpu5: C1: MWAIT 0x00 cpu5: C6: MWAIT 0x51 cpu5: cpufreq driver: acpi-cpufreq cpu5: cpufreq governor: schedutil cpu0: MSR_IA32_ENERGY_PERF_BIAS: 0x00000004 (custom) cpu0: MSR_RAPL_POWER_UNIT: 0x000a1003 (0.125000 Watts, 0.000015 Joules, 0.000977 sec.) cpu0: MSR_PKG_POWER_LIMIT: 0x468bb8005b89c4 (UNlocked) cpu0: PKG Limit #1: ENabled (312.500000 Watts, 10.000000 sec, clamp ENabled) cpu0: PKG Limit #2: ENabled (375.000000 Watts, 0.009766* sec, clamp DISabled) cpu0: MSR_PP0_POWER_LIMIT: 0x00020000 (UNlocked) cpu0: Cores Limit: DISabled (0.000000 Watts, 0.001953 sec, clamp DISabled) cpu0: MSR_IA32_TEMPERATURE_TARGET: 0x00620000 (98 C) cpu0: MSR_IA32_THERM_STATUS: 0x883b0000 (39 C +/- 1) cpu0: MSR_IA32_THERM_INTERRUPT: 0x000a0507 (88 C, 93 C) cpu1: MSR_IA32_THERM_STATUS: 0x883b0000 (39 C +/- 1) cpu1: MSR_IA32_THERM_INTERRUPT: 0x000a0507 (88 C, 93 C) cpu2: MSR_IA32_THERM_STATUS: 0x883c0000 (38 C +/- 1) cpu2: MSR_IA32_THERM_INTERRUPT: 0x000a0507 (88 C, 93 C) cpu3: MSR_IA32_THERM_STATUS: 0x883c0000 (38 C +/- 1) cpu3: MSR_IA32_THERM_INTERRUPT: 0x000a0507 (88 C, 93 C) cpu4: MSR_IA32_THERM_STATUS: 0x883c0000 (38 C +/- 1) cpu4: MSR_IA32_THERM_INTERRUPT: 0x000a0507 (88 C, 93 C) cpu5: MSR_IA32_THERM_STATUS: 0x883c0000 (38 C +/- 1) cpu5: MSR_IA32_THERM_INTERRUPT: 0x000a0507 (88 C, 93 C) cpu6: MSR_IA32_THERM_STATUS: 0x883e0000 (36 C +/- 1) cpu6: MSR_IA32_THERM_INTERRUPT: 0x000a0507 (88 C, 93 C) cpu7: MSR_IA32_THERM_STATUS: 0x883e0000 (36 C +/- 1) cpu7: MSR_IA32_THERM_INTERRUPT: 0x000a0507 (88 C, 93 C) Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI C1 C6 C1% C6% CPU%c1 CPU%c6 CoreTmp Pkg%pc3 Pkg%pc6 PkgWatt CorWatt - - 33 1.74 1900 2400 7395 0 223 7435 0.10 98.20 0.38 97.88 39 0.00 0.00 0.00 0.00 0 0 24 1.25 1900 2400 719 0 23 710 0.21 98.57 0.44 98.31 39 0.00 0.00 0.00 0.00 1 1 29 1.54 1900 2400 1025 0 18 811 0.05 98.44 0.29 98.16 39 2 2 33 1.76 1900 2400 1010 0 15 1191 0.02 98.27 0.34 97.90 38 3 3 22 1.14 1900 2400 979 0 21 957 0.06 98.84 0.32 98.54 38 4 4 27 1.40 1900 2400 629 0 36 838 0.02 98.62 0.28 98.33 38 5 5 28 1.46 1900 2400 749 0 36 1010 0.02 98.57 0.32 98.22 38 6 6 72 3.78 1900 2400 1233 0 30 925 0.26 96.00 0.57 95.65 36 7 7 30 1.58 1900 2400 1051 0 44 993 0.16 98.31 0.49 97.93 36
(In reply to Alex Forencich from comment #0) > > cpuinfo (1 core out of 8): > > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 77 > model name : Intel(R) Atom(TM) CPU C2758 @ 2.40GHz #define INTEL_FAM6_ATOM_SILVERMONT2 0x4D /* Avaton/Rangely */ this is an Avaton platform.
the turnostat output are consistent with the core_temp driver output. It seems that the real problem is that MSR stops updating...
please 1. run "turbostat --debug --out turbostat.log" 2. stress cpu to make sure the temperature raises 3. quit turbostat and attach the turbostat.log here we can check if the other MSRs are updated properly.
Bug closed because there is not response from the bug reporter. Please feel free to reopen it if you can provide the information required in comment #6.