My laptop keeps reporting "temperature above threshold" errors that always last 1ms and cannot be reported with ABRT. I'm pretty sure this is not a real temperature error condition, since it always gets "fixed" in 1ms. Things having to do with temperature always take a lot longer than that. Here's dmesg snippet: [93120.116714] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [93300.544896] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [93828.253704] CPU1: Core temperature above threshold, cpu clock throttled (total events = 65) [93828.253705] CPU5: Core temperature above threshold, cpu clock throttled (total events = 65) [93828.253717] CPU5: Package temperature above threshold, cpu clock throttled (total events = 462) [93828.253720] CPU2: Package temperature above threshold, cpu clock throttled (total events = 462) [93828.253721] CPU4: Package temperature above threshold, cpu clock throttled (total events = 462) [93828.253722] CPU0: Package temperature above threshold, cpu clock throttled (total events = 462) [93828.253723] CPU7: Package temperature above threshold, cpu clock throttled (total events = 462) [93828.253724] CPU6: Package temperature above threshold, cpu clock throttled (total events = 462) [93828.253725] CPU3: Package temperature above threshold, cpu clock throttled (total events = 462) [93828.253728] CPU1: Package temperature above threshold, cpu clock throttled (total events = 462) [93828.254736] CPU1: Core temperature/speed normal [93828.254738] CPU5: Core temperature/speed normal [93828.254744] CPU2: Package temperature/speed normal [93828.254747] CPU6: Package temperature/speed normal [93828.254749] CPU5: Package temperature/speed normal [93828.254755] CPU7: Package temperature/speed normal [93828.254756] CPU3: Package temperature/speed normal [93828.254761] CPU0: Package temperature/speed normal [93828.254764] CPU4: Package temperature/speed normal [93828.254776] CPU1: Package temperature/speed normal [93858.439817] mce: [Hardware Error]: Machine check events logged [94151.554335] CPU5: Core temperature above threshold, cpu clock throttled (total events = 1214) [94151.554346] CPU2: Package temperature above threshold, cpu clock throttled (total events = 3285) [94151.554348] CPU4: Package temperature above threshold, cpu clock throttled (total events = 3285) [94151.554348] CPU0: Package temperature above threshold, cpu clock throttled (total events = 3285) [94151.554349] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1214) [94151.554350] CPU6: Package temperature above threshold, cpu clock throttled (total events = 3285) [94151.554352] CPU7: Package temperature above threshold, cpu clock throttled (total events = 3285) [94151.554353] CPU3: Package temperature above threshold, cpu clock throttled (total events = 3285) [94151.554355] CPU1: Package temperature above threshold, cpu clock throttled (total events = 3285) [94151.554361] CPU5: Package temperature above threshold, cpu clock throttled (total events = 3285) [94151.555359] CPU5: Core temperature/speed normal [94151.555363] CPU2: Package temperature/speed normal [94151.555366] CPU1: Core temperature/speed normal [94151.555369] CPU0: Package temperature/speed normal [94151.555372] CPU4: Package temperature/speed normal [94151.555375] CPU6: Package temperature/speed normal [94151.555377] CPU1: Package temperature/speed normal [94151.555389] CPU7: Package temperature/speed normal [94151.555392] CPU3: Package temperature/speed normal [94151.555404] CPU5: Package temperature/speed normal [94308.961272] mce: [Hardware Error]: Machine check events logged Version-Release number of selected component (if applicable): [ 0.000000] Linux version 3.19.0-1.fc22.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 5.0.0 20150206 (Red Hat 5.0.0-0.8) (GCC) ) #1 SMP Mon Feb 9 08:42:43 UTC 2015 Running this kernel on Fedora 21 -- but even when running properly matching kernel (from fedora-updates), I saw same errors. How reproducible: Every day, ~once a day Steps to Reproduce: 1. Keep laptop on Additional info: * does not seem to coincide with heavy CPU activity * original bug reported on 3.19.0; now same behavior on 3.19.3
what is the model of your platform? please attach the output of "cat /proc/cpuinfo"
Here's the 1st of 8 entries (I believe rest are nearly identical): processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 60 model name : Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz stepping : 3 microcode : 0x1c cpu MHz : 3679.156 cache size : 6144 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt bugs : bogomips : 5587.01 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management:
These messages are generated by thermal threshold interrupts generated by the BIOS. So it does think that you have reached a temperature where this needs immediate control. BIOS use aggressive method, so temperature will become below threshold immediately. So I suggest these steps to check: 1- go to cd /sys/class/hwmon/hwmonX (where X is from 0-n, where attribute name = "coretemp") You will something like this, if you do grep . * name:coretemp temp1_crit:105000 temp1_crit_alarm:0 temp1_input:41000 temp1_label:Physical id 0 temp1_max:105000 temp2_crit:105000 temp2_crit_alarm:0 temp2_input:39000 temp2_label:Core 0 temp2_max:105000 temp3_crit:105000 temp3_crit_alarm:0 temp3_input:38000 temp3_label:Core 1 temp3_max:105000 Usually the BIOS will control and send interrupt to OS, when tempX_input > tempX_max So you can do something like this in this folder while true; do cat temp1_input; sleep 1; done I think you do reach close to tempX_max temperature.
Soon (<1min) after a temperature event: # cd /sys/class/hwmon/hwmon1/ # grep . * --exclude-dir='*' name:coretemp temp1_crit:100000 temp1_crit_alarm:0 temp1_input:65000 temp1_label:Physical id 0 temp1_max:84000 temp2_crit:100000 temp2_crit_alarm:0 temp2_input:62000 temp2_label:Core 0 temp2_max:84000 temp3_crit:100000 temp3_crit_alarm:0 temp3_input:62000 temp3_label:Core 1 temp3_max:84000 temp4_crit:100000 temp4_crit_alarm:0 temp4_input:61000 temp4_label:Core 2 temp4_max:84000 temp5_crit:100000 temp5_crit_alarm:0 temp5_input:63000 temp5_label:Core 3 temp5_max:84000 # dmesg |tail -n### [37845.134740] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [39467.310323] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [39480.882088] CPU5: Core temperature above threshold, cpu clock throttled (total events = 1) [39480.882100] CPU0: Package temperature above threshold, cpu clock throttled (total events = 1) [39480.882101] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1) [39480.882102] CPU4: Package temperature above threshold, cpu clock throttled (total events = 1) [39480.882105] CPU1: Package temperature above threshold, cpu clock throttled (total events = 1) [39480.882106] CPU3: Package temperature above threshold, cpu clock throttled (total events = 1) [39480.882107] CPU7: Package temperature above threshold, cpu clock throttled (total events = 1) [39480.882108] CPU6: Package temperature above threshold, cpu clock throttled (total events = 1) [39480.882109] CPU2: Package temperature above threshold, cpu clock throttled (total events = 1) [39480.882113] CPU5: Package temperature above threshold, cpu clock throttled (total events = 1) [39480.883111] CPU5: Core temperature/speed normal [39480.883116] CPU3: Package temperature/speed normal [39480.883120] CPU4: Package temperature/speed normal [39480.883124] CPU0: Package temperature/speed normal [39480.883128] CPU2: Package temperature/speed normal [39480.883131] CPU7: Package temperature/speed normal [39480.883134] CPU1: Core temperature/speed normal [39480.883137] CPU6: Package temperature/speed normal [39480.883142] CPU1: Package temperature/speed normal [39480.883162] CPU5: Package temperature/speed normal [39526.852167] mce: [Hardware Error]: Machine check events logged Is _max a limit setting for the alarm or is it historical max reading? The only other neighbor symlink is hwmon0, and it has: # grep . * --exclude-dir='*' name:acpitz temp1_crit:120000 temp1_input:52000
max limit is a setting where BIOS/firmware should activate cooling. This is a predefined limit set by the manufacturer. Your temperature readings (tempX_input) is quite close the max, so it is likely that temperature reached max and came down as a result of action BIOS. So this displayed this message. If you are not running anything and your cpu temperature is 65C, then you may have some cooling issue on your platform (Is FAN running?). Better to run turbostat or system monitor and see utilization of your CPUs. You may have some daemon going crazy. For your hwmon0 question, this follows closely with coretemp. So suggest in your hwmon1 folder $ while true; do cat temp1_input; sleep 1; done You will see that your temperature sometime reaching close to max.
ping...
dmesg: [ 238.815752] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead. [ 1613.070169] CPU5: Core temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070170] CPU1: Core temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070183] CPU1: Package temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070186] CPU5: Package temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070188] CPU2: Package temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070189] CPU4: Package temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070190] CPU6: Package temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070191] CPU0: Package temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070192] CPU7: Package temperature above threshold, cpu clock throttled (total events = 120) [ 1613.070193] CPU3: Package temperature above threshold, cpu clock throttled (total events = 120) [ 1613.071190] CPU1: Core temperature/speed normal [ 1613.071204] CPU5: Core temperature/speed normal [ 1613.071206] CPU1: Package temperature/speed normal [ 1613.071216] CPU4: Package temperature/speed normal [ 1613.071221] CPU7: Package temperature/speed normal [ 1613.071224] CPU3: Package temperature/speed normal [ 1613.071227] CPU0: Package temperature/speed normal [ 1613.071231] CPU2: Package temperature/speed normal [ 1613.071234] CPU6: Package temperature/speed normal [ 1613.071236] CPU5: Package temperature/speed normal $ while true; do echo "$(date) $(dmesg |tail -n1|cut -d] -f1)] $(cat temp1_input) $(cat temp2_input) $(cat temp3_input) $(cat temp4_input) $(cat temp5_input)"; sleep 1; done Fri Jun 26 10:37:23 PDT 2015 [ 238.815752] 78000 73000 77000 72000 72000 Fri Jun 26 10:37:24 PDT 2015 [ 238.815752] 92000 75000 92000 80000 75000 Fri Jun 26 10:37:25 PDT 2015 [ 238.815752] 77000 73000 76000 72000 72000 Fri Jun 26 10:37:26 PDT 2015 [ 238.815752] 89000 75000 89000 78000 71000 Fri Jun 26 10:37:27 PDT 2015 [ 238.815752] 90000 74000 91000 79000 73000 Fri Jun 26 10:37:28 PDT 2015 [ 238.815752] 77000 71000 76000 71000 71000 Fri Jun 26 10:37:29 PDT 2015 [ 238.815752] 74000 71000 73000 72000 71000 Fri Jun 26 10:37:30 PDT 2015 [ 238.815752] 81000 73000 80000 74000 71000 Fri Jun 26 10:37:31 PDT 2015 [ 238.815752] 81000 74000 81000 76000 71000 Fri Jun 26 10:37:32 PDT 2015 [ 238.815752] 78000 72000 77000 74000 71000 Fri Jun 26 10:37:33 PDT 2015 [ 238.815752] 79000 75000 81000 72000 72000 Fri Jun 26 10:37:34 PDT 2015 [ 238.815752] 80000 74000 79000 73000 72000 Fri Jun 26 10:37:35 PDT 2015 [ 238.815752] 83000 74000 81000 75000 72000 Fri Jun 26 10:37:36 PDT 2015 [ 238.815752] 91000 75000 92000 79000 73000 Fri Jun 26 10:37:37 PDT 2015 [ 238.815752] 94000 78000 93000 79000 72000 Fri Jun 26 10:37:38 PDT 2015 [ 238.815752] 94000 79000 94000 79000 72000 Fri Jun 26 10:37:39 PDT 2015 [ 238.815752] 83000 74000 81000 75000 71000 Fri Jun 26 10:37:40 PDT 2015 [ 238.815752] 82000 78000 82000 76000 72000 Fri Jun 26 10:37:41 PDT 2015 [ 238.815752] 76000 74000 76000 74000 71000 Fri Jun 26 10:37:42 PDT 2015 [ 238.815752] 77000 74000 75000 74000 70000 Fri Jun 26 10:37:43 PDT 2015 [ 1613.071236] 87000 77000 87000 80000 72000 Fri Jun 26 10:37:44 PDT 2015 [ 1613.071236] 77000 76000 75000 75000 70000 Fri Jun 26 10:37:45 PDT 2015 [ 1613.071236] 81000 77000 80000 77000 72000 Fri Jun 26 10:37:46 PDT 2015 [ 1613.071236] 78000 74000 79000 73000 72000 Fri Jun 26 10:37:47 PDT 2015 [ 1613.071236] 74000 73000 74000 72000 73000 Fri Jun 26 10:37:48 PDT 2015 [ 1613.071236] 91000 77000 91000 77000 74000 Fri Jun 26 10:37:49 PDT 2015 [ 1613.071236] 75000 71000 74000 72000 70000 Fri Jun 26 10:37:50 PDT 2015 [ 1613.071236] 91000 78000 92000 79000 71000 Fri Jun 26 10:37:51 PDT 2015 [ 1613.071236] 75000 72000 74000 73000 71000 Fri Jun 26 10:37:52 PDT 2015 [ 1613.071236] 73000 72000 72000 71000 72000 Fri Jun 26 10:37:53 PDT 2015 [ 1613.071236] 75000 73000 74000 73000 73000 Fri Jun 26 10:37:54 PDT 2015 [ 1613.071236] 71000 70000 70000 70000 70000 Fri Jun 26 10:37:55 PDT 2015 [ 1613.071236] 75000 73000 74000 72000 74000 Fri Jun 26 10:37:56 PDT 2015 [ 1613.071236] 90000 75000 88000 81000 73000 Fri Jun 26 10:37:57 PDT 2015 [ 1613.071236] 91000 75000 90000 79000 75000 Fri Jun 26 10:37:58 PDT 2015 [ 1613.071236] 91000 75000 91000 78000 72000 Fri Jun 26 10:37:59 PDT 2015 [ 1613.071236] 91000 75000 92000 78000 74000 Fri Jun 26 10:38:00 PDT 2015 [ 1613.071236] 91000 75000 91000 78000 71000 Fri Jun 26 10:38:01 PDT 2015 [ 1613.071236] 83000 80000 78000 82000 83000 Fri Jun 26 10:38:02 PDT 2015 [ 1613.071236] 75000 74000 73000 72000 70000 Fri Jun 26 10:38:03 PDT 2015 [ 1613.071236] 75000 75000 74000 70000 72000 Fri Jun 26 10:38:04 PDT 2015 [ 1613.071236] 79000 73000 80000 75000 70000 Fri Jun 26 10:38:05 PDT 2015 [ 1613.071236] 86000 77000 85000 74000 73000 Fri Jun 26 10:38:06 PDT 2015 [ 1613.071236] 79000 72000 79000 73000 73000 Fri Jun 26 10:38:07 PDT 2015 [ 1613.071236] 71000 70000 72000 70000 69000 Fri Jun 26 10:38:08 PDT 2015 [ 1613.071236] 69000 68000 70000 69000 68000 Fri Jun 26 10:38:09 PDT 2015 [ 1613.071236] 72000 70000 71000 70000 68000 Fri Jun 26 10:38:10 PDT 2015 [ 1613.071236] 72000 69000 70000 69000 69000 Fri Jun 26 10:38:11 PDT 2015 [ 1613.071236] 72000 70000 72000 70000 67000 Fri Jun 26 10:38:12 PDT 2015 [ 1613.071236] 76000 71000 76000 71000 69000
I started seeing the same after a kernel update in Fedora 20. It stayed even with a Fedora 21 upgrade. Went away for a few weeks but is back with the latest kernel update. Linux version 4.0.7-200.fc21.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC) ) #1 SMP Mon Jun 29 22:11:52 UTC 2015 [87365.887038] CPU5: Core temperature above threshold, cpu clock throttled (total events = 1) [87365.887039] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1) [87365.887041] CPU6: Package temperature above threshold, cpu clock throttled (total events = 1) [87365.887042] CPU2: Package temperature above threshold, cpu clock throttled (total events = 1) [87365.887044] CPU3: Package temperature above threshold, cpu clock throttled (total events = 1) [87365.887046] CPU4: Package temperature above threshold, cpu clock throttled (total events = 1) [87365.887047] CPU7: Package temperature above threshold, cpu clock throttled (total events = 1) [87365.887048] CPU0: Package temperature above threshold, cpu clock throttled (total events = 1) [87365.887049] CPU1: Package temperature above threshold, cpu clock throttled (total events = 1) [87365.887061] CPU5: Package temperature above threshold, cpu clock throttled (total events = 1) [87365.889032] CPU5: Core temperature/speed normal [87365.889033] CPU1: Core temperature/speed normal [87365.889035] CPU6: Package temperature/speed normal [87365.889036] CPU4: Package temperature/speed normal [87365.889037] CPU3: Package temperature/speed normal [87365.889038] CPU0: Package temperature/speed normal [87365.889039] CPU2: Package temperature/speed normal [87365.889039] CPU7: Package temperature/speed normal [87365.889040] CPU1: Package temperature/speed normal [87365.889048] CPU5: Package temperature/speed normal [87374.345270] mce: [Hardware Error]: Machine check events logged [87666.172496] CPU5: Core temperature above threshold, cpu clock throttled (total events = 1356) [87666.172497] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1356) [87666.172499] CPU4: Package temperature above threshold, cpu clock throttled (total events = 1393) [87666.172500] CPU0: Package temperature above threshold, cpu clock throttled (total events = 1393) [87666.172500] CPU1: Package temperature above threshold, cpu clock throttled (total events = 1393) [87666.172519] CPU5: Package temperature above threshold, cpu clock throttled (total events = 1393) [87666.172529] CPU6: Package temperature above threshold, cpu clock throttled (total events = 1393) [87666.172530] CPU2: Package temperature above threshold, cpu clock throttled (total events = 1393) [87666.172532] CPU3: Package temperature above threshold, cpu clock throttled (total events = 1393) [87666.172533] CPU7: Package temperature above threshold, cpu clock throttled (total events = 1393) [87666.174531] CPU5: Core temperature/speed normal
Your temperature log shows temperature around 90C, which is high enough for some devices to be throttled by BIOS. This is a real temperature condition.
This is not a thermal bug as system is acting to compensate for high temperature. If kernel upgrade caused this then run turbostat and top to compare numbers and cause of the driver/process which is causing this.
I haven't seen any more events since upgrading to Linux version 4.0.7-300.fc22.x86_64. I used turbostat to monitor and I heard the fan rpm kick-up once the temp reached around 80. The fan ran high for about a minute then cut the rpms again as temp dropped. Temp seems to hover between 65 - 69. I would get the machine check events in the past out of the blue. No high rpm fan cut in, no warning, just a machine check. Will debug if it happens again.
any updates? does the machine check event happen again?
Actually I haven't seen those since almost a month ago, ~August 8th (according to btrfs backups), with kernel 4.0.8 According to yum log, kernel 4.1.3 was installed around that time (7th) -- that probably fixed the problem
Since this issue doesn't occur, Can we close this issue?
Just stumbled across this BZ in my own search. I've been seeing the same errors on two different laptops across several kernel versions, in Fedora 22-23-24-25 and just for fun - CentOS7. Please let me know what information I need to provide to help move the fix along Should I open a new BZ or can we reopen this one?