Bug 173361 - Under heavy load, the CPU speed suddenly and irreversibly drops from 3500 to 400 MHz
Summary: Under heavy load, the CPU speed suddenly and irreversibly drops from 3500 to ...
Status: CLOSED DOCUMENTED
Alias: None
Product: Power Management
Classification: Unclassified
Component: Thermal (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Srinivas Pandruvada
URL: https://bugzilla.redhat.com/show_bug....
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-29 14:57 UTC by Larry Finger
Modified: 2023-03-10 16:32 UTC (History)
7 users (show)

See Also:
Kernel Version: 4.7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Script to continuously monitor package temperature (241 bytes, text/plain)
2016-09-29 18:28 UTC, Doug Smythies
Details
Output from perf record -a --event=power:pstate_sample command (2.52 MB, application/gzip)
2016-09-30 17:24 UTC, Larry Finger
Details
CPU frequencies from Larry's trace data (34.66 KB, image/png)
2016-09-30 18:16 UTC, Doug Smythies
Details
CPU frequencies from Doug's trace data (12.84 KB, image/png)
2016-09-30 18:19 UTC, Doug Smythies
Details
A detail example of a moment of higher CPU frequency - Larry (22.03 KB, image/png)
2016-09-30 18:29 UTC, Doug Smythies
Details
Output of dmesg command (63.80 KB, text/plain)
2016-09-30 19:23 UTC, Larry Finger
Details
Output from "turbostat --debug sleep 10" (3.43 KB, text/plain)
2016-09-30 19:51 UTC, Larry Finger
Details
CPU freqs, Doug 50% modulation - no load (23.76 KB, image/png)
2016-09-30 22:16 UTC, Doug Smythies
Details
Package Temperature for two methods of temperature limiting 1 of 2 (49.37 KB, image/png)
2016-10-02 18:37 UTC, Doug Smythies
Details
Package Temperature for two methods of temperature limiting 2 of 2 (49.27 KB, image/png)
2016-10-02 18:41 UTC, Doug Smythies
Details
simple script to monitor temerature and cpu0 frequency (956 bytes, text/plain)
2016-10-06 06:53 UTC, Doug Smythies
Details

Description Larry Finger 2016-09-29 14:57:55 UTC
My Toshiba Tecra A50-A laptop that a CPU described as 'Model: 6.60.3 "Intel(R) Core(TM) i7-4600M CPU @ 2.90GHz"'. Under heavy load, this dual-core unit with hyperthreading will suddenly drop from a frequency of 3500 to 410 MHz. The only way to recover is to reboot. See http://lkml.iu.edu/hypermail/linux/kernel/1609.3/00720.html for some details and other discussion.

This bug is being filed under power management because the 'sensors' command shows the following:

finger@linux-1t8h:~> sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +88.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:         +88.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:         +77.0°C  (high = +84.0°C, crit = +100.0°C)

Routinely, at least one of the CPUs has a temperature higher than the "high" value. As Rafael Wysocki states "It looks like in 4.8-rc we made a change that caused the "high" trip point to be acted on." In that case, the bug would be that the frequency is never restored.
Comment 1 Doug Smythies 2016-09-29 16:40:29 UTC
For what it is worth, some data from my computer:

. I get the exact same steady state temperature and package power under full load with both kernel 4.7 and 4.8-rc8.

. I messed with the cooling so that it could exceed the high limit, and when it did nothing tripped (as expected):

doug@s15:~/temp2$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +81.0°C  (high = +80.0°C, crit = +98.0°C)
Core 0:         +77.0°C  (high = +80.0°C, crit = +98.0°C)
Core 1:         +81.0°C  (high = +80.0°C, crit = +98.0°C)
Core 2:         +77.0°C  (high = +80.0°C, crit = +98.0°C)
Core 3:         +78.0°C  (high = +80.0°C, crit = +98.0°C)

For this comment from the e-mail thread: "Hmm, I would not expect the CPU to drop from 80 to 40 degrees in a few seconds if the fan is not spinning.  I wouldn't even expect it if the fan was spinning.  I would think at least 30 to 60 seconds if not more."

I from a steady state, full load, temperature of 78 degrees C to 0 load I see: 15 degrees drop in 1 second; 18 degrees drop in 2 seconds; 22 degrees drop in 10 seconds; 25 degrees droop in 20 seconds.

For the original post comment: "In that case, the bug would be that the frequency is never restored."
It isn't supposed to restore. I do not know why in this case it is kicking in at 50%, usually it is less. Regardless, the current control algorithm in the intel_pstate driver is fundamentally incompatible with clock modulation, and will always drive the CPU down to the minimum * the modulation %, regardless of load. Other drivers typically drive the CPU frequency to what would normally be desired * modulation % (and for the most part users don't even notice).
Comment 2 Larry Finger 2016-09-29 17:18:02 UTC
I have not measured the drop in temperature with change in load, but I know it happens very quickly, and that the fan speed is very sensitive to the load. Even when running 4 infinite loops, sending a E-mail with Thunderbird will decrease the CPU utilization enough that the fan will momentarily slow.
Comment 3 Doug Smythies 2016-09-29 18:28:04 UTC
Created attachment 240221 [details]
Script to continuously monitor package temperature

I guess the point is that one can not come after the event trigger and obtain anything close to related temperatures. Perhaps the attached script could be used to monitor package temperature from well before to well after the event.
Comment 4 Larry Finger 2016-09-29 19:29:11 UTC
Thanks. I'm running the script with the continuing test of 4.7, which has been running for a little over 30 hours without triggering the event.
Comment 5 Srinivas Pandruvada 2016-09-29 21:57:18 UTC
Just for keeping record: Posting thermal tables from this device:
/sys/class/thermal/thermal_zone*
/sys/class/thermal/thermal_zone0/mode:enabled
/sys/class/thermal/thermal_zone0/temp:16000
/sys/class/thermal/thermal_zone0/type:acpitz
/sys/class/thermal/thermal_zone0/trip_point_0_temp:102000
/sys/class/thermal/thermal_zone0/trip_point_0_type:critical
/sys/class/thermal/thermal_zone0/policy:step_wise
/sys/class/thermal/thermal_zone0/passive:0
/sys/class/thermal/thermal_zone0/available_policies:user_space bang_bang 
fair_share step_wise
/sys/class/thermal/thermal_zone1/temp:43000
/sys/class/thermal/thermal_zone1/type:x86_pkg_temp
/sys/class/thermal/thermal_zone1/slope:0
/sys/class/thermal/thermal_zone1/trip_point_0_temp:0
/sys/class/thermal/thermal_zone1/trip_point_0_type:passive
/sys/class/thermal/thermal_zone1/trip_point_1_temp:0
/sys/class/thermal/thermal_zone1/trip_point_1_type:passive
/sys/class/thermal/thermal_zone1/offset:0
/sys/class/thermal/thermal_zone1/policy:step_wise
/sys/class/thermal/thermal_zone1/available_policies:user_space bang_bang 
fair_share step_wise

Also, there is no thermald running.
Comment 6 Srinivas Pandruvada 2016-09-29 22:24:53 UTC
In Intel processors clock modulation is last resort before HW takes over. Here we got stuck in that state, it means that the firmware thinks that it is still in not right state.

Can you copy and run this script once stuck?
Before that 
# cd /sys/class/thermal/cooling-device?
where "type" attribute is "processor". 


#!/bin/bash

i=0
while (( i <= 10 )); do
    echo $(( i++ )) > cur_state
    sleep 1
    cat cur_state
done

echo reset again

i=10
while (( i >= 0 )); do
    echo $(( i-- )) > cur_state
    sleep 1
    cat cur_state
done
Comment 7 Srinivas Pandruvada 2016-09-29 22:31:52 UTC
Also when we are in this state: Please run trace with some workload

# cd /sys/kernel/debug/tracing/
# echo 1 > events/power/pstate_sample/enable
# echo 1 > events/power/cpu_frequency/enable
# cat trace
Comment 8 Rafael J. Wysocki 2016-09-29 23:12:41 UTC
(In reply to Srinivas Pandruvada from comment #6)
> In Intel processors clock modulation is last resort before HW takes over.
> Here we got stuck in that state, it means that the firmware thinks that it
> is still in not right state.

Or there is a bug in the SMM code that prevents it from turning the clock modulation off.

@Larry: I wonder if bumping up the min frequency from sysfs while in this state makes any difference.  Can you please try that too?
Comment 9 Rafael J. Wysocki 2016-09-29 23:16:38 UTC
That is, echo twice the number from

/sys/devices/system/cpu/cpufreq/policy0/cpuinfo_min_freq

into

/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq

and see what happens.
Comment 10 Doug Smythies 2016-09-29 23:28:44 UTC
(In reply to Srinivas Pandruvada from comment #7)
> Also when we are in this state: Please run trace with some workload
> 
> # cd /sys/kernel/debug/tracing/
> # echo 1 > events/power/pstate_sample/enable
> # echo 1 > events/power/cpu_frequency/enable
> # cat trace

Isn't the "cpu_frequency" trace redundant, because it is already included in the pstate_sample trace? The above method means we can not use the post processing tools to automatically parse, separate, and graph everything. If this is done instead:

sudo perf record -a --event=power:pstate_sample sleep 300

Then the post processing tools can do in less than a minute what has always taken (well, me at least) a few hours to do manually.

The post processing tools were started by Dirk, but I guess I am the only one that still uses (and maintains) them.

We have already analyzed and understand why clock modulation is incompatible with the current algorithm of the intel_pstate driver. I'll see if I can find some reference links.
Comment 11 Srinivas Pandruvada 2016-09-30 00:02:00 UTC
@Doug
Any method is fine. 

I will not incompatible. Someone need to disable T-states. P-states shouldn't be used before T-States. This can cause even bad issue to powerdown by processor, when this is a genuine case that thermals are not in control.
Comment 12 Srinivas Pandruvada 2016-09-30 00:04:01 UTC
That's why we have ACPI thermal table, which manufactures should add to prevent entering to this condition.
Comment 13 Srinivas Pandruvada 2016-09-30 00:45:38 UTC
Sorry, need to add before you run script in comment 6 before you do anything, the moment you think you got stuck:

sudo rdmsr --all 0x19A
sudo rdmsr --all 0x19B
sudo rdmsr --all 0x19C
sudo rdmsr --all 0x19D
sudo rdmsr --all 0x1A0
sudo rdmsr --all 0x1B1
sudo rdmsr --all 0x1B2

At least you are other side of problem. Trying to help someone where the catastrophic shutdown within few seconds of thermal throttle, once start running JAVA netbeans.
Comment 14 Rafael J. Wysocki 2016-09-30 00:47:15 UTC
(In reply to Srinivas Pandruvada from comment #12)
> That's why we have ACPI thermal table, which manufactures should add to
> prevent entering to this condition.

Not if SMM is supposed/expected to take care of thermal management.
Comment 15 Doug Smythies 2016-09-30 00:49:47 UTC
(In reply to Rafael J. Wysocki from comment #8)
> 
> @Larry: I wonder if bumping up the min frequency from sysfs while in this
> state makes any difference.  Can you please try that too?

That should work towards increases the CPU frequency.
I did it on my computer: With a nominal minimum CPU frequency of 1600 MHz,I set 50% clock modulation (manually) resulting in 800 MHz, even under full load.

Then I did this (different method than was suggested):

# echo "70" > /sys/devices/system/cpu/intel_pstate/min_perf_pct

and got:

cpu MHz         : 1299.968
cpu MHz         : 1299.968
cpu MHz         : 1299.968
cpu MHz         : 1299.968
cpu MHz         : 1299.968
cpu MHz         : 1299.968
cpu MHz         : 1299.968
cpu MHz         : 1299.968

3800 * 0.7 * 0.5 = 1300 MHz
Comment 16 Rafael J. Wysocki 2016-09-30 00:52:07 UTC
(In reply to Doug Smythies from comment #10)
> (In reply to Srinivas Pandruvada from comment #7)
> > Also when we are in this state: Please run trace with some workload
> > 
> > # cd /sys/kernel/debug/tracing/
> > # echo 1 > events/power/pstate_sample/enable
> > # echo 1 > events/power/cpu_frequency/enable
> > # cat trace
> 
> Isn't the "cpu_frequency" trace redundant, because it is already included in
> the pstate_sample trace? The above method means we can not use the post
> processing tools to automatically parse, separate, and graph everything. If
> this is done instead:
> 
> sudo perf record -a --event=power:pstate_sample sleep 300
> 
> Then the post processing tools can do in less than a minute what has always
> taken (well, me at least) a few hours to do manually.
> 
> The post processing tools were started by Dirk, but I guess I am the only
> one that still uses (and maintains) them.

Can you send those to me or Srinivas, please?

> We have already analyzed and understand why clock modulation is incompatible
> with the current algorithm of the intel_pstate driver. I'll see if I can
> find some reference links.

As Srinivas said, this doesn't explain why the processor gets stuck at 400 MHz even after the thermal stress has gone away.

The SMM code should stop applying clock modulation at that point and the measured frequency should go back to 800 MHz (and probably stay there).
Comment 17 Rafael J. Wysocki 2016-09-30 00:55:30 UTC
(In reply to Doug Smythies from comment #15)
> (In reply to Rafael J. Wysocki from comment #8)
> > 
> > @Larry: I wonder if bumping up the min frequency from sysfs while in this
> > state makes any difference.  Can you please try that too?
> 
> That should work towards increases the CPU frequency.

Yes, it should, but I wonder if it *does* on the Larry's system.
Comment 18 Srinivas Pandruvada 2016-09-30 00:58:01 UTC
No SMM, is last resort. This is mandatory to have thermal controls implemented at this level. On many several PCs you will see lots of power control messages in dmesg. They are generally quite good.

Other OS have in built thermal manager runs in user space (similar to thermald), so probably never reaches this stage to have good validation.
Comment 19 Rafael J. Wysocki 2016-09-30 01:17:36 UTC
(In reply to Srinivas Pandruvada from comment #18)
> No SMM, is last resort. This is mandatory to have thermal controls
> implemented at this level. On many several PCs you will see lots of power
> control messages in dmesg. They are generally quite good.

That depends AFAICS.  On some systems there's SMM only (I have one of these here).
Comment 20 Srinivas Pandruvada 2016-09-30 03:58:30 UTC
Looks I am not clear, that's what I am saying. You need to have thermal controls outside OS (E.g. in SMM or so). Usually they are quite good and enough for thermals. 

But it has limitation hence OS needs to get involved in thermal management in some cases.

In Linux thermal management, even if there are no ACPI thermal bindings, it allows to write a passive trip.
This is present in /sys/class/thermal/thermal_zone0/ on this platform:

Like here may help to avoid to trigger SMM thermal algorithm at worst case
# echo 86000 > passive
Comment 21 Doug Smythies 2016-09-30 07:10:24 UTC
(In reply to Rafael J. Wysocki from comment #16)
> 
> Can you send those to me or Srinivas, please?
> 

Sent via separate e-mail. And it includes a short 50% Clock Modulation trace from my computer.

>> We have already analyzed and understand why clock modulation is incompatible
>> with the current algorithm of the intel_pstate driver. I'll see if I can
>> find some reference links.
> 
> As Srinivas said, this doesn't explain why the processor gets stuck at 400
> MHz even after the thermal stress has gone away.
> 
> The SMM code should stop applying clock modulation at that point and the
> measured frequency should go back to 800 MHz (and probably stay there).

With all the people I have helped (forums and such) with Clock Modulation issues, I have never seen it go away by itself. (admittedly communication is often a challenge.)
Comment 22 Larry Finger 2016-09-30 16:07:24 UTC
I ran the script from comment 6 on my system when it was not failing. On this box, /sys/class/thermal/cooling-device{0-3} refer to the 4 processors. I used 0.

In the middle of the script, my CPU fan slowed, and I quickly checked the "Average clock" as reported by the System Load applet. It had dropped to the value of 410 MHz. As the script finished, the average clock returned to 3500.

While I was preparing this entry, and after roughly 50 hours of running correctly, this problem just triggered with v4.7. It is NOT a 4.8 regression, but a previously undiscovered bug. Of course, some change in 4.8 may trigger it more often.

I have prepared a script to run the various tests. The steps are listed below. If I have anything wrong, please let me know. I will leave the machine in this state for a couple of hours until I get confirmation that this sequence is the best.

1. Run the commands in comment 13. These produce no output.
2. Run the command in comment 10. (What to do with the file that is produced?) This command is in lieu of those in comment 7.
3. Do the steps from comment 9.
4. Finally run the script in comment 6.

Once the computer is again running fast, I will post the interesting portion of the output from the script from comment 3.
Comment 23 Srinivas Pandruvada 2016-09-30 16:11:56 UTC
Doug:
If there is some issue in SMM where the modulation is not reset, we have another x86 pkg temp thermal driver, that is the place to reset as it monitors thermals.
The MSR dump I requested should give clue it is really stuck or not.

If this is a real thermal situation both voltage and clock already forced low before using stop clock solution.
Comment 24 Doug Smythies 2016-09-30 16:29:35 UTC
(In reply to Larry Finger from comment #22)

> 
> 1. Run the commands in comment 13. These produce no output.

You need to run "sudo modprobe msr" first.

> 2. Run the command in comment 10. (What to do with the file that is
> produced?) This command is in lieu of those in comment 7.

You can post it here, or email it directly to me and I'll post process it and post back relevant findings.
Comment 25 Srinivas Pandruvada 2016-09-30 16:33:41 UTC
Larry: I think we have confused you enough :-)

Is your compute still on and stuck at 400 MHz?

Make sure that you are not running any busy load.

Please follow these in order:

1. For few second till you see 2+ sample printed from

 #turbostat --debug --msr=0x199 

2.
Looks like you don't have msr-tools installed and may not have msr modules loaded. It will really help to get this.
sudo modprobe msr
sudo rdmsr --all 0x19A
sudo rdmsr --all 0x19B
sudo rdmsr --all 0x19C
sudo rdmsr --all 0x19D
sudo rdmsr --all 0x1A0
sudo rdmsr --all 0x1B1
sudo rdmsr --all 0x1B2

3. Post the output of sensors

4. Then run the suggestion from comment 10 and attach the output file.

5. echo 2000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq

Check again if you see ~2000000, when you do "cat scaling_cur_freq" with your busy load

5. If in the above step you don't see, run the script from comment 6
Comment 26 Larry Finger 2016-09-30 17:19:47 UTC
On 09/30/2016 11:33 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=173361
>
> --- Comment #25 from Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> ---
> Larry: I think we have confused you enough :-)
>
> Is your compute still on and stuck at 400 MHz?
>
> Make sure that you are not running any busy load.

I have killed the 4 infinite loop jobs. CPU utilization is now 0.

>
> Please follow these in order:
>
> 1. For few second till you see 2+ sample printed from
>
>  #turbostat --debug --msr=0x199

My version of turbostat did not like those arguments. Running just 'turbostat' 
resulted in

finger@linux-1t8h:~/linux-2.6> sudo turbostat
     Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3 
CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt 
GFXWatt
        -       -      25    6.12     411    2894       0    7.29    0.37 
0.11   86.11      36      38   78.47    0.00    0.00    0.00    2.19    0.05    0.00
        0       0      25    5.96     419    2894       9    7.80    0.48 
0.16   85.61      36      38   78.45    0.00    0.00    0.00    2.19    0.05    0.00
        0       1      25    6.18     402    2893       9    7.55
        1       2      27    6.49     421    2893       9    6.60    0.25 
0.06   86.60      36
        1       3      24    5.86     401    2893       9    7.23
     Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3 
CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt 
GFXWatt
        -       -      40    9.73     410    2893       0    8.51    0.39 
0.14   81.23      36      38   68.95    0.00    0.00    0.00    2.32    0.08    0.00
        0       0      47   11.28     417    2893       9    6.19    0.54 
0.23   81.76      36      38   68.95    0.00    0.00    0.00    2.32    0.08    0.00
        0       1      29    7.13     401    2893       9   10.35
        1       2      54   12.98     416    2893       9    6.03    0.23 
0.06   80.70      36
        1       3      30    7.53     400    2893       9   11.48

>
> 2.
> Looks like you don't have msr-tools installed and may not have msr modules
> loaded. It will really help to get this.
> sudo modprobe msr
> sudo rdmsr --all 0x19A
> sudo rdmsr --all 0x19B
> sudo rdmsr --all 0x19C
> sudo rdmsr --all 0x19D
> sudo rdmsr --all 0x1A0
> sudo rdmsr --all 0x1B1
> sudo rdmsr --all 0x1B2

I do have the msr module, and it is now loaded.

finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x19A
18
18
18
18
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x19B
0
0
0
0
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x19C
883e0808
883e0808
883e0808
883e0808
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x19D
0
0
0
0
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x1A0
4000850089
4000850089
4000850089
4000850089
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x1B1
883d0808
883d0808
883d0808
883d0808
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x1B2
10
10
10
10

> 3. Post the output of sensors

finger@linux-1t8h:~/linux-2.6> sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +37.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:         +37.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:         +37.0°C  (high = +84.0°C, crit = +100.0°C)

> 4. Then run the suggestion from comment 10 and attach the output file.

Console output is

finger@linux-1t8h:~/linux-2.6> sudo perf record -a --event=power:pstate_sample 
sleep 300
root's password:
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 7.663 MB perf.data (61786 samples) ]

The perf.data file
>
> 5. echo 2000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
>
> Check again if you see ~2000000, when you do "cat scaling_cur_freq" with your
> busy load
>
> 5. If in the above step you don't see, run the script from comment 6

I do have the msr module, and it is now loaded. I have killed the 4 infinite 
loop jobs. CPU utilization is now 0.
Comment 27 Larry Finger 2016-09-30 17:24:53 UTC
Created attachment 240431 [details]
Output from perf record -a --event=power:pstate_sample  command
Comment 28 Srinivas Pandruvada 2016-09-30 17:32:44 UTC
I have to look at the data.

Did this step help?

echo 2000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
Check again if you see ~2000000, when you do "cat scaling_cur_freq" 

Run some busy load and monitor scaling_cure_freq.
Comment 29 Larry Finger 2016-09-30 17:40:12 UTC
On 09/30/2016 11:33 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=173361
>
> --- Comment #25 from Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> ---
> Larry: I think we have confused you enough :-)
>
> Is your compute still on and stuck at 400 MHz?
>
> Make sure that you are not running any busy load.

I have killed the 4 infinite loop jobs. CPU utilization is now 0.

>
> Please follow these in order:
>
> 1. For few second till you see 2+ sample printed from
>
>  #turbostat --debug --msr=0x199

My version of turbostat did not like those arguments. Running just 'turbostat' 
resulted in

finger@linux-1t8h:~/linux-2.6> sudo turbostat
      Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3 
CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt 
GFXWatt
         -       -      25    6.12     411    2894       0    7.29    0.37 0.11 
  86.11      36      38   78.47    0.00    0.00    0.00    2.19    0.05    0.00
         0       0      25    5.96     419    2894       9    7.80    0.48 0.16 
  85.61      36      38   78.45    0.00    0.00    0.00    2.19    0.05    0.00
         0       1      25    6.18     402    2893       9    7.55
         1       2      27    6.49     421    2893       9    6.60    0.25 0.06 
  86.60      36
         1       3      24    5.86     401    2893       9    7.23
      Core     CPU Avg_MHz   %Busy Bzy_MHz TSC_MHz     SMI  CPU%c1  CPU%c3 
CPU%c6  CPU%c7 CoreTmp  PkgTmp Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 PkgWatt CorWatt 
GFXWatt
         -       -      40    9.73     410    2893       0    8.51    0.39 0.14 
  81.23      36      38   68.95    0.00    0.00    0.00    2.32    0.08    0.00
         0       0      47   11.28     417    2893       9    6.19    0.54 0.23 
  81.76      36      38   68.95    0.00    0.00    0.00    2.32    0.08    0.00
         0       1      29    7.13     401    2893       9   10.35
         1       2      54   12.98     416    2893       9    6.03    0.23 0.06 
  80.70      36
         1       3      30    7.53     400    2893       9   11.48

>
> 2.
> Looks like you don't have msr-tools installed and may not have msr modules
> loaded. It will really help to get this.
> sudo modprobe msr
> sudo rdmsr --all 0x19A
> sudo rdmsr --all 0x19B
> sudo rdmsr --all 0x19C
> sudo rdmsr --all 0x19D
> sudo rdmsr --all 0x1A0
> sudo rdmsr --all 0x1B1
> sudo rdmsr --all 0x1B2

I do have the msr module, and it is now loaded.

finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x19A
18
18
18
18
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x19B
0
0
0
0
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x19C
883e0808
883e0808
883e0808
883e0808
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x19D
0
0
0
0
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x1A0
4000850089
4000850089
4000850089
4000850089
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x1B1
883d0808
883d0808
883d0808
883d0808
finger@linux-1t8h:~/linux-2.6> sudo rdmsr --all 0x1B2
10
10
10
10

> 3. Post the output of sensors

finger@linux-1t8h:~/linux-2.6> sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +37.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:         +37.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:         +37.0°C  (high = +84.0°C, crit = +100.0°C)

> 4. Then run the suggestion from comment 10 and attach the output file.

Console output is

finger@linux-1t8h:~/linux-2.6> sudo perf record -a --event=power:pstate_sample 
sleep 300
root's password:
[ perf record: Woken up 25 times to write data ]
[ perf record: Captured and wrote 7.663 MB perf.data (61786 samples) ]

The perf.data file has been gzipped and attached to 
https://bugzilla.kernel.org/show_bug.cgi?id=173361.
>
> 5. echo 2000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
>
> Check again if you see ~2000000, when you do "cat scaling_cur_freq" with your
> busy load

Busy load restarted.

linux-1t8h:~ # cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq
770135

>
> 5. If in the above step you don't see, run the script from comment 6

finger@linux-1t8h:~/wireless-drivers-next> sudo ~/stuck_cpu.sh
/sys/class/thermal/cooling_device0 /home/finger/wireless-drivers-next
0
1
2
3
4
5
6
7
8
9
10
reset again
10
9
8
7
6
5
4
3
2
1
0

CPUs still stuck at 410 MHz.

Following Rafael's postings in comment 9,

linux-1t8h:~ # cat /sys/devices/system/cpu/cpufreq/policy0/cpuinfo_min_freq
800000
linux-1t8h:~ # echo 1600000 > 
/sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq

The "Average clock" in the applet went from 410 to 612 MHz.

After echo 3200000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq,
the Average clock went to 1262 MHz. The machine is starting to be usable.

Sorry that an incomplete version was sent inadvertently. I will leave the 
machine in this state until I hear back.

Larry


I will leave the machine in this state until I hear back.
Comment 30 Srinivas Pandruvada 2016-09-30 17:45:56 UTC
Please leave the machine on. Still analyzing the issue.
Comment 31 Larry Finger 2016-09-30 17:50:50 UTC
It will stay on.

Output from script in comment 3 from just before the slowdown happened. I think that the temperature of 91 C was the first noted.

Physical id 0:  +90.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +89.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +91.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +90.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +90.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +90.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +90.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +60.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +58.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +57.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +56.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +56.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +56.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +55.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +55.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +54.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +54.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +54.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +54.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +53.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +53.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +53.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +53.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +54.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +53.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +53.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +53.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Physical id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Comment 32 Doug Smythies 2016-09-30 18:16:40 UTC
Created attachment 240451 [details]
CPU frequencies from Larry's trace data

The CPU frequencies vary more than I expected. I'll post the similar graph from my computer (when under 50% Clock Modulation) in a moment. Larry's register 19A does show that clock modulation is enabled at 50%. However, my computer was fully loaded, and Larry's was not.
Comment 33 Doug Smythies 2016-09-30 18:19:00 UTC
Created attachment 240461 [details]
CPU frequencies from Doug's trace data

Doug's computer at 50% clock modulation, just to compare.
Comment 34 Srinivas Pandruvada 2016-09-30 18:22:59 UTC
Do you have identical another PC/laptop? I guess not, but I asked anyway.
I needed to check initial state of one MSR.

Somehow the thermal interrupts were disabled for core and package. So the SMM thermal handler never removed the modulation as it never got interrupt when temperature was low. I don't know the initial state of this interrupt enable (Interrupts were enabled before but somehow got disabled later or not). SMM can always resort to polling, but then it would have caught low temperature and everything would have been fine.

0x19A (IA32_CLOCK_MODULATION):
       0x18 (50% modulation, and clock modulation enabled), that's why you see 50% reduction in frequency. It will be permanent.

0x19b (IA32_THERM_INTERRUPT):
       00 : This is a problem! Somehow thermal interrupt got killed. This is unusual (usually 0x13)

0x19c (IA32_THERM_STATUS):
       883e0808: PROCHOT (So the system had hot thermal event) for all cores
		 Power was limited on this processor

0x1A0 (IA32_MISC_ENABLE):
      4000850089 : TCC enabled, performance monitoring enabled, 

0x1B1 (IA32_PACKAGE_THERM_STATUS):
      883d0808:  PROCHOT (So the system had hot thermal event)
		 Power was limited on this processor

0x1B2 (IA32_PACKAGE_THERM_INTERRUPT): 0x10: Only Pkg Overheat Interrupt Enabled: Strange (usually atleast 13 or 3)


Keep the machine on. I will see if the OS can disable this interrupt.
Comment 35 Doug Smythies 2016-09-30 18:29:07 UTC
Created attachment 240471 [details]
A detail example of a moment of higher CPU frequency - Larry

I assume this has something to do with Larry's computer being mostly idle. I'll try to verify on my computer.
Comment 36 Larry Finger 2016-09-30 18:39:07 UTC
On 09/30/2016 01:22 PM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=173361
>
> --- Comment #34 from Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> ---
> Do you have identical another PC/laptop? I guess not, but I asked anyway.
> I needed to check initial state of one MSR.
>
> Somehow the thermal interrupts were disabled for core and package. So the SMM
> thermal handler never removed the modulation as it never got interrupt when
> temperature was low. I don't know the initial state of this interrupt enable
> (Interrupts were enabled before but somehow got disabled later or not). SMM
> can
> always resort to polling, but then it would have caught low temperature and
> everything would have been fine.
>
> 0x19A (IA32_CLOCK_MODULATION):
>        0x18 (50% modulation, and clock modulation enabled), that's why you
>        see
> 50% reduction in frequency. It will be permanent.
>
> 0x19b (IA32_THERM_INTERRUPT):
>        00 : This is a problem! Somehow thermal interrupt got killed. This is
> unusual (usually 0x13)
>
> 0x19c (IA32_THERM_STATUS):
>        883e0808: PROCHOT (So the system had hot thermal event) for all cores
>          Power was limited on this processor
>
> 0x1A0 (IA32_MISC_ENABLE):
>       4000850089 : TCC enabled, performance monitoring enabled,
>
> 0x1B1 (IA32_PACKAGE_THERM_STATUS):
>       883d0808:  PROCHOT (So the system had hot thermal event)
>          Power was limited on this processor
>
> 0x1B2 (IA32_PACKAGE_THERM_INTERRUPT): 0x10: Only Pkg Overheat Interrupt
> Enabled: Strange (usually atleast 13 or 3)
>
>
> Keep the machine on. I will see if the OS can disable this interrupt.

I do not have another of these machines. Computer is still on.
Comment 37 Srinivas Pandruvada 2016-09-30 18:41:53 UTC
Larry: What is the output of  
cat /proc/interrupts
Comment 38 Larry Finger 2016-09-30 18:44:14 UTC
linux-1t8h:~ # cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:         18          0          0          0  IR-IO-APIC   2-edge      timer
  1:      83853         27         22         49  IR-IO-APIC   1-edge      i8042
  8:          0          1          0          0  IR-IO-APIC   8-edge      rtc0
  9:      81163          0          0          2  IR-IO-APIC   9-fasteoi   acpi
 12:    4694000       9521       1634       1034  IR-IO-APIC  12-edge      i8042
 21:         36         32          0          1  IR-IO-APIC  21-fasteoi   ehci_hcd:usb3
 23:         42         27         12          0  IR-IO-APIC  23-fasteoi   ehci_hcd:usb4
 24:          0          0          0          0  DMAR-MSI   0-edge      dmar0
 25:          0          0          0          0  DMAR-MSI   1-edge      dmar1
 26:     443803      11538       6693     190340  IR-PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 27:      42828       2484          3          7  IR-PCI-MSI 32768-edge      i915
 28:        794        248         25         62  IR-PCI-MSI 327680-edge      xhci_hcd
 29:          7         12          0          0  IR-PCI-MSI 49152-edge      snd_hda_intel:card0
 30:         19          4          0          3  IR-PCI-MSI 360448-edge      mei_me
 31:     837877         60         28         20  IR-PCI-MSI 2097152-edge      iwlwifi
 32:        485          2          0         11  IR-PCI-MSI 442368-edge      snd_hda_intel:card1
 33:          9          0      50498          0  IR-PCI-MSI 409600-edge      eth0
NMI:       5934      22296      22302      22296   Non-maskable interrupts
LOC:   58737106   54010638   51324656   52732344   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:       5934      22296      22302      22296   Performance monitoring interrupts
IWI:         19         18         20         18   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:     884279     836394     626701     475450   Rescheduling interrupts
CAL:     226278       2175     112325       1958   Function call interrupts
TLB:      89280      97511      44072      35504   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:        651        649        649        649   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0   Posted-interrupt notification event
PIW:          0          0          0          0   Posted-interrupt wakeup event

I have the perf data for a run before the bug triggered. If that has any interest, I could upload it as well.
Comment 39 Srinivas Pandruvada 2016-09-30 18:55:06 UTC
Did your dmesg overflowed?

If it has boot time messages, can you check if you have the following log?


CPU0: Thermal monitoring enabled
Comment 40 Doug Smythies 2016-09-30 18:57:43 UTC
Larry: This request is just for my own interest, and will not progress the investigation. I think turbostat, in debug mode decodes some, but not all, of the same registers Srinivas is interested in. Could you post the output for:

sudo turbostat  --debug sleep 10

Perhaps as a text attachment, so that the formatting doesn't get messed up.

Myself, I do not need your before event perf data.
Comment 41 Larry Finger 2016-09-30 19:23:11 UTC
Created attachment 240491 [details]
Output of dmesg command

The dmesg buffer did not overflow. I did not see a line that says "CPU0: Thermal monitoring enabled".
Comment 42 Larry Finger 2016-09-30 19:35:24 UTC
My copy of turbostat is different. I get

finger@linux-1t8h:~/rtlwifi_new> sudo turbostat  --debug sleep 10
root's password:
turbostat: invalid option -- '-'
turbostat: turbostat: [-v][-R][-T][-p|-P|-S][-c MSR#][-C MSR#][-m MSR#][-M MSR#][-i interval_sec | command ...]

What option should I use for this version?
Comment 43 Larry Finger 2016-09-30 19:51:46 UTC
Created attachment 240501 [details]
Output from "turbostat  --debug sleep 10"

I found and built the version of turbostat from the kernel source rather than the one installed by openSUSE Leap 42.1. The output is attached.
Comment 44 Srinivas Pandruvada 2016-09-30 20:41:28 UTC
Doug: If you don't want to collect more data, then I don't think I need any more data.

My conclusion so far:

- The thermal interrupts are handled by SMM code, so OS doesn't get the interrupt
- SMM code got hot temperature interrupt
- The system went to hot state and used clock 50% duty cycle
- Somehow interrupt enable register was reset
- Temperature went down but duty cycling was not disabled.
- Even if scaling frequency is increased the duty cycle will reduce it

This points to issue in SMM code. I am not an expert in this, so need to get some help.

Meanwhile try
# sudo wrmsr --all 0x19b 0x13
See if it makes things normal.

# sudo wrmsr --all 0x19a 0x13
This will bring frequencies to normal

Then reboot and

Read

#sudo rdmsr --all  0x19B
I guess this will not be 0. If 0 then SMM is polling. 

Hacks
- Since SMM is handling thermal interrupt, need to poll.
May be just on this system 
write to passive attributes in acpi thermal zone to max temperature - some offset

as suggested in comment #20.

- I know Doug has created a change in past to multiply the P-state request to multiply with modulation duty cycle.
But this may unnecessary impact a genuine thermal cases. Also need consider extended duty cycling also.
Also this is an issue is also in acpi-cpufreq also, just to a lesser degree.
Comment 45 Srinivas Pandruvada 2016-09-30 20:43:07 UTC
Sorry mistake in comment 44.

# sudo wrmsr --all 0x19a 0x13

Should change to

# sudo wrmsr --all 0x19a 0x00
Comment 46 Srinivas Pandruvada 2016-09-30 20:45:58 UTC
Rui,
Any suggestions?
Comment 47 Doug Smythies 2016-09-30 21:33:07 UTC
(In reply to Srinivas Pandruvada from comment #44)
> 
> - I know Doug has created a change in past to multiply the P-state request
> to multiply with modulation duty cycle.

It turned out that proposed patch, based on a suggestion from Chen Yu, didn't actually work. My initial method of testing was flawed in that I always had the CPU pinned at 100% load. When the load drops off the patch basically failed.

Reference:
http://marc.info/?l=linux-pm&m=144859492715482&w=2

> But this may unnecessary impact a genuine thermal cases.

Agreed.

> Also this is an issue is also in acpi-cpufreq also,
> just to a lesser degree.

Disagree. 
Without knowing the deep detail, I always thought that once this last level of defense thermal trip was hit, that it should stay engaged so as to prevent a thermal oscillation. I have never seen 50% before, only 82.5% and 75%, which with the acpi-cpufreq driver many users don't even notice. Anyway, I defer to Srinivas on this.
Comment 48 Larry Finger 2016-09-30 21:47:02 UTC
Any objections to my running the commands from comment 44, as amended in comment 45, followed by a reboot?
Comment 49 Srinivas Pandruvada 2016-09-30 21:52:27 UTC
Yes, you should run 
# sudo wrmsr --all 0x19a 0x00
 before reboot.

Then reboot and

#sudo rdmsr --all  0x19B
Comment 50 Doug Smythies 2016-09-30 22:16:00 UTC
Created attachment 240511 [details]
CPU freqs, Doug 50% modulation - no load

In the end, it probably isn't relevant to this bug report, but I do find Larry's CPU frequencies somewhat non-typical for the 50% clock modulation case. As a comparative example, this is my computer with no-load.
Comment 51 Larry Finger 2016-10-01 00:08:19 UTC
I did a 'sudo wrmsr --all 0x19b 0x13'. That had no effect on any frequencies.

I then did a 'sudo wrmsr --all 0x19a 0'. Again no effect on frequencies.

After reboot 'sudo rdmsr --all x19b' resulted in 0x10 for all 4 CPUs.

Thanks for all the help. I hope you learned something. At least I now know how to get some speed back after such events.
Comment 52 Srinivas Pandruvada 2016-10-01 01:44:01 UTC
Thanks for your patience. 

Very surprised that

sudo wrmsr --all 0x19a 0

didn't change any frequencies.

But atleast the theory is proved that the thermal interrupt enable is getting overwritten.
Comment 53 Doug Smythies 2016-10-01 07:24:28 UTC
(In reply to Srinivas Pandruvada from comment #52) 
> 
> Very surprised that
> 
> sudo wrmsr --all 0x19a 0
> 
> didn't change any frequencies.

Yes, that should have cleared the clock modulation state. As far as I can tell there are only logged reasons (PROCHOT, ...) but no active reasons, and so the condition should have clearable.
Comment 54 Henrique de Moraes Holschuh 2016-10-01 18:48:50 UTC
There exists a microcode release 0x21 for that processor, but it is not widely available (yet?).

Maybe someone @intel can check the internal databases and see if that revision is a fix for some issue related to power management?  And if it is, get that update published to the general public *please*?

I consider this has a non-zero chance of being relevant because the processor *is* behaving rather oddly, and recent microcode updates had to be issued as workarounds for a very similar errata on Broadwell-E (fixed on latest microcode for Xeon E5-v3) and still exists in an unfixed state on some other Broadwell cores, according to at least one user report.

If it is an issue similar to the one in Broadwell, you could try these workarounds:

1. Don't update the microcode at all (i.e. run with whatever microcode your BIOS/UEFI is shipping, and suffer the nasty errata that will get by unfixed)

or...

2. Get UEFI to set up current limit profiles with ridiculously high limits (apparently XMP memory profiles are good for this) -- not often available.

basically, something inside the microcode update was getting a current measurement limit wrong by a factor of 10 in the fail-safe direction (i.e. triggering the limiter when it was not needed), but only when the microcode was updated later than the initial round of MSR setup by UEFI/BIOS.  And that was a regression, i.e. older microcode updates didn't cause that.

It _is_ a long shot, but still...
Comment 55 Henrique de Moraes Holschuh 2016-10-01 18:51:42 UTC
Oops, make it Haswell (Xeon E5-v3 is Haswell).  That makes the microcode angle a bit less unlikely.
Comment 56 Larry Finger 2016-10-01 19:05:18 UTC
There are three updates to the BIOS since the version I am running. The latest is only installable from running Windows, thus I cannot use it. Although I am reluctant to update any BIOS, it may be necessary in this instance.
Comment 57 Henrique de Moraes Holschuh 2016-10-01 21:21:04 UTC
You can open the executable archive they distribute with that bios update using 7z (in Linux or Windows), but the BIOS and EC firmware blobs look like they are either encrypted, compressed, or both.  You *could* try to use the older os-agnostic BIOS/EC updaters for your laptop with the up-to-date firmware data if their headers look alike... but that could easily result in a bricked system.

If I were in your situation, I'd install MS windows in a spare partition, do the firmware update, and then remove it.
 
But I fear that would at most get you microcode 0x1c from what is written on their BIOS changelogs.

Note: if your BIOS has something earlier than Haswell microcode 0x1c, then *please* don't even consider running with BIOS microcode as a valid workaround, except *maybe* for very limited testing.
Comment 58 Srinivas Pandruvada 2016-10-02 01:56:52 UTC
If there is BIOS update, I recommend to update BIOS. This may have fixed some issues. But you need Windows boot somehow.
Comment 59 Larry Finger 2016-10-02 02:54:19 UTC
I upgraded BIOS from 4.20 to 4.50 by writing the iso to a CD and booting it. Windows was not required. There is a newer version that needs Windows to install. I have a Windows PE bootable CD. Perhaps that will work.
Comment 60 Doug Smythies 2016-10-02 18:37:57 UTC
Created attachment 240621 [details]
Package Temperature for two methods of temperature limiting 1 of 2

For my own education, I monitored package temperature under 100% CPU load where clock modulation (at 75%) would become enabled if >= 71 degrees. For one test clock modulation would be disabled again if the temp went <= 69 degrees. For the other test, once enabled, clock modulation was never disabled.

This was while using the acpi-cpufreq CPU frequency scaling driver.

A relatively low threshold temperature was used because: I do not like to stress my test computer; Normally, I can not get it much higher than the low 70's anyhow.
Comment 61 Doug Smythies 2016-10-02 18:41:56 UTC
Created attachment 240631 [details]
Package Temperature for two methods of temperature limiting 2 of 2

This one was while using the intel_pstate CPU frequency scaling driver, where we know the CPU frequency will become 1.2GHz once the 75% clock modulation is enabled, and 3.5 GHz without clock modulation.
Comment 62 Doug Smythies 2016-10-02 18:49:58 UTC
Larry's inability to clear the clock modulation state via register 0x19A combined with the CPU frequencies sometimes briefly going above 400 MHz, makes me wonder if there are competing threads, one setting clock modulation and one clearing it.
Comment 63 Larry Finger 2016-10-02 23:02:09 UTC
That is certainly a possibility.

Is your test box of comment 60 a desktop? There is no way I could keep my laptop's CPU that cool. The sensors command shows it to be near 70 while idling.
Comment 64 Srinivas Pandruvada 2016-10-03 01:34:57 UTC
Can you try with this with default configuration:
http://software.opensuse.org/package/thermald?search_term=thermald
Comment 65 Larry Finger 2016-10-03 02:39:29 UTC
It is now running. I had installed it earlier, but I had not enabled the service until now.
Comment 66 Doug Smythies 2016-10-03 04:08:19 UTC
(In reply to Larry Finger from comment #63)
> That is certainly a possibility.
> 
> Is your test box of comment 60 a desktop? There is no way I could keep my
> laptop's CPU that cool. The sensors command shows it to be near 70 while
> idling.

Yes, it is a desktop, with excellent airflow and no side cover.
To do my test the other day, where I got the package temperature to 81 degrees, it took significant effort.

If your LapTop idles at 70 degrees, I wonder if it needs cleaning? I have to clean my LapTop about twice a year (and it resides in a relatively clean environment. I lock the fan blades and put compressed air backwards through he air cooling system, while at the same time vacuuming from the other side. That way, I don't have to take it apart, at least so far.
Comment 67 Larry Finger 2016-10-03 14:51:52 UTC
I think my laptop is clean. After we had finished all the tests, I blew it out with compressed air and saw no difference in the temperatures.
Comment 68 Srinivas Pandruvada 2016-10-03 16:09:10 UTC
Lary: Please try to repeat tests with thermald. For test I suggest:
# systemctl stop thermald
# thermald --no-daemon --loglevel=info
Leave this shell open for entire duration of test

Alternatively you can modify the systemd config file to add --loglevel=info, then we can get logs.
Comment 69 Larry Finger 2016-10-03 18:04:00 UTC
Using the 4 infinite loops, the system reached the 410 MHz stage very quickly. The thermald info is

finger@linux-1t8h:~/wireless-drivers-next> sudo thermald --no-daemon --loglevel=info
NO RAPL sysfs present 
13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
Running on a vanilla kernel
Polling mode is enabled: 4
sensor_update: type x86_pkg_temp
sensor_update: type acpitz
thd_read_default_thermal_sensors loaded 2 sensors 
dts /sys/devices/platform/coretemp.0/name doesn't exist
failed to open /dev/acpi_thermal_rel 
failed to open /dev/acpi_thermal_rel 
TRT/ART read failed
 Dumping parsed XML Data
 *** Index 0 ***
Name: GenericX86LaptopDevice
UUID: 
type: 0
        Sensor 0 
         Name: TSKN
         Path: 
         Async Capable: 1
         Virtual: 0
        Zone 0 
         Name: SKIN
                 Trip Point 0 
                  temp 55000 
                  trip type 2 
                  hyst id 0 
                  sensor type TSKN 
                  cdev index 0 
                          type rapl_controller 
                          influence 100 
                          SamplingPeriod 16 
                  cdev index 1 
                          type intel_powerclamp 
                          influence 100 
                          SamplingPeriod 12 
 *** Index 1 ***
Name: ExamplePlatformName
UUID: ExampleUUID
type: 0
        Sensor 0 
         Name: TSKN
         Path: 
         Async Capable: 1
         Virtual: 0
        Sensor 1 
         Name: example_sensor_1
         Path: /some_path
         Async Capable: 0
         Virtual: 0
        Sensor 2 
         Name: example_thermal_sysfs_sensor
         Path: 
         Async Capable: 1
         Virtual: 0
        Sensor 3 
         Name: example_virtual_sensor
         Path: 
         Async Capable: 0
         Virtual: 1
                 Link type: example_sensor_1
                 Link mult: 0.500000
                 Link offset: 10.000000
        Zone 0 
         Name: ExampleZonetype
                 Trip Point 0 
                  temp 75000 
                  trip type 1 
                  hyst id 0 
                  sensor type example_sensor_1 
                  cdev index 0 
                          type example_cooling_device 
                          influence 100 
                          SamplingPeriod 12 
        Cooling Dev 0 
                Type: example_cooling_device
                Path: 
                Min: 0
                Max: 50
                Step: 10
                AutoDownControl: 0
         PID: Kp 0.001000
         PID: Ki 0.000100
         PID: Kd 0.000100
Product Name matched [wildcard]
sensor id 5: No temp sysfs for reading raw temp
sensor index:1 x86_pkg_temp /sys/class/thermal/thermal_zone1/ Async:1 
sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:1 
sensor index:2 hwmon /sys/class/hwmon/hwmon0/temp1_input Async:0 
sensor index:3 hwmon /sys/class/hwmon/hwmon0/temp2_input Async:0 
sensor index:4 hwmon /sys/class/hwmon/hwmon0/temp3_input Async:0 
thd_read_default_cooling devices loaded 5 cdevs 
powercap RAPL no long term time window
RAPL device for cpu 0
Use Default pstate drv settings
Product Name matched [wildcard]
3: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
1: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
4: intel_powerclamp, C:-1 MN: 0 MX:50 ST:5 pt:/sys/class/thermal/ rd_bk 0 
2: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
0: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
5: rapl_controller, C:0 MN: 0 MX:0 ST:1850 pt:/sys/devices/system/cpu/ rd_bk 1 
6: intel_pstate, C:0 MN: 0 MX:10 ST:1 pt:/sys/devices/system/cpu/intel_pstate/ rd_bk 1 
7: LCD, C:0 MN: 0 MX:4905 ST:490 pt:/sys/class/backlight/intel_backlight/ rd_bk 1 
Sorted trip dump :
index 0: type:critical temp:102000 hyst:1 zone id:0 sensor id:0 cdev size:0
trip type: 0 temp: 102000 
sysfs write failed trip_point_0_temp
thd_read_default_thermal_zones loaded 1 zones 
zone cpu will be created 
dts zone /sys/devices/platform/coretemp.0/name doesn't exist
/sys/class/hwmon/hwmon0/name->coretemp
Core temp DTS :critical 100000, max 84000
Read set point 0
node type: Element, name: CoolingDevice value: rapl_controller
node type: Element, name: CoolingDevice value: intel_pstate
node type: Element, name: CoolingDevice value: intel_powerclamp
node type: Element, name: CoolingDevice value: cpufreq
node type: Element, name: CoolingDevice value: Processor
CDEVS order specified in thermal-cpu-cdev-order.xml
Sorted trip dump :
index 1: type:passive temp:84000 hyst:0 zone id:2 sensor id:65535 cdev size:4
cdev[0] rapl_controller
cdev[1] intel_pstate
cdev[2] intel_powerclamp
cdev[3] Processor
index 0: type:max temp:91000 hyst:0 zone id:2 sensor id:65535 cdev size:4
cdev[0] rapl_controller
cdev[1] intel_pstate
cdev[2] intel_powerclamp
cdev[3] Processor
trip type: 2 temp: 84000 
trip type: 1 temp: 91000 
Read set point 0
Read set point 0
Product Name matched [wildcard]
XML zone: invalid sensor type TSKN
Zone update failed: unable to bind 
Zone 0: acpitz, Active:0 Bind:0 Sensor_cnt:1
..sensors.. 
sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:1 
..trips.. 
index 0: type:critical temp:102000 hyst:1 zone id:0 sensor id:0 cdev size:0
index 1: type:polling temp:97000 hyst:0 zone id:0 sensor id:0 cdev size:0
Zone 2: cpu, Active:1 Bind:0 Sensor_cnt:1
..sensors.. 
sensor index:1 x86_pkg_temp /sys/class/thermal/thermal_zone1/ Async:1 
..trips.. 
index 1: type:passive temp:84000 hyst:0 zone id:2 sensor id:65535 cdev size:4
cdev[0] rapl_controller
cdev[1] intel_pstate
cdev[2] intel_powerclamp
cdev[3] Processor
index 0: type:max temp:91000 hyst:0 zone id:2 sensor id:65535 cdev size:4
cdev[0] rapl_controller
cdev[1] intel_pstate
cdev[2] intel_powerclamp
cdev[3] Processor
index 2: type:polling temp:72000 hyst:0 zone id:2 sensor id:1 cdev size:0
FD = 6
Current user preference is 0
thd_engine_thread begin
Set : 91000, 71000, 5, 0, 27750
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
Read set point 0
sysfs write failed trip_point_0_temp
Set : 84000, 90000, 5, 1850, 27750
consecutive call, increment exponentially state 5550
Set : 84000, 89000, 5, 5550, 27750
Set : 84000, 82000, 5, 3700, 27750
Set : 91000, 90000, 5, 1850, 27750
Set : 84000, 90000, 5, 3700, 27750
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
Read set point 0
Comment 70 Srinivas Pandruvada 2016-10-03 18:41:36 UTC
Did it stuck at 400 Mhz. thermald log shows still you were over max temperature, so this is not a bad case.

Anyway can you try this?
When you start thermald add option to enable dbus, which systemd unit file would have done this.


# thermald --no-daemon --loglevel=info --dbus-enable

Change one persistant setting (Make sure root)

# dbus-send --system --dest=org.freedesktop.thermald /org/freedesktop/thermald org.freedesktop.thermald.SetUserPassiveTemperature string:cpu uint32:80000

And repeat tests.
Comment 71 Larry Finger 2016-10-03 18:56:24 UTC
My setup must be different:

linux-1t8h:~ # dbus-send --system --dest=org.freedesktop.thermald /org/freedesktop/thermald
Usage: dbus-send [--help] [--system | --session | --bus=ADDRESS | --peer=ADDRESS] [--dest=NAME] [--type=TYPE] [--print-reply[=literal]] [--reply-timeout=MSEC] <destination object path> <message name> [contents ...]
Comment 72 Larry Finger 2016-10-03 18:58:17 UTC
I missed the line-wrapped part. It is now working.
Comment 73 Larry Finger 2016-10-03 19:03:20 UTC
This time, I can see the CPU clock cycling, and the fan runs intermittently, not all the time. The thermald log info is

finger@linux-1t8h:~> sudo thermald --no-daemon --loglevel=info --dbus-enable
NO RAPL sysfs present 
13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
Running on a vanilla kernel
Polling mode is enabled: 4
sensor_update: type x86_pkg_temp
sensor_update: type acpitz
thd_read_default_thermal_sensors loaded 2 sensors 
dts /sys/devices/platform/coretemp.0/name doesn't exist
failed to open /dev/acpi_thermal_rel 
failed to open /dev/acpi_thermal_rel 
TRT/ART read failed
 Dumping parsed XML Data
 *** Index 0 ***
Name: GenericX86LaptopDevice
UUID: 
type: 0
        Sensor 0 
         Name: TSKN
         Path: 
         Async Capable: 1
         Virtual: 0
        Zone 0 
         Name: SKIN
                 Trip Point 0 
                  temp 55000 
                  trip type 2 
                  hyst id 0 
                  sensor type TSKN 
                  cdev index 0 
                          type rapl_controller 
                          influence 100 
                          SamplingPeriod 16 
                  cdev index 1 
                          type intel_powerclamp 
                          influence 100 
                          SamplingPeriod 12 
 *** Index 1 ***
Name: ExamplePlatformName
UUID: ExampleUUID
type: 0
        Sensor 0 
         Name: TSKN
         Path: 
         Async Capable: 1
         Virtual: 0
        Sensor 1 
         Name: example_sensor_1
         Path: /some_path
         Async Capable: 0
         Virtual: 0
        Sensor 2 
         Name: example_thermal_sysfs_sensor
         Path: 
         Async Capable: 1
         Virtual: 0
        Sensor 3 
         Name: example_virtual_sensor
         Path: 
         Async Capable: 0
         Virtual: 1
                 Link type: example_sensor_1
                 Link mult: 0.500000
                 Link offset: 10.000000
        Zone 0 
         Name: ExampleZonetype
                 Trip Point 0 
                  temp 75000 
                  trip type 1 
                  hyst id 0 
                  sensor type example_sensor_1 
                  cdev index 0 
                          type example_cooling_device 
                          influence 100 
                          SamplingPeriod 12 
        Cooling Dev 0 
                Type: example_cooling_device
                Path: 
                Min: 0
                Max: 50
                Step: 10
                AutoDownControl: 0
         PID: Kp 0.001000
         PID: Ki 0.000100
         PID: Kd 0.000100
Product Name matched [wildcard]
sensor id 5: No temp sysfs for reading raw temp
sensor index:1 x86_pkg_temp /sys/class/thermal/thermal_zone1/ Async:1 
sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:1 
sensor index:2 hwmon /sys/class/hwmon/hwmon0/temp1_input Async:0 
sensor index:3 hwmon /sys/class/hwmon/hwmon0/temp2_input Async:0 
sensor index:4 hwmon /sys/class/hwmon/hwmon0/temp3_input Async:0 
thd_read_default_cooling devices loaded 5 cdevs 
powercap RAPL no long term time window
RAPL device for cpu 0
Use Default pstate drv settings
Product Name matched [wildcard]
3: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
1: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
4: intel_powerclamp, C:-1 MN: 0 MX:50 ST:5 pt:/sys/class/thermal/ rd_bk 0 
2: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
0: Processor, C:0 MN: 0 MX:10 ST:1 pt:/sys/class/thermal/ rd_bk 0 
5: rapl_controller, C:0 MN: 0 MX:0 ST:1850 pt:/sys/devices/system/cpu/ rd_bk 1 
6: intel_pstate, C:0 MN: 0 MX:10 ST:1 pt:/sys/devices/system/cpu/intel_pstate/ rd_bk 1 
7: LCD, C:0 MN: 0 MX:4905 ST:490 pt:/sys/class/backlight/intel_backlight/ rd_bk 1 
Sorted trip dump :
index 0: type:critical temp:102000 hyst:1 zone id:0 sensor id:0 cdev size:0
trip type: 0 temp: 102000 
sysfs write failed trip_point_0_temp
thd_read_default_thermal_zones loaded 1 zones 
zone cpu will be created 
dts zone /sys/devices/platform/coretemp.0/name doesn't exist
/sys/class/hwmon/hwmon0/name->coretemp
Core temp DTS :critical 100000, max 84000
Read set point 0
node type: Element, name: CoolingDevice value: rapl_controller
node type: Element, name: CoolingDevice value: intel_pstate
node type: Element, name: CoolingDevice value: intel_powerclamp
node type: Element, name: CoolingDevice value: cpufreq
node type: Element, name: CoolingDevice value: Processor
CDEVS order specified in thermal-cpu-cdev-order.xml
Sorted trip dump :
index 1: type:passive temp:84000 hyst:0 zone id:2 sensor id:65535 cdev size:4
cdev[0] rapl_controller
cdev[1] intel_pstate
cdev[2] intel_powerclamp
cdev[3] Processor
index 0: type:max temp:91000 hyst:0 zone id:2 sensor id:65535 cdev size:4
cdev[0] rapl_controller
cdev[1] intel_pstate
cdev[2] intel_powerclamp
cdev[3] Processor
trip type: 2 temp: 84000 
trip type: 1 temp: 91000 
Read set point 0
Read set point 0
Product Name matched [wildcard]
XML zone: invalid sensor type TSKN
Zone update failed: unable to bind 
Zone 0: acpitz, Active:0 Bind:0 Sensor_cnt:1
..sensors.. 
sensor index:0 acpitz /sys/class/thermal/thermal_zone0/ Async:1 
..trips.. 
index 0: type:critical temp:102000 hyst:1 zone id:0 sensor id:0 cdev size:0
index 1: type:polling temp:97000 hyst:0 zone id:0 sensor id:0 cdev size:0
Zone 2: cpu, Active:1 Bind:0 Sensor_cnt:1
..sensors.. 
sensor index:1 x86_pkg_temp /sys/class/thermal/thermal_zone1/ Async:1 
..trips.. 
index 1: type:passive temp:84000 hyst:0 zone id:2 sensor id:65535 cdev size:4
cdev[0] rapl_controller
cdev[1] intel_pstate
cdev[2] intel_powerclamp
cdev[3] Processor
index 0: type:max temp:91000 hyst:0 zone id:2 sensor id:65535 cdev size:4
cdev[0] rapl_controller
cdev[1] intel_pstate
cdev[2] intel_powerclamp
cdev[3] Processor
index 2: type:polling temp:72000 hyst:0 zone id:2 sensor id:1 cdev size:0
FD = 7
Current user preference is 0
thd_engine_thread begin
sysfs write failed trip_point_0_temp
Set : 91000, 79000, 5, 0, 27750
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
Read set point 0
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
Read set point 0
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
Read set point 0
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
Read set point 0
Setting psv 80000
sysfs write failed trip_point_0_temp
Dropped below poll threshold 
sysfs write failed trip_point_0_temp
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
thd_trip_cdev_state_reset 
thd_trip_cdev_state_reset index 3:Processor
thd_trip_cdev_state_reset index 4:intel_powerclamp
thd_trip_cdev_state_reset index 6:intel_pstate
thd_trip_cdev_state_reset index 5:rapl_controller
Read set point 0
sysfs write failed trip_point_0_temp
Set : 84000, 87000, 5, 1850, 27750
consecutive call, increment exponentially state 5550
Set : 84000, 89000, 5, 5550, 27750
consecutive call, increment exponentially state 9250
Set : 84000, 89000, 5, 9250, 27750
consecutive call, increment exponentially state 16650
Set : 84000, 87000, 5, 16650, 27750
Set : 84000, 77000, 5, 14800, 27750
Set : 91000, 78000, 5, 12950, 27750
Set : 91000, 83000, 5, 11100, 27750
Set : 91000, 88000, 5, 9250, 27750
Set : 84000, 88000, 5, 11100, 27750
consecutive call, increment exponentially state 14800
Set : 84000, 86000, 5, 14800, 27750
Set : 84000, 80000, 5, 12950, 27750
Set : 91000, 82000, 5, 11100, 27750
Set : 91000, 85000, 5, 9250, 27750
Set : 84000, 85000, 5, 11100, 27750
consecutive call, increment exponentially state 14800
Set : 84000, 86000, 5, 14800, 27750
Set : 84000, 79000, 5, 12950, 27750
Set : 91000, 82000, 5, 11100, 27750
Set : 91000, 86000, 5, 9250, 27750
Set : 84000, 86000, 5, 11100, 27750
consecutive call, increment exponentially state 14800
Set : 84000, 84000, 5, 14800, 27750
Set : 84000, 80000, 5, 12950, 27750
Set : 91000, 82000, 5, 11100, 27750
Set : 91000, 86000, 5, 9250, 27750
Set : 84000, 86000, 5, 11100, 27750
consecutive call, increment exponentially state 14800
Set : 84000, 86000, 5, 14800, 27750
Set : 84000, 80000, 5, 12950, 27750
Set : 91000, 82000, 5, 11100, 27750
Set : 91000, 86000, 5, 9250, 27750
Set : 84000, 86000, 5, 11100, 27750
consecutive call, increment exponentially state 14800
Set : 84000, 86000, 5, 14800, 27750
Set : 84000, 79000, 5, 12950, 27750
Set : 91000, 83000, 5, 11100, 27750
Set : 91000, 86000, 5, 9250, 27750
Set : 84000, 86000, 5, 11100, 27750
consecutive call, increment exponentially state 14800
Set : 84000, 86000, 5, 14800, 27750
Set : 84000, 79000, 5, 12950, 27750
Set : 91000, 82000, 5, 11100, 27750
Set : 91000, 86000, 5, 9250, 27750
Set : 84000, 86000, 5, 11100, 27750
consecutive call, increment exponentially state 14800
Set : 84000, 85000, 5, 14800, 27750
Set : 84000, 79000, 5, 12950, 27750
Set : 91000, 83000, 5, 11100, 27750
Set : 91000, 85000, 5, 9250, 27750
Set : 84000, 85000, 5, 11100, 27750
consecutive call, increment exponentially state 14800
Comment 74 Srinivas Pandruvada 2016-10-03 19:09:20 UTC
I suggest restart daemon. I want to see whether the setting is stored correctly. Do you still see 400MHz?
Comment 75 Larry Finger 2016-10-03 19:19:35 UTC
Before the restart of thermals, the following modified version of Doug's script:

#! /bin/dash
#
# temp_mon Smythies 2016.09.29
#       Monitor Package temperatures.
#
echo ... begin package temperature monitoring ...
T=`sensors | grep Physical`
S=`hwinfo --cpu | grep MHz | uniq`
echo $T$S > temp_speed
while [ 1 ];do
  sleep 1
  T=`sensors | grep Physical`
  S=`hwinfo --cpu | grep MHz | uniq`
  echo $T$S >> temp_speed
done

The output is as follows:

Physical id 0: +85.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz
Physical id 0: +87.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3333 MHz Clock: 3341 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3341 MHz
Physical id 0: +85.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3113 MHz Clock: 3116 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3111 MHz Clock: 3112 MHz Clock: 3111 MHz Clock: 3116 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3100 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3118 MHz Clock: 3125 MHz Clock: 3116 MHz Clock: 3125 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3250 MHz Clock: 3243 MHz Clock: 3250 MHz
Physical id 0: +83.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3250 MHz
Physical id 0: +84.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3347 MHz Clock: 3350 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3342 MHz Clock: 3350 MHz Clock: 3341 MHz
Physical id 0: +85.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz Clock: 3345 MHz Clock: 3343 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz Clock: 3337 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3344 MHz Clock: 3336 MHz Clock: 3344 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3115 MHz Clock: 3116 MHz
Physical id 0: +80.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3108 MHz Clock: 3116 MHz Clock: 3108 MHz
Physical id 0: +80.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3116 MHz
Physical id 0: +82.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3247 MHz Clock: 3239 MHz Clock: 3247 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3241 MHz
Physical id 0: +83.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3246 MHz Clock: 3250 MHz
Physical id 0: +84.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz Clock: 3341 MHz Clock: 3342 MHz Clock: 3350 MHz
Physical id 0: +85.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz Clock: 3342 MHz Clock: 3341 MHz Clock: 3350 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz
Physical id 0: +85.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3341 MHz Clock: 3350 MHz Clock: 3341 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3341 MHz Clock: 3342 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3342 MHz Clock: 3340 MHz Clock: 3342 MHz Clock: 3350 MHz
Physical id 0: +86.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3350 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3114 MHz Clock: 3116 MHz
Physical id 0: +79.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3125 MHz Clock: 3116 MHz Clock: 3125 MHz Clock: 3116 MHz
Physical id 0: +80.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3116 MHz Clock: 3108 MHz Clock: 3116 MHz
Physical id 0: +82.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3253 MHz Clock: 3233 MHz Clock: 3253 MHz Clock: 3233 MHz
t

After restarting thermald, the fan never runs as the CPU frequency is help lower. The output of the script is

Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3100 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3116 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3125 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3108 MHz Clock: 3112 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3118 MHz Clock: 3108 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3116 MHz Clock: 3105 MHz
Physical id 0: +78.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2853 MHz Clock: 2850 MHz Clock: 2853 MHz
Physical id 0: +77.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2859 MHz Clock: 2850 MHz Clock: 2858 MHz
Physical id 0: +76.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2850 MHz Clock: 2853 MHz
Physical id 0: +78.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3000 MHz
Physical id 0: +79.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2990 MHz Clock: 2991 MHz
Physical id 0: +78.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2980 MHz
Physical id 0: +79.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2983 MHz Clock: 2991 MHz Clock: 2983 MHz
Physical id 0: +78.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2991 MHz Clock: 2983 MHz Clock: 2991 MHz
Physical id 0: +79.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2991 MHz Clock: 2985 MHz
Physical id 0: +80.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2983 MHz Clock: 2991 MHz Clock: 2983 MHz
Physical id 0: +80.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3108 MHz Clock: 3116 MHz Clock: 3114 MHz
Physical id 0: +81.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3108 MHz
Physical id 0: +82.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3108 MHz Clock: 3125 MHz
Physical id 0: +83.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3107 MHz Clock: 3100 MHz Clock: 3108 MHz
Physical id 0: +82.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3116 MHz
Physical id 0: +82.0°C (high = +84.0°C, crit = +100.0°C) Clock: 3108 MHz Clock: 3100 MHz Clock: 3108 MHz
Physical id 0: +78.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2842 MHz Clock: 2841 MHz Clock: 2833 MHz Clock: 2841 MHz
Physical id 0: +77.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2846 MHz Clock: 2850 MHz Clock: 2841 MHz Clock: 2850 MHz
Physical id 0: +78.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2843 MHz Clock: 2842 MHz Clock: 2841 MHz
Physical id 0: +77.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2844 MHz Clock: 2833 MHz Clock: 2841 MHz
Physical id 0: +80.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2991 MHz Clock: 2983 MHz Clock: 2991 MHz
Physical id 0: +79.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2989 MHz Clock: 2985 MHz Clock: 2989 MHz Clock: 2991 MHz
Physical id 0: +79.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2991 MHz Clock: 2988 MHz Clock: 2991 MHz
Physical id 0: +79.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2983 MHz Clock: 2991 MHz Clock: 2983 MHz
Physical id 0: +78.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2983 MHz Clock: 3000 MHz Clock: 2983 MHz Clock: 3000 MHz
Physical id 0: +80.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2983 MHz
Physical id 0: +80.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2975 MHz Clock: 2988 MHz Clock: 2975 MHz
Physical id 0: +74.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2666 MHz Clock: 2671 MHz
Physical id 0: +75.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2670 MHz Clock: 2675 MHz Clock: 2670 MHz Clock: 2675 MHz
Physical id 0: +76.0°C (high = +84.0°C, crit = +100.0°C) Clock: 2670 MHz Clock: 2675 MHz
t
Comment 76 Srinivas Pandruvada 2016-10-03 19:28:57 UTC
So I guess you are not seeing 400MHz. But since the SMM is very aggressive and get stuck at max temperature, the default thermald setting is not enough.

So I guess you need to play with temperature "uint32:80000"as high as possible.

I don't know if OpenSuse has ThermalMonitor tool. If it is there you can look at temperature graph and adjust from GUI.
Comment 77 Larry Finger 2016-10-03 19:51:16 UTC
No 400 anymore, or at least only briefly

I could not find a ThermalMonitor tool for openSUSE. Do you have a reference to the source so that I can build it myself?
Comment 79 Larry Finger 2016-10-05 16:32:02 UTC
Unfortunately, openSUSE Leap 42.1 uses Qt4 by default. The repositories have Qt5, but I got into dependency hell when I tried to install it. For the moment, I will not be able to run the ThermalMonitor tool. I will be changing to Leap 42.2 soon, which may have Qt5 as default.

In any case, using "uint32:80000" seems to be a satisfactory solution. Under heavy load, the average clock from my applet runs between 3100 and 3200 MHz. Under those conditions, the CPU fan comes on, but never runs as fast as before. The CPU temperature is 80 C, not 90.
Comment 80 Srinivas Pandruvada 2016-10-05 16:47:14 UTC
The laptops running into thermal issues is not unusual. Hence Ubuntu decided to start thermald by default. May be opensuse should do too.
Comment 81 Doug Smythies 2016-10-06 06:53:27 UTC
Created attachment 240941 [details]
simple script to monitor temerature and cpu0 frequency

Larry, I've been using the attached to watch temp and frequency. You will have to edit it, since your TCC is 100 degrees (and yes, I should just get the script to read TCC for itself.)

I agree that computers should never hit this level of thermal protection, but given that yours did, this bug report leaves curious questions (at least for me). Today, I tried again to get my computer to trigger a PROCHOT event, but just couldn't. I had the maximum offset of 15 degrees (98 - 15 = 83), but the event didn't trigger when I got to 84 degrees. Then I thought maybe the 4 bit offset might be signed, so entered 7 (98-7 = 91), and it still didn't trigger when I got to 92. While I think all of the bits are set properly, I must have missed something. From your turbostat output, I see that your offset was 5 degrees (100 - 5 = 95), so I do not know why it seemed to trigger at 91 degrees.
Comment 82 Larry Finger 2016-10-06 19:13:24 UTC
The output of your script is as follows:

... begin package temperature monitoring ...
85   3330604
84   3334262
78   3066760
78   3099998
78   3124875
77   3199999
79   3220531
79   3233561
80   3229969
81   3341346
80   3308295
82   3343933
82   3316504
81   3199998
80   3200000
80   3199998
81   3200000
83   3334629
83   3309117
83   3369234
84   3340911
86   3430973
87   3450887
88   3441326
87   3441968
87   3409277
85   3348904
85   3315787
84   3316733
83   3300000
86   3441717
87   3416608
87   3466716
87   3399999
83   3308387
82   3362526
84   3374828
82   3345226
85   3457484
85   3449628

I did edit the script and change 98 => 100. While the script was running, I was running mksquashfs with a 150 MB file as output. Since I started running thermald, my CPU shows a lot better temperature control. It has not gotten into the "locked-at-400" state.
Comment 83 Rafael J. Wysocki 2016-10-10 20:34:47 UTC
OK, it looks like running thermald allows the problem to be addressed and I don't see anything to do on the kernel side.

Closing.
Comment 84 RocketSam 2023-03-10 16:32:04 UTC
Blacklisting intel_rapl_msr module helped me. I didn't test it well yet but at first sight it had fixed the issue for me.

My Laptop is Dell Latitude 7430 with i5-1235U running Fedora 37.

For testing you may
$ sudo rmmod intel_rapl_msr

For constant changes create file /etc/modprobe.d/intel_rapl_msr-blacklist.conf with these contents:
blacklist intel_rapl_msr

Note You need to log in before you can comment on or make changes to this bug.