Bug 97261
Summary: | Intel P-State driver does not honor no_turbo with opengl workload | ||
---|---|---|---|
Product: | Power Management | Reporter: | Martin Steigerwald (Martin) |
Component: | intel_pstate | Assignee: | Chen Yu (yu.c.chen) |
Status: | RESOLVED WILL_NOT_FIX | ||
Severity: | normal | CC: | dsmythies, lenb, rui.zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
acpifreq + glxgear measurements with turbostat
acpifreq + glxgear measurements with turbostat pstate + glxgear measurements with turbostat tests with turbostat -d sleep 10 and with and without tlp |
Description
Martin Steigerwald
2015-04-25 15:56:49 UTC
Okay, I see that is not limited to the openmw window opened. I now have seen it at higher frequences even without it being open. Can it be that I need to limit GPU as well? Does CPU raise frequency on GPU usage? Anyway to limit max temp to say 90 or 85 degrees celsius without throttling or powerclamp by just allowing it not go higher than a certain speed in the first time? Anyway to tell the driver "hey, this laptop has a heating issues, be gentle to it instead of bringing it up to 98 degrees?" I´d rather have a constant lower performance for now, than high performance, hit throttling or idle injection, then low performance. I rather like a constant lower performance instead of the bumpy up and down. You might be able to use turbostat to determine if a lot of your heat is from the graphics or not. The acpi-cpufreq using the ondemand governor and intel-pstate using the powersave governor do have differing load verses CPU frequency response curves. We do not care if some game results in different temperatures using the different scaling methods. What we care about is your claim that disabling turbo doesn't work and that limits don't seem to be enforced. To investigate that further, you need to use an unmodified kernel, and I would suggest kernel 4.2RC2, so that you are testing with the most recent release candidate. Also, do not use any higher level tool, only use primitives to control the driver settings. With such a setup do the following: disable turbo: echo "1" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo Then fully load one CPU in one terminal: taskset -c 3 cat /dev/zero > /dev/null Then inquire as to CPU frequencies in another terminal: grep MHz /proc/cpuinfo Post the results here. Now, similarly for a limiting test: Set an upper frequency limit: echo "60" | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct And repeat the above load and inquire steps. Post the results here. I meant to write kernel 4.1RC2 above where I wrote 4.2RC2. Can you please tell me which scaling governor you are using? # cat /sys/devices/systrem/cpu/cpu0/cpufreq/scaling_governor merkaba:~> cat /proc/version Linux version 4.0.1-tp520-btrfs-trim-norace+ (martin@merkaba) (gcc version 4.9.2 (Debian 4.9.2-15) ) #27 SMP PREEMPT Sun May 3 12:24:38 CEST 2015 (Just two little BTRFS patches, do not want to spend evening with kernel compiling and rc2 would still have broken hibernation with klibc based initramfs) merkaba:/sys/devices/system/cpu> grep . cpu?/cpufreq/scaling_{governor,driver} cpu0/cpufreq/scaling_governor:powersave cpu1/cpufreq/scaling_governor:powersave cpu2/cpufreq/scaling_governor:powersave cpu3/cpufreq/scaling_governor:powersave cpu0/cpufreq/scaling_driver:intel_pstate cpu1/cpufreq/scaling_driver:intel_pstate cpu2/cpufreq/scaling_driver:intel_pstate cpu3/cpufreq/scaling_driver:intel_pstate Workload: merkaba:~> taskset -c 3 cat /dev/zero > /dev/null merkaba:/sys/devices/system/cpu/intel_pstate> grep . * max_perf_pct:100 min_perf_pct:25 no_turbo:0 num_pstates:25 turbo_pct:29 merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 3000.000 cpu MHz : 2884.667 cpu MHz : 3189.355 cpu MHz : 3179.101 Why all cores at those high frequencies? merkaba:/sys/devices/system/cpu/intel_pstate> echo 1 > no_turbo merkaba:/sys/devices/system/cpu/intel_pstate> merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 2499.804 cpu MHz : 2499.902 cpu MHz : 2499.707 cpu MHz : 2499.902 Okay, this seems to work, will test with PlaneShift later on. But also here, why all cores at those high frequencies? According to atop only one is fully utilized, others mostly idle. merkaba:/sys/devices/system/cpu/intel_pstate> echo 50 > max_perf_pct merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1599.902 cpu MHz : 1599.902 cpu MHz : 1599.902 cpu MHz : 1599.902 merkaba:/sys/devices/system/cpu/intel_pstate> grep . ../cpu?/cpufreq/scaling_cur_freq ../cpu0/cpufreq/scaling_cur_freq:1599902 ../cpu1/cpufreq/scaling_cur_freq:1599902 ../cpu2/cpufreq/scaling_cur_freq:1600976 ../cpu3/cpufreq/scaling_cur_freq:1599902 So works, from 3,2 GHz maximum with turbo as 100%. But still all fully loaded to maximum of allowed percentage. merkaba:/sys/devices/system/cpu/intel_pstate> echo 1 > no_turbo merkaba:/sys/devices/system/cpu/intel_pstate> echo 50 > max_perf_pct merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1199.902 cpu MHz : 1199.902 cpu MHz : 1200.000 cpu MHz : 1199.902 Should be half of 2,5 GHz which it can do without turbo, but well, its lower than that so good I think. Now I take this setting and start PlaneShift while cat workload still running: merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1199.902 cpu MHz : 1200.000 cpu MHz : 1200.000 cpu MHz : 1199.902 merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1585.253 cpu MHz : 1570.898 cpu MHz : 1229.296 cpu MHz : 1601.953 merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1199.902 cpu MHz : 1199.902 cpu MHz : 1200.000 cpu MHz : 1200.000 merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 2940.039 cpu MHz : 2910.742 cpu MHz : 3042.773 cpu MHz : 2899.902 merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 3000.000 cpu MHz : 3000.000 cpu MHz : 3000.000 cpu MHz : 3000.000 merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 3000.000 cpu MHz : 3000.000 cpu MHz : 3021.386 cpu MHz : 3054.785 Whats this? After stopping PlaneShift: merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1200.000 cpu MHz : 1199.902 cpu MHz : 1200.000 cpu MHz : 1199.902 After stopping cat workload: merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1012.207 cpu MHz : 986.816 cpu MHz : 1075.488 cpu MHz : 1082.324 Now with PlaneShift again, but cat workload remains stopped. While PlaneShift is loading: merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1199.902 cpu MHz : 1200.000 cpu MHz : 1199.902 cpu MHz : 1200.000 On PlaneShift Login screen: merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 2209.765 cpu MHz : 2376.562 cpu MHz : 2169.335 cpu MHz : 1861.523 After selecting char and having char displayed in rotating view: merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 3148.046 cpu MHz : 3149.804 cpu MHz : 3073.144 cpu MHz : 3037.207 So its OpenGL load raising the frequencies here and neither no_turbo nor max_perf_pct are effective then anymore? Okay, one process of glxgears: merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1066.601 cpu MHz : 1019.628 cpu MHz : 1112.597 cpu MHz : 1183.105 Two: merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1066.601 cpu MHz : 1019.628 cpu MHz : 1112.597 cpu MHz : 1183.105 Ten processes: martin@merkaba:~> for (( I=0 ; I<10 ; I++ )); do glxgears & ; done [2] 3319 […] merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 1199.707 cpu MHz : 1200.000 cpu MHz : 1194.238 cpu MHz : 1200.000 Okay, is framerate capped, but still ten should use more GPU resources? Okay, without cap: martin@merkaba:~#1> vblank_mode=0 glxgears ATTENTION: default value of option vblank_mode overridden by environment. ATTENTION: default value of option vblank_mode overridden by environment. 22453 frames in 5.0 seconds = 4490.521 FPS 23618 frames in 5.0 seconds = 4723.544 FPS 23628 frames in 5.0 seconds = 4725.493 FPS merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo cpu MHz : 2999.902 cpu MHz : 2999.804 cpu MHz : 2999.902 cpu MHz : 2999.902 So performance capping is not effective with OpenGL workload. If thats expected behavior it would be nice to have this documented. I didn´t find anything about this. And: Is there a way to limit GPU performance to avoid overheating. And 2: Still with acpi-cpufreq instead of P-State the machine overheats way less. What is the point in having it run into forced throttling over and over again? martin@merkaba:~> date Do 7. Mai 20:33:16 CEST 2015 martin@merkaba:~> cat /proc/version Linux version 4.0.1-tp520-btrfs-trim-norace+ (martin@merkaba) (gcc version 4.9.2 (Debian 4.9.2-15) ) #27 SMP PREEMPT Sun May 3 12:24:38 CEST 2015 Retesting with ACPI cpufreq and while it confirms the setting I made the CPU is still running hot. So its just that P-State is more honest about the actually used CPU frequency during OpenGL workload and it is not possible to limit it for OpenGL workloads with P-State or acpi-cpufreq? merkaba:/sys/devices/system/cpu> grep . cpufreq/ondemand/* cpufreq/ondemand/ignore_nice_load:0 cpufreq/ondemand/io_is_busy:1 cpufreq/ondemand/powersave_bias:0 cpufreq/ondemand/sampling_down_factor:1 cpufreq/ondemand/sampling_rate:10000 cpufreq/ondemand/sampling_rate_min:10000 cpufreq/ondemand/up_threshold:95 merkaba:/sys/devices/system/cpu#2> LANG=C grep . cpu0/cpufreq/* cpu0/cpufreq/affected_cpus:0 cpu0/cpufreq/bios_limit:2501000 cpu0/cpufreq/cpuinfo_cur_freq:1200000 cpu0/cpufreq/cpuinfo_max_freq:2501000 cpu0/cpufreq/cpuinfo_min_freq:800000 cpu0/cpufreq/cpuinfo_transition_latency:10000 cpu0/cpufreq/freqdomain_cpus:0 1 2 3 cpu0/cpufreq/related_cpus:0 cpu0/cpufreq/scaling_available_frequencies:2501000 2500000 2000000 1800000 1600000 1400000 1200000 1000000 800000 cpu0/cpufreq/scaling_available_governors:userspace powersave conservative ondemand performance cpu0/cpufreq/scaling_cur_freq:1200000 cpu0/cpufreq/scaling_driver:acpi-cpufreq cpu0/cpufreq/scaling_governor:ondemand cpu0/cpufreq/scaling_max_freq:2501000 cpu0/cpufreq/scaling_min_freq:800000 cpu0/cpufreq/scaling_setspeed:<unsupported> grep: cpu0/cpufreq/stats: Is a directory Others are like this. So no turbo: merkaba:/sys/devices/system/cpu> for CPU in $( ls -d cpu? ); do echo 2500000 > $CPU/cpufreq/scaling_max_freq ; done merkaba:/sys/devices/system/cpu> grep . cpu?/cpufreq/scaling_max_freq cpu0/cpufreq/scaling_max_freq:2500000 cpu1/cpufreq/scaling_max_freq:2500000 cpu2/cpufreq/scaling_max_freq:2500000 cpu3/cpufreq/scaling_max_freq:2500000 Appears (!) fine: merkaba:/sys/devices/system/cpu> grep MHz /proc/cpuinfo cpu MHz : 2500.000 cpu MHz : 800.000 cpu MHz : 1200.000 cpu MHz : 1400.000 So even lower freq: merkaba:/sys/devices/system/cpu> for CPU in $( ls -d cpu? ); do echo 2000000 > $CPU/cpufreq/scaling_max_freq ; done Appears (!) fine: merkaba:/sys/devices/system/cpu> grep MHz /proc/cpuinfo cpu MHz : 2000.000 cpu MHz : 1000.000 cpu MHz : 800.000 cpu MHz : 1400.000 So even lower freq: merkaba:/sys/devices/system/cpu> for CPU in $( ls -d cpu? ); do echo 800000 > $CPU/cpufreq/scaling_max_freq ; done Appears (!) fine: merkaba:/sys/devices/system/cpu> grep MHz /proc/cpuinfo cpu MHz : 800.000 cpu MHz : 800.000 cpu MHz : 800.000 cpu MHz : 800.000 But verify cause it tells what is requested: Okay, seems it is not really running at 800 MHz: coretemp-isa-0000 Adapter: ISA adapter Physical id 0: +92.0°C (high = +86.0°C, crit = +100.0°C) Core 0: +90.0°C (high = +86.0°C, crit = +100.0°C) Core 1: +92.0°C (high = +86.0°C, crit = +100.0°C) thinkpad-isa-0000 Adapter: ISA adapter fan1: 3600 RPM intel_gpu_top shows: render busy: 51%: ██████████▎ render space: 55/131072 bitstream busy: 0%: bitstream space: 0/131072 blitter busy: 47%: █████████▌ blitter space: 38/131072 task percent busy GAM: 66%: █████████████▎ vert fetch: 2878392208 (7122278/sec) CS: 50%: ██████████ prim fetch: 1437331936 (3557106/sec) PSD: 42%: ████████▌ VS invocations: 2873060471 (7110179/sec) IZ: 38%: ███████▋ GS invocations: 0 (0/sec) DAP: 37%: ███████▌ GS prims: 0 (0/sec) RCPFE: 36%: ███████▎ CL invocations: 1434071848 (3549040/sec) RCC: 36%: ███████▎ CL prims: 1435679216 (3553073/sec) RCPBE: 36%: ███████▎ PS invocations: 197540761646 (473169366/sec) SVG: 36%: ███████▎ PS depth pass: 195051640182 (467011306/sec) HIZ: 35%: ███████ IC 3: 35%: ███████ IC 0: 35%: ███████ IC 1: 35%: ███████ IC 2: 35%: ███████ TD: 33%: ██████▋ WMFE: 32%: ██████▌ EU 10: 31%: ██████▎ EU 30: 31%: ██████▎ EU 00: 31%: ██████▎ EU 20: 30%: ██████ WMBE: 30%: ██████ EU 21: 30%: ██████ EU 31: 29%: █████▉ EU 11: 29%: █████▉ EU 01: 29%: █████▉ Message Arbiter 3: 16%: ███▎ Message Arbiter 2: 16%: ███▎ Message Arbiter 1: 16%: ███▎ Message Arbiter 0: 16%: ███▎ EU 32: 10%: ██ EU 02: 9%: █▉ EU 22: 9%: █▉ GS: 9%: █▉ EU 12: 9%: █▉ SVSM: 8%: █▋ SF: 8%: █▋ CL: 7%: █▌ RCZ: 7%: █▌ VS0: 7%: █▌ SVRW: 6%: █▎ ISC: 5%: █ merkaba:~> /usr/bin/intel_gpu_frequency cur: 1300 MHz min: 650 MHz RP1: 650 MHz max: 1300 MHz So the GPU has its own frequency. So why does P-State display higher CPU frequency when GPU runs intense OpenGL workload? (In reply to Martin Steigerwald from comment #5) > merkaba:/sys/devices/system/cpu/intel_pstate> grep . * > max_perf_pct:100 > min_perf_pct:25 > no_turbo:0 > num_pstates:25 > turbo_pct:29 > > merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo > > cpu MHz : 3000.000 > cpu MHz : 2884.667 > cpu MHz : 3189.355 > cpu MHz : 3179.101 > > Why all cores at those high frequencies? There is only one PLL on the processor. All CPUs are always at the same frequency, differences in what is displayed are due to them coming and going from C0 state and the PLL changes in between. Note that the information from the acpi-cpufreq scaling driver is extremely misleading with respect to this. > > > merkaba:/sys/devices/system/cpu/intel_pstate> echo 1 > no_turbo > merkaba:/sys/devices/system/cpu/intel_pstate> > > merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo > cpu MHz : 2499.804 > cpu MHz : 2499.902 > cpu MHz : 2499.707 > cpu MHz : 2499.902 > > Okay, this seems to work, Agreed, there is not problem here. > will test with PlaneShift later on. I don't care what results it gives. > > But also here, why all cores at those high frequencies? According to atop > only one is fully utilized, others mostly idle. See above. > > > Now I take this setting and start PlaneShift while cat workload still > running: > ... > merkaba:/sys/devices/system/cpu/intel_pstate> grep MHz /proc/cpuinfo > cpu MHz : 3000.000 > cpu MHz : 3000.000 > cpu MHz : 3021.386 > cpu MHz : 3054.785 > > Whats this? Don't know. Perhaps look at PlaneShift. > > Okay, is framerate capped, but still ten should use more GPU resources? Don't know. > > And: Is there a way to limit GPU performance to avoid overheating. Don't know. (In reply to Martin Steigerwald from comment #6) > > > But verify cause it tells what is requested: > > Okay, seems it is not really running at 800 MHz: > > coretemp-isa-0000 > Adapter: ISA adapter > Physical id 0: +92.0°C (high = +86.0°C, crit = +100.0°C) > Core 0: +90.0°C (high = +86.0°C, crit = +100.0°C) > Core 1: +92.0°C (high = +86.0°C, crit = +100.0°C) > Use turbostat to obtain real information about CPU frequencies. > So why does P-State display > higher CPU frequency when GPU runs intense OpenGL workload? I do not know, but suspect the programs you are using are messing with things in a non standard way. We would have to add a patch to the intel_pstate driver (accepted and queued for kernel 4.2RC1) and aqcuire some trace data to know for sure what is going on. Created attachment 176121 [details]
acpifreq + glxgear measurements with turbostat
Attached as text as Bugzilla would wordwrap this into a mess.
Oh, and about your suspicion, Doug, I was demonstrating with glxgears. Do you really think it uses things in a non standard way? I tested with PlaneShift and glxgears that are already two programs. I have seen this with openmw as well earlier already, so thats a third. Maybe its a technical limitation and frequencies can´t be capped in that case. Created attachment 176131 [details]
acpifreq + glxgear measurements with turbostat
Measured with higher maximum frequencies.
From what I can see acpi-cpufreq enforces the maximum frequency. The CPU won´t go into turbo mode at all tough during OpenGL workload, which makes sense, cause as I understood the CPU will go only as much into turbo mode as it can considering the GPU usage to prevent overheating.
Will repeat measurements with P-State.
Created attachment 176141 [details]
pstate + glxgear measurements with turbostat
Here are pstate measurements with turbostat.
pstate driver completely ignores no_turbo and max_perf_pct for glxgears = OpenGL workload.
Also note the considerably higher power consumption of both CPU core and GPU even when compared with acpi-cpufreq running with maximum scaling_max_freq.
P-State:
CPU core: 16-17 Watts (!!!)
GPU: 8-9 Watts
acpi-cpufreq:
CPU core: 12-13 Watts (!!!)
GPU: 7-8 Watts
So if these values are realistic I am totally not surprised that I run into a lot more overheating with P-State instead of acpi-cpufreq.
It may be that with cooling devices in top state this will give higher performance, but for this ThinkPad T520 it results in reduced performance.
And it ignores any attempt into limiting performance with OpenGL workloads, while acpi-cpufreq seems to respect a maximum frequency limit but even runs with less watts on maximum setting.
From attachment: > merkaba:/sys/devices/system/cpu/intel_pstate> echo "50" > max_perf_pct > It ignores max_perf_pct completely for OpenGL workloads. Are you root? How do you know it took it? I do this: echo "60" | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 100 doug@s15:~/temp$ echo "60" | sudo tee /sys/devices/system/cpu/intel_pstate/max_perf_pct 60 doug@s15:~/temp$ cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 60 doug@s15:~/temp$ Please add the patch (from another bug report) attachment 169891 [details] and then run: sudo perf record -a --event=power:pstate_sample sleep 600 and while that is running do your thing with limited max_pct. post the resulting perf.data here. Yes, I am root via su, Doug (the prompt doesn´t show a username in this case on Debian with bash). I tested on my first tests that it takes it. Also it does respect the setting when used with your cat /dev/zero workload. Or with stress -c. But it doesn´t respect it with OpenGL workload. So even without testing with cat that it took it each time, I am pretty confident that it did take it. Please post the output for: turbostat -d sleep 10 And: Do you have tlp installed? Doug, I am on acpi-cpufreq currently and shortly before going to sleep… so not today anymore. Yes, I have tlp installed. I used it to limit to 2 GHz with acpi-freq and I also did the permanent p-state limitations with it. But it just sets them once or on battery/ac change and I echo´d things manually for tests, and set things like I documented in test text files. O.K. Please post the output for: turbostat -d sleep 10 then uninstall tlp completely, then re-boot, then post the output from: turbostat -d sleep 10 It doesn't matter what you are doing during the 10 seconds, it is the debug output that is desired. Created attachment 176231 [details]
tests with turbostat -d sleep 10 and with and without tlp
As requested.
The kernel-patch thing I will do on my next kernel compile, I hope 4.1-rc3 will contain fix for broken resume from hibernation with klibc initramfs.
(In reply to Martin Steigerwald from comment #18) > Created attachment 176231 [details] > tests with turbostat -d sleep 10 and with and without tlp > > As requested. O.K. thanks very much. It is a dead end, as I am not seeing some differences that I thought might be there. > > The kernel-patch thing I will do on my next kernel compile, I hope 4.1-rc3 > will contain fix for broken resume from hibernation with klibc initramfs. O.K. I will be interested in the trace data. what's the status of this problem? Oh, I didn´t follow up on this anymore, after I fixed the underlying hardware issue by cleaning the fan with compressed air. If no one fixed it, I expect the underlying issue that it still uses turbo mode with high OpenGL usage is still there. I don´t mind tough, since the cooling works like new after the fan cleaning and thus the CPU does not throttle itself anymore. |