Bug 114551
Summary: | Regression: bogus passive trip point causes processors throttled unexpectedly - Lenovo G510, Y50-70 laptop | ||
---|---|---|---|
Product: | Power Management | Reporter: | greeenify |
Component: | Thermal | Assignee: | Zhang Rui (rui.zhang) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | cunio, dmatej, doaxan77, dsmythies, frolvlad, greeenify, johanneicher, lenb, marcin.j.nowak, megamak, rui.zhang, yu.c.chen |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.4.5-1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
attachment-997-0.html
This patch causes the regression Fix suggested by Zhang Rui |
Description
greeenify
2016-03-14 00:50:05 UTC
Hi 1. did you test with 100% cpuload on a single cpu? 2. does this problem still reproduce on latest 4.5? 3. could you plz provide: grep . /sys/devices/system/cpu/intel_pstate/* grep . /sys/devices/system/cpu/cpu3/cpufreq/* (before and after suspended) thx Hi, Thanks for your fast reply. I have to use my laptop during the week - will report back at the latest on the weekend. Hi, I can confirm the similar behaviour on my Lenovo Y50-70 laptop. I have narrowed this down to that 4.4.3 mainline kernel works fine, but 4.4.4 introduces the regression, but I don't have enough time to narrow this further down. Another Ubuntu user reported that the issue was backported to 4.2.0-28 .. 4.2.0-30 (4.2.0-27 worked fine) releases: http://askubuntu.com/questions/745087/cpu-clock-slower-after-each-resume-from-sleep I forgot to summarize here my findings which I have posted in my comments to the askubuntu question. I have up-to-date ArchLinux x86_64 and I have tested the following kernels: * Mainline 4.4.0 - no regression * Mainline 4.4.1 - no regression * Mainline 4.4.2 - no regression * Mainline 4.4.3 - no regression * Mainline 4.4.4 - regression starts here * Mainline 4.4.5 - regression is still here * Mainline 4.5rc7 - regression is still here * Mainline 4.5.0 - regression is still here My preferred kernel includes Liquorix patches, so the very first time I encountered the regression there, but I have built and checked vanilla kernels to confirm that the issue is not in the custom patches. *** Bug 113531 has been marked as a duplicate of this bug. *** Here is how CPU frequency changes on each suspend/resume (copied from the askubuntu question with some extra comments). Notice changes in "frequency should be within 800 MHz and XXX GHz.", "current CPU frequency is XXX MHz", and /sys/devices/system/cpu/intel_pstate/max_perf_pct. After a fresh boot: ========================================================================== root@alain-Y50-70:~# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 0.97 ms. hardware limits: 800 MHz - 3.60 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 800 MHz and 3.60 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency is 817 MHz (asserted by call to hardware). boost state support: Supported: yes Active: yes root@alain-Y50-70:~# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 100 ========================================================================== After the first suspend/resume: ========================================================================== root@alain-Y50-70:~# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 0.97 ms. hardware limits: 800 MHz - 3.60 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 800 MHz and 2.88 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency is 800 MHz (asserted by call to hardware). boost state support: Supported: yes Active: yes root@alain-Y50-70:~# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 80 ========================================================================== After the second: ========================================================================== root@alain-Y50-70:~# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 60 ========================================================================== After the third: ========================================================================== root@alain-Y50-70:~# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 40 ========================================================================== After the forth and on max_perf_pct stays 40, but current CPU frequency drops even further... ========================================================================== root@alain-Y50-70:~# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 40 root@alain-Y50-70:~# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 0.97 ms. hardware limits: 800 MHz - 3.60 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 800 MHz and 1.44 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency is 699 MHz (asserted by call to hardware). boost state support: Supported: yes Active: yes ========================================================================== ========================================================================== root@alain-Y50-70:~# cat /sys/devices/system/cpu/intel_pstate/max_perf_pct 40 root@alain-Y50-70:~# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 0.97 ms. hardware limits: 800 MHz - 3.60 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 800 MHz and 1.44 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency is 605 MHz (asserted by call to hardware). boost state support: Supported: yes Active: yes ========================================================================== The machine becomes noticeably less responsive even after 3 suspend/resume cycles, but going beyond that makes it almost unusable. I could go down to 200Mhz... You may also notice that "hardware limits" states minimum 800Mhz, but "current CPU frequency" goes even lower than that limit. I am confirming the same behaviour. I have tested kernel 4.4.5 and 4.5rc7 from Fedora on Lenovo G550 laptop with Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 2 On-line CPU(s) list: 0,1 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 23 Model name: Intel(R) Core(TM)2 Duo CPU T9300 @ 2.50GHz Stepping: 6 CPU MHz: 800.000 CPU max MHz: 2501.0000 CPU min MHz: 800.0000 BogoMIPS: 4987.79 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 6144K NUMA node0 CPU(s): 0,1 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm ida dtherm tpr_shadow vnmi flexpriority Please notice that my CPU does not use p-states but cpufreq analyzing CPU 0: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 10.0 us. hardware limits: 800 MHz - 2.50 GHz available frequency steps: 2.50 GHz, 2.50 GHz, 2.00 GHz, 1.60 GHz, 1.20 GHz, 800 MHz available cpufreq governors: conservative, userspace, powersave, ondemand, performance current policy: frequency should be within 800 MHz and 1.20 GHz. The governor "conservative" may decide which speed to use within this range. current CPU frequency is 1.20 GHz (asserted by call to hardware). boost state support: Supported: yes Active: yes analyzing CPU 1: driver: acpi-cpufreq CPUs which run at the same hardware frequency: 1 CPUs which need to have their frequency coordinated by software: 1 maximum transition latency: 10.0 us. hardware limits: 800 MHz - 2.50 GHz available frequency steps: 2.50 GHz, 2.50 GHz, 2.00 GHz, 1.60 GHz, 1.20 GHz, 800 MHz available cpufreq governors: conservative, userspace, powersave, ondemand, performance current policy: frequency should be within 800 MHz and 1.20 GHz. The governor "conservative" may decide which speed to use within this range. current CPU frequency is 1.20 GHz (asserted by call to hardware). boost state support: Supported: yes Active: yes Jacek Pawlyta noted (the comment was hard to spot it, actually) that he encountered the regression even with acpi-cpufreq driver while all previous reporters were using intel-pstate driver and assumed that it is related to it. I think, the issue title should be changed since the regression seems not to be limited to Haswell and intel-pstate (Jacek Pawlyta has Core2Duo). @Vlad: Yes, it is my understanding that this issue is not intel_pstate specific. @greenify: What brand and model is your computer? So far, I have only heard of this issue on Lenovo computers. Other references, in addition to the one Vlad gave: http://ubuntuforums.org/showthread.php?t=2316101 Created attachment 209231 [details] attachment-997-0.html Lenovo G510 On Tue, Mar 15, 2016, 17:11 <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=114551 > > Doug Smythies <dsmythies@telus.net> changed: > > What |Removed |Added > > ---------------------------------------------------------------------------- > CC| |dsmythies@telus.net > > --- Comment #9 from Doug Smythies <dsmythies@telus.net> --- > @Vlad: Yes, it is my understanding that this issue is not intel_pstate > specific. > > @greenify: What brand and model is your computer? So far, I have only > heard of > this issue on Lenovo computers. > > Other references, in addition to the one Vlad gave: > http://ubuntuforums.org/showthread.php?t=2316101 > > -- > You are receiving this mail because: > You are on the CC list for the bug. > You reported the bug. > I am confirming findings made by Vlad Frolov, Fedora's kernel-4.4.3-300.fc23.x86_64 works fine on my Lenovo G550 with C2Duo T9300 controlled by acpi-cpufreq The first broken Fedora kernel is 4.4.4 as well. I'd like to confirm that I have the same issue on my Lenovo y5070, with either the intel_pstate or acpi driver. Currently I am running the 4.2.0-34 kernel. So based on comments this issue has nothing to do intel_pstate as acpi_cpufreq has the same issue. Suspend/Resume can be broken by many kernel drivers. It is possible that each reporter here has different issue. Please do the following steps: - Upload dmesg which include both suspend and resume sequence - Test suspend/resume with kernel command line option to check if the built in drivers and core kernel is causing the issue, test_suspend=mem,10 - Once the resume run turbostat --debug -i 1 --msr=0x199 I'd like to confirm exactly the same issue on my Lenovo y580 laptop. I ran kernel 3.19 generic - no issues. As soon as I started using 4.2.0-34-generic (most recent version in Ubuntu 15.10 right now) the issue popped up. Using the older Kernel is my workaround atm. I have caught the misbehaving module! It is `thermal`! Doing `rmmod thermal && modprobe thermal` I immediately see the drop in CPU frequency! I'm going to revert the patches made to the `thermal` module in 4.4.4 patchset and recompile the kernel now. Created attachment 209611 [details]
This patch causes the regression
Reverting these patches resume/suspend and rmmod/modprobe `thermal` module resolves the regression!
@Vlad: Good work. Please be aware of this thread, from just a hew hours ago, about the same commit. http://marc.info/?t=145816738700001&r=1&w=2 Just for the reference, I have provided the following details to the kernel developers in the thread on the mailing list mentioned by @Doug: > Can you send me the output of "grep . /sys/class/thermal/*/*" both w/ and w/o > the broken patch series? 4.4.4 without thermal patches (deduplicated output): ================================================================= /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:10 /sys/class/thermal/cooling_device0/type:Processor ... (7 more cooling_deviceN groups with the same values as above) /sys/class/thermal/cooling_device8/cur_state:-1 /sys/class/thermal/cooling_device8/max_state:50 /sys/class/thermal/cooling_device8/type:intel_powerclamp /sys/class/thermal/thermal_zone0/available_policies:power_allocator user_space bang_bang fair_share step_wise /sys/class/thermal/thermal_zone0/cdev0_trip_point:2 /sys/class/thermal/thermal_zone0/cdev0_weight:0 ... (7 more cdevN_trip_point & cdevN_weight with the same values) /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/policy:step_wise /sys/class/thermal/thermal_zone0/temp:62000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:127000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:127000 /sys/class/thermal/thermal_zone0/trip_point_1_type:hot /sys/class/thermal/thermal_zone0/trip_point_2_temp:0 /sys/class/thermal/thermal_zone0/trip_point_2_type:passive /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/available_policies:power_allocator user_space bang_bang fair_share step_wise /sys/class/thermal/thermal_zone1/integral_cutoff:0 /sys/class/thermal/thermal_zone1/k_d:0 /sys/class/thermal/thermal_zone1/k_i:0 /sys/class/thermal/thermal_zone1/k_po:0 /sys/class/thermal/thermal_zone1/k_pu:0 /sys/class/thermal/thermal_zone1/offset:0 /sys/class/thermal/thermal_zone1/policy:user_space /sys/class/thermal/thermal_zone1/slope:0 /sys/class/thermal/thermal_zone1/sustainable_power:0 /sys/class/thermal/thermal_zone1/temp:59000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:72000 /sys/class/thermal/thermal_zone1/trip_point_0_type:passive /sys/class/thermal/thermal_zone1/trip_point_1_temp:0 /sys/class/thermal/thermal_zone1/trip_point_1_type:passive /sys/class/thermal/thermal_zone1/type:x86_pkg_temp ================================================================= original 4.4.4 (with thermal patches) (also deduplicated output): ================================================================= /sys/class/thermal/cooling_device0/cur_state:0 /sys/class/thermal/cooling_device0/max_state:10 /sys/class/thermal/cooling_device0/type:Processor ... (7 more cooling_deviceN groups with the same values as above) /sys/class/thermal/cooling_device8/cur_state:-1 /sys/class/thermal/cooling_device8/max_state:50 /sys/class/thermal/cooling_device8/type:intel_powerclamp /sys/class/thermal/thermal_zone0/available_policies:user_space bang_bang fair_share step_wise /sys/class/thermal/thermal_zone0/cdev0_trip_point:2 /sys/class/thermal/thermal_zone0/cdev0_weight:0 ... (7 more cdevN_trip_point & cdevN_weight with the same values) /sys/class/thermal/thermal_zone0/mode:enabled /sys/class/thermal/thermal_zone0/policy:step_wise /sys/class/thermal/thermal_zone0/temp:63000 /sys/class/thermal/thermal_zone0/trip_point_0_temp:127000 /sys/class/thermal/thermal_zone0/trip_point_0_type:critical /sys/class/thermal/thermal_zone0/trip_point_1_temp:127000 /sys/class/thermal/thermal_zone0/trip_point_1_type:hot /sys/class/thermal/thermal_zone0/trip_point_2_temp:0 /sys/class/thermal/thermal_zone0/trip_point_2_type:passive /sys/class/thermal/thermal_zone0/type:acpitz /sys/class/thermal/thermal_zone1/available_policies:user_space bang_bang fair_share step_wise /sys/class/thermal/thermal_zone1/integral_cutoff:0 /sys/class/thermal/thermal_zone1/k_d:0 /sys/class/thermal/thermal_zone1/k_i:0 /sys/class/thermal/thermal_zone1/k_po:0 /sys/class/thermal/thermal_zone1/k_pu:0 /sys/class/thermal/thermal_zone1/offset:0 /sys/class/thermal/thermal_zone1/policy:user_space /sys/class/thermal/thermal_zone1/slope:0 /sys/class/thermal/thermal_zone1/sustainable_power:0 /sys/class/thermal/thermal_zone1/temp:65000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:72000 /sys/class/thermal/thermal_zone1/trip_point_0_type:passive /sys/class/thermal/thermal_zone1/trip_point_1_temp:0 /sys/class/thermal/thermal_zone1/trip_point_1_type:passive /sys/class/thermal/thermal_zone1/type:x86_pkg_temp ================================================================= > What does it show here when performance drops? > grep . /sys/devices/system/cpu/intel_pstate/* # grep . /sys/devices/system/cpu/intel_pstate/* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:22 /sys/devices/system/cpu/intel_pstate/no_turbo:0 /sys/devices/system/cpu/intel_pstate/num_pstates:28 /sys/devices/system/cpu/intel_pstate/turbo_pct:36 # rmmod thermal # grep . /sys/devices/system/cpu/intel_pstate/* /sys/devices/system/cpu/intel_pstate/max_perf_pct:100 /sys/devices/system/cpu/intel_pstate/min_perf_pct:22 /sys/devices/system/cpu/intel_pstate/no_turbo:0 /sys/devices/system/cpu/intel_pstate/num_pstates:28 /sys/devices/system/cpu/intel_pstate/turbo_pct:36 # modprobe thermal # grep . /sys/devices/system/cpu/intel_pstate/* /sys/devices/system/cpu/intel_pstate/max_perf_pct:80 /sys/devices/system/cpu/intel_pstate/min_perf_pct:22 /sys/devices/system/cpu/intel_pstate/no_turbo:0 /sys/devices/system/cpu/intel_pstate/num_pstates:28 /sys/devices/system/cpu/intel_pstate/turbo_pct:36 > Is the problem still occurs if you set > /sys/class/thermal/thermal_zone*/mode to "disabled" Yes, the problem still occurs. (I have tested it just like above and the outcome is the same.) > please do the following test both w/ and w/o the patches, > 1. # echo 'module thermal_sys +fp' > /sys/kernel/debug/dynamic_debug/control > 2. rmmod and insmod thermal > 3. get the dmesg output With the thermal patches (original 4.4.4): ========================================================== [ 8354.627365] update_temperature: thermal thermal_zone0: last_temperature N/A, current_temperature=59000 [ 8354.627375] thermal_zone_trip_update: thermal thermal_zone0: Trip2[type=1,temp=0]:trend=0,throttle=1 [ 8354.627380] get_target_state: thermal cooling_device7: cur_state=2 [ 8354.627383] thermal_zone_trip_update: thermal cooling_device7: old_target=-1, target=3 [ 8354.627386] get_target_state: thermal cooling_device6: cur_state=2 [ 8354.627389] thermal_zone_trip_update: thermal cooling_device6: old_target=-1, target=3 [ 8354.627393] get_target_state: thermal cooling_device5: cur_state=2 [ 8354.627396] thermal_zone_trip_update: thermal cooling_device5: old_target=-1, target=3 [ 8354.627399] get_target_state: thermal cooling_device4: cur_state=2 [ 8354.627402] thermal_zone_trip_update: thermal cooling_device4: old_target=-1, target=3 [ 8354.627405] get_target_state: thermal cooling_device3: cur_state=2 [ 8354.627408] thermal_zone_trip_update: thermal cooling_device3: old_target=-1, target=3 [ 8354.627412] get_target_state: thermal cooling_device2: cur_state=2 [ 8354.627415] thermal_zone_trip_update: thermal cooling_device2: old_target=-1, target=3 [ 8354.627418] get_target_state: thermal cooling_device1: cur_state=2 [ 8354.627421] thermal_zone_trip_update: thermal cooling_device1: old_target=-1, target=3 [ 8354.627425] get_target_state: thermal cooling_device0: cur_state=2 [ 8354.627428] thermal_zone_trip_update: thermal cooling_device0: old_target=-1, target=3 [ 8354.627432] thermal_cdev_update: thermal cooling_device7: zone0->target=3 [ 8354.627441] thermal_cdev_update: thermal cooling_device7: set to state 3 [ 8354.627444] thermal_cdev_update: thermal cooling_device6: zone0->target=3 [ 8354.627451] thermal_cdev_update: thermal cooling_device6: set to state 3 [ 8354.627454] thermal_cdev_update: thermal cooling_device5: zone0->target=3 [ 8354.627461] thermal_cdev_update: thermal cooling_device5: set to state 3 [ 8354.627464] thermal_cdev_update: thermal cooling_device4: zone0->target=3 [ 8354.627471] thermal_cdev_update: thermal cooling_device4: set to state 3 [ 8354.627473] thermal_cdev_update: thermal cooling_device3: zone0->target=3 [ 8354.627480] thermal_cdev_update: thermal cooling_device3: set to state 3 [ 8354.627483] thermal_cdev_update: thermal cooling_device2: zone0->target=3 [ 8354.627490] thermal_cdev_update: thermal cooling_device2: set to state 3 [ 8354.627493] thermal_cdev_update: thermal cooling_device1: zone0->target=3 [ 8354.627501] thermal_cdev_update: thermal cooling_device1: set to state 3 [ 8354.627504] thermal_cdev_update: thermal cooling_device0: zone0->target=3 [ 8354.627511] thermal_cdev_update: thermal cooling_device0: set to state 3 [ 8354.627519] thermal LNXTHERM:00: registered as thermal_zone0 [ 8354.627521] ACPI: Thermal Zone [TZ00] (59 C) ========================================================== Without the thermal patches (4.4.4 without the patches [reverted]): ========================================================== [ 28.144010] update_temperature: thermal thermal_zone1: last_temperature=69000, current_temperature=63000 [ 34.154054] update_temperature: thermal thermal_zone1: last_temperature=63000, current_temperature=62000 [ 37.094852] update_temperature: thermal thermal_zone0: last_temperature=0, current_temperature=65000 [ 37.094857] thermal_zone_trip_update: thermal thermal_zone0: Trip2[type=1,temp=0]:trend=0,throttle=1 [ 37.094859] get_target_state: thermal cooling_device7: cur_state=0 [ 37.094860] thermal_zone_trip_update: thermal cooling_device7: old_target=-1, target=-1 [ 37.094862] get_target_state: thermal cooling_device6: cur_state=0 [ 37.094863] thermal_zone_trip_update: thermal cooling_device6: old_target=-1, target=-1 [ 37.094864] get_target_state: thermal cooling_device5: cur_state=0 [ 37.094865] thermal_zone_trip_update: thermal cooling_device5: old_target=-1, target=-1 [ 37.094867] get_target_state: thermal cooling_device4: cur_state=0 [ 37.094868] thermal_zone_trip_update: thermal cooling_device4: old_target=-1, target=-1 [ 37.094869] get_target_state: thermal cooling_device3: cur_state=0 [ 37.094870] thermal_zone_trip_update: thermal cooling_device3: old_target=-1, target=-1 [ 37.094872] get_target_state: thermal cooling_device2: cur_state=0 [ 37.094873] thermal_zone_trip_update: thermal cooling_device2: old_target=-1, target=-1 [ 37.094874] get_target_state: thermal cooling_device1: cur_state=0 [ 37.094875] thermal_zone_trip_update: thermal cooling_device1: old_target=-1, target=-1 [ 37.094877] get_target_state: thermal cooling_device0: cur_state=0 [ 37.094878] thermal_zone_trip_update: thermal cooling_device0: old_target=-1, target=-1 [ 37.094882] thermal LNXTHERM:00: registered as thermal_zone0 [ 37.094883] ACPI: Thermal Zone [TZ00] (65 C) ========================================================== Here is Srinivas's guess about the cause: > I think, the problem is your device has a passive trip temp of 0 > /sys/class/thermal/thermal_zone0/trip_point_2_temp:0 > /sys/class/thermal/thermal_zone0/trip_point_2_type:passive > Which triggers a false throttle = true. I think we should this trip as > invalid in the case of > if (tz->temperature >= trip_temp) {} check > in thermal_zone_trip_update(). P.S. I guess, the "Component" description of this issue should be changed from intel_pstate to thermal. Same problem with Lenovo Y510P: uname -a Linux dmatej-lenovo 4.2.0-34-generic #39-Ubuntu SMP Thu Mar 10 22:13:01 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux cpufreq-info cpufrequtils 008: cpufreq-info (C) Dominik Brodowski 2004-2009 Report errors and bugs to cpufreq@vger.kernel.org, please. analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 0.97 ms. hardware limits: 800 MHz - 3.10 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 800 MHz and 1.86 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency is 1.80 GHz. ... After several suspends I ended up with 580 MHz, even under the minimal limit (800-800 policy, now 800-1860). I can change the policy, but only inside this range; I suppose that it should be limited only by hardware limits ...? I tried also latest kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/ uname -a Linux dmatej-lenovo 4.5.0-040500-generic #201603140130 SMP Mon Mar 14 05:32:22 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux This is after 3 suspends (after the second one system waked up spontaneously). ... hardware limits: 800 MHz - 3.10 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 800 MHz and 1.24 GHz. ... I use the Lenovo y580 - a fallback to Kernel 4.2.0-19-generic is my current workaround for the problem... Created attachment 210111 [details]
Fix suggested by Zhang Rui
I have tested this fix and it works fine for both `rmmod & modprobe thermal` and suspend & resume use-cases.
I have been running the fix on top of 4.4.4 mainline kernel and 4.4.6 kernel with Liquorix patches since the last comment. I have not encountered any problems with the patch at all. When will it land to the mainline? Patch has been shipped in 4.6-rc1. Bug closed. commit 81ad4276b505e987dd8ebbdf63605f92cd172b52 Author: Zhang Rui <rui.zhang@intel.com> Date: Fri Mar 18 10:03:24 2016 +0800 Thermal: Ignore invalid trip points In some cases, platform thermal driver may report invalid trip points, thermal core should not take any action for these trip points. This fixed a regression that bogus trip point starts to screw up thermal control on some Lenovo laptops, after commit bb431ba26c5cd0a17c941ca6c3a195a3a6d5d461 Author: Zhang Rui <rui.zhang@intel.com> Date: Fri Oct 30 16:31:47 2015 +0800 Thermal: initialize thermal zone device correctly After thermal zone device registered, as we have not read any temperature before, thus tz->temperature should not be 0, which actually means 0C, and thermal trend is not available. In this case, we need specially handling for the first thermal_zone_device_update(). Both thermal core framework and step_wise governor is enhanced to handle this. And since the step_wise governor is the only one that uses trends, so it's the only thermal governor that needs to be updated. Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> CC: <stable@vger.kernel.org> #3.18+ Link: https://bugzilla.redhat.com/show_bug.cgi?id=1317190 Link: https://bugzilla.kernel.org/show_bug.cgi?id=114551 Signed-off-by: Zhang Rui <rui.zhang@intel.com> I have tested and can confirm that 4.6-rc1 works fine! Thank you, Zhang! Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz on Linux 4.6.0-1-MANJARO finally seems to work fine. God bless you! |