My fans are acting strangely since 3.13 upgrade. Behaviour on 3.12: Fans running pretty much all the time on 30%, temperatures 30-40 C. Behaviour on 3.13: Fans are idle until temperatures rise to 84 C (this is hot!), then ramp up to 75% (high noise) for a few seconds until temperatures drop to 72 C. Then they idle again. This seems pretty dangerous, because the threshold of 84 degrees is just too high. I'd be fine with 60. Laptop: macbook air 2013 OS: Archlinux
Seen with other systems as well. Additional information: https://bugs.archlinux.org/task/39005
e-mail exchange on the subject. On 2014-03-08 16:59, Guenter Roeck wrote: > On 03/08/2014 03:08 AM, Jean Delvare wrote: >> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote: >>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote: >>>> Hi, and thanks for the quick response! >>>> No special fancy "fan control policy". 'fancontrol' isn't up or >>>> running. >>>> Vanilla kernels 3.11.* and 3.12.* had been working on here >>>> without >>>> any extra work. >>>> -- >>>> # sensors >>>> acpitz-virtual-0 >>>> Adapter: Virtual device >>>> temp1: +71.0°C (crit = +256.0°C) >>>> temp2: +69.0°C (crit = +110.0°C) >>>> temp3: +52.0°C (crit = +105.0°C) >>>> temp4: +25.0°C (crit = +110.0°C) >>>> temp5: +58.0°C (crit = +110.0°C) >>>> >>>> coretemp-isa-0000 >>>> Adapter: ISA adapter >>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C) >>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C) >>>> -- >>>> My notebook (HP/Compaq 6730b) does not have a seperate fan >>>> sensor. >>>> This is with 3.12.13 with my normal workload. >>>> >>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I >>>> don't like to boot 3.13.6 anymore, to avoid harm to the >>>> notebook's >>>> casing. >>> >>> Understood. Unfortunately, we'll need to get information >>> from the new kernel to be able to track down the problem. >> >> Indeed. Not only the run-time temperatures, but also the high >> and crit >> limits. >> >>>> But I'd do to test any improvement-patch. >>> >>> So far I have no idea what is going on. I don't see anything >>> in the >>> drivers providing above data that would explain the behavior, >>> but I might be missing something. >> >> Looks like a regression in the acpi subsystem or in power >> management, >> not hwmon. Hwmon is merely reporting the temperatures, it's not >> responsible for the actual temperatures. >> > > I would agree. I don't think we have enough information to be sure, > though. There might be some unintended interaction or interference. > > gpu is a good hint ... for example, look at commit b9ed919f1c8 > (drm/nouveau/drm/pm: remove everything except the hwmon interfaces > to THERM). nouveau does export pwm and fan control information, > so any change in that code may have unintended side effects. > Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to > use devm_hwmon_register_with_groups) could have the observed impact, > as it is purely passive, but I prefer to be rather safe than sorry. > > This problem has now been submitted into bugzilla as > https://bugzilla.kernel.org/show_bug.cgi?id=71711. > > Guenter > Sorry, for beeing late, had to search for/accumulate much info for you... I hope, you like me to put it into one answer to you all CCing you. My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions. kernel-module: i915 According to the output of 'cpupower': I have CPUidle driver: acpi_idle CPUidle governor: menu CPUfreq: driver: acpi-cpufreq available cpufreq governors: ondemand, performance - And "ondemand" is running. -- # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +41.0°C (crit = +256.0°C) temp2: +92.0°C (crit = +110.0°C) temp3: +71.0°C (crit = +105.0°C) temp4: +26.5°C (crit = +110.0°C) temp5: +25.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +86.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +84.0°C (high = +105.0°C, crit = +105.0°C) FROM a critical "smelly" situation today, kernel-compilation, fan @100%. -- Additional findings: Identification from bootup ACPI initialisation vs. sensors: temp1 = DTSZ temp2 = CPUZ --> triggering Cooling in 3.12.13 if > 74°C temp3 = SKNZ temp4 = BATZ "Battery Zone" always calm ~ +6°C of ambient T temp5 = FDTZ --- in 3.12.13 a representation of the cooling-fan (25 - 45 - 58 - max?) Core 0 & Core 1 are the internal CPU T sensors. With the 3.13.x (.5+) kernels the first gatherered cooling settings from bootup do stay forever. Means, rebooting a hot system will get a FDTZ @45°C+ and won't make any problems, as it does cool enough (even for kernel compiling on here). If it gets 25°C @bootup the system goes into emergency cooling somewhen. Same is with a suspend/resume. Kernel 3.12.13 adjusts the cooling on it's own, but appropriately. Thank you all for your engagement, best regards, Manuel Krause.
# Based on the shown email in Comment 2 Rafael J Wysocki asked me on 2014-03-09 # 18:58: > This almost certainly is an ACPI regression, but I'm not sure whether > thermal management or CPU power management is broken on your system. > > Can you compare the contents of /sys/class/thermal/ from working and > not working kernels, please? > > Rafael > # which I answered the following way (I hope it'll be complete on here): Hi again, unfortunately you didn't specify how deeply I should dig into /sys/class/thermal. So you get the lines from # BOF # to # EOF # below. I hope they're readable without more comments. The most remarkable changes, in my eyes, had happened within "thermal_zone1". Best regards, Manuel Krause # BOF # Following ones are all from /sys/class/thermal/ which are links to -> ../../devices/virtual/thermal/ I've listed the directories in sections of cooling_devices and thermal_zones separately for each bad/good kernel. For Emailing purposes only. You can merge them into a spreadsheet for your evaluation on your own. I've left out reporting some subdirs and subdir's values that _really_ didn't seem to need attention. Also, I've had collected the #sensors output for each readout, having reproduced nearly the same workload, represented by the "Fan speed" (thermal_zone4==FDTZ). And I've done my very best to not produce typos or c&p errors. 3.13.5 -- 20140309 -- 20:52 -- bad ============================= dir |- /type /cur_state /max_state cooling_device0 Processor 0 10 cooling_device1 Processor 0 10 cooling_device2 Fan 0 1 cooling_device3 Fan 1 1 cooling_device4 Fan 0 1 cooling_device5 Fan 0 1 cooling_device6 Fan 0 1 cooling_device7 LCD 0 24 3.12.13 -- 20140310 -- 00:26 -- good ============================== dir |- /type /cur_state /max_state cooling_device0 Processor 0 10 cooling_device1 Processor 0 10 cooling_device2 Fan 0 1 cooling_device3 Fan 1 1 cooling_device4 Fan 1 1 cooling_device5 Fan 1 1 cooling_device6 Fan 1 1 cooling_device7 LCD 0 24 3.13.5 -- 20140309 -- 20:52 -- bad ============================= dir |- /passive /temp |- /cdev?_ /trip_ /trip_ trip_ point_ point_ point ?_temp ?_type thermal_zone0 0 68000 ?=0 n.a. 256000 critical thermal_zone1 n.a. 70000 |- ?=0 6 110000 critical ?=1 5 107000 passive ?=2 4 90000 active ?=3 3 75000 active ?=4 2 55000 active ?=5 1 45000 active ?=6 1 30000 active thermal_zone2 n.a. 54000 |- ?=0 1 105000 critical ?=1 1 95000 passive thermal_zone3 n.a. 25800 |- ?=0 1 110000 critical ?=1 1 60000 passive thermal_zone4 0 58000 ?=0 n.a. 110000 critical 3.12.13 -- 20140310 -- 00:26 -- good ============================== dir |- /passive /temp |- /cdev?_ /trip_ /trip_ trip_ point_ point_ point ?_temp ?_type thermal_zone0 0 50000 ?=0 n.a. 256000 critical thermal_zone1 n.a. 70000 |- ?=0 1 110000 critical ?=1 1 107000 passive ?=2 2 90000 active ?=3 3 67000 active ?=4 4 55000 active ?=5 5 45000 active ?=6 6 30000 active thermal_zone2 n.a. 53000 |- ?=0 1 105000 critical ?=1 1 95000 passive thermal_zone3 n.a. 25600 |- ?=0 1 110000 critical ?=1 1 60000 passive thermal_zone4 0 58000 ?=0 n.a. 110000 critical --- Legend here: /type is always acpitz /mode enabled /policy step_wise - from kernel ACPI initialisation: thermal_zone0==DTSZ, thermal_zone1==CPUZ, thermal_zone2==SKNZ, thermal_zone3==BATZ, thermal_zone4==FDTZ - n.a. means file or value is not available ___ Legend in general: /power/control is always auto /power/runtime_status unsupported /uevent ''==empty ---------------------------------------------------------------- 3.13.5 -- 20140309 -- 20:52 -- bad ============================= # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +68.0°C (crit = +256.0°C) temp2: +70.0°C (crit = +110.0°C) temp3: +54.0°C (crit = +105.0°C) temp4: +25.8°C (crit = +110.0°C) temp5: +58.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +66.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +63.0°C (high = +105.0°C, crit = +105.0°C) 3.12.13 -- 20140310 -- 00:26 -- good ============================== # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +50.0°C (crit = +256.0°C) temp2: +70.0°C (crit = +110.0°C) temp3: +53.0°C (crit = +105.0°C) temp4: +25.6°C (crit = +110.0°C) temp5: +58.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +65.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +61.0°C (high = +105.0°C, crit = +105.0°C) # EOF #
# also posted to linux-kernel && linux-pm # my findings from tonight: Hi, and thank you for your attention ^^ at the bottom of this email you'd get the actual values for the new 3.12.14 kernel for two different levels of usage and ambient temperature. You'd read, in kernel 3.12.14 the /cdev?_trip_point enumeration has changed to the way of 3.13.? and also one /trip_point_?_temp did. But 3.12.14 is working as well as 3.12.13. (So my first eyecatcher didn't lead to useful things.) I'm not capaple of finding or understanding the related code, but, please, let me present an idea of what MAY be going on: In 3.12.13+, on my system, the effective cooling fan speed seems to be an accumulation, maybe bitwise, of cooling_device[2-6]/cur_state, that each get activated (=1) by a certain other temperature value or level; each of the cooling_device[2-6]/cur_state stays @1 as long as their ref. temp. does not undershoot. For my system this ref. temp. would most likely be triggered by temp2 == thermal_zone1/temp [CPUZ]. In 3.13.? there seems to get only one of cooling_device[2-6]/cur_state be set to 1, the others left and/or rewritten with 0. And the fan speed algorithm then accumulates only one 1 without seeing the [_LEVEL_] number of cooling_device[2-6]... or re-requesting the related trigger temperature. I hope this leads you developers nearer to a conclusion on how to fix it, best regards, Manuel Krause _____________________________ 3.12.14 -- 20140311 -- 19:07 -- changed, not broken -- normal use ============================= /sys/class/thermal/* which are links to -> ../../devices/virtual/thermal/* dir |- /type /cur_state /max_state Maybe trigger /PWM ... cooling_device2 Fan 0 1 not yet observed cooling_device3 Fan 0 1 FDTZ==58°C cooling_device4 Fan 1 1 FDTZ==45°C cooling_device5 Fan 1 1 FDTZ==34°C cooling_device6 Fan 1 1 FDTZ==25°C ... dir |- /passive /temp |- /cdev?_ /trip_ /trip_ trip_ point_ point_ point ?_temp ?_type ... thermal_zone1 n.a. 73000 |- (CPUZ) ?=0 6 110000 critical ?=1 5 107000 passive ?=2 4 90000 active ?=3 3 75000 active ?=4 2 55000 active ?=5 1 45000 active ?=6 1 30000 active ... thermal_zone4 n.a. 45000 ?=0 n.a. 110000 critical (FDTZ) ... # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +46.0°C (crit = +256.0°C) temp2: +73.0°C (crit = +110.0°C) temp3: +57.0°C (crit = +105.0°C) temp4: +26.3°C (crit = +110.0°C) temp5: +45.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +68.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +66.0°C (high = +105.0°C, crit = +105.0°C) _____________________________ 3.12.14 -- 20140311 -- 21:09 -- changed, not broken -- idle state ============================= dir |- /type /cur_state /max_state Maybe trigger /PWM ... cooling_device2 Fan 0 1 not yet observed cooling_device3 Fan 0 1 FDTZ==58°C cooling_device4 Fan 0 1 FDTZ==45°C cooling_device5 Fan 0 1 FDTZ==34°C cooling_device6 Fan 1 1 FDTZ==25°C ... dir |- /passive /temp thermal_zone1 n.a. 46000 ... (CPUZ) ... thermal_zone4 n.a. 25000 ... (FDTZ) ... # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +50.0°C (crit = +256.0°C) temp2: +46.0°C (crit = +110.0°C) temp3: +44.0°C (crit = +105.0°C) temp4: +25.7°C (crit = +110.0°C) temp5: +25.0°C (crit = +110.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +41.0°C (high = +105.0°C, crit = +105.0°C) Core 1: +41.0°C (high = +105.0°C, crit = +105.0°C) _____________________________
[SNIP] Long time no reply from you... Have I overseen a unwritten convention? Or were my charts that unusable for your analysis/work? Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem persists. "Strange / dangerous fan policy..." Since kernel 3.13.6 I've managed to 'fix' the potential overheating problem by manually issuing a: "echo 1 > /sys/class/thermal/cooling_device3/cur_state" *) _before_ obviously critical temperatures occur. Remind: This particular setting may only work for my system! ...and keeps working for 3.14-rc. In the following I'd like to present you a modified output of my /sys/class/thermal, that I've written a script for (for my system), that shows the results in the way of linux/Documentation/thermal/sysfs-api.txt, point 3: {I've uploded the files to pastebin, to not swamp you and the lists with so many lines of logs.} For the last good kernel -- 3.12.14 -- in-use: http://pastebin.com/HL1PNcda For my first bad kernel revision 3.13 -- at critical temp: http://pastebin.com/98hgf1a9 For the last bad kernel -- 3.14.0-rc7 -- at critical temp: http://pastebin.com/MuTwTnjD For the last bad kernel -- 3.14.0-rc7 -- after issuing the *) command: http://pastebin.com/2peda54z Please, have a look at them! And maybe, give me hints on how I can help you to further debug this issue, as my manual method works but it's annoying. And, PLEASE CC: ME, as I'm not on the lists. Or lead this Email-thread to someone in charge. Thank you for your work && best regards, Manuel Krause
3.12.15 works very well 3.13.7 fails 3.14.0-rc8 fails I've tried the tmon tool, now, too. Nice eyecandy and for monitoring! I've tried to revert all "thermal" related patches from 3.12.14->3.13.7 from 3.13.7. But they don't seem to matter. (Even if I apply the vice-versa patch to 3.12.15.) So "thermal" is out? For the failing kernels: Not any reached trip point (active) triggers ONE fan action! Next would be ACPI, to be investigated, THX for this audience, Manuel Krause
I'm not sure if this is related to this bug but since Kernel 3.13 my fan speed is far to high and noisy as soon as the system is booting up ... I'm using Fedora 20. With Kernel 3.12.X everything was fine instead and fan speed was on a acceptable level ... [ant@fedorant ~]$ sensors nouveau-pci-0100 Adapter: PCI adapter fan1: 6693 RPM temp1: +69.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) coretemp-isa-0000 Adapter: ISA adapter Core 0: +52.0°C (high = +83.0°C, crit = +99.0°C) Core 1: +50.0°C (high = +83.0°C, crit = +99.0°C) Core 2: +52.0°C (high = +83.0°C, crit = +99.0°C) Core 3: +50.0°C (high = +83.0°C, crit = +99.0°C) it8720-isa-0a10 Adapter: ISA adapter in0: +0.86 V (min = +0.00 V, max = +4.08 V) ALARM in1: +3.04 V (min = +0.00 V, max = +4.08 V) ALARM in2: +3.33 V (min = +0.00 V, max = +4.08 V) ALARM +5V: +3.04 V (min = +0.00 V, max = +4.08 V) ALARM in4: +2.94 V (min = +0.00 V, max = +4.08 V) ALARM in5: +2.16 V (min = +0.00 V, max = +4.08 V) ALARM in6: +2.16 V (min = +0.00 V, max = +4.08 V) ALARM 5VSB: +2.96 V (min = +0.00 V, max = +4.08 V) ALARM Vbat: +2.99 V fan1: 838 RPM (min = 0 RPM) fan2: 949 RPM (min = 0 RPM) temp1: +127.0°C (low = -1.0°C, high = +127.0°C) ALARM sensor = thermal diode temp2: +22.0°C (low = -1.0°C, high = +127.0°C) ALARM sensor = thermistor temp3: -47.0°C (low = -1.0°C, high = +127.0°C) sensor = Intel PECI cpu0_vid: +0.000 V intrusion0: ALARM Any ideas?
On 04/02/2014 01:39 AM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=71711 > > --- Comment #7 from Roman Spirgi <the.ant@gmx.net> --- > I'm not sure if this is related to this bug but since Kernel 3.13 my fan > speed > is far to high and noisy as soon as the system is booting up ... I'm using > Fedora 20. With Kernel 3.12.X everything was fine instead and fan speed was > on > a acceptable level ... > > [ant@fedorant ~]$ sensors > nouveau-pci-0100 > Adapter: PCI adapter > fan1: 6693 RPM > temp1: +69.0°C (high = +95.0°C, hyst = +3.0°C) > (crit = +105.0°C, hyst = +5.0°C) > (emerg = +135.0°C, hyst = +5.0°C) > Looks like Nouveau fan control does not work. No idea what may be causing this ... well, possibly. There are two suspicious commits between 3.12 and 3.13. Maybe the "remove everything" commit has undesirable side effects. eec9901 drm/nouveau/hwmon: fix compilation without CONFIG_HWMON b9ed919 drm/nouveau/drm/pm: remove everything except the hwmon interfaces to THERM I would suggest to open a separate bug against the Nouveau component. [ Side note: The displayed values for hyst are wrong. Those should be absolute temperatures, not temperature differences. But that is yet another bug. ] > coretemp-isa-0000 > Adapter: ISA adapter > Core 0: +52.0°C (high = +83.0°C, crit = +99.0°C) > Core 1: +50.0°C (high = +83.0°C, crit = +99.0°C) > Core 2: +52.0°C (high = +83.0°C, crit = +99.0°C) > Core 3: +50.0°C (high = +83.0°C, crit = +99.0°C) > > it8720-isa-0a10 > Adapter: ISA adapter > in0: +0.86 V (min = +0.00 V, max = +4.08 V) ALARM > in1: +3.04 V (min = +0.00 V, max = +4.08 V) ALARM > in2: +3.33 V (min = +0.00 V, max = +4.08 V) ALARM > +5V: +3.04 V (min = +0.00 V, max = +4.08 V) ALARM > in4: +2.94 V (min = +0.00 V, max = +4.08 V) ALARM > in5: +2.16 V (min = +0.00 V, max = +4.08 V) ALARM > in6: +2.16 V (min = +0.00 V, max = +4.08 V) ALARM > 5VSB: +2.96 V (min = +0.00 V, max = +4.08 V) ALARM > Vbat: +2.99 V > fan1: 838 RPM (min = 0 RPM) > fan2: 949 RPM (min = 0 RPM) > temp1: +127.0°C (low = -1.0°C, high = +127.0°C) ALARM sensor = > thermal diode > temp2: +22.0°C (low = -1.0°C, high = +127.0°C) ALARM sensor = > thermistor > temp3: -47.0°C (low = -1.0°C, high = +127.0°C) sensor = Intel PECI > cpu0_vid: +0.000 V > intrusion0: ALARM > Something in your system configuration is wrong. Usually this comes from the BIOS, so you you might want to check if there is a BIOS upgrade available. It looks like the system believes that your CPU is freezing and therefore runs the CPU fan at minimum speed. That may be ok with the current load, but might be a problem if the CPUs get busy and run hot. That is not related to the nouveau problem, though. Guenter
I can confirm the original bug reported. I reproduced it with a HP 625 (AMD athlon processor with AMD HD 4200 graphics) laptop. I tested ubuntu 3.12, 3.13 and 3.14 kernels, and the problem appeared in 3.13. Best regards, Daniele
(In reply to Guenter Roeck from comment #8) > Something in your system configuration is wrong. Usually this comes from the > BIOS, so you you might want to check if there is a BIOS upgrade available. It > looks like the system believes that your CPU is freezing and therefore runs > the CPU fan at minimum speed. As I recall the IT87xx chips need an offset programmed by the BIOS in order to return "sane" temperature values from PECI sources. Without the offset, the driver returns the thermal margin as a negative value (-47°C here would mean the CPU runs 47 pseudo-°C below its critical temperature.) This matches the values returned by coretemp (99 - 47 = 52). This would justify the low fan speeds. The original poster could try setting temp3_offset to 99 (in the right chip section of sensors.conf, followed by "sensors -s" as root) and see if it makes the system behave differently.
Jean, indeed: ... temp3: +46.0°C (low = -1.0°C, high = +127.0°C) sensor = Intel PECI ... But it's definitely noisier now ;) Guenter, thank you, I did open "https://bugs.freedesktop.org/show_bug.cgi?id=77003" for the NVIDIA fan speed issue. Thank you guys, Roman
It really all depends on what the automatic fan control setup expects. Unfortunately I don't think the it87 driver exposes its trip points to user-space so you'd have to poke at the registers directly.
Hello everyone, I can confirm this bug as well on an HP Probook 4710s. So there are now at least 5 confirmed reports. Please see: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1290110 (my bug report) http://lkml.iu.edu//hypermail/linux/kernel/1404.0/02012.html (my archived post to the linux-kernel mailing list) It would be worth to also take a look at the DSDT, as there are other minor quirks on my system that could point there... (brightness always on max after reboot/suspend, coarse brightness setting range) I've already disassembled mine but am stumped at what to do next (this is my first look at anything ACPI related), how to debug... But as previous kernels worked okay with this same DSDT, maybe they didn't control the fan speed through ACPI but left it to the BIOS? For info on disassembling the DSDT see https://wiki.archlinux.org/index.php/DSDT
I've now bisected two times. From two different kernel origins, just to be sure, as I'm new to this stupid-and-lengthy method, and, to be sure, I haven't given a false positive inbetween due to boredom. In the end it says each time: # git bisect bad | tee -a /var/log/bisect.log cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit commit cc8ef52707341e67a12067d6ead991d56ea017ca Author: Zhang Rui <rui.zhang@intel.com> Date: Wed Sep 25 20:39:45 2013 +0800 ACPI / AC: convert ACPI ac driver to platform bus Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> :040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers Please help me, on how I can help debug this more, and please also read the newest from https://bugzilla.kernel.org/show_bug.cgi?id=71711 Manuel Krause
Hi, Manuel, nice report. (In reply to Manuel Krause from comment #3) > > 3.13.5 -- 20140309 -- 20:52 -- bad > ============================= > dir |- > /type /cur_state /max_state > cooling_device0 Processor 0 10 > cooling_device1 Processor 0 10 > cooling_device2 Fan 0 1 > cooling_device3 Fan 1 1 > cooling_device4 Fan 0 1 > cooling_device5 Fan 0 1 > cooling_device6 Fan 0 1 > cooling_device7 LCD 0 24 > > 3.12.13 -- 20140310 -- 00:26 -- good > ============================== > dir |- > /type /cur_state /max_state > cooling_device0 Processor 0 10 > cooling_device1 Processor 0 10 > cooling_device2 Fan 0 1 > cooling_device3 Fan 1 1 > cooling_device4 Fan 1 1 > cooling_device5 Fan 1 1 > cooling_device6 Fan 1 1 > cooling_device7 LCD 0 24 > > > 3.13.5 -- 20140309 -- 20:52 -- bad > ============================= > dir |- > /passive /temp |- /cdev?_ /trip_ /trip_ > trip_ point_ point_ > point ?_temp ?_type > thermal_zone0 0 68000 ?=0 n.a. 256000 critical > thermal_zone1 n.a. 70000 |- > ?=0 6 110000 critical > ?=1 5 107000 passive > ?=2 4 90000 active > ?=3 3 75000 active > ?=4 2 55000 active > ?=5 1 45000 active > ?=6 1 30000 active > thermal_zone2 n.a. 54000 |- > ?=0 1 105000 critical > ?=1 1 95000 passive > thermal_zone3 n.a. 25800 |- > ?=0 1 110000 critical > ?=1 1 60000 passive > thermal_zone4 0 58000 ?=0 n.a. 110000 critical > > > 3.12.13 -- 20140310 -- 00:26 -- good > ============================== > dir |- > /passive /temp |- /cdev?_ /trip_ /trip_ > trip_ point_ point_ > point ?_temp ?_type > thermal_zone0 0 50000 ?=0 n.a. 256000 critical > thermal_zone1 n.a. 70000 |- > ?=0 1 110000 critical > ?=1 1 107000 passive > ?=2 2 90000 active > ?=3 3 67000 active > ?=4 4 55000 active > ?=5 5 45000 active > ?=6 6 30000 active > thermal_zone2 n.a. 53000 |- > ?=0 1 105000 critical > ?=1 1 95000 passive > thermal_zone3 n.a. 25600 |- > ?=0 1 110000 critical > ?=1 1 60000 passive > thermal_zone4 0 58000 ?=0 n.a. 110000 critical > this is not enough, can you please attach the output of " grep . /sys/class/thermal/thermal_zone*/cdev*/device/path" I need to figure out why /sys/class/thermal/thermal_zone1/cdev0_trip_point equals 1 in 3.12, while it equals 6 in 3.13. plus, can you please attach the output of "grep . /sys/class/thermal/cooling_device*/device/path" in both 3.12 and 3.13 as well.
Let's start with my actual GOOD kernel: # uname -r 3.12.16-ck2 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path /sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4 /sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3 /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2 /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1 /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0 /sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 # grep . /sys/class/thermal/cooling_device*/device/path /sys/class/thermal/cooling_device0/device/path:\_PR_.CPU0 /sys/class/thermal/cooling_device1/device/path:\_PR_.CPU1 /sys/class/thermal/cooling_device2/device/path:\_TZ_.FAN0 /sys/class/thermal/cooling_device3/device/path:\_TZ_.FAN1 /sys/class/thermal/cooling_device4/device/path:\_TZ_.FAN2 /sys/class/thermal/cooling_device5/device/path:\_TZ_.FAN3 /sys/class/thermal/cooling_device6/device/path:\_TZ_.FAN4 /sys/class/thermal/cooling_device7/device/path:\_SB_.PCI0.GFX0.DD02 And have a newer BAD kernel: # uname -r 3.13.8-ck1 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path /sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4 /sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3 /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2 /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1 /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0 /sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 # grep . /sys/class/thermal/cooling_device*/device/path /sys/class/thermal/cooling_device0/device/path:\_PR_.CPU0 /sys/class/thermal/cooling_device1/device/path:\_PR_.CPU1 /sys/class/thermal/cooling_device2/device/path:\_TZ_.FAN0 /sys/class/thermal/cooling_device3/device/path:\_TZ_.FAN1 /sys/class/thermal/cooling_device4/device/path:\_TZ_.FAN2 /sys/class/thermal/cooling_device5/device/path:\_TZ_.FAN3 /sys/class/thermal/cooling_device6/device/path:\_TZ_.FAN4 /sys/class/thermal/cooling_device7/device/path:\_SB_.PCI0.GFX0.DD02 The "grep . /sys/class/thermal/cooling_device*/device/path" results stay always the same as above, so I omit them in the following. There are generally only two different re-occurring scenarios for "grep . /sys/class/thermal/thermal_zone*/cdev*/device/path", so that I want to abbreviate them in the following: Scenario-1: # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path /sys/class/thermal/thermal_zone1/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone1/cdev1/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN0 /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1 /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN2 /sys/class/thermal/thermal_zone1/cdev5/device/path:\_TZ_.FAN3 /sys/class/thermal/thermal_zone1/cdev6/device/path:\_TZ_.FAN4 /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 Scenario-2: # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path /sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4 /sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3 /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2 /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1 /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0 /sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 Already, during bisecting this issue, I've found out, that these scenarios have something to do with rebooting: So, I've rebooted the new bisected kernel twice in the second roundup. But I haven't expected the following disorder: This is a row of results from last night, rebooting different kernels, one after the other, and capturing some relevant data. # uname -r 3.12.16 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-2 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-2 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.12.13 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-2 # uname -r 3.12.13 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.12.13 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-2 # uname -r 3.13.5 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-2 # uname -r 3.13.5 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.13.5 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.12.16 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.12.16 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.12.16 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-2 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.13.8 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.12.16 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-2 # uname -r 3.12.16 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 # uname -r 3.12.16 # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path Scenario-1 Please, mind, what doesn't come from this data: 3.13.x _never_ triggers a new fan speed when needed (higher/lower). 3.12.x _always_ does, at least after hitting a higher active temp trigger! Manuel Krause
ping Rui ... Please have a look this bug.
There had been additional steps in the meantime, but unfortunately no sulution so far. You can read the related postings to lkml e.g. with: http://marc.info/?l=linux-kernel&w=2&r=1&s=dangerous+fan+policy&q=b Best regards, Manuel Krause
Hi! I have similar problem on HP ProBook 4510s (Firmware F.20, Intel T3000) running 64bit-kernel 3.13.9 (Kubuntu) or 64-bit kernel 3.13 - 3.14 on arch. I remarked that the regulation of the fan (not necessarily the fan itself!) stops after boot. So, if the system is cold, the fan is running at 0% (= off) or at 20% (which is an unusual number as the fan speed rises usually in 15% stepintel pentium dual core t3000 "microcode" updates on this hardware). On reboot, when the machine is warm, fan speeds of 30% or 45% are often observed depending on the CPU temperature at boot time. After booting, the fan speed does not change anymore and keeps constant. So, when the machine was started cold, the fan is off until the temperature reaches critical values and runs then with 90% (= full speed) until the temperature drops. It goes then off again completely. This is not nice as the cooling might not be sufficient and my machine may shut down hard. Such behaviour is not nice and also not in-line with the idea of 'Laptop' because the machine gets so hot that I don't want to leave it on the top of my lap to avoid burning myself :-) According to my interpretation, the system ignores all active trip points, but reacts on the passive and critical trip points. I found also a not so perferct workaround after some trial and error with boot parameters: passing 'thermal.tzp=1' (or any other higher number) to the kernel at boot time (unload and reload thermal with the tzp-parameter does not help) restores the temperature depending fan speed regulation. This work around comes unfortunately with the trade-off of two or three kworker-processes that consume up to the full capacity of one CPU, which makes the system sluggy and raises power consumption. I hope that this info on the problem helps finding a real fix, which would be appreciated. Regards, Thomas
(In reply to Pohjoistuuli from comment #19) [...] > I have similar problem on HP ProBook 4510s (Firmware F.20, Intel T3000) > running 64bit-kernel 3.13.9 (Kubuntu) or 64-bit kernel 3.13 - 3.14 on arch. [...] @Pohjoistuuli // Thomas Your machine has the same symptoms as mine with 3.13.x + Have you tried a 3.12.y kernel of your distro (or even vanilla)? BTW, you can issue a command at runtime or via a startup script to set "echo 1 > /sys/class/thermal/cooling_device3/cur_state" e.g. (my favourite). 6 is the lowest of cooling_device~ representing fan speed knobs. Just try. @ Rui Zhang I don't want this to be handled as a HP-Laptop-only problem, as 3.12.x is able to serve the fans and temps appropriately. Best regards, Manuel
(In reply to Pohjoistuuli from comment #19) > Hi! > > I have similar problem on HP ProBook 4510s (Firmware F.20, Intel T3000) > running 64bit-kernel 3.13.9 (Kubuntu) or 64-bit kernel 3.13 - 3.14 on arch. > I remarked that the regulation of the fan (not necessarily the fan itself!) > stops after boot. So, if the system is cold, the fan is running at 0% (= > off) or at 20% (which is an unusual number as the fan speed rises usually in > 15% stepintel pentium dual core t3000 "microcode" updates on this hardware). > On reboot, when the machine is warm, fan speeds of 30% or 45% are often > observed depending on the CPU temperature at boot time. After booting, the > fan speed does not change anymore and keeps constant. So, when the machine > was started cold, the fan is off until the temperature reaches critical > values and runs then with 90% (= full speed) until the temperature drops. It > goes then off again completely. I've seen exactly the same behavior on one of my test laptop. And the problem is that ACPICA can not handle some kind of AML code well, PLUS, the fix for the problem ships in 3.13-rc1. So the symptom I've seen is not a regression and exists in all Linux previous release. Anyway, please attach the acpidump of your machine, so that I can check if they are the same AML problem. BTW, it would be nice if you can try 3.12 kernel to verify if this is a regression or not.
(In reply to Manuel Krause from comment #16) > There are generally only two different re-occurring scenarios for > "grep . /sys/class/thermal/thermal_zone*/cdev*/device/path", so that I > want to abbreviate them in the following: > > Scenario-1: > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > /sys/class/thermal/thermal_zone1/cdev0/device/path:\_PR_.CPU1 > /sys/class/thermal/thermal_zone1/cdev1/device/path:\_PR_.CPU0 > /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN0 > /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1 > /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN2 > /sys/class/thermal/thermal_zone1/cdev5/device/path:\_TZ_.FAN3 > /sys/class/thermal/thermal_zone1/cdev6/device/path:\_TZ_.FAN4 > /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 > /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 > /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 > /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 > > Scenario-2: > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > /sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4 > /sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3 > /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2 > /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1 > /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0 > /sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1 > /sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0 > /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 > /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 > /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 > /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 > > Already, during bisecting this issue, I've found out, that these scenarios > have something to do with rebooting: So, I've rebooted the new bisected > kernel > twice in the second roundup. > But I haven't expected the following disorder: > > This is a row of results from last night, rebooting different kernels, one > after the other, and capturing some relevant data. > > > # uname -r > 3.12.16 > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > Scenario-2 > > # uname -r > 3.13.8 > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > Scenario-2 > > # uname -r > 3.13.8 > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > Scenario-1 > > # uname -r > 3.12.13 > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > Scenario-2 > > # uname -r > 3.12.13 > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > Scenario-1 > > # uname -r > 3.12.13 > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > Scenario-2 > I suppose these 3.12.13 kernel are the exactly the same kernel without any rebuilding, right? could you please change your config file and always build in the ACPI thermal and fan driver and see if this problem still exists?
(In reply to Zhang Rui from comment #21) > (In reply to Pohjoistuuli from comment #19) > > Hi! > > > > I have similar problem on HP ProBook 4510s (Firmware F.20, Intel T3000) > > running 64bit-kernel 3.13.9 (Kubuntu) or 64-bit kernel 3.13 - 3.14 on arch. > > I remarked that the regulation of the fan (not necessarily the fan itself!) > > stops after boot. So, if the system is cold, the fan is running at 0% (= > > off) or at 20% (which is an unusual number as the fan speed rises usually > in > > 15% stepintel pentium dual core t3000 "microcode" updates on this > hardware). > > On reboot, when the machine is warm, fan speeds of 30% or 45% are often > > observed depending on the CPU temperature at boot time. After booting, the > > fan speed does not change anymore and keeps constant. So, when the machine > > was started cold, the fan is off until the temperature reaches critical > > values and runs then with 90% (= full speed) until the temperature drops. > It > > goes then off again completely. > > I've seen exactly the same behavior on one of my test laptop. > And the problem is that ACPICA can not handle some kind of AML code well, > PLUS, the fix for the problem ships in 3.13-rc1. > So the symptom I've seen is not a regression and exists in all Linux > previous release. > Anyway, please attach the acpidump of your machine, so that I can check if > they are the same AML problem. > > BTW, it would be nice if you can try 3.12 kernel to verify if this is a > regression or not. I can confirm having the same problem with HP Compaq 6830s -- the fan is off until temperature reaches critical, then runs full speed. When the temperature drops below 8x °C, the fan stops completely. This is happening both on 3.13 and 3.14 3.12 works fine I'll post my acpidump when I get to the machine. Are there any more listings you are interested in?
These symptoms are exactly the ones I am experiencing. Please see comment 13 and my post to the mailing list: http://lkml.iu.edu//hypermail/linux/kernel/1404.0/02012.html I have disassembled the DSDT from my machine, fixed most errors and warnings and tried booting with this one, but no change. I haven't dumped the other tables yet, but I will post them when I do. 3.12 is what was on this laptop until now (Ubuntu Saucy), then everything worked fine. No other changes, no fan control utilities, no negative temperatures (checked with lm-sensors). Just stock installs...
Got the same bug on Debian 7.4 with kernel 3.13-0, HP 4310s laptop. While kernels 3.12 worked correctly, after installing 3.13 fan went off after boot and turned on only when temperature reached 80 C and for very high speed. After cooling to ~75 C the fan went off again. The only thing I can state now is that this bug seems to be chipset-independed, it shows itself on AMD and Intel laptops and even on old Athlon-based desktop box.
(In reply to Zhang Rui from comment #22) > (In reply to Manuel Krause from comment #16) > > There are generally only two different re-occurring scenarios for > > "grep . /sys/class/thermal/thermal_zone*/cdev*/device/path", so that I > > want to abbreviate them in the following: > > > > Scenario-1: > > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > > /sys/class/thermal/thermal_zone1/cdev0/device/path:\_PR_.CPU1 > > /sys/class/thermal/thermal_zone1/cdev1/device/path:\_PR_.CPU0 > > /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN0 > > /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1 > > /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN2 > > /sys/class/thermal/thermal_zone1/cdev5/device/path:\_TZ_.FAN3 > > /sys/class/thermal/thermal_zone1/cdev6/device/path:\_TZ_.FAN4 > > /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 > > /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 > > /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 > > /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 > > > > Scenario-2: > > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > > /sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4 > > /sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3 > > /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2 > > /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1 > > /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0 > > /sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1 > > /sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0 > > /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 > > /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 > > /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 > > /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 > > > > Already, during bisecting this issue, I've found out, that these scenarios > > have something to do with rebooting: So, I've rebooted the new bisected > > kernel > > twice in the second roundup. > > But I haven't expected the following disorder: > > > > This is a row of results from last night, rebooting different kernels, one > > after the other, and capturing some relevant data. > > > > > > # uname -r > > 3.12.16 > > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > > Scenario-2 > > > > # uname -r > > 3.13.8 > > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > > Scenario-2 > > > > # uname -r > > 3.13.8 > > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > > Scenario-1 > > > > # uname -r > > 3.12.13 > > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > > Scenario-2 > > > > # uname -r > > 3.12.13 > > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > > Scenario-1 > > > > # uname -r > > 3.12.13 > > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path > > Scenario-2 > > > I suppose these 3.12.13 kernel are the exactly the same kernel without any > rebuilding, right? Yes, of course, without rebuilding. Only re-/booting previously built kernels, to show you the obvious differences after rebooting. > could you please change your config file and always build in the ACPI > thermal and fan driver and see if this problem still exists? I've done so for a 3.12.13 kernel and a 3.13.11. We'd get a new Scenario-3: # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path /sys/class/thermal/thermal_zone1/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone1/cdev1/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN4 /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN3 /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN2 /sys/class/thermal/thermal_zone1/cdev5/device/path:\_TZ_.FAN1 /sys/class/thermal/thermal_zone1/cdev6/device/path:\_TZ_.FAN0 /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0 /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1 /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0 As new comparison: fan, thermal, processor as MODULES; sequentially rebooted same kernel: 3.12.17 - 1. boot: Scenario-1 3.12.17 - 2. boot: Scenario-1 3.12.17 - 3. boot: Scenario-2 3.12.17 - 4. boot: Scenario-2 3.12.17 - 5. boot: Scenario-2 3.12.17 - 6. boot: Scenario-1 fan, thermal, processor as BUILT IN: 3.12.13 - 6 sequential reboots: all Scenario-3 fan, thermal, processor as BUILT IN: 3.13.11 - 6 sequential reboots: all Scenario-3 After that config change 3.12 still works fine / 3.13 still FAILS: In my opinion, this has nothing to do with the original fan / trip point problem. But fine, if you can fix this little bug, too, in addition. ;-) Best regards, Manuel Krause
(In reply to Zhang Rui from comment #21) > (In reply to Pohjoistuuli from comment #19) Sorry for answering quite late. I am usually busy during the week and testing this is surprisingly time-consuming (waiting for the system to have the right start temperature and then then waiting for it to raise etc). I use now tmon, which makes testing the thermal behaviour of laptops much easier. It is also a quite handy tool to regulate the fan speed. I raise the CPU-temperature usually with 'openssl speed'. Finding this 'technique' improved testing speed quite much. > Anyway, please attach the acpidump of your machine, so that I can check if > they are the same AML problem. The acpidump is now on my harddrive, but I did not find a function to attach a file to this message. I run also a check with fwts on my machine (on Ubuntu 14.04). fwts reported problems in the DSDT. I can provide also this log if needed (and when I know how ;-). > BTW, it would be nice if you can try 3.12 kernel to verify if this is a > regression or not. I have checked out ArchLinux kernels 3.12.9-2 and 3.13.1-1. 3.12.9-2 runs fine and 3.13.1-1 does not regulate the fan speed when passing an active trip point temperature. Other ArchLinux kernels that I have tested so far are 3.10.37-1 (lts), which works fine, and 3.14.1-1 (today's kernel), which does not regulate the fan speed. Some other remarks: - I can confirm Manuel's observations regarding cdev*_trip_point. I can see also all three numbering versions on my laptop (version 3 on Ubuntu 14.04, which has the the acpi routines compiled in the kernel). tmon does not have any problems with this and shows under kernels 3.10, 3.12., 3.13 and 3.14 the same setup and works without any differences. Additionally checking dmesg did not reveal relevant differences between 3.12 and 3.13 to me. - My machine has a thermal zone GFXZ (acpitz0), which isthat not connected to any hardware because my computer has only chipset graphics. The 'temperature' is constant at 16'C. Is this perhaps a problem in this context? Is the acpi system looking only at the wrong thermal zone? - The behaviour of my machine is different when on battery and when on AC. The reason for this is a BIOS setting, which affects the lowest fan speed level. On battery, it is always 0% rpm (= completely off). When on AC, it is possible to choose in the BIOS between 0% rpm (like when on battery) or 20% rpm as minimum value (my setup). This difference between AC and battery made remarking this error in the beginning quite difficult. - For cooling my machine at normal CPU load, 20-30% rpm are often sufficient. Under full load, the CPU temperature rarely exceeds 60'C when the fans are running with 45% of max. rpm. Therefore, problems with overheating and fan regulation were first quite confusing. - tmon is really nice - including the user interface!!! Thanks for looking into this, Thomas
Created attachment 134061 [details] acpidump HP Compaq 6730b Maybe a acpidump from my machine can help? @Pohjoistuuli / Thomas: At the top, above the comments and below the header of this bugzilla page, there is the box "Attachment" with the function to add one. (I also needed a while to find it.) ;-) I hope there's still someone working on this bug?! Regards, Manuel
And kernel 3.15.0-rc2 also fails in (all) the same way(s). Regards, Manuel
Rui, care to prepare a revert of commit cc8ef5270734 (ACPI / AC: convert ACPI ac driver to platform bus) on top of 3.15-rc3 so that Manuel can test it?
Rui, best for me would be a patch to apply to some released kernels, as I don't want to go bisecting again for nothing. Thx!
It would be most useful to us to know if the revert on top of the current mainline (that is, 3.15-rc3) works, though. If it doesn't, we need to look somewhere else anyway.
O.K. You're right, indeed. 3.15-rc3 is here. So, please: Give me a patch!!!
Without any patch from you... :-( 3.14.3 fails and 3.15.0-rc4 fails, too.
I'll send a compile-tested-only patch in a minute. For the Brave ...
Patch to test: https://patchwork.kernel.org/patch/4124871/ Thanks Guenter!
Created attachment 135301 [details] ACPI / AC: Use proper name for netlink event generation Manuel, if the Guenter's patch from the previous comment helps, can you please check if this one helps too?
Thank you both to provide something to test finally!!! :-))) I've now tested the two variants with 3.15.0-rc4, they apply && compile fine. (For now only with the thermal, fan and processor _built into_ the kernel.) Guenters reverting patch works !!! Rafaels does not, it does not change fan speeds when passing the trip point temperatures. And now?
Well, I'll queue up the revert for 3.15 and then we'll need to figure out what was wrong with that commit. Thanks!
Oh, and in the meantime I've patched my 3.14.3 with Guenters reverting patch (with some fuzzes and offsets o.k.) -- and it also works very well! I stay tuned to this bug -- and still like to help you to figure out. Best regards to all participants, Manuel
Created attachment 136881 [details] Guenter Roecks patch adapted for a 3.14.4 vanilla kernel Unfortunately I haven't seen someone to add Guenters reverting patch to 3.14.x kernels so far. So I'd like to post you something adapted for 3.14.4. There were only cosmetical changes needed from Guenters original version for 3.15-rcX. And, yes, it works on here.
Unless I am missing something, the patch is not yet upstream, so we can not back-port it to 3.14.
Just compiled and installed kernel 3.15-rc6 on my Intel ICH9 laptop, the problem still remain and it's very dangerous. with this kernel at least the fan runs at a very low speed, but doesn't follow thermal variances, so the temperature can easily rise to 80C. So thi is not resolved for me.
Quite surprising, because 3.15-rc6 does include the fix, as tested by Manuel. Manuel, any chance you can re-test with 3.15-rc6 ?
Hi Guenter, My fault, I was running 3.15rc5 instead of rc6! RC& works wonderfully, fan runs smoothly than any previous kernel thermal management. There is only one hiccup, fan never reaches 100% full speed also if the temperature rises over 77C the fun runs max at 70%. I have to manually write 1 into /sys/devices/virtual/thermal/cooling_device0/cur_state to freshen the cpu to a normal level, this is particularly annoying when I'm compiling, because I have to reissue a command occasionally. Thank you for your support! 2014-05-23 15:41 GMT+02:00 <bugzilla-daemon@bugzilla.kernel.org>: > https://bugzilla.kernel.org/show_bug.cgi?id=71711 > > --- Comment #44 from Guenter Roeck <linux@roeck-us.net> --- > Quite surprising, because 3.15-rc6 does include the fix, > as tested by Manuel. > > Manuel, any chance you can re-test with 3.15-rc6 ? > > -- > You are receiving this mail because: > You are on the CC list for the bug.
(In reply to Guenter Roeck from comment #44) > Quite surprising, because 3.15-rc6 does include the fix, > as tested by Manuel. > > Manuel, any chance you can re-test with 3.15-rc6 ? Yes, I've just tested it -- and it works fine for me, as expected. And, I'm not concerned about the temp. <-> fan levels as Angelo mentions. IIRC, this is the normal behaviour also known from kernels before 3.13 . Thanks to you, Guenter!
3.14.5 is out now... without this fix... Can someone of you sleepy guys, please, ... begin to... at least think of... bringing Guenters patch to the so called "stable" kernel... finally ??! My simply converted patch for 3.14.4 is still working with 3.14.5. See Comment 41. This is a quite disappointig thread. Has someone begun to work on the original failure, why the conversion of AC to platform bus didn't work? Thanks, Manuel
On Sun, Jun 01, 2014 at 04:24:41PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=71711 > > --- Comment #47 from Manuel Krause <manuelkrause@netscape.net> --- > 3.14.5 is out now... without this fix... Can someone of you sleepy guys, > please, ... begin to... at least think of... bringing Guenters patch to the > so > called "stable" kernel... finally ??! > My simply converted patch for 3.14.4 is still working with 3.14.5. See > Comment > 41. > Please chill down. You do have a working solution, don't you ? The 3.14 maintainer mentioned a couple of days ago that he has more than 200 patches pending for 3.14, on top of 3.14.5. Greg is doing an excellent job maintaining the stable kernel releases. Calling him sleepy is, to say it very politely, not appropriate. > This is a quite disappointig thread. Has someone begun to work on the > original > failure, why the conversion of AC to platform bus didn't work? > As far as I know no one who actually helped fixing your problem is getting paid for this task, including me. Actually, I am specifically _not_ paid for anything I do in the upstream kernel. In addition to that, it occurs to me that you are most likely not paying anything to anyone for providing you support either. You might want to consider adjusting your expectations a bit, or switch to a pay-for-use operating system. Having said that, Linux being an open source operating system, I am sure the responsible maintainer would be happy to get a patch from you to fix the original failure. Thanks, Guenter
HP 2230s is also affected. A fresh kernel pulled from the Linus tree seems to work fine now.
At first I want to apologize a bit for my words in my Comment 47. I'm no native english speaker so I obviously/may have not found the *right* words to express my disappointment with the ongoing of this thread since early 2014/03. And I felt that I should not "chill down" until this is included into the actual kernel series. Of course, I did NOT want to question the work of people *working* on this bug. Neither those, helping me to help to resolve it for other people, too. Guenter is a great helper. I don't think my disappointment is worth a discussion about paid support or something related. IIRC, I have provided needed info ASAP and also invested some of my spare time for your debugging work, as well as you and others. And I'd do it in future again, too. Don't blame me for not having enough Linux programming knowledge, so far, to just provide a better "convert AC to platform bus" patch -- that's a bit inappropriate, too. --- According to a yesterdays' message from Greg and a look to the stable queue: Guenters revert patch would be included in 4.14.6. --- Cheers! And thank you for your understanding, Manuel
- revert patch would be included in 4.14.6. + revert patch would be included in 3.14.6. Sorry for the typo.
HOUSTON, WE'VE GOT A PROBLEM... I don't know why I haven't tested it thoroughly so far... Maybe, due to the ambient temperatures and my usual workflow for testing this one, only aiming at high temperatures? (I used worldcommunitygrid to achieve this.) This patches' settings DO NOT surviwe a SUSPEND TO DISK: The settings for the actually needed trip point <-> fan speed are, unfortunately, then forgotten? For the suspend-to-disk way I've checked several kernels, today, 3.15.0 pure vanilla NOGO 3.14.5 +BFQ +CK/BFS + revert patch NOGO 3.14.6 +BFQ +CK/BFS +TuxOnIce NOGO 3.14.7 +BFQ +CK/BFS +TuxOnIce NOGO 3.12.18 +BFQ +CK/BFS NOGO It's a pity, to bother you again, any ideas?! Best regards, Manuel
On Thu, Jun 12, 2014 at 05:22:29PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=71711 > > --- Comment #52 from Manuel Krause <manuelkrause@netscape.net> --- > HOUSTON, WE'VE GOT A PROBLEM... > > I don't know why I haven't tested it thoroughly so far... Maybe, due to the > ambient temperatures and my usual workflow for testing this one, only aiming > at > high temperatures? (I used worldcommunitygrid to achieve this.) > > This patches' settings DO NOT surviwe a SUSPEND TO DISK: The settings for > the > actually needed trip point <-> fan speed are, unfortunately, then forgotten? > > For the suspend-to-disk way I've checked several kernels, today, > 3.15.0 pure vanilla NOGO > 3.14.5 +BFQ +CK/BFS + revert patch NOGO > 3.14.6 +BFQ +CK/BFS +TuxOnIce NOGO > 3.14.7 +BFQ +CK/BFS +TuxOnIce NOGO > 3.12.18 +BFQ +CK/BFS NOGO > > It's a pity, to bother you again, > > any ideas?! > Unless I am missing something, looks like a separate problem. Does this work with any earlier kernels ? Guenter
To be more accurate: The last triggered trip_point before suspend seems to be taken as the one to focus as next after suspend. But there is no correlation to lower fan speeds. It's lost, then? I can pass this trip point upwardly and the fan goes to the related level. Going below, it may go to 0 fan speed. The higher fan numbers (what are the fan's speed levels on here, but in vice-versa order, 04: is 24% fan; 03: 34%; 02: 45%; 01: 58%; 00: 100%) come up as 0 then (B). Meaning with the help of the "tmon" tool: (A) At boot everything is ok (for all the mentioned kernels): ID Cooling Dev Cur Max Thermal Zone Binding │ │00 Fan 0 1 │││││││││││ ││││*││││││ │││││││││││ │││││││││││ ││││││││││││ │ │01 Fan 1 1 │││││││││││ │││*│││││││ │││││││││││ │││││││││││ ││││││││││││ │ │02 Fan 1 1 │││││││││││ ││*││││││││ │││││││││││ │││││││││││ ││││││││││││ │ │03 Fan 1 1 │││││││││││ │*│││││││││ │││││││││││ │││││││││││ ││││││││││││ │ │04 Fan 1 1 │││││││││││ *││││││││││ │││││││││││ │││││││││││ ││││││││││││ (B) At resume NOT ok: │00 Fan 0 1 │││││││││││ ││││*││││││ │││││││││││ │││││││││││ ││││││││││││ │ │01 Fan 1 1 │││││││││││ │││*│││││││ │││││││││││ │││││││││││ ││││││││││││ │ │02 Fan 0 1 │││││││││││ ││*││││││││ │││││││││││ │││││││││││ ││││││││││││ │ │03 Fan 0 1 │││││││││││ │*│││││││││ │││││││││││ │││││││││││ ││││││││││││ │ │04 Fan 0 1 │││││││││││ *││││││││││ │││││││││││ │││││││││││ ││││││││││││ This is affecting suspend-to-ram, too, on here. (I've already reported this symptom at the beginning of this thread ~ Comment 3.) @Guenter: Do I really need to dig out kernels from before 3.12? Best regards, Manuel
First of all, this seems to be a different problem. could you please file a new bug, build the latest upstream kernel, say 3.15, boot and 1. attach the output of "grep . /sys/class/thermal/thermal_zone*/cdev*/device/path" 2. attach the output of "# grep . /sys/class/thermal/cdev*/device/path" 3. run "# echo 'module thermal_sys +fp' > /sys/kernel/debug/dynamic_debug/control" 4. reproduce the problem you showed in comment #54 5. attach the dmesg output and tmon output.
For my hardware both suspend and hibernate are OK.
(In reply to Zhang Rui from comment #55) > First of all, this seems to be a different problem. > could you please file a new bug, build the latest upstream kernel, say 3.15, > boot and > 1. attach the output of "grep . > /sys/class/thermal/thermal_zone*/cdev*/device/path" > 2. attach the output of "# grep . /sys/class/thermal/cdev*/device/path" > 3. run "# echo 'module thermal_sys +fp' > > /sys/kernel/debug/dynamic_debug/control" > 4. reproduce the problem you showed in comment #54 > 5. attach the dmesg output and tmon output. Thank you very much, for pointing out the details that would be helpful. Of course, I can file a new bug. But before I'd do this -- could you, please, have a look at https://bugzilla.kernel.org/show_bug.cgi?id=67101 "weird fan control with 3.12, was ok in 3.9" that I've found by coincidence. The symptoms seem to be the same (except for my system not needing to shut down, as the thermal's emergency cooling is very effective). Unfortunately the original poster didn't finish. What do you say? Please, advise me, whether it would be better to revive that bug and add my additional info or to file a new one. Thank you in advance, Manuel
(In reply to Joonas Saarinen from comment #56) > For my hardware both suspend and hibernate are OK. Can you, please, tell me which BIOS version you're running? I'm running the one before the latest as the latest is only installable via Windows with much more addon software. Mine is a: (excerpt from 'dmesg | grep BIOS') DMI: Hewlett-Packard HP Compaq 6730b (KU489ET#ABD)/30DD, BIOS 68PDD Ver. F.17 12/02/2010 Thank you in advance, Manuel
DMI: Hewlett-Packard HP 2230s /3037, BIOS 68PHU Ver. F.20 12/10/2011
Manuel, please file a new bug.
(In reply to Zhang Rui from comment #60) > Manuel, please file a new bug. A BIOS update from F.17 to F.20 did not achieve any efforts. Btw., some distro specific bug reports falsely (not from my hands) point to here. I've now filed a new bug upon my Comment 52 ++ https://bugzilla.kernel.org/show_bug.cgi?id=78201 Thank you all for your guidance, Manuel
Our 3 laptops Compaq nx8220 run Mint 17 and I just upgraded to 3.13.0-30. They are still affected. After resume they heat up to 100°C until cpu throttling occurs. A quite serious issue. Jörg-Karl Bösner did a reverse-bisect and may have found the evil commit: https://launchpad.net/bugs/1312860 Please backport the fix also to 3.13.x, since this kernel is part of many "Long Term Support" distros.
That 3.13.0-30 is an Ubuntu kernel and is always based on upstream 3.13.0 with Canonical's own selection of patches applied on top of it. From there the same kernel seems to trickle to Mint. So Ubuntu would have to apply the patch "ACPI / AC: convert ACPI ac driver to platform bus" to the 3.13.0-?? patch queue.
I don't know if it's still valid, but the patch had been picked up by Kamal Mostafa who has told to maintain 3.13.y.z. Patch: http://patchwork.ozlabs.org/patch/360895/ Maybe you'd also like to read https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable and https://lkml.org/lkml/2014/4/23/516 Best regards, Manuel Krause
(In reply to Oliver Joos from comment #62) > Our 3 laptops Compaq nx8220 run Mint 17 and I just upgraded to 3.13.0-30. > They are still affected. After resume they heat up to 100°C until cpu > throttling occurs. A quite serious issue. > > Jörg-Karl Bösner did a reverse-bisect and may have found the evil commit: > https://launchpad.net/bugs/1312860 > > Please backport the fix also to 3.13.x, since this kernel is part of many > "Long Term Support" distros. This BUG, here, only covers false fan speed after booting. For the issue of high temperatures without fan action after resume from disk/RAM, please attach to https://bugzilla.kernel.org/show_bug.cgi?id=78201. Thank you in advance, Manuel Krause
> So Ubuntu would have to apply the patch "ACPI / AC: convert ACPI ac driver to > platform bus" to the 3.13.0-?? patch queue. Just to refine my message a bit...they obviously should apply the *revert* patch. :) Here's also a direct link to the aforementioned "extended stable" Ubuntu kernel, where it already is reverted: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11.4-trusty/ But as Manuel says, in Oliver's case it might actually be a different bug if it reveals itself after suspend.
Looking at this changelog, the revert patch seems to be already part of upcoming 3.13.0-31 Ubuntu kernel. https://launchpad.net/ubuntu/trusty/+source/linux/+changelog
I'm running Archlinux (up to date) and I'm having the same issue. With the latest kernel 3.18.2 the problem is still there for me. Every time I unplug the ac power the laptop fan stops until temp reaches 84°C and then ramps down to 74°C with full speed fan. With kernel 3.11.4 I have no problem at all. Do you guys still have this issue?
There are currently many HP/Compaq notebook owners having problems with kernel 3.18.x. We are waiting for Zhang Rui to wake up from his winter sleep & him to catch up. See: https://bbs.archlinux.org/viewtopic.php?id=192255&p=2 (Read from the first page to get full info, and, some people on there don't handle the full fan speed value correctly.) Most probably you would need to file a new BUG, but I'd attach to it soon with my logs.. Best regards, Manuel
You can also have a look at https://bugzilla.kernel.org/show_bug.cgi?id=78201, if that's something regarding your fan problem. Best regards, Manuel
for my hardware , the problem seems to be resolved by installing the latest beta of osx 10.10.2 , it has a firmware update that solves the issue under linux & windows . hope this helps .
nope , the problem is still there : temperature is fine around 35 to 40 c , but the fans kiks rpm from 2000 to 5900 & then back to 4100 , cpu utilization is 10 - 13 % . kernel : 3.18.4 os : Archlinux Hardware : Macbook Air 2013
PLEASE RESPOND , the problem is solved by updating to a new firmware with osx 10.10.2 , in linux 3.19 , the patch you have made make the laptop very noisy & fans spinning at a very high rpm . in linux 3.14.33 , everything is fine ( thermal , fan rpm ) , so can you please kindly revert or remove the patch , as it's not necessary any more after osx 10.10.2 update .
(In reply to step-ali from comment #73) > PLEASE RESPOND , > > the problem is solved by updating to a new firmware with osx 10.10.2 , > > in linux 3.19 , the patch you have made make the laptop very noisy & fans > > spinning at a very high rpm . > which patch are you referring to? > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) , > > > so can you please kindly revert or remove the patch , as it's not necessary > any > > more after osx 10.10.2 update .
(In reply to Zhang Rui from comment #74) > (In reply to step-ali from comment #73) > > PLEASE RESPOND , > > > > the problem is solved by updating to a new firmware with osx 10.10.2 , > > > > in linux 3.19 , the patch you have made make the laptop very noisy & fans > > > > spinning at a very high rpm . > > > which patch are you referring to? > > > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) , > > > > > > so can you please kindly revert or remove the patch , as it's not necessary > > any > > > > more after osx 10.10.2 update . the patch that made the fans spin harder , all i know is on 3.19 there is no heat but the fans spin at high rpm on 10-15 cpu utilization on 3.14.33 there is heat up to 89 c & the fans doesn't spin up on the same cpu utilization .
(In reply to step-ali from comment #75) > (In reply to Zhang Rui from comment #74) > > (In reply to step-ali from comment #73) > > > PLEASE RESPOND , > > > > > > the problem is solved by updating to a new firmware with osx 10.10.2 , > > > > > > in linux 3.19 , the patch you have made make the laptop very noisy & fans > > > > > > spinning at a very high rpm . > > > > > which patch are you referring to? > > > > > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) , > > > > > > > > > so can you please kindly revert or remove the patch , as it's not > necessary > > > any > > > > > > more after osx 10.10.2 update . > > the patch that made the fans spin harder , > step-ali, actually, I don't think which patch introduces this problem. But there is indeed some bug report complaining that the fan speed never changes after boot, since 3.18. so can you please refer to bug #93301 and check if it is the same commit (6ab3430129e258ea31dd214adf1c760dfafde67a) that introduces this problem for you?
(In reply to Zhang Rui from comment #76) > (In reply to step-ali from comment #75) > > (In reply to Zhang Rui from comment #74) > > > (In reply to step-ali from comment #73) > > > > PLEASE RESPOND , > > > > > > > > the problem is solved by updating to a new firmware with osx 10.10.2 , > > > > > > > > in linux 3.19 , the patch you have made make the laptop very noisy & > fans > > > > > > > > spinning at a very high rpm . > > > > > > > which patch are you referring to? > > > > > > > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) , > > > > > > > > > > > > so can you please kindly revert or remove the patch , as it's not > necessary > > > > any > > > > > > > > more after osx 10.10.2 update . > > > > the patch that made the fans spin harder , > > > step-ali, > actually, I don't think which patch introduces this problem. > But there is indeed some bug report complaining that the fan speed never > changes after boot, since 3.18. > so can you please refer to bug #93301 and check if it is the same commit > (6ab3430129e258ea31dd214adf1c760dfafde67a) that introduces this problem for > you? I don't think so , before 3.18 we had a high cpu utilization (25 to 30%) that was fixed by recent apple osx 10.10.2 update , ( the problem was solved temporarily by disabling some gpe ) but there wasn't any fan or heat problem . After the osx 10.10.2 update ( was during linux 3.18 ) the fan spins up ( very high rpm )on very little cpu utilization ( watching a video in chrome ) & then spins down when idling . on 3.14.33 it's the reverse , the fan doesn't spin up but the temperature rises to 90 degree celsius ( also while watching videos on chrome ) , which is harmful to the laptop . the solution would be something in the middle , BUT PLEASE HURRY , MY MACHINE IS FRYING .
Please 1. rebuild your kernel with the patches at https://bugzilla.kernel.org/show_bug.cgi?id=78201#c150 applied. 2. run echo 'module thermal_sys +fp' > /sys/kernel/debug/dynamic_debug/control after boot 3. attach the dmesg output after the problem is reproduced.
(In reply to Zhang Rui from comment #78) > Please > 1. rebuild your kernel with the patches at > https://bugzilla.kernel.org/show_bug.cgi?id=78201#c150 applied. > 2. run echo 'module thermal_sys +fp' > > /sys/kernel/debug/dynamic_debug/control after boot > 3. attach the dmesg output after the problem is reproduced. sorry , I don't know how to merge a patch & compile . after weeks of testing it looks like another firmware issue that needs to be updated from apple , like the gpe66 issue , because the issue is also occurring in windows too ( high temperature ) . when i first bought the laptop it ran fine with linux , i guess i wish i never updated osx , i never use it anyway . i will submit a bug report to apple & see what happen .
as there is a firmware update, so can you please try 3.14 kernel again with your new firmware(In reply to step-ali from comment #77) > (In reply to Zhang Rui from comment #76) > > (In reply to step-ali from comment #75) > > > (In reply to Zhang Rui from comment #74) > > > > (In reply to step-ali from comment #73) > > > > > PLEASE RESPOND , > > > > > > > > > > the problem is solved by updating to a new firmware with osx 10.10.2 > , > > > > > > > > > > in linux 3.19 , the patch you have made make the laptop very noisy & > fans > > > > > > > > > > spinning at a very high rpm . > > > > > > > > > which patch are you referring to? > > > > > > > > > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) , > > > > > > > > > > > > > > > so can you please kindly revert or remove the patch , as it's not > necessary > > > > > any > > > > > > > > > > more after osx 10.10.2 update . > > > > > > the patch that made the fans spin harder , > > > > > step-ali, > > actually, I don't think which patch introduces this problem. > > But there is indeed some bug report complaining that the fan speed never > > changes after boot, since 3.18. > > so can you please refer to bug #93301 and check if it is the same commit > > (6ab3430129e258ea31dd214adf1c760dfafde67a) that introduces this problem for > > you? > > I don't think so , > > before 3.18 we had a high cpu utilization (25 to 30%) that was fixed by > recent > > apple osx 10.10.2 update , ( the problem was solved temporarily by disabling > > some gpe ) but there wasn't any fan or heat problem . > > > After the osx 10.10.2 update ( was during linux 3.18 ) the fan spins up ( > very > > high rpm )on very little cpu utilization ( watching a video in chrome ) & > then > > spins down when idling . > > > on 3.14.33 it's the reverse , the fan doesn't spin up but the temperature > rises > > to 90 degree celsius ( also while watching videos on chrome ) , which is > > harmful to the laptop. > is this symptom got with updated firmware?
it's after kernel 3.18 & osx firmware update
> > > > on 3.14.33 it's the reverse , the fan doesn't spin up but the temperature > > rises > > > > to 90 degree celsius ( also while watching videos on chrome ) , which is > > > > harmful to the laptop. > > > is this symptom got with updated firmware? I mean did you get this symptom with 3.14 kernel, after firmware updated? Please do the following test on 4.0-rc kernel 1. apply the patches at https://patchwork.kernel.org/patch/6077231/ https://patchwork.kernel.org/patch/6077241/ https://patchwork.kernel.org/patch/6077251/ 2. please apply the two patches attached later 3. after build, please boot with kernel parameter module.dyndbg="module thermal_sys +fp" dyndbg="file thermal_core.c +fp; file step_wise.c +fp" 4. attach the acpidump output of your mac book 5. attach the output of "grep . /sys/class/thermal/*/*/path" after boot 6. attach the dmesg output after the bug reproduced 7. attach the output of "grep . /sys/class/thermal/thermal*/*" after the bug reproduced
Created attachment 171921 [details] patch 4
Created attachment 171931 [details] patch-5
ping...
(In reply to Zhang Rui from comment #82) > > > > > > on 3.14.33 it's the reverse , the fan doesn't spin up but the temperature > > > rises > > > > > > to 90 degree celsius ( also while watching videos on chrome ) , which is > > > > > > harmful to the laptop. > > > > > is this symptom got with updated firmware? > > I mean did you get this symptom with 3.14 kernel, after firmware updated? > > Please do the following test on 4.0-rc kernel > 1. apply the patches at > https://patchwork.kernel.org/patch/6077231/ > https://patchwork.kernel.org/patch/6077241/ > https://patchwork.kernel.org/patch/6077251/ > 2. please apply the two patches attached later > 3. after build, please boot with kernel parameter module.dyndbg="module > thermal_sys +fp" dyndbg="file thermal_core.c +fp; file step_wise.c +fp" > 4. attach the acpidump output of your mac book > 5. attach the output of "grep . /sys/class/thermal/*/*/path" after boot > 6. attach the dmesg output after the bug reproduced > 7. attach the output of "grep . /sys/class/thermal/thermal*/*" after the bug > reproduced yes , the symptom is htere after firmware update on 3.14 lts & 3.19
Upgraded to kernel 3.16 from Debian Jessie repos. Having performed no firmware upgrade, just upgraded OS. Strange, but problem has gone. Here's uname output: $ uname -srvom Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01) x86_64 GNU/Linux $ cat /etc/issue Debian GNU/Linux 8 Installation of 3.16 on Debian 7.x still gives that old problem.
(In reply to E.Glorg from comment #87) > Upgraded to kernel 3.16 from Debian Jessie repos. Having performed no > firmware upgrade, just upgraded OS. Strange, but problem has gone. > Here's uname output: > $ uname -srvom > Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01) x86_64 > GNU/Linux > $ cat /etc/issue > Debian GNU/Linux 8 > Installation of 3.16 on Debian 7.x still gives that old problem. yep , upgrade to any kernel above 3.17 & you will have the problem again
ha , i discovered something strange , under wayland everything is running normally 69 degree celsius ( where it was around 85-90 under Xorg ) fan rpm is 1300 ( 5000 under xorg ) all using the same kernel 3.14.37 lts running under Archlinux . could it be a xorg-server issue ??! if so then how come the problem disappear with kernel under 3.17 ??!
(In reply to step-ali from comment #86) > (In reply to Zhang Rui from comment #82) > > > > > > > > on 3.14.33 it's the reverse , the fan doesn't spin up but the > temperature > > > > rises > > > > > > > > to 90 degree celsius ( also while watching videos on chrome ) , which > is > > > > > > > > harmful to the laptop. > > > > > > > is this symptom got with updated firmware? > > > > I mean did you get this symptom with 3.14 kernel, after firmware updated? > > > > Please do the following test on 4.0-rc kernel > > 1. apply the patches at > > https://patchwork.kernel.org/patch/6077231/ > > https://patchwork.kernel.org/patch/6077241/ > > https://patchwork.kernel.org/patch/6077251/ > > 2. please apply the two patches attached later > > 3. after build, please boot with kernel parameter module.dyndbg="module > > thermal_sys +fp" dyndbg="file thermal_core.c +fp; file step_wise.c +fp" > > 4. attach the acpidump output of your mac book > > 5. attach the output of "grep . /sys/class/thermal/*/*/path" after boot > > 6. attach the dmesg output after the bug reproduced > > 7. attach the output of "grep . /sys/class/thermal/thermal*/*" after the > bug > > reproduced > > yes , the symptom is htere after firmware update on 3.14 lts & 3.19 please do the test and attach the debug information requested above.
sorry , don't know how to apply patches to the kernel , but the problem is still there with kernel 4.0 .
do you know how to build a customized kernel? please download the patches and run "patch -p1 < foo.patch" to apply each of them in ascending order, and then build the kernel.
bug closed as we can more make any progress w/o bug reporter' response and help. Please feel free to reopen it if you can build customized kernel to help debug the issue.
(In reply to Zhang Rui from comment #95) > bug closed as we can more make any progress w/o bug reporter' response and > help. > Please feel free to reopen it if you can build customized kernel to help > debug the issue. sorry , just don't have the time to build a customized kernel , will test with 4.1 .
Mr Zhang Rui, I notice that this bug is still affected HP notebooks in all new kernels. I want to reopen this bug report. I am not experienced user but I tried all popular distributions like Fedora 34 with 5.11 kernel, Ubuntu 20.04 with 5.4 kernel, SUSE Linux Enterprise Desktop 15 SP3 with 5.3 kernel. My fans are constantly in IDLE speed. It is doesn't matter if my CPU usage is 100% or 0%, this have this same low speed. Sometimes my notebook shut down because it is overheating. Sometimes my fans are running 100% speed for few seconds when my hardware is very hot and then this return to behaviour with IDLE speed. So I have question. Why these patches from this bug was not applied to upstream final kernel? Can I fix my issue without compiling new kernel with modifications? I thinked about thermald daemon but I don't know it is compatilbe with AMD processors? If yes, how I can configure it to fix issues? I also finded some program in github to HP 625 but still I am not coder (I learning) so I don't know if it program is working and if it is safe. I hope that you can help. Yours faithfully, PeterQ
Hello. I am still experiencing this bug on 6.1. Could you reopen this bug, so we will be able to solve it?