Distribution: Hardware Environment: Pentium M Software Environment: Problem Description: Steps to reproduce: echo "xx:xx:low_value:xx:xx:" >/proc/acpi/thermal_zone/*/trip_points echo "xx:xx:high_value:xx:xx:" >/proc/acpi/thermal_zone/*/trip_points passive mode still active, even the temperature is far below the trip point
Created attachment 3671 [details] fixes problem, please verify and submit
patch doesn't look completely right, as _all_ results (even on real errors) will lead in a notice that passive cooling is disabled. Also, the problem seems to be that "passive" is still displayed, but may be not enabled in reality because the temperature is low enough again.
yes, but this is as it should be, if I haven't missed anything. Always leave passive mode when temperature is below limit, on errors and when cpu is fast again: if (tz->temperature >= passive->temperature) { ... else if (tz->trips.passive.flags.enabled) { /* disable passive flag on errors or if CPU is fast again */ } Some other concerns? Also, the problem seems to be that "passive" is still displayed, but may be not enabled in reality because the temperature is low enough again. -> Bad enough
I'm curious why this is high severity?
Maybe it's not, but if it goes mainline, it's no problem to add it to the SUSE 9.2 kernel which will be released soon. Hopefully I can provide a patch for remembering userspace governor changes while in passive mode and overridding the passive trip point, even if BIOS does not export a passive tp. However I'am on (real) holidays until Monday, so I can't promise anything. -> changes shouldn't be sever as passive mode seems to be buggy for a longer time and people don't really use it (as most vendors don't export the tp or set it very high). Thanks.
I've made some patches for acpi thermal. I think this issue found by Thomas might be caused by some confusing in current code. So pls help to review it.
Created attachment 4012 [details] p00-acpi_thermal_default_polling.patch this patch made by David, Shaohua adds default polling frequency if we get _TZP failed or polling is required. See acpi spec for definition.
Created attachment 4013 [details] p01-acpi_thermal_check_coolingmode.patch This patch is based on my previous patch at http://bugme.osdl.org/show_bug.cgi?id=1770 It cleans up code, creats acpi_thermal_check_cooling_mode() function because there might be other place to use it.
Created attachment 4014 [details] p01-acpi_thermal_check_coolingmode.patch This patch is based on my previous patch at http://bugme.osdl.org/show_bug.cgi?id=1770 It cleans up code, creats acpi_thermal_check_cooling_mode() function because there might be other place to use it.
Comment on attachment 4013 [details] p01-acpi_thermal_check_coolingmode.patch oops
Created attachment 4019 [details] p02-acpi_thermal_passive_leave_fix.patch oh, sorry for my mistake. p00 & p01 maybe not aim to resolve this issue. how about this one? is this enough? i've tested it on one sony, it leaves passive well and state became 'ok' again. thanks, -zhen
Sorry, give me some time over the weekend, I have not time currently, but I will test the patch soon. Thanks for looking into this.
with my processor return patch, it seems we should do like this in thermal. ... else if (tz->trips.passive.flags.enabled) { result = 1; for (i=0; i<passive->devices.count; i++) result &= acpi_processor_set_thermal_limit( passive->devices.handles[i], ACPI_PROCESSOR_LIMIT_DECREMENT); if (result == 1) { tz->trips.passive.flags.enabled = 0; ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Disabling passive cooling (zone is cool)\n")); } }
Created attachment 4170 [details] updated patch
patch needs to fix (and comment) the return value from acpi_processor_set_thermal_limit(), which currently mixes success and failure values...
Created attachment 4249 [details] update by Len's comment
Created attachment 4292 [details] update with current processor.c split patch convert patch into new split processor_thermal.c
sorry for replying that late: I tested a lot before Christmas: The patch seems to work fine. However, the CPU freq seems to be stuck at lowest freq, once you entered the passive mode, even if the system has cooled down again, but this is another problem. Possibly a speedstep-centrino bug? Will do more testing, as soon as I have some time. Maybe someone else wants to have a look at this, too. Thanks for fixing this one.
What's the kernel version you've tested? it seems like a cpufreq bug, but i'm not sure about that. Any details? Feel free to open another bug. thanks
The patch is wrong, or at least not nice: If throttling is supported I expect that cpufreq always stays at lowest freq when passive mode is left, because of this: if (pr->flags.throttling) { - if (tx == 0) + if (tx == 0) { ACPI_DEBUG_PRINT((ACPI_DB_INFO, "At minimum throttling state\n")); + return_VALUE(1); + } Also these lines: if (tz->trips.hot.flags.enabled) tz->state.hot = 1; + else + tz->state.hot = 0; should be converted to: tz->state.hot = tz->trips.hot.flags.enabled; (also for critical, passive and active). I can come up with a patch soon, still testing...
And the last issue (in processor_thermal.c): Passive mode is left, when it is tried to increase cpufreq after the highest freq is already reached. But passive mode must be left as soon as the highest freq is reached again, not one cycle after that (-ERANGE)... Will attach a modified patch, please review and consider to apply soon, if possible. Thanks.
Created attachment 5976 [details] leave passive mode as soon as passive cooling policy allows maximum cpufrequency to be set again Be careful: If testing, you need to set a value (good values are about 2-20 seconds) in /proc/acpi/thermal_zone/*/polling_frequency, or the frequency will always stay on lowest frequency after leaving passive cooling mode.
Created attachment 6280 [details] 2.6.13 patch to leave passive thermal polling Sorry, for posting that late. This one came into SL10.0 final and therefore should be considered as safe. Affected machines (critical shutdowns) are mainly new IBM thinkpads. Their temperature raises quite fast and they also have a high tsp value (600 means thermal polling should be done only every minute if passive mode is reached) defined which leads too critical shutdowns. Could this be applied to mainline, please?
setting to RESOLVED since a proposed fix is attached
Created attachment 6287 [details] patch for 2.6.14-rc2, some cleanups
Still complains about melting hardware for 2.6.14 - OpenSuse kernels are based on plain, very recent kernels now... Also see: https://bugzilla.novell.com/show_bug.cgi?id=131543 (should be publically accessable?) This fix should go mainline ASAP?
it seems this is not in any acpi patchset or in mainline, can it please get reopened?
Created attachment 6727 [details] latest patch vs 2.6.15-rc3 attached cleaned up patch from Konstantin applied to acpi-test tree
shipped in linux-2.6.15-rc5 -- closing.
There slipped in a little bug that avoids cpufreq cooling in passive mode at all. This patch should fix it.
Created attachment 6869 [details] Return value invertion prevents cpufreq cooling in passive cooling mode
patch in comment #31 checked into acpi-test tree
Shipped in 2.6.16-rc1-git6 -- closing.