Bug 3410

Summary: passive mode is not left, once entered
Product: ACPI Reporter: Thomas Renninger (trenn)
Component: Power-ThermalAssignee: Konstantin Karasyov (konstantin.karasyov)
Status: CLOSED CODE_FIX    
Severity: high CC: acpi-bugzilla, rene
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.8 Subsystem:
Regression: --- Bisected commit-id:
Attachments: fixes problem, please verify and submit
p00-acpi_thermal_default_polling.patch
p01-acpi_thermal_check_coolingmode.patch
p01-acpi_thermal_check_coolingmode.patch
p02-acpi_thermal_passive_leave_fix.patch
updated patch
update by Len's comment
update with current processor.c split patch
leave passive mode as soon as passive cooling policy allows maximum cpufrequency to be set again
2.6.13 patch to leave passive thermal polling
patch for 2.6.14-rc2, some cleanups
latest patch vs 2.6.15-rc3
Return value invertion prevents cpufreq cooling in passive cooling mode

Description Thomas Renninger 2004-09-16 11:04:55 UTC
Distribution: 
Hardware Environment: Pentium M 
Software Environment: 
Problem Description: 
 
Steps to reproduce:  
echo "xx:xx:low_value:xx:xx:" >/proc/acpi/thermal_zone/*/trip_points 
echo "xx:xx:high_value:xx:xx:" >/proc/acpi/thermal_zone/*/trip_points 
 
passive mode still active, even the temperature is far below the trip point
Comment 1 Thomas Renninger 2004-09-16 11:07:06 UTC
Created attachment 3671 [details]
fixes problem, please verify and submit
Comment 2 Dominik Brodowski 2004-09-17 12:13:33 UTC
patch doesn't look completely right, as _all_ results (even on real errors) will
lead in a notice that passive cooling is disabled. Also, the problem seems to be
that "passive" is still displayed, but may be not enabled in reality because the
temperature is low enough again.
Comment 3 Thomas Renninger 2004-09-20 03:26:40 UTC
yes, but this is as it should be, if I haven't missed anything.
Always leave passive mode when temperature is below limit, on errors and when
cpu is fast again:

	if (tz->temperature >= passive->temperature) {
                 ...
	else if (tz->trips.passive.flags.enabled) {
                 /* disable passive flag on errors or 
                    if CPU is fast again */
        }

Some other concerns?

Also, the problem seems to be
that "passive" is still displayed, but may be not enabled in reality because the
temperature is low enough again.
-> Bad enough
Comment 4 Luming Yu 2004-09-22 02:53:59 UTC
I'm curious why this is high severity?
Comment 5 Thomas Renninger 2004-09-22 04:20:04 UTC
Maybe it's not, but if it goes mainline, it's no problem to add it to the SUSE
9.2 kernel which will be released soon.
Hopefully I can provide a patch for remembering userspace governor changes while
in passive mode and overridding the passive trip point, even if BIOS does not
export a passive tp. However I'am on (real) holidays until Monday, so I can't
promise anything.
-> changes shouldn't be sever as passive mode seems to be buggy for a longer
time and people don't really use it (as most vendors don't export the tp or set
it very high).
Thanks.
Comment 6 Wang, Zhenyu Z 2004-11-11 01:38:32 UTC
I've made some patches for acpi thermal. I think this issue found by Thomas
might be caused by some confusing in current code.
So pls help to review it.
Comment 7 Wang, Zhenyu Z 2004-11-11 01:41:01 UTC
Created attachment 4012 [details]
p00-acpi_thermal_default_polling.patch

this patch made by David, Shaohua adds default polling frequency
if we get _TZP failed or polling is required. See acpi spec for definition.
Comment 8 Wang, Zhenyu Z 2004-11-11 01:48:02 UTC
Created attachment 4013 [details]
p01-acpi_thermal_check_coolingmode.patch

This patch is based on my previous patch at 
http://bugme.osdl.org/show_bug.cgi?id=1770
It cleans up code, creats acpi_thermal_check_cooling_mode() function because
there might be other place to use it.
Comment 9 Wang, Zhenyu Z 2004-11-11 01:48:28 UTC
Created attachment 4014 [details]
p01-acpi_thermal_check_coolingmode.patch

This patch is based on my previous patch at 
http://bugme.osdl.org/show_bug.cgi?id=1770
It cleans up code, creats acpi_thermal_check_cooling_mode() function because
there might be other place to use it.
Comment 10 Wang, Zhenyu Z 2004-11-11 02:41:16 UTC
Comment on attachment 4013 [details]
p01-acpi_thermal_check_coolingmode.patch

oops
Comment 11 Wang, Zhenyu Z 2004-11-12 00:52:57 UTC
Created attachment 4019 [details]
p02-acpi_thermal_passive_leave_fix.patch

oh, sorry for my mistake. p00 & p01 maybe not aim to resolve this issue.
how about this one? is this enough? 
i've tested it on one sony, it leaves passive well and state became 'ok'
again.
thanks,
-zhen
Comment 12 Thomas Renninger 2004-11-12 04:41:56 UTC
Sorry, give me some time over the weekend, I have not time currently, but I will
test the patch soon.
Thanks for looking into this.
Comment 13 Wang, Zhenyu Z 2004-11-16 18:31:56 UTC
with my processor return patch, it seems we should do like this in thermal.
...
else if (tz->trips.passive.flags.enabled) {
                result = 1;
                for (i=0; i<passive->devices.count; i++)
                        result &= acpi_processor_set_thermal_limit(
                                passive->devices.handles[i],
                                ACPI_PROCESSOR_LIMIT_DECREMENT);
                if (result == 1) {
                        tz->trips.passive.flags.enabled = 0;
                        ACPI_DEBUG_PRINT((ACPI_DB_INFO,
                                "Disabling passive cooling (zone is cool)\n"));
                }
        }
Comment 14 Wang, Zhenyu Z 2004-11-29 21:24:30 UTC
Created attachment 4170 [details]
updated patch
Comment 15 Len Brown 2004-12-01 18:45:15 UTC
patch needs to fix (and comment) the return value from 
acpi_processor_set_thermal_limit(), which currently mixes
success and failure values...
Comment 16 Wang, Zhenyu Z 2004-12-08 17:38:26 UTC
Created attachment 4249 [details]
update by Len's comment
Comment 17 Wang, Zhenyu Z 2004-12-22 23:59:32 UTC
Created attachment 4292 [details]
update with current processor.c split patch

convert patch into new split processor_thermal.c
Comment 18 Thomas Renninger 2005-01-05 08:19:41 UTC
sorry for replying that late:
I tested a lot before Christmas:
  The patch seems to work fine.

However, the CPU freq seems to be stuck at lowest freq, once you entered the
passive mode, even if the system has cooled down again, but this is another problem.
Possibly a speedstep-centrino bug?
Will do more testing, as soon as I have some time. Maybe someone else wants to
have a look at this, too.

Thanks for fixing this one.
Comment 19 Wang, Zhenyu Z 2005-01-05 17:17:05 UTC
What's the kernel version you've tested? it seems like a cpufreq bug, but
i'm not sure about that. Any details? Feel free to open another bug.

thanks
Comment 20 Thomas Renninger 2005-09-12 03:06:08 UTC
The patch is wrong, or at least not nice: 
 
If throttling is supported I expect that cpufreq always stays at lowest freq 
when passive mode is left, because of this: 
 		if (pr->flags.throttling) { 
-			if (tx == 0) 
+			if (tx == 0) { 
 				ACPI_DEBUG_PRINT((ACPI_DB_INFO, 
 					"At minimum throttling state\n")); 
+				return_VALUE(1); 
+			} 
 
 
Also these lines: 
 	if (tz->trips.hot.flags.enabled) 
 		tz->state.hot = 1; 
+	else 
+		tz->state.hot = 0; 
 
should be converted to: 
       tz->state.hot = tz->trips.hot.flags.enabled; 
(also for critical, passive and active). I can come up with a patch soon, still 
testing... 
Comment 21 Thomas Renninger 2005-09-12 03:25:21 UTC
And the last issue (in processor_thermal.c): 
Passive mode is left, when it is tried to increase cpufreq after the highest 
freq is already reached. 
But passive mode must be left as soon as the highest freq is reached again, not 
one cycle after that (-ERANGE)... Will attach a modified patch, please review 
and consider to apply soon, if possible. Thanks. 
Comment 22 Thomas Renninger 2005-09-12 03:28:01 UTC
Created attachment 5976 [details]
leave passive mode as soon as passive cooling policy allows maximum cpufrequency to be set again

Be careful: If testing, you need to set a value (good values are about 2-20
seconds) in /proc/acpi/thermal_zone/*/polling_frequency, or the frequency will
always stay on lowest frequency after leaving passive cooling mode.
Comment 23 Thomas Renninger 2005-10-11 08:36:39 UTC
Created attachment 6280 [details]
2.6.13 patch to leave passive thermal polling

Sorry, for posting that late.
This one came into SL10.0 final and therefore should be considered as safe.
Affected machines (critical shutdowns) are mainly new IBM thinkpads. Their
temperature raises quite fast and they also have a high tsp value (600 means
thermal polling should be done only every minute if passive mode is reached)
defined which leads too critical shutdowns.
Could this be applied to mainline, please?
Comment 24 Len Brown 2005-10-11 09:36:42 UTC
setting to RESOLVED since a proposed fix is attached
Comment 25 Alexey Starikovskiy 2005-10-12 10:21:29 UTC
Created attachment 6287 [details]
patch for 2.6.14-rc2, some cleanups
Comment 26 Thomas Renninger 2005-10-31 05:56:56 UTC
Still complains about melting hardware for 2.6.14 - OpenSuse kernels are based 
on plain, very recent kernels now... 
Also see: https://bugzilla.novell.com/show_bug.cgi?id=131543 (should be 
publically accessable?) 
This fix should go mainline ASAP? 
Comment 27 Dirk Mueller 2005-11-02 17:17:55 UTC
it seems this is not in any acpi patchset or in mainline, can it please get 
reopened? 
 
 
Comment 28 Len Brown 2005-11-30 19:42:43 UTC
Created attachment 6727 [details]
latest patch vs 2.6.15-rc3

attached cleaned up patch from Konstantin applied to acpi-test tree
Comment 29 Len Brown 2005-12-05 13:42:11 UTC
shipped in linux-2.6.15-rc5 -- closing.
Comment 30 Thomas Renninger 2005-12-21 01:29:21 UTC
There slipped in a little bug that avoids cpufreq cooling in passive mode at
all. This patch should fix it.
Comment 31 Thomas Renninger 2005-12-21 01:33:12 UTC
Created attachment 6869 [details]
Return value invertion prevents cpufreq cooling in passive cooling mode
Comment 32 Len Brown 2005-12-22 18:55:14 UTC
patch in comment #31 checked into acpi-test tree
Comment 33 Len Brown 2006-02-02 14:39:44 UTC
Shipped in 2.6.16-rc1-git6 -- closing.