Most recent kernel where this bug did not occur: 2.6.20 Distribution: Ubuntu 7.10 beta Hardware Environment: Avalue ECM-945GM embedded core duo board Software Environment: Problem Description: The BIOS does not seem to communicate the correct temperature trip points to the OS. It sets CPU temperature trip points as: critical: 60C passive, 50C active, 50C. I am using the T7200 CPU (rated up to 100C, critical is 125C). With 2.6.20 I can successfully override the critical trip point and with two copies of CPUBurn running the system stabilizes at about 67C, (at ~23C room temp), and does not suffer from shutdown. But of course that's not an option with 2.6.22. I would much rather be able to do that from user space. The other options (modifying ACPI tables, turning off thermal module, preventing power down command from running), all seem pretty hokey. Staying with the older version fixes that problem but introduces another. There is no fan control on this embedded system, it's always on. And there is a requirement for the high cpu performance. attachments to follow.
Created attachment 13058 [details] ACPIDump file
Created attachment 13059 [details] DSDT.dsl DSDT dsl file, note recompile fails with: Intel ACPI Component Architecture ASL Optimizing Compiler version 20061109 [May 16 2007] Copyright (C) 2000 - 2006 Intel Corporation Supports ACPI Specification Revision 3.0a DSDT.dsl 373: Method (\_WAK, 1, NotSerialized) Warning 1079 - ^ Reserved method must return a value (_WAK) DSDT.dsl 405: Store (Local0, Local0) Error 4049 - ^ Method local variable is not initialized (Local0) DSDT.dsl 410: Store (Local0, Local0) Error 4049 - ^ Method local variable is not initialized (Local0) ASL Input: DSDT.dsl - 4997 lines, 164074 bytes, 1790 keywords Compilation complete. 2 Errors, 1 Warnings, 0 Remarks, 531 Optimizations
Created attachment 13087 [details] patch-allow-override-critical-trip-point Hi, Jon, Are there any options in the BIOS to set trip points? You can override them in the BIOS if the answer is yes. If not, please apply the patch I attached and try the boot parameter "thermal.crt=xxx". Note that the temperature of the thermal zone is not equal to the temperature of the processor, so a high critical trip point (like 100C, 125C) may be dangerous.
Hi, Jon, any updates on this? can the patch work for you?
Hi, Len, thermal.crt can only be used to lower the critical trip point in current code. And this can't sovle the problem shown in this bug. The patch in comment #3 allows userspace to override the trip point to higher temperatures as well. Any comments on this?
Does Windows work on this board? Rui, I guess I'm okay with allowing higher crt thresholds if we include a warning. please rebase patch to tip.
Created attachment 16511 [details] patch: allow overriding critical threshold to higher value
len, patch in comment #7 is on top of 2.6.26-rc6. :)
I don't see the warning Len asked for in the latest patch? Can you please add it? I'm a little uneasy with the concept in general. If it's an embedded system you control why can't you change the ACPI tables? And is it useful on a wider range of systems? Could people cook their systems with increasing the trip point? (I think yes) Please someone reopen the bug, I am not allowed to do that.
The ACPI tables will not recompile with available tools as above. The manufacturer hasn't / won't fix them. I'm just a user who needs a fix, it shouldn't be my problem to fix the manufacturer's broken BIOS (beyond my ability)!
(In reply to comment #9) > I don't see the warning Len asked for in the latest patch? Can you please add > it? > - if (crt_k < tz->trips.critical.temperature) - tz->trips.critical.temperature = crt_k; + if (crt_k > tz->trips.critical.temperature) + printk(KERN_WARNING PREFIX + "Critical threshold %d C\n", crt); + tz->trips.critical.temperature = crt_k; this is the warning when user tries to increase the critical threshold. > I'm a little uneasy with the concept in general. If it's an embedded > system you control why can't you change the ACPI tables? And is it useful > on a wider range of systems? Could people cook their systems with > increasing the trip point? (I think yes) > yes, they could. I agree that overriding the critical threshold with higher values is dangerous. But this could be used to fix a lot of other laptops which we used to. This is a regression for the users who have a bogus critical threshold on their laptops.
The bogus critical trip point is a BIOS bug -- apparently one that the manufacturer is unwilling to fix. So i'm changing the category of this report to ACPI/BIOS. No word on if Windows works properly on this board, the assumption is that it does not. Jon, Does thermal.nocrt make the problem go away? Note that re-defining the thermal trip point does not guarantee that the EC on the board will actually trigger around when that temperatre is exceeded, so it is not necessarily a solid solution. Please verify that thermal.nocrt makes the issue go away. If it does, please attach the output from dmidecode. The practical solution may be simply to invoke "thermal.nocrt" on this board automatcially.
Len, Let me understand the procedure to do this (currently on Ubuntu 8.04, 2.6.24-19). Adding thermal.nocrt=1 as a kernel boot option does not work (feature not in this version?). After hunting around I have attempted to modify /etc/modprobe.d/options adding the line options thermal nocrt=1 but I can't seem to find a way to tell whether this is correct (other than trying the thermal stress test which I will but it would be nice to know if this is the correct way to disable the critical check). I think disabling critical actions entirely is much more dangerous than just bumping up the critical trip temperature to a more reasonable value.
(In reply to comment #13) > I think disabling critical actions entirely is much more dangerous than just > bumping up the critical trip temperature to a more reasonable value. > agree. Considering that we used to allow users overriding higher crt threshold, the patch in comment #7 is not that bad. Without the patch, users can only 1. wait for the computer to restart once the temperature reaches 60 C. 2. under the risk of cooking their system with module parameter thermal.nocrt=1.
Andi, Len, I agree that thermal.nocrt=1 is more dangerous than using a higher critical threshold here. can you apply the patch in comment #7 please?
Jon, re: comment #13 if you succeed in overriding the trip point, your setting will be visible in /proc/acpi/thermal_zone/*/trip_points in the case of thermal.nocrt, the trip point is unchanged, but the trip action is ignored. In the case of thermal.crt=-1, the trip point will simply vanish from the files above. in the case of thermal.crt=N, it will be set to N The change here is the ability to make N higher than the BIOS default. The problem with reasoning "bumping up critical is less dangerous than disabling critical" is that bumping up the critical trip point may actually just give you the illusion of control that you don't actually have. ie. the EC decides when/if to send a thermal event which is what we use to compare the temperature to the trip points. There is absolutely no assurance that the EC will do this near the new fake trip point. and... "The greatest obstacle to discovery is not ignorance -- it is the illusion of knowledge." so i don't really like it, but i'll apply the patch in commment #7 to 'keep the customer satisfied':-)
shipped in linux-2.6.28-rc1 closed commit 22a94d79a34bf010d11996d30eed8ee3fc1a4fbf Author: Zhang Rui <rui.zhang@intel.com> Date: Fri Oct 17 02:41:20 2008 -0400 ACPI: Allow overriding to higher critical trip point. http://bugzilla.kernel.org/show_bug.cgi?id=9129