Bug 9041

Summary: kernel acpi reads wrong temperature - critical shutdown
Product: ACPI Reporter: Christoph Resch (shanti)
Component: Power-ThermalAssignee: acpi_config-processors
Status: REJECTED WILL_NOT_FIX    
Severity: high CC: acpi-bugzilla, protasnb, yyyeer.bo
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.18.8-0.5 Subsystem:
Regression: --- Bisected commit-id:

Description Christoph Resch 2007-09-19 15:55:54 UTC
Most recent kernel where this bug did not occur: unknown
Distribution:
Hardware Environment: https://bugzilla.novell.com/attachment.cgi?id=159317
Software Environment: opensuse 10.2
Problem Description:

regulary my system shutsdown on a "wrong" temperature-alert from ACPI:
trippoint is reached by bogus value from ACPI



Sep 19 23:27:44 zion kernel: ACPI: Critical trip point
Sep 19 23:27:44 zion kernel: Critical temperature reached (91 C), shutting down.
Sep 19 23:27:44 zion kernel: ACPI: Unable to turn cooling device [dffecd88] 'on'
Sep 19 23:27:44 zion shutdown[3753]: shutting down for system halt
Sep 19 23:27:44 zion powersaved[3436]: WARNING (checkTemperatureStateChanges:218) Temperature state changed to c
ritical.
Sep 19 23:27:46 zion init: Switching to runlevel: 0
Sep 19 23:27:50 zion kernel: Critical temperature reached (43 C), shutting down.

tnx 4 support 

Steps to reproduce:  waiting to happen
Comment 1 Christoph Resch 2007-09-19 15:56:29 UTC
https://bugzilla.novell.com/show_bug.cgi?id=259992 has more history
Comment 2 Christoph Resch 2007-09-20 02:38:47 UTC
trippoint is set to 60°C .. i never ever reached that value .. 6 seconds later the value shows up correct , but since powersaved ( or else ) have no margins set to retry reading acpi .. the system is doomed to shutdown .. 

Developers mentioned that it is no good to recode routines to handle bogus values .. is this a bug in the kernel-chipset driver ? 
i also reported to my bios-supplier , they ( of course ) refused support saying they have a magnificant bios without errors ( it only release 12 ) 

there seem to be some areas in driver-code that are incomplete 

my system= AMD3800X2 on RS480-Mainboard (Shuttle ST20G5)  .. the chipsetmodule is "it87" .. disabling thermalmonitoring by stopping lm_sensors kindo defeats the purpose .. so i think it must be fixed in driver 

i think there are several issues with powersaving on this hardware .. also the WHITE SCREEN ( http://forums.suselinuxsupport.de/index.php?showtopic=36370  )  crash happens on this hardware, but since one mortal people have access to FGLRX-sourcecode debugging this seems not to get easy .. 

i want to get rid of this issue - pls comment for more debug information , i will get it posted - just tell me how i can help :-) 

tnx4help
Comment 3 Natalie Protasevich 2007-11-28 18:54:47 UTC
Christoph,
Did you have chance to try with newer kernel, such as recent 2.6.24+? This kernel version is definitely too old.
Thanks.
Comment 4 Alexey Starikovskiy 2007-12-14 03:14:49 UTC
Christoph,
lm_sensors and ACPI use the same hardware for getting temperature and are not able 
to coexist on many systems, yours included.