Bug 8893 - Critical temperature reached (5487 C) if CONFIG_HWMON=y
Summary: Critical temperature reached (5487 C) if CONFIG_HWMON=y
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-16 06:27 UTC by Encolpe Degoute
Modified: 2008-06-13 21:12 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.19
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
lsmod output (4.50 KB, text/plain)
2007-08-16 06:29 UTC, Encolpe Degoute
Details
cpuid output (2.48 KB, text/plain)
2007-08-16 06:29 UTC, Encolpe Degoute
Details
sensors output (196 bytes, text/plain)
2007-08-16 06:32 UTC, Encolpe Degoute
Details
sensors.conf file (83.08 KB, application/octet-stream)
2007-08-16 06:33 UTC, Encolpe Degoute
Details

Description Encolpe Degoute 2007-08-16 06:27:45 UTC
Most recent kernel where this bug did not occur: 2.6.18
Distribution: Debian unstable
Hardware Environment: DELL Inspiron 9400
Software Environment: Debian lenny/KDE 3.5
Problem Description:

I didn't have this bug before 2.6.19 on my debian kernel but since I can have
this critical trop point three times a day like only one time a week.
I didn't think it's only a conversion bug (see below).
My dual core is reported in logs to be between 40 C and 55 C usually, and never reaches 60 C under load. The best score is 5745 C for 57 C

# grep -A 4 "Critical trip point" syslog.?
syslog.2:Jul  5 00:33:06 gosseyn kernel: ACPI: Critical trip point
syslog.2-Jul  5 00:33:06 gosseyn kernel: Critical temperature reached (1135 C),
shutting down.
syslog.2-Jul  5 00:33:06 gosseyn shutdown[13784]: shutting down for system halt
syslog.2-Jul  5 00:33:06 gosseyn init: Switching to runlevel: 0
syslog.2-Jul  5 00:33:08 gosseyn kernel: Critical temperature reached (41 C),
shutting down.
--
syslog.3:Jul  4 00:49:49 gosseyn kernel: ACPI: Critical trip point
syslog.3-Jul  4 00:49:49 gosseyn kernel: Critical temperature reached (5487 C),
shutting down.
syslog.3-Jul  4 00:49:49 gosseyn shutdown[10029]: shutting down for system halt
syslog.3-Jul  4 00:49:49 gosseyn init: Switching to runlevel: 0
syslog.3-Jul  4 00:49:51 gosseyn kernel: Critical temperature reached (53 C),
shutting down.
--
syslog.5:Jul  1 21:49:23 gosseyn kernel: ACPI: Critical trip point
syslog.5-Jul  1 21:49:23 gosseyn kernel: Critical temperature reached (1135 C),
shutting down.
syslog.5-Jul  1 21:49:23 gosseyn shutdown[8231]: shutting down for system halt
syslog.5-Jul  1 21:49:23 gosseyn init: Switching to runlevel: 0
syslog.5-Jul  1 21:50:20 gosseyn syslog-ng[3338]: syslog-ng starting up;
version='2.0.0'
--
syslog.6:Jul  1 00:04:08 gosseyn kernel: ACPI: Critical trip point
syslog.6-Jul  1 00:04:08 gosseyn kernel: Critical temperature reached (1135 C),
shutting down.
syslog.6-Jul  1 00:04:08 gosseyn shutdown[12701]: shutting down for system halt
syslog.6-Jul  1 00:04:08 gosseyn init: Switching to runlevel: 0
syslog.6-Jul  1 00:04:10 gosseyn kernel: Critical temperature reached (41 C),
shutting down.

Steps to reproduce:

Random.
This bug can come when nothing is done or under heavy load, when working console only or with 3D application.
Comment 1 Encolpe Degoute 2007-08-16 06:29:04 UTC
Created attachment 12401 [details]
lsmod output
Comment 2 Encolpe Degoute 2007-08-16 06:29:53 UTC
Created attachment 12402 [details]
cpuid output
Comment 3 Encolpe Degoute 2007-08-16 06:32:54 UTC
Created attachment 12403 [details]
sensors output
Comment 4 Encolpe Degoute 2007-08-16 06:33:23 UTC
Created attachment 12404 [details]
sensors.conf file
Comment 5 Jean Delvare 2007-08-16 06:52:55 UTC
Is this bug still present in 2.6.22.3?

Can you reproduce this bug with CONFIG_HWMON=n?

Can you reproduce this bug with CONFIG_I2C_I801=n?
Comment 6 Encolpe Degoute 2007-08-19 09:17:47 UTC
Yes, it's still present in 2.6.22.3

I compile a new kernel with(out) these options. I would give you a result within a wekk.
Comment 7 Encolpe Degoute 2007-08-20 02:52:47 UTC
(In reply to comment #6)
> Yes, it's still present in 2.6.22.3
> 
> I compile a new kernel with(out) these options. I would give you a result
> within a week.

A new reboot this morning because of critical temperature reached (879 C).
Where can I set a trace for this ?
Comment 8 Jean Delvare 2007-08-20 03:10:24 UTC
So you confirm that the problem still happens with CONFIG_HWMON=n and CONFIG_I2C_I801=n?
Comment 9 Encolpe Degoute 2007-08-20 15:40:58 UTC
Yes.

Do you want I set CONFIG_ACPI_DEBUG or something else in kernel compilation ?
Comment 10 Jean Delvare 2007-08-20 23:47:45 UTC
Then it's an ACPI bug.
Comment 11 Len Brown 2007-08-24 00:52:42 UTC
This did NOT happen in 2.6.18,
and it started happening in 2.6.19
and continues to happen in 2.6.23-rc3?

Please build 2.6.23-rc3 or later with CONFIG_HWMON=n
and set thermal.nocrt=1 to disable critical trip point actions.

Please attach the complete output from dmesg -s64000

please include the output from more /proc/acpi/thermal_zone/*/* | cat

please read /proc/acpi/thermal_zone/*/temperature continuously
and see if you can observe it jump to erroneous values.
Comment 12 Encolpe Degoute 2007-08-30 12:46:19 UTC
After a new recompilation again with CONFIG_HWMON=n the bug doesn't occur anymore since 10 days.
Comment 13 Len Brown 2007-08-30 21:54:14 UTC
comment #12 seems to contradict comment #9 -- can you clarify?
Comment 14 Encolpe Degoute 2007-09-01 12:22:33 UTC
#9 is obsolete.
Comment 15 Jean Delvare 2007-09-01 12:58:45 UTC
(In reply to comment #12)
> After a new recompilation again with CONFIG_HWMON=n the bug doesn't occur
> anymore since 10 days.

What was the difference between both kernels then?
Comment 16 Encolpe Degoute 2007-09-02 07:58:46 UTC
It's like that the .config I used for the first compilation wasn't the good one.

The only other things I modified with this compilation is this:

-CONFIG_PM_LEGACY=y
+# CONFIG_PM_LEGACY is not set


-# CONFIG_ACPI_DEBUG is not set
+CONFIG_ACPI_DEBUG=y


One other clue: with standard debian kernel the bug don't occur if i801 module and hwmon module aren't loaded in /etc/modules.
Comment 17 Jean Delvare 2007-09-02 08:22:36 UTC
(In reply to comment #16)
> One other clue: with standard debian kernel the bug don't occur if i801
> module
> and hwmon module aren't loaded in /etc/modules.

Interesting. What version is this standard debian kernel?

Please try blacklisting only i2c-i801, and then only hwmon, in /etc/modules, to figure out which driver is conflicting exactly.

BTW, your lsmod output in comment #1 suggests that hwmon is built into the kernel and not as a module, so blacklisting it won't work, you'd need to blacklist the coretemp driver itself instead. Please clarify what you blacklisted exactly.
Comment 18 Encolpe Degoute 2007-09-02 15:01:21 UTC
Today I put 2.6.22-4. hwmon is build into; hwmon-vid, coretemp and i2c-i801 are build as modules.
First, I will try to reproduce the bug with hwmon-vid only, then with coretemp +  i2c-i801. 
These two last were proposed by sensors-detect for this chipset:
SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
Comment 19 Jean Delvare 2007-09-03 00:41:06 UTC
hwmon-vid is a helper module, don't bother testing it, it won't change a thing. BTW, why are you loading it at all, given that none of the other modules you use need it?

Only the i2c-i801 driver is for the Intel ICH7 SMBus. The coretemp driver reports the CPU temperature directly. They do _not_ depend on each other, so please test them separately. The whole point of the test is to find out which of these two drivers is causing trouble.
Comment 20 Encolpe Degoute 2007-10-11 02:44:18 UTC
2.6.22-4 Debian kernel seems to be 2.6.22-2 vanilla version.
I cannot reproduce the bug with this version from weeks.
You may can close it.
Comment 21 Fu Michael 2007-11-12 17:32:31 UTC
mark as fixed then.

Note You need to log in before you can comment on or make changes to this bug.