Most recent kernel where this bug did not occur: 2.6.18 Distribution: Debian unstable Hardware Environment: DELL Inspiron 9400 Software Environment: Debian lenny/KDE 3.5 Problem Description: I didn't have this bug before 2.6.19 on my debian kernel but since I can have this critical trop point three times a day like only one time a week. I didn't think it's only a conversion bug (see below). My dual core is reported in logs to be between 40 C and 55 C usually, and never reaches 60 C under load. The best score is 5745 C for 57 C # grep -A 4 "Critical trip point" syslog.? syslog.2:Jul 5 00:33:06 gosseyn kernel: ACPI: Critical trip point syslog.2-Jul 5 00:33:06 gosseyn kernel: Critical temperature reached (1135 C), shutting down. syslog.2-Jul 5 00:33:06 gosseyn shutdown[13784]: shutting down for system halt syslog.2-Jul 5 00:33:06 gosseyn init: Switching to runlevel: 0 syslog.2-Jul 5 00:33:08 gosseyn kernel: Critical temperature reached (41 C), shutting down. -- syslog.3:Jul 4 00:49:49 gosseyn kernel: ACPI: Critical trip point syslog.3-Jul 4 00:49:49 gosseyn kernel: Critical temperature reached (5487 C), shutting down. syslog.3-Jul 4 00:49:49 gosseyn shutdown[10029]: shutting down for system halt syslog.3-Jul 4 00:49:49 gosseyn init: Switching to runlevel: 0 syslog.3-Jul 4 00:49:51 gosseyn kernel: Critical temperature reached (53 C), shutting down. -- syslog.5:Jul 1 21:49:23 gosseyn kernel: ACPI: Critical trip point syslog.5-Jul 1 21:49:23 gosseyn kernel: Critical temperature reached (1135 C), shutting down. syslog.5-Jul 1 21:49:23 gosseyn shutdown[8231]: shutting down for system halt syslog.5-Jul 1 21:49:23 gosseyn init: Switching to runlevel: 0 syslog.5-Jul 1 21:50:20 gosseyn syslog-ng[3338]: syslog-ng starting up; version='2.0.0' -- syslog.6:Jul 1 00:04:08 gosseyn kernel: ACPI: Critical trip point syslog.6-Jul 1 00:04:08 gosseyn kernel: Critical temperature reached (1135 C), shutting down. syslog.6-Jul 1 00:04:08 gosseyn shutdown[12701]: shutting down for system halt syslog.6-Jul 1 00:04:08 gosseyn init: Switching to runlevel: 0 syslog.6-Jul 1 00:04:10 gosseyn kernel: Critical temperature reached (41 C), shutting down. Steps to reproduce: Random. This bug can come when nothing is done or under heavy load, when working console only or with 3D application.
Created attachment 12401 [details] lsmod output
Created attachment 12402 [details] cpuid output
Created attachment 12403 [details] sensors output
Created attachment 12404 [details] sensors.conf file
Is this bug still present in 2.6.22.3? Can you reproduce this bug with CONFIG_HWMON=n? Can you reproduce this bug with CONFIG_I2C_I801=n?
Yes, it's still present in 2.6.22.3 I compile a new kernel with(out) these options. I would give you a result within a wekk.
(In reply to comment #6) > Yes, it's still present in 2.6.22.3 > > I compile a new kernel with(out) these options. I would give you a result > within a week. A new reboot this morning because of critical temperature reached (879 C). Where can I set a trace for this ?
So you confirm that the problem still happens with CONFIG_HWMON=n and CONFIG_I2C_I801=n?
Yes. Do you want I set CONFIG_ACPI_DEBUG or something else in kernel compilation ?
Then it's an ACPI bug.
This did NOT happen in 2.6.18, and it started happening in 2.6.19 and continues to happen in 2.6.23-rc3? Please build 2.6.23-rc3 or later with CONFIG_HWMON=n and set thermal.nocrt=1 to disable critical trip point actions. Please attach the complete output from dmesg -s64000 please include the output from more /proc/acpi/thermal_zone/*/* | cat please read /proc/acpi/thermal_zone/*/temperature continuously and see if you can observe it jump to erroneous values.
After a new recompilation again with CONFIG_HWMON=n the bug doesn't occur anymore since 10 days.
comment #12 seems to contradict comment #9 -- can you clarify?
#9 is obsolete.
(In reply to comment #12) > After a new recompilation again with CONFIG_HWMON=n the bug doesn't occur > anymore since 10 days. What was the difference between both kernels then?
It's like that the .config I used for the first compilation wasn't the good one. The only other things I modified with this compilation is this: -CONFIG_PM_LEGACY=y +# CONFIG_PM_LEGACY is not set -# CONFIG_ACPI_DEBUG is not set +CONFIG_ACPI_DEBUG=y One other clue: with standard debian kernel the bug don't occur if i801 module and hwmon module aren't loaded in /etc/modules.
(In reply to comment #16) > One other clue: with standard debian kernel the bug don't occur if i801 > module > and hwmon module aren't loaded in /etc/modules. Interesting. What version is this standard debian kernel? Please try blacklisting only i2c-i801, and then only hwmon, in /etc/modules, to figure out which driver is conflicting exactly. BTW, your lsmod output in comment #1 suggests that hwmon is built into the kernel and not as a module, so blacklisting it won't work, you'd need to blacklist the coretemp driver itself instead. Please clarify what you blacklisted exactly.
Today I put 2.6.22-4. hwmon is build into; hwmon-vid, coretemp and i2c-i801 are build as modules. First, I will try to reproduce the bug with hwmon-vid only, then with coretemp + i2c-i801. These two last were proposed by sensors-detect for this chipset: SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
hwmon-vid is a helper module, don't bother testing it, it won't change a thing. BTW, why are you loading it at all, given that none of the other modules you use need it? Only the i2c-i801 driver is for the Intel ICH7 SMBus. The coretemp driver reports the CPU temperature directly. They do _not_ depend on each other, so please test them separately. The whole point of the test is to find out which of these two drivers is causing trouble.
2.6.22-4 Debian kernel seems to be 2.6.22-2 vanilla version. I cannot reproduce the bug with this version from weeks. You may can close it.
mark as fixed then.