Bug 8893
Summary: | Critical temperature reached (5487 C) if CONFIG_HWMON=y | ||
---|---|---|---|
Product: | ACPI | Reporter: | Encolpe Degoute (encolpe) |
Component: | Power-Thermal | Assignee: | Len Brown (lenb) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | acpi-bugzilla, encolpe, jdelvare |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.19 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
lsmod output
cpuid output sensors output sensors.conf file |
Description
Encolpe Degoute
2007-08-16 06:27:45 UTC
Created attachment 12401 [details]
lsmod output
Created attachment 12402 [details]
cpuid output
Created attachment 12403 [details]
sensors output
Created attachment 12404 [details]
sensors.conf file
Is this bug still present in 2.6.22.3? Can you reproduce this bug with CONFIG_HWMON=n? Can you reproduce this bug with CONFIG_I2C_I801=n? Yes, it's still present in 2.6.22.3 I compile a new kernel with(out) these options. I would give you a result within a wekk. (In reply to comment #6) > Yes, it's still present in 2.6.22.3 > > I compile a new kernel with(out) these options. I would give you a result > within a week. A new reboot this morning because of critical temperature reached (879 C). Where can I set a trace for this ? So you confirm that the problem still happens with CONFIG_HWMON=n and CONFIG_I2C_I801=n? Yes. Do you want I set CONFIG_ACPI_DEBUG or something else in kernel compilation ? Then it's an ACPI bug. This did NOT happen in 2.6.18, and it started happening in 2.6.19 and continues to happen in 2.6.23-rc3? Please build 2.6.23-rc3 or later with CONFIG_HWMON=n and set thermal.nocrt=1 to disable critical trip point actions. Please attach the complete output from dmesg -s64000 please include the output from more /proc/acpi/thermal_zone/*/* | cat please read /proc/acpi/thermal_zone/*/temperature continuously and see if you can observe it jump to erroneous values. After a new recompilation again with CONFIG_HWMON=n the bug doesn't occur anymore since 10 days. comment #12 seems to contradict comment #9 -- can you clarify? #9 is obsolete. (In reply to comment #12) > After a new recompilation again with CONFIG_HWMON=n the bug doesn't occur > anymore since 10 days. What was the difference between both kernels then? It's like that the .config I used for the first compilation wasn't the good one. The only other things I modified with this compilation is this: -CONFIG_PM_LEGACY=y +# CONFIG_PM_LEGACY is not set -# CONFIG_ACPI_DEBUG is not set +CONFIG_ACPI_DEBUG=y One other clue: with standard debian kernel the bug don't occur if i801 module and hwmon module aren't loaded in /etc/modules. (In reply to comment #16) > One other clue: with standard debian kernel the bug don't occur if i801 > module > and hwmon module aren't loaded in /etc/modules. Interesting. What version is this standard debian kernel? Please try blacklisting only i2c-i801, and then only hwmon, in /etc/modules, to figure out which driver is conflicting exactly. BTW, your lsmod output in comment #1 suggests that hwmon is built into the kernel and not as a module, so blacklisting it won't work, you'd need to blacklist the coretemp driver itself instead. Please clarify what you blacklisted exactly. Today I put 2.6.22-4. hwmon is build into; hwmon-vid, coretemp and i2c-i801 are build as modules. First, I will try to reproduce the bug with hwmon-vid only, then with coretemp + i2c-i801. These two last were proposed by sensors-detect for this chipset: SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) hwmon-vid is a helper module, don't bother testing it, it won't change a thing. BTW, why are you loading it at all, given that none of the other modules you use need it? Only the i2c-i801 driver is for the Intel ICH7 SMBus. The coretemp driver reports the CPU temperature directly. They do _not_ depend on each other, so please test them separately. The whole point of the test is to find out which of these two drivers is causing trouble. 2.6.22-4 Debian kernel seems to be 2.6.22-2 vanilla version. I cannot reproduce the bug with this version from weeks. You may can close it. mark as fixed then. |