Bug 11309 - THRM temperature reported as 3428C
Summary: THRM temperature reported as 3428C
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Alexey Starikovskiy
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-12 06:24 UTC by James Ettle
Modified: 2008-11-07 12:01 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.26.2-2.fc8
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
customized DSDT (222.17 KB, application/octet-stream)
2008-08-12 18:34 UTC, Zhang Rui
Details
try the debug patch (2.11 KB, patch)
2008-08-13 00:25 UTC, ykzhao
Details | Diff
patch 1/4: Don't issue the burst disable command if EC exits the burst mode (1.39 KB, patch)
2008-09-16 18:12 UTC, ykzhao
Details | Diff
Patch 2/4: Clear the query_pending bit only after processing EC notification event (2.21 KB, patch)
2008-09-16 18:13 UTC, ykzhao
Details | Diff
Patch 3/4: Switch to polling mode when there is no EC GPE interrupt for some EC transactions (4.38 KB, patch)
2008-09-16 18:16 UTC, ykzhao
Details | Diff
patch 4/4: Add some delay in EC GPE handler to avoid EC GPE storm (5.23 KB, patch)
2008-09-16 18:17 UTC, ykzhao
Details | Diff
fast transaction (14.94 KB, patch)
2008-09-16 23:45 UTC, Alexey Starikovskiy
Details | Diff
patch vs 2.6.27-rc7 (16.15 KB, application/octet-stream)
2008-09-25 12:17 UTC, Len Brown
Details

Description James Ettle 2008-08-12 06:24:53 UTC
Latest working kernel version: 2.6.24.7-92.fc8
Earliest failing kernel version: 2.6.26.2-2.fc8
Distribution: Fedora 8
Hardware Environment: Clevo M720R notebook with Intel T8100 processor
Software Environment: i686
Problem Description:

The contents of /proc/acpi/thermal_zone/THRM/temperature reported a temperature of 3428C. This seemed to happen shortly after the hardware active cooling trip-point was reached and the fans activated around 53C in THRM). The machine continued to operate without any problems, despite the erroneous ACPI temperature report. coretemp reported:

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +42°C  (high =  +100°C)                   

coretemp-isa-0001
Adapter: ISA adapter
Core 1:      +42°C  (high =  +100°C)      

The following was in dmesg:
ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state changed
Please send acpidump to linux-acpi@vger.kernel.org
 [20080321]
ACPI: EC: acpi_ec_wait timeout, status = 0x09, event = "b0=1"
ACPI: EC: read timeout, command = 130

For this machine's acpidump, see attachment 17030 [details] for bug 11170.
Comment 1 Zhang Rui 2008-08-12 18:31:30 UTC
> The following was in dmesg:
> ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state
> changed
> Please send acpidump to linux-acpi@vger.kernel.org
>  [20080321]

we can also see this in 2.6.24 kernel.

> ACPI: EC: acpi_ec_wait timeout, status = 0x09, event = "b0=1"
> ACPI: EC: read timeout, command = 130

this is new in 2.6.26.
Could you please verify that if the temperature is right until this message?
please attach the full dmesg output of 2.6.26.
Comment 2 Zhang Rui 2008-08-12 18:34:01 UTC
Created attachment 17200 [details]
customized DSDT

Please try to override the DSDT with the one I attached.
Set CONFIG_ACPI_DEBUG and recompile the kernel.
reboot with "acpi.debug_level=0x0f" and attach the dmesg out after the temperature becomes wrong.
Comment 3 ykzhao 2008-08-13 00:25:29 UTC
Created attachment 17202 [details]
try the debug patch

Will you please try this debug patch ?
After the system is booted, please cat /proc/acpi/thermal_zone/THRM/temperature and attach the output of dmesg.
Comment 4 Zhang Rui 2008-09-01 23:11:57 UTC
James, any updates?
Comment 5 James Ettle 2008-09-02 00:36:27 UTC
Sorry I've not updated you on this. I've not seen it happen on any more recent kernels. Shall I close it unless it surfaces again?
Comment 6 Zhang Rui 2008-09-02 00:43:25 UTC
okay. seems that the bug is already fixed in the latest kernel.
Comment 7 James Ettle 2008-09-16 09:09:03 UTC
This seems like a rare thing, and it just came back to bite me. Seen in kernel-PAE-2.6.26.5-22.fc8: a bogus reading of 3428C, this time it shut down the machine.

Sep 16 12:34:38 rhapsody kernel: ACPI: Critical trip point
Sep 16 12:34:38 rhapsody kernel: Critical temperature reached (3428 C), shutting down.

This happened not long after I resumed from suspend-to-RAM. If I find the time, I'll try the debug patch above.

I also see

   ACPI: EC: GPE storm detected, disabling EC GPE

which wasn't present in the 2.6.24 series, plus the usual trip-point message.
Comment 8 ykzhao 2008-09-16 18:11:10 UTC
Hi, James
   Thanks for your info.
   From the description in comment #7 it seems that this issue is related with EC. In the AML code the temperature is obtained by reading the EC internal register.And on your laptop there exists the EC GPE storm.
   Will you please try the attached four patches on the latest kernel(2.6.27-rc6) and see whether the problem still exists?
Comment 9 ykzhao 2008-09-16 18:12:29 UTC
Created attachment 17820 [details]
patch 1/4: Don't issue the burst disable command if EC exits the burst mode
Comment 10 ykzhao 2008-09-16 18:13:33 UTC
Created attachment 17821 [details]
Patch 2/4: Clear the query_pending bit only after processing EC notification event
Comment 11 ykzhao 2008-09-16 18:16:00 UTC
Created attachment 17822 [details]
Patch 3/4: Switch to polling mode when there is no EC GPE interrupt for some EC transactions

If there is no EC GPE confirmation for some EC transactions, it will be switched to polling mode. And when EC internal register is accessed, it will work in polling mode. But the EC GPE is still enabled.
Comment 12 ykzhao 2008-09-16 18:17:09 UTC
Created attachment 17823 [details]
patch 4/4: Add some delay in EC GPE handler to avoid EC GPE storm
Comment 13 ykzhao 2008-09-16 18:23:08 UTC
Hi, James
    Will you please try the attached patch set on the latest kernel(2.6.27-rc6) and see whether the problem still exists?
    Please add the boot option of "acpi.debug_layer=0x04010000 acpi.debug_level=0x17" and attach the output of dmesg after test.
    Thanks.
Comment 14 Alexey Starikovskiy 2008-09-16 23:45:25 UTC
Created attachment 17829 [details]
fast transaction

Hi,
please check if this patch works for you? it is supposed to be a better solution to storm problem, but your case may differ.
Comment 15 ykzhao 2008-09-19 01:02:05 UTC
Hi, James
    Do you have an opportunity to do the test as mentioned in comment #13?
    Thanks.
Comment 16 James Ettle 2008-09-19 02:42:52 UTC
yzhao, I managed to build the kernel with the patch applied, but couldn't get it to boot (it didn't find the root logical volume for some reason, stopped at switchroot with "Booting has failed"). I don't know what I've done wrong yet, I'll try the patch later on a Fedora development kernel and see if that works.
Comment 17 ykzhao 2008-09-20 07:13:07 UTC
Hi, James
    Maybe you should use the same .config file with the 2.6.26.2 Fedoral kernel.
    thanks.
Comment 18 Len Brown 2008-09-25 12:17:26 UTC
Created attachment 18046 [details]
patch vs 2.6.27-rc7

This version of Alexey's fast transaction patch
has been checked into the acpi-test tree.
Please let us know if you have any troubles with it.

thanks,
-Len
Comment 19 Len Brown 2008-10-24 23:24:43 UTC
shipped in linux-2.6.28-rc1
closed

commit 7c6db4e050601f359081fde418ca6dc4fc2d0011
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Thu Sep 25 21:00:31 2008 +0400

    ACPI: EC: do transaction from interrupt context
Comment 20 James Ettle 2008-11-07 12:01:45 UTC
Sorry I've not been able to provide further info over the past few weeks --- I'll try giving a 2.6.28-series kernel a go and check that this problem has been fixed.

Note You need to log in before you can comment on or make changes to this bug.