Bug 11309

Summary: THRM temperature reported as 3428C
Product: ACPI Reporter: James Ettle (james)
Component: ECAssignee: Alexey Starikovskiy (astarikovskiy)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, bunk, james
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26.2-2.fc8 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: customized DSDT
try the debug patch
patch 1/4: Don't issue the burst disable command if EC exits the burst mode
Patch 2/4: Clear the query_pending bit only after processing EC notification event
Patch 3/4: Switch to polling mode when there is no EC GPE interrupt for some EC transactions
patch 4/4: Add some delay in EC GPE handler to avoid EC GPE storm
fast transaction
patch vs 2.6.27-rc7

Description James Ettle 2008-08-12 06:24:53 UTC
Latest working kernel version: 2.6.24.7-92.fc8
Earliest failing kernel version: 2.6.26.2-2.fc8
Distribution: Fedora 8
Hardware Environment: Clevo M720R notebook with Intel T8100 processor
Software Environment: i686
Problem Description:

The contents of /proc/acpi/thermal_zone/THRM/temperature reported a temperature of 3428C. This seemed to happen shortly after the hardware active cooling trip-point was reached and the fans activated around 53C in THRM). The machine continued to operate without any problems, despite the erroneous ACPI temperature report. coretemp reported:

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +42°C  (high =  +100°C)                   

coretemp-isa-0001
Adapter: ISA adapter
Core 1:      +42°C  (high =  +100°C)      

The following was in dmesg:
ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state changed
Please send acpidump to linux-acpi@vger.kernel.org
 [20080321]
ACPI: EC: acpi_ec_wait timeout, status = 0x09, event = "b0=1"
ACPI: EC: read timeout, command = 130

For this machine's acpidump, see attachment 17030 [details] for bug 11170.
Comment 1 Zhang Rui 2008-08-12 18:31:30 UTC
> The following was in dmesg:
> ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state
> changed
> Please send acpidump to linux-acpi@vger.kernel.org
>  [20080321]

we can also see this in 2.6.24 kernel.

> ACPI: EC: acpi_ec_wait timeout, status = 0x09, event = "b0=1"
> ACPI: EC: read timeout, command = 130

this is new in 2.6.26.
Could you please verify that if the temperature is right until this message?
please attach the full dmesg output of 2.6.26.
Comment 2 Zhang Rui 2008-08-12 18:34:01 UTC
Created attachment 17200 [details]
customized DSDT

Please try to override the DSDT with the one I attached.
Set CONFIG_ACPI_DEBUG and recompile the kernel.
reboot with "acpi.debug_level=0x0f" and attach the dmesg out after the temperature becomes wrong.
Comment 3 ykzhao 2008-08-13 00:25:29 UTC
Created attachment 17202 [details]
try the debug patch

Will you please try this debug patch ?
After the system is booted, please cat /proc/acpi/thermal_zone/THRM/temperature and attach the output of dmesg.
Comment 4 Zhang Rui 2008-09-01 23:11:57 UTC
James, any updates?
Comment 5 James Ettle 2008-09-02 00:36:27 UTC
Sorry I've not updated you on this. I've not seen it happen on any more recent kernels. Shall I close it unless it surfaces again?
Comment 6 Zhang Rui 2008-09-02 00:43:25 UTC
okay. seems that the bug is already fixed in the latest kernel.
Comment 7 James Ettle 2008-09-16 09:09:03 UTC
This seems like a rare thing, and it just came back to bite me. Seen in kernel-PAE-2.6.26.5-22.fc8: a bogus reading of 3428C, this time it shut down the machine.

Sep 16 12:34:38 rhapsody kernel: ACPI: Critical trip point
Sep 16 12:34:38 rhapsody kernel: Critical temperature reached (3428 C), shutting down.

This happened not long after I resumed from suspend-to-RAM. If I find the time, I'll try the debug patch above.

I also see

   ACPI: EC: GPE storm detected, disabling EC GPE

which wasn't present in the 2.6.24 series, plus the usual trip-point message.
Comment 8 ykzhao 2008-09-16 18:11:10 UTC
Hi, James
   Thanks for your info.
   From the description in comment #7 it seems that this issue is related with EC. In the AML code the temperature is obtained by reading the EC internal register.And on your laptop there exists the EC GPE storm.
   Will you please try the attached four patches on the latest kernel(2.6.27-rc6) and see whether the problem still exists?
Comment 9 ykzhao 2008-09-16 18:12:29 UTC
Created attachment 17820 [details]
patch 1/4: Don't issue the burst disable command if EC exits the burst mode
Comment 10 ykzhao 2008-09-16 18:13:33 UTC
Created attachment 17821 [details]
Patch 2/4: Clear the query_pending bit only after processing EC notification event
Comment 11 ykzhao 2008-09-16 18:16:00 UTC
Created attachment 17822 [details]
Patch 3/4: Switch to polling mode when there is no EC GPE interrupt for some EC transactions

If there is no EC GPE confirmation for some EC transactions, it will be switched to polling mode. And when EC internal register is accessed, it will work in polling mode. But the EC GPE is still enabled.
Comment 12 ykzhao 2008-09-16 18:17:09 UTC
Created attachment 17823 [details]
patch 4/4: Add some delay in EC GPE handler to avoid EC GPE storm
Comment 13 ykzhao 2008-09-16 18:23:08 UTC
Hi, James
    Will you please try the attached patch set on the latest kernel(2.6.27-rc6) and see whether the problem still exists?
    Please add the boot option of "acpi.debug_layer=0x04010000 acpi.debug_level=0x17" and attach the output of dmesg after test.
    Thanks.
Comment 14 Alexey Starikovskiy 2008-09-16 23:45:25 UTC
Created attachment 17829 [details]
fast transaction

Hi,
please check if this patch works for you? it is supposed to be a better solution to storm problem, but your case may differ.
Comment 15 ykzhao 2008-09-19 01:02:05 UTC
Hi, James
    Do you have an opportunity to do the test as mentioned in comment #13?
    Thanks.
Comment 16 James Ettle 2008-09-19 02:42:52 UTC
yzhao, I managed to build the kernel with the patch applied, but couldn't get it to boot (it didn't find the root logical volume for some reason, stopped at switchroot with "Booting has failed"). I don't know what I've done wrong yet, I'll try the patch later on a Fedora development kernel and see if that works.
Comment 17 ykzhao 2008-09-20 07:13:07 UTC
Hi, James
    Maybe you should use the same .config file with the 2.6.26.2 Fedoral kernel.
    thanks.
Comment 18 Len Brown 2008-09-25 12:17:26 UTC
Created attachment 18046 [details]
patch vs 2.6.27-rc7

This version of Alexey's fast transaction patch
has been checked into the acpi-test tree.
Please let us know if you have any troubles with it.

thanks,
-Len
Comment 19 Len Brown 2008-10-24 23:24:43 UTC
shipped in linux-2.6.28-rc1
closed

commit 7c6db4e050601f359081fde418ca6dc4fc2d0011
Author: Alexey Starikovskiy <astarikovskiy@suse.de>
Date:   Thu Sep 25 21:00:31 2008 +0400

    ACPI: EC: do transaction from interrupt context
Comment 20 James Ettle 2008-11-07 12:01:45 UTC
Sorry I've not been able to provide further info over the past few weeks --- I'll try giving a 2.6.28-series kernel a go and check that this problem has been fixed.