Bug 12203

Summary: ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state changed - Clevo M7X0SU
Product: ACPI Reporter: Lee Dowling (kernel)
Component: Power-ThermalAssignee: Zhang Rui (rui.zhang)
Status: CLOSED DUPLICATE    
Severity: normal CC: acpi-bugzilla
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27.7 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Dmesg and acpidump

Description Lee Dowling 2008-12-12 02:04:45 UTC
Just loaded up Linux 2.6.27.7 on a new laptop and got some ACPI errors
that apparently I should report.  Please find below the dmesg extracts and an acpidump.  My emails are being silently ignored by the relevant mailing list.

The laptop is a Clevo M7X0SU, currently sold in the UK by Novatech,  I
have tried 2.6.24.4 as well, which gets spurious ACPI errors thrown
about "Invalid passive threshold" all the time (and usually when the
fan is about to kick in) which stop in 2.6.27.7.

If any further info is required, please shout.

ACPI: Core revision 20080609
ACPI: EC: Look up EC in DSDT
ACPI: BIOS _OSI(Linux) query ignored
ACPI: DMI System Vendor: clevo                           
ACPI: DMI Product Name: M7X0SU                          
ACPI: DMI Product Version: Rev. A1                         
ACPI: DMI Board Name: M7X0SU                          
ACPI: DMI BIOS Vendor: Phoenix Technologies LTD
ACPI: DMI BIOS Date: 10/02/0892
ACPI: Please send DMI info above to linux-acpi@vger.kernel.org
ACPI: If "acpi_osi=Linux" works better, please notify linux-acpi@vger.kernel.org
ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state changed
Please send acpidump to linux-acpi@vger.kernel.org
 [20080609]
Comment 1 Lee Dowling 2008-12-12 02:05:31 UTC
Created attachment 19261 [details]
Dmesg and acpidump
Comment 2 Len Brown 2008-12-12 12:53:49 UTC
> ACPI: Please send DMI info above to linux-acpi@vger.kernel.org
> ACPI: If "acpi_osi=Linux" works better, please notify
> linux-acpi@vger.kernel.org

This can be ignored, unless acpi_osi=Linux makes the system
behave better in some way.  Note that in future kernels
we'll not be displaying the DMI messages.

FWIW, OSI(Linux) has no effect on this
       machine because of the way the AML is written:

    Name (LINX, 0x00)

                    If (_OSI ("Linux"))
                    {
                        Store (0x01, LINX)
                    }

and LINX is never referenced.

> My emails are being silently ignored by the relevant mailing list.

note that html messages are not forwarded by vger.kernel.org,
and messages over 100K are ignored too.  We shouldn't be asking
people to try to send acpidump to the list b/c many of them
are over 100K (this one is, at least when uncompressed).
attaching to a bugzilla report is better.

So what's left is this:

> ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state
> changed
> Please send acpidump to linux-acpi@vger.kernel.org

2.6.27.7: thermal.c:

        if ((flag & ACPI_TRIPS_PASSIVE) || (flag & ACPI_TRIPS_DEVICES)) {
                if (valid != tz->trips.passive.flags.valid)
 469:                           ACPI_THERMAL_TRIPS_EXCEPTION(flag, "state");
        }

please paste the output from
grep . /proc/acpi/thermal_zone/*/*

if you can reproduce this, it would be great to get it both before and after the message.
Comment 3 Lee Dowling 2008-12-12 14:41:42 UTC
I discovered the first of your points at random by hitting a patch in the kernel git that disables such DMI messages because of their futility - unfortunately, by then I'd already submitted this.  I did try compressed and smaller versions of the mailing list posting... guess it's just me.

Anyway, I couldn't get an accurate "before" of the thermal_zone contents because it seems to trigger quite early, but I got a consistent "after":

/proc/acpi/thermal_zone/THRM/cooling_mode:0 - Active; 1 - Passive
/proc/acpi/thermal_zone/THRM/polling_frequency:<polling disabled>
/proc/acpi/thermal_zone/THRM/state:state:                   ok
/proc/acpi/thermal_zone/THRM/temperature:temperature:             51 C
/proc/acpi/thermal_zone/THRM/trip_points:critical (S5):           155 C
/proc/acpi/thermal_zone/THRM/trip_points:passive:                 82 C: tc1=0 tc2=0 tsp=0 devices=

The only thing that changed between attempts at this was the actual temp but it never exceeded 60 (I'd know about it - it sits on my lap).  It does have a certain consistency in that the laptop's fans have to have kicked in before this error will appear.  I don't know what the 82 is supposed to represent but I assume that's the magic temp when things should happen - I very, very much doubt this laptop ever hits 82C at all because burn-in tests with me watching the temp don't see it go that hot even when stressing both cores and the GPU on mains power (most I got was about 72C).

TO do that, the fans went to what I assume was full speed which doesn't happen even on boot, and a clean boot from a cold start into Linux still gets me that error once the fans power up even in their lowest setting around the 50C mark.

If I boot quickly (before the fans kick in) or keep the laptop cooler for a while longer, I can probably get a "before".  I'll post if I manage it.
Comment 4 Zhang Rui 2008-12-14 17:48:42 UTC
please apply this patch and see if it helps.
http://marc.info/?l=linux-acpi&m=120522267418715&w=2
Comment 5 Lee Dowling 2008-12-15 01:05:58 UTC
Will give it a shot tonight hopefully.
Comment 6 Lee Dowling 2008-12-16 16:32:01 UTC
I can confirm that the linked patch appears to fix the problem without affecting ordinary fan operation (the fan kicks in at 60C and off again when it dips below that point).
Comment 7 Shaohua 2008-12-16 19:29:49 UTC
Mark this track as resolved, so Len can take the patch.
Comment 8 Len Brown 2009-01-16 11:14:49 UTC
the patch is attached to bug 8544

*** This bug has been marked as a duplicate of bug 8544 ***