Bug 10970 - thermal ACPI has no reference package element - Abit AX78 (AM2+ AMD 770 / SB600)
Summary: thermal ACPI has no reference package element - Abit AX78 (AM2+ AMD 770 / SB...
Status: REJECTED WILL_NOT_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Thermal (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: ykzhao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-23 12:55 UTC by Jimmy.Jazz
Modified: 2008-07-14 06:51 UTC (History)
1 user (show)

See Also:
Kernel Version: stable branch 2.6.26-rc6
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
acpi dump file (23.20 KB, application/x-bzip)
2008-06-23 12:57 UTC, Jimmy.Jazz
Details
[acpidump] bios version 13 (137.70 KB, text/plain)
2008-07-07 05:29 UTC, Jimmy.Jazz
Details
DSDT corrected (12.47 KB, text/x-patch)
2008-07-14 06:51 UTC, Jimmy.Jazz
Details

Description Jimmy.Jazz 2008-06-23 12:55:11 UTC
Latest working kernel version: n/a
Earliest failing kernel version: n/a
Distribution: n/a
Hardware Environment: Abit AX78 (AM2+  AMD 770 / SB600) bios 12
Software Environment: n/a
Problem Description:

During the boot, when loading the thermal module, the kernel displays a warning:

thermal ACPI: Expecting a [Reference] package element, found type 0

followed by:

"ACPI Exception (thermal-0469): AE_ERROR, ACPI thermal trip point state changed"

The on-board post code displayer reports nothing and deactivating the FANEQ feature in the bios does alter the messages either.

FYI, the bios lets me activate XSDT table for 64bits OS, but this does not change anything.

Steps to reproduce:
Comment 1 Jimmy.Jazz 2008-06-23 12:57:56 UTC
Created attachment 16590 [details]
acpi dump file
Comment 2 Jimmy.Jazz 2008-06-23 13:30:32 UTC
The heat of the cpu (AM2+ 9750) is most of the time incontrollable. All 4 cores need to be set in ondemand mode to not exceed 54°C in idle mode. Compiling with gcc -j8 makes the cpu overheat to more than 80°C
Comment 3 Len Brown 2008-06-23 13:50:47 UTC
note that cpufreq in general, and ondemand, in particular,
are not responsible for thermal management.  For they
save power only in the partially idle case, and have
no effect on the fully utilized case.

please paste the results of
grep . /proc/acpi/thermal_zone/*/*
Comment 4 Jimmy.Jazz 2008-06-23 14:39:30 UTC
(In reply to comment #3)

> please paste the results of
> grep . /proc/acpi/thermal_zone/*/*


# grep . /proc/acpi/thermal_zone/*/*
grep: /proc/acpi/thermal_zone/*/*: Aucun fichier ou dossier de ce type

# modprobe thermal
# grep . /proc/acpi/thermal_zone/*/*
/proc/acpi/thermal_zone/THRM/cooling_mode:0 - Active; 1 - Passive
/proc/acpi/thermal_zone/THRM/polling_frequency:<polling disabled>
/proc/acpi/thermal_zone/THRM/state:state:                   ok
/proc/acpi/thermal_zone/THRM/temperature:temperature:             54 C
/proc/acpi/thermal_zone/THRM/trip_points:critical (S5):           90 C
/proc/acpi/thermal_zone/THRM/trip_points:active[0]:               80 C: devices= FAN
Comment 5 ykzhao 2008-06-29 19:24:04 UTC
From the acpidump there exists the following definition:
    Name (_PSL, Package (0x01) 
            {
                \_PR.CPU0  
            })
    
    The _PSL object returns the passive device list. But unfortuately the CPU0 object can't be found. So OS complains the following warning message:
   > thermal ACPI: Expecting a [Reference] package element, found type 0
   
   IMO it is a BIOS bug and had better be fixed by bios upgrading.
Comment 6 Jimmy.Jazz 2008-06-30 15:10:51 UTC
(In reply to comment #5)
> From the acpidump there exists the following definition:
>     Name (_PSL, Package (0x01) 
>             {
>                 \_PR.CPU0  
>             })
> 
>     The _PSL object returns the passive device list. But unfortuately the
>     CPU0
> object can't be found. So OS complains the following warning message:
>    > thermal ACPI: Expecting a [Reference] package element, found type 0
> 
>    IMO it is a BIOS bug and had better be fixed by bios upgrading.
> 
Thanks,

as soon as a new bios is released I will send to you a new acpidump file. It is at the time being only in beta release. we just have to wait a bit.
Comment 7 ykzhao 2008-07-03 00:50:10 UTC
As it is a BIOS bug, the bug will be rejected.
Thanks.
Comment 8 Jimmy.Jazz 2008-07-07 05:26:02 UTC
(In reply to comment #7)
> As it is a BIOS bug, the bug will be rejected.
> Thanks.
> 

Is there an easy way to make it work with a quirk or something like this as it is with some laptops or motherboards ?

FYI, I upgraded the bios to the latest version 13 and the cpu is better recognized and cooled (41°C in idle mode) but the acpi warning subsists.
I don't know if it is related, but acpi THM is changing when the processor is changing its state. 

If the _PSL object (CPU0) isn't found, it could certainly then be put in relation with the right entry (the CPU) since the motherboard has only one cpu set.

Perhaps "type 0" reference is a valid one when it is about phenom processor and then it should be implemented in the driver as well.

Could the XSDT table for 64bits operating system be of any help ?

Anyway, I dumped the acpi image again if you want to look at it.

I could try to contact Abit for the issue but I am convinced they won't take care because I will be unable to explain it accurately. I lack knowledge about how acpi works. 

Please could you be more precise and tell how it should work and how to correct it ?
Comment 9 Jimmy.Jazz 2008-07-07 05:29:20 UTC
Created attachment 16760 [details]
[acpidump] bios version 13
Comment 10 Zhang Rui 2008-07-07 19:30:47 UTC
    Scope (\_PR)
    {
        Processor (\_PR.C000, 0x00, 0x00004010, 0x06) {}
        Processor (\_PR.C001, 0x01, 0x00004010, 0x06) {}
        Processor (\_PR.C002, 0x02, 0x00004010, 0x06) {}
        Processor (\_PR.C003, 0x03, 0x00004010, 0x06) {}
    }
From this piece of AML code, we can see that four processors can be supported,
C000, C001, C002 and C003.

Name (_PSL, Package (0x01)
{
   \_PR.CPU0
})
_PSL method returns the device that can be used for passive cooling in this thermal zone.
Unfortunately, the bogus AML code returns a non-existed device, \_PR.CPU0.
Just a one line change in AML code can fix this problem.

Name (_PSL, Package (0x01)
{
-   \_PR.CPU0
+   \_PR.C000
})
You can fix it by overriding the DSDT.
How to? Please have a look at
http://www.lesswatts.org/projects/acpi/overridingDSDT.php
Comment 11 ykzhao 2008-07-07 23:27:10 UTC
   What Rui in comment #10 said is right. 
   The problem is related with the broken BIOS. It had better be fixed by upgrading BIOS. Of course it can also be fixed by the custom DSDT.
   Thanks.
Comment 12 Jimmy.Jazz 2008-07-14 06:47:16 UTC
Hello,

Following you recommendation, I modified DSDT.lst to get rid of the thermal warning and to get the processor better cooled and correct some warnings. I can now run a "make -j8" without fearing cpu overheatings.

Also, I read the ACPI Thermal Zone spec. The draft was quite old (2005) but helpful and still greek to me. Sorry, if what follows is totally wrong.

Abit FanEQ features help to select the temp source to regulate the SYS fan as well. You can select SYSTEM or CPU as source.
I guess that is the clue of the linux thermal warning. Linux acpi supposes that the _PSV has just a fixed value. 
It is not the case with the Abit AX78 motherboard. _PSV is shared and gives two different temperature values TP1H or TP2H according to the PLCY value Or(PLCY, PLCY, Local7). 

I although added _ACx and _ALx. I could have defined a better value for _AC0 and used a multiplier like the "FanEQ Temp Tolerance" Abit bios option. _AC0= _PSV * 1.05 for instance or use the "Temp Tolerance" directly. But it has not been defined in an ACPI variable and I am not used to the ASL syntax :(

I still believe linux thermal should be more flexible and accept that the _PSV value could be a variable and not consider the changing of the ACPI thermal trip point state like an error but rather as a common warning and re-evaluate via the AML code (Notify(thermal_zone, 0x81)).

Beside, I read (probably in an lm_sensors archive forum) that reading _RTMP is not always accurate and will lead to unexpected behaviour if the value is over the _CRT one. It is preferable to deactivate _CRT in thermal kernel module with the option nocrt. Otherwise, for no real reason the computer will stop. It happens to be me.

In case that could help someone, here is the dsdl.dsl.diff file i'm using now.

PS: I abandoned the idea to create a new "group" of components that will be managed with the others fans available. I was unable to create a working fan device for it :(.
Comment 13 Jimmy.Jazz 2008-07-14 06:51:12 UTC
Created attachment 16811 [details]
DSDT corrected

it could be definitly be improved ;)

Note You need to log in before you can comment on or make changes to this bug.