Bug 19262 - CPU0 locked at slower speed, regardless of governor on IP35 Pro(Intel P35-ICH9R)
Summary: CPU0 locked at slower speed, regardless of governor on IP35 Pro(Intel P35-ICH9R)
Status: CLOSED DOCUMENTED
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_bios
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-28 19:08 UTC by Jason Lynch
Modified: 2010-12-09 21:02 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.36-rc5-00226-g050026f
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
output of `cat /proc/cpuinfo` (2.94 KB, text/plain)
2010-09-28 19:08 UTC, Jason Lynch
Details
sysfs information (1018 bytes, text/plain)
2010-09-28 19:10 UTC, Jason Lynch
Details
dmesg with cpufreq.debug=7 (71.92 KB, text/plain)
2010-09-28 19:11 UTC, Jason Lynch
Details
acpidump (95.48 KB, text/plain)
2010-09-30 21:23 UTC, Jason Lynch
Details
dmidecode (10.57 KB, text/plain)
2010-09-30 21:24 UTC, Jason Lynch
Details

Description Jason Lynch 2010-09-28 19:08:46 UTC
Created attachment 31732 [details]
output of `cat /proc/cpuinfo`

Hardware: Intel Q6600 2.4Ghz (clocked at 2.8Ghz)
Software: Gentoo Linux ~amd64, running Linus's HEAD.

Problem description:
The first core appears to be locked at a slower speed. The CPU has two potential speeds: the maximum speed and and the maximum speed divided by 1.5. CPU2, 3, and 4 all operate correctly at 2.8Ghz, while CPU0 remains stuck at 1.87Ghz. The problem is observed at stock speeds. I don't remember when it first appeared, but I believe it was 2.6.34.
Comment 1 Jason Lynch 2010-09-28 19:10:22 UTC
Created attachment 31742 [details]
sysfs information
Comment 2 Jason Lynch 2010-09-28 19:11:26 UTC
Created attachment 31752 [details]
dmesg with cpufreq.debug=7
Comment 3 Thomas Renninger 2010-09-30 08:51:40 UTC
Looks like the passive temperature threshold is exceeded for CPU0.
Hm, typically there is one thermal sensor/device connected to all CPUs, but if this is related to the problem, CPU0 would have its own thermal device.
Can you provide:
for x in /proc/acpi/thermal_zone/*/*;do
       echo $x
       cat $x
       echo
done
Comment 4 Jason Lynch 2010-09-30 09:46:37 UTC
Here is the requested information:

/proc/acpi/thermal_zone/THRM/cooling_mode
0 - Active; 1 - Passive

/proc/acpi/thermal_zone/THRM/polling_frequency
<polling disabled>

/proc/acpi/thermal_zone/THRM/state
state:                   ok

/proc/acpi/thermal_zone/THRM/temperature
temperature:             20 C

/proc/acpi/thermal_zone/THRM/trip_points
critical (S5):           255 C
passive:                 0 C: tc1=4 tc2=3 tsp=60 devices=CPU0
Comment 5 Thomas Renninger 2010-09-30 12:35:45 UTC
... and this is the bug, probably BIOS related:

/proc/acpi/thermal_zone/THRM/trip_points
passive:                 0 C: tc1=4 tc2=3 tsp=60 devices=CPU0

Please also attach dmesg and acpidump output.

The param to workaround this issue is:
thermal.psv = 50
to e.g. let the CPU get throttled at 50 C.

There already is a blacklist in drivers/acpi/thermal.c:
static struct dmi_system_id thermal_dmi_table[] __initdata = {
...
        {
         .callback = thermal_psv,
         .ident = "AOpen i915GMm-HFS",
         .matches = {
                DMI_MATCH(DMI_BOARD_VENDOR, "AOpen"),
                DMI_MATCH(DMI_BOARD_NAME, "i915GMm-HFS"),
                },
        },

Also attach dmidecode, if we do not find a generic kernel workaround/fix the BIOS can get blacklisted to not use the passive trip point by default.

Reassigning to ACPI component -> this is an acpi problem.
Comment 6 Jason Lynch 2010-09-30 21:23:06 UTC
Created attachment 32122 [details]
acpidump

I had previously attached dmesg. If there's something I missed, or if I didn't do the acpidump correctly (just ran acpidump with no parameters), let me know.
Comment 7 Jason Lynch 2010-09-30 21:24:29 UTC
Created attachment 32132 [details]
dmidecode
Comment 8 Thomas Renninger 2010-10-01 15:12:07 UTC
Best is you go with the mentioned boot param workaround.
There is something rather fishy with the implementation of the thermal trip points. There are some iasl errors/warnings which point to thermal/temp:
DSDT.dsl  5685:         Store (GAHC (Arg0, Arg1), Local4)
Warning  1093 -                   ^ Called method may not always return a value

DSDT.dsl  5723:     Method (GAHC, 2, NotSerialized)
Warning  1088 -                ^ Not all control paths return a value (GAHC)

DSDT.dsl  5798:                         Store (GAHS (0x00), Local6)
Error    4061 -    Called method returns no value ^

The code there does not make much sense to me:
  - Passive cooling device only CPU0
  - Quite some storing of the same things:
GAHC():
        Store (0x01, DTAP)
        Stall (0x7F)
        Store (0x6C, DTAP)
        Stall (0x7F)
        Store (DTAP, Local5)
        Stall (0x7F)
        Store (DTAP, Local6)
        Stall (0x7F)
        Store (DTAP, Local7)
-> Ok, there is IO behind, still this cannot make much sense...

The temperature should also be 20C fixed, because of
    Method (GAHS, 1, NotSerialized)
    {
    }

_TMP():
    Store (GAHS (0x00), Local6)
    And (Local6, 0x01, Local6)
    If (LEqual (Local6, 0x01)) {..}
etc., etc....

The thermal ACPI stuff is really totally messed up on this system, best is to use thermal.psv=-1 (afaik) to disable passive cooling.

-> Closing resolved documented, not much that can be done from OS side.
Comment 9 Thomas Renninger 2010-10-01 15:15:52 UTC
Eh, you might want to write a tiny a patch to blacklist this machine to not use passive cooling automatically as described in comment #5 and send it to the acpi list. You can read out needed info from dmidecode.
If you do this, please let me know and the bug can be set fixed.
Comment 10 Zhang Rui 2010-10-19 00:54:57 UTC
As the _TMP returns fixed value, I suggest Jason to either update the BIOS to see if the situation is improved, or disable the ACPI thermal driver because it will never work with this BIOS.

Note You need to log in before you can comment on or make changes to this bug.