Bug 13259 - BIOS _TPC always forces CPU to T3 - Clevo M720R w/ Intel T8100 CPU
BIOS _TPC always forces CPU to T3 - Clevo M720R w/ Intel T8100 CPU
Status: CLOSED CODE_FIX
Product: ACPI
Classification: Unclassified
Component: BIOS
All Linux
: P1 normal
Assigned To: acpi_bios
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-06 19:58 UTC by James
Modified: 2009-05-31 20:37 UTC (History)
7 users (show)

See Also:
Kernel Version:
Tree: Fedora
Regression: No


Attachments
Clevo M720R acpidump (139.30 KB, text/plain)
2009-05-06 19:58 UTC, James
Details
/proc/acpi info from cold (4.04 KB, text/plain)
2009-05-06 19:58 UTC, James
Details
/proc/acpi info following suspend/resume (4.04 KB, text/plain)
2009-05-06 19:59 UTC, James
Details
A dmesg log (56.71 KB, text/plain)
2009-05-06 19:59 UTC, James
Details
patch: ignore _TPC (1.54 KB, patch)
2009-05-07 06:39 UTC, Zhang Rui
Details | Diff
patch: reset t-state once it's invalid (886 bytes, patch)
2009-05-08 02:17 UTC, Zhang Rui
Details | Diff
remove the superfluous warning messages (958 bytes, patch)
2009-05-26 02:03 UTC, Zhang Rui
Details | Diff

Description James 2009-05-06 19:58:15 UTC
Created attachment 21249 [details]
Clevo M720R acpidump

My Clevo M720R notebook has an Intel Core 2 Duo T8100 processor. Upon boot, Core 1 shows throttling states T0--T7 available and operates in T0, with full performance. Core 0 on the other hand has only T3--T7 and is stuck in T3, its performance suffering noticeably. There is no thermal reason for this, the CPU is well within its safe temperature zone. If I booting with acpi=off, this problem does not manifest.

After a suspend/resume cycle, both cores end up in T8 (outside the reported available range), but both have full performance. Another difference after suspend/resume is that CPU0 goes from SMI thermal monitoring to TM2 (Core 1 is TM2 throughout).

This is with kernel-2.6.29.2-129.fc11.x86_64, but I believe I have seen this as far back as roughly 2.6.24, IIRC (the first kernel I used with this notebook).

Attached are the acpidump output, /proc/acpi info before and after suspend/resume, and a dmesg log for a session with a suspend/resume cycle.
Comment 1 James 2009-05-06 19:58:42 UTC
Created attachment 21250 [details]
/proc/acpi info from cold
Comment 2 James 2009-05-06 19:59:01 UTC
Created attachment 21251 [details]
/proc/acpi info following suspend/resume
Comment 3 James 2009-05-06 19:59:18 UTC
Created attachment 21252 [details]
A dmesg log
Comment 4 ykzhao 2009-05-07 01:51:43 UTC
will you please attach the output of acpidump?
    thanks.
Comment 5 ykzhao 2009-05-07 01:52:32 UTC
Sorry that the acpidump is already attached.
Please ignore the comment #4.
thanks.
Comment 6 ykzhao 2009-05-07 01:58:07 UTC
Will you please attach the output of "cat /proc/acpi/thermal_zone/*/*"?
    Thanks.
Comment 7 ykzhao 2009-05-07 03:13:31 UTC
Hi, James
    From the acpidump it seems that this issue is related with the BIOS. There exists the _TPC object for CPU.And the _TPC object is used to define the processor throttling limit. The throttling limit will be used when updating the T-state.(In the boot phase the T state will be changed according to the current T-state).
   > CPU0. Name (_TPC, 0x03) . It means that the active T-state must be equal to or greater than T3.
   
   So the info in comment #1 reflects the correct T-state.

   After the suspend/resume cycle, maybe the CPU T-state is configured in BIOS. And only the throttling state is obtained when /proc/acpi/processor/*/throttling is invoked. But it won't udpate the throttling state. In such case the throttling limit is not used.
   From the info in comment #2 we know that T8 means the incorrect T-state. It is already beyond the T-state limit.
   
   The CPU will still use T3 when we try to change the T-state for CPU0. (echo T0 > /proc/acpi/processor/CPU0/throttling). When the T-state is changed, the T-state limit will be considered. In such case the active T-state will be equal to or greater than T3.
   
    So IMO this is a BIOS bug. And It had better be update by upgrading BIOS.
    Thanks.
Comment 8 Zhang Rui 2009-05-07 05:54:38 UTC
I don't understand why _TPC is implemented for CPU0 only,
and it has a hard-coded value 0x03.

All these suggest that you are using a BIOS that is not well implemented.
And I agree with Yakui that this is a BIOS bug.

But anyway, I think the idea of enabling/disabling throttling control in processor driver, either by kernel config options or module parameters.

I'll attach a patch later.
Comment 9 Zhang Rui 2009-05-07 06:39:49 UTC
Created attachment 21253 [details]
patch: ignore _TPC

please apply this patch and
1. build in ACPI processor driver and boot with processor.ignore_tpc=1
or
2. build ACPI processor driver as a module and load it with parameter ignore_tpc=1.

please verify if this patch helps.
Comment 10 James 2009-05-07 09:11:09 UTC
(In reply to comment #6)
> Will you please attach the output of "cat /proc/acpi/thermal_zone/*/*"?
>     Thanks.

$ cat /proc/acpi/thermal_zone/*/*
0 - Active; 1 - Passive
<polling disabled>
state:                   ok
temperature:             28 C
critical (S5):           155 C

(This is about 2 minutes after switching on, having been left off overnight...)

I'll try the patch as soon as possible and report back. My notebook's original supplier has not mentioned any BIOS updates, although there are some around for the same chassis; none of the changelogs mention anything about throttling bugfixes. I'll send some e-mails and see what I can get.
Comment 11 James 2009-05-07 22:45:41 UTC
The patch worked as intended. Straight after booting:

[james@rhapsody ~]$ cat /proc/acpi/processor/CPU0/throttling 
state count:             8
active state:            T0
state available: T0 to T7
states:
   *T0:                  100%
    T1:                  88%
    T2:                  75%
    T3:                  63%
    T4:                  50%
    T5:                  38%
    T6:                  25%
    T7:                  13%

I'm not sure I understand what was meant in Comment #7 about the T8 state after suspend/resume, though. It still ends up in T8 after an S3 cycle:

[james@rhapsody ~]$ cat /proc/acpi/processor/CPU0/throttling 
state count:             8
active state:            T8
state available: T0 to T7
states:
    T0:                  100%
    T1:                  88%
    T2:                  75%
    T3:                  63%
    T4:                  50%
    T5:                  38%
    T6:                  25%
    T7:                  13%
Comment 12 ykzhao 2009-05-08 01:44:44 UTC
Hi, James
    Thanks for the test.
    After the boot option of "processor.ignore_tpc" is added, the T-state limit will be ignored. In such case the CPU0 won't be set to T3 state in the boot phase.

    For the suspend/resume issue: 
    Maybe I don't explain it very clear in comment #7. The command of "cat /proc/acpi/processor/CPU0/throttling" will re-obtained the current T-state. And this is realized by the following flowchart on this box:
    a. Read the T-state MSR status register
    b. If the status value can be found in the _TSS package, the correct T-state is obtained. Otherwise the invalid T-state is returned.
    
    Unfortunately the T-state status is not found in the _TSS package after suspend/resume. And it will report that it is beyond the T-state. 

    thanks.
Comment 13 Zhang Rui 2009-05-08 02:17:45 UTC
Created attachment 21270 [details]
patch: reset t-state once it's invalid

please apply this patch on top of the previous one and see if it helps.
Comment 14 James 2009-05-08 13:14:40 UTC
(In reply to comment #13)
> Created an attachment (id=21270) [details]
> patch: reset t-state once it's invalid
> 
> please apply this patch on top of the previous one and see if it helps.

It doesn't appear to have made any difference, the processor is still in T8 after resume. I'm sure it applied correctly... I see no "Invalid throttling state, reset" messages.
Comment 15 James 2009-05-08 13:34:54 UTC
What if instead of "if (state == -1)", the test were

    if ((state < pr->throttling_platform_limit) ||
        (state >= pr->throttling.state_count))

?
Comment 16 ykzhao 2009-05-08 14:06:57 UTC
Hi, James
    The patch in comment #13 is based on the latest upstream kernel.
    Will you please apply the patch on the latest upstream kernel(2.6.30-rc4) and see whether the issue still exists?
    If you use the 2.6.29.xx kernel, what you have done in comment #15 is also OK.
    Thanks.
Comment 17 James 2009-05-08 20:39:29 UTC
(In reply to comment #16)
> Hi, James
>     The patch in comment #13 is based on the latest upstream kernel.
>     Will you please apply the patch on the latest upstream kernel(2.6.30-rc4)
> and see whether the issue still exists?

It worked on that kernel, T0 after resume.

>     If you use the 2.6.29.xx kernel, what you have done in comment #15 is also
> OK.
>     Thanks.

I'll give this a go since 2.6.30 is unstable in other places for me at the moment.
Comment 18 James 2009-05-09 08:58:02 UTC
Also, the modification from Comment #15 works OK on 2.6.29.
Comment 19 Zhang Rui 2009-05-11 01:31:31 UTC
Hah, I see.
that's because I made the patch based on 2.6.30-rc
i.e. after this patch is merged.

commit 53af9cfb37af5e03ee2b24c5d5c4963c34e5b765
Author: Len Brown <lenb@kernel.org>

diff --git a/drivers/acpi/processor_throttling.c b/drivers/acpi/processor_throttling.c
index d278381..5f09901 100644
--- a/drivers/acpi/processor_throttling.c
+++ b/drivers/acpi/processor_throttling.c
@@ -783,11 +783,9 @@ static int acpi_get_throttling_state(struct acpi_processor *pr,
                    (struct acpi_processor_tx_tss *)&(pr->throttling.
                                                      states_tss[i]);
                if (tx->control == value)
-                       break;
+                       return i;
        }
-       if (i > pr->throttling.state_count)
-               i = -1;
-       return i;
+       return -1;
 }

so my patch also works if you apply it on top of the latest git kernel, right?

patches are available.
Comment 20 Zhang Rui 2009-05-11 03:21:31 UTC
(In reply to comment #19)
> 
> patches are available.

which is also known as
http://patchwork.kernel.org/patch/22833/
http://patchwork.kernel.org/patch/22834/
Comment 21 Len Brown 2009-05-16 03:07:22 UTC
patches in comment #20 applied to acpi tree
Comment 22 Len Brown 2009-05-18 16:23:45 UTC
workaround for this _TPC BIOS bug shipped in Linux-2.6.30-rc6-git1
Comment 23 Frans Pop 2009-05-21 20:39:24 UTC
Mostly just for your information.

I have a HP 2510p notebook and am now getting this new warning after upgrading to 2.6.30-rc6 (x86_64). From dmesg:
<snip>
ACPI: AC Adapter [C23B] (on-line)
input: Power Button as /class/input/input4
ACPI: Power Button [PWRF]
input: Sleep Button as /class/input/input5
ACPI: Sleep Button [C2BF]
input: Lid Switch as /class/input/input6
ACPI: Lid Switch [C155]
ACPI: SSDT 000000007e7dbd42 0027F (v01 HP      Cpu0Ist 00003000 INTL 20060317)
ACPI: SSDT 000000007e7dc046 005FA (v01 HP      Cpu0Cst 00003001 INTL 20060317)
ACPI Warning (processor_throttling-0843): Invalid throttling state, reset
 [20090320]
Monitor-Mwait will be used to enter C-1 state
Monitor-Mwait will be used to enter C-2 state
Marking TSC unstable due to TSC halts in idle
ACPI: CPU0 (power states: C1[C1] C2[C2])
processor ACPI_CPU:00: registered as cooling_device7
ACPI: Processor [CPU0] (supports 8 throttling states)
ACPI: SSDT 000000007e7dbc7a 000C8 (v01 HP      Cpu1Ist 00003000 INTL 20060317)
ACPI: SSDT 000000007e7dbfc1 00085 (v01 HP      Cpu1Cst 00003000 INTL 20060317)
ACPI Warning (processor_throttling-0843): Invalid throttling state, reset
 [20090320]
ACPI: CPU1 (power states: C1[C1] C2[C2])
processor ACPI_CPU:01: registered as cooling_device8
ACPI: Processor [CPU1] (supports 8 throttling states)
</snip>

Is this expected or cause for concern?
Any debugging I can to do check whether the warning is really valid or not?

Cheers,
FJP
Comment 24 Zhang Rui 2009-05-22 01:24:25 UTC
This warning message is printed out because that the t-state we get from hardware is not one of the known ones, in this case, we reset the t-state to t0.
So, this is a bug we should be aware of, but at the same time, we can handle it well in the processor driver.

> Any debugging I can to do check whether the warning is really valid or not?
I don't think it's worth doing. :)
Comment 25 Frans Pop 2009-05-22 08:35:40 UTC
The analysis in the bug report was that the cause is a buggy BIOS. As long as we're talking about one single (relatively obscure?) system that seems fine.

However, the problem now also shows up for a completely different system from a different manufacturer with probably a completely different BIOS. A system which has never shown any problems in this area, at least not that I have noticed.

My question is: are you still 100% sure that this *is* a BIOS or hardware problem, or is that new info reason to reconsider and examine if maybe after all there is an error somewhere in the kernel itself that results in these invalid readings?

The Clevo had a Phoenix BIOS. From dmidecode for my system:
BIOS Information
        Vendor: Hewlett-Packard
        Version: 68MSP Ver. F.0C
        Release Date: 06/18/2008

P.S. I've just sent a patch to remove the spurious newline from the warning message: the ACPI_CA_VERSION should not be printed on a separate line.
Comment 26 Maciej Rutecki 2009-05-25 17:48:33 UTC
(In reply to comment #25)
> The analysis in the bug report was that the cause is a buggy BIOS. As long as
> we're talking about one single (relatively obscure?) system that seems fine.
> 
> However, the problem now also shows up for a completely different system from a
> different manufacturer with probably a completely different BIOS. A system
> which has never shown any problems in this area, at least not that I have
> noticed.
> 
> My question is: are you still 100% sure that this *is* a BIOS or hardware
> problem, or is that new info reason to reconsider and examine if maybe after
> all there is an error somewhere in the kernel itself that results in these
> invalid readings?

I also have doubts. The message "ACPI Warning (processor_throttling-0843): Invalid throttling state, reset" I saw in 2.6.30-rc7, found in Google, that many people have the same. In 2.6.30-rc6 after s2ram I had:

maciek@gumis:~$ cat /proc/acpi/processor/*/throttling
state count:             8
active state:            T0
state available: T0 to T7
states:
   *T0:                  100%
    T1:                  88%
    T2:                  75%
    T3:                  63%
    T4:                  50%
    T5:                  38%
    T6:                  25%
    T7:                  13%
state count:             8
active state:            T-1
state available: T0 to T7
states:
    T0:                  100%
    T1:                  88%
    T2:                  75%
    T3:                  63%
    T4:                  50%
    T5:                  38%
    T6:                  25%
    T7:                  13%

Before I haven't any problems, never.

Some interesting things. In 2.6.30-rc7, always, when I try cat /proc/acpi/processor/*/throttling, I see in dmesg:

[ 1467.264416] ACPI Warning (processor_throttling-0843): Invalid throttling state, reset
[ 1467.264429]  [20090320]
[ 1467.264816] ACPI Warning (processor_throttling-0843): Invalid throttling state, reset
[ 1467.264827]  [20090320]
[ 1531.644406] ACPI Warning (processor_throttling-0843): Invalid throttling state, reset
[ 1531.644419]  [20090320]
[ 1531.646978] ACPI Warning (processor_throttling-0843): Invalid throttling state, reset
[ 1531.646988]  [20090320]

One warning for each processor, and each "cat...". It never happens before.
Comment 27 Frans Pop 2009-05-25 20:17:32 UTC
Confirmed. 'cat /proc/acpi/processor/*/throttling' triggers the warning for me too (twice: one for each core of my Intel Core Duo).
Comment 28 ykzhao 2009-05-26 00:28:43 UTC
Hi, Frans
    Thanks for the confirmation and the additional info.
    What Rui said in comment #24 is right. The message is complained because the obtained T-state is beyond the scope of available T-state. 
    It is harmless. Of course it is confusing.
    Maybe this should be handled directly by kernel and doesn't complain such info.
    thanks.
Comment 29 Zhang Rui 2009-05-26 02:03:06 UTC
Created attachment 21553 [details]
remove the superfluous warning messages

hmm, what about this incremental patch?
Comment 30 Frans Pop 2009-05-26 10:22:14 UTC
I'm still not happy with this. Here's some additional info.

If I boot the system with -rc5, /proc/acpi/processor/CPU*/throttling correctly shows the active state as T0 for both processors.
This means that my system does NOT have the same problem as originally reported in this BR, as that system really did show an invalid state (T8 with only T0-T7 supported).

If I boot with -rc6, I get the new warning *every time* I do
   cat /proc/acpi/processor/CPU*/throttling
despite the fact that you supposedly reset the value to a valid throttling state.

I then tried a manual change of the throttling state as follows:
   echo -n T4 >/proc/acpi/processor/CPU0/throttling
   echo -n T4 >/proc/acpi/processor/CPU1/throttling
   cat /proc/acpi/processor/CPU*/throttling
   echo -n T0 >/proc/acpi/processor/CPU0/throttling
   echo -n T0 >/proc/acpi/processor/CPU1/throttling
   cat /proc/acpi/processor/CPU*/throttling
Both 'cat' statements show the correct active state (T4 resp. T0) and after the changes to T4 the warning no longer triggers.

If I just 'echo T0' for both processors without first changing to T4, the warning will still be displayed. Apparently a "real" state change is needed to avoid the warning.
That also means that your "reset to T0" is not seen as a real state change (otherwise it should also prevent further warnings) which again seems to confirm that the initial state after boot is not incorrect.

I'm still convinced that there are two *different* issues here: the original bug where a machine actually reported an invalid state, and a separate issue that causes the warning to trigger even though the initial state after boot is correctly at T0.

I will see if I can add some debug printks to find out exactly what is going on here.
Comment 31 Frans Pop 2009-05-31 20:37:26 UTC
A few days ago I opened a new BR to track the possible regression discussed here: http://bugzilla.kernel.org/show_bug.cgi?id=13389.

There is at least a problem with the reset as no actual reset takes place.
Follow up to the new BR please.

Note You need to log in before you can comment on or make changes to this bug.