Bug 7934 - Dual-core CPU: Different info for its cores, missing C-states (including C1)
Summary: Dual-core CPU: Different info for its cores, missing C-states (including C1)
Status: REJECTED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Processors (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: acpi_acpica-core@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-05 00:38 UTC by Pavel Troller
Modified: 2008-01-06 21:13 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.20
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg and acpidump output (26.91 KB, application/x-tbz)
2007-02-05 00:46 UTC, Pavel Troller
Details

Description Pavel Troller 2007-02-05 00:38:42 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.18
Distribution: Sinux 8.0 (private distro with vanilla kernel)
Hardware Environment: MSI MS-7210 motherboard with Pentium-D CPU
Software Environment: BIOS Information (from dmidecode)
                Vendor: American Megatrends Inc.
                Version: 080012
                Release Date: 11/24/2005

Problem Description: 
1) Different info shown for CPU cores:
patrol@arcus:~$ cat /proc/acpi/processor/CPU1/info
processor id:            0
acpi id:                 1
bus mastering control:   no
power management:        no
throttling control:      yes
limit interface:         yes
patrol@arcus:~$ cat /proc/acpi/processor/CPU2/info
processor id:            1
acpi id:                 2
bus mastering control:   no
power management:        no
throttling control:      no
limit interface:         no
2) Totally missing C-states, CPUs reported as being permanently in C0 
(running):
patrol@arcus:~$ cat /proc/acpi/processor/CPU*/power
active state:            C0
max_cstate:              C8
bus master activity:     00000000
maximum allowed latency: 2000 usec
states:
active state:            C0
max_cstate:              C8
bus master activity:     00000000
maximum allowed latency: 2000 usec
states:
3) (maybe not ACPI-related)
Second core displays much higher performance than the first
patrol@arcus:~$ cat /proc/cpuinfo
...
bogomips        : 6403.55
...
bogomips        : 8110.87
(clock is 3.2G, so the first value seems OK).
All these problems occured in 2.6.19, all was OK in 2.6.18 and prior.

Steps to reproduce:
Comment 1 Pavel Troller 2007-02-05 00:46:40 UTC
Created attachment 10283 [details]
dmesg and acpidump output
Comment 2 Pavel Troller 2007-02-05 09:13:10 UTC
Just verified that not only different info is given for CPU1 and CPU2, but, 
most importantly, while limit and throttling of CPU1 can read/set its 
settings, both the files are reading as <not supported> for CPU2. I also 
grepped my old logs and I've found that while C1 disappeared in 2.6.19, maybe 
the throttling/limit was unavailable for the second core didn't work in 
earlier kernels too.
Comment 3 Pavel Troller 2007-02-05 22:04:31 UTC
I've found, that when the CPU is going hot due to high load, the kernel logs 
this:
Feb  6 06:04:16 arcus kernel: CPU1: Temperature above threshold, cpu clock 
throttled (total events = 1)
Feb  6 06:04:45 arcus kernel: Machine check events logged
Feb  6 06:07:51 arcus kernel: CPU0: Temperature above threshold, cpu clock 
throttled (total events = 1)
Feb  6 06:09:16 arcus kernel: CPU1: Temperature above threshold, cpu clock 
throttled (total events = 33925)
Feb  6 06:09:45 arcus kernel: Machine check events logged
Feb  6 06:14:16 arcus kernel: CPU1: Temperature above threshold, cpu clock 
throttled (total events = 44822)
Feb  6 06:14:20 arcus kernel: CPU0: Temperature above threshold, cpu clock 
throttled (total events = 17)
Feb  6 06:14:45 arcus kernel: Machine check events logged
Feb  6 06:19:54 arcus kernel: CPU1: Temperature above threshold, cpu clock 
throttled (total events = 61607)
Feb  6 06:24:45 arcus kernel: Machine check events logged
Feb  6 06:28:14 arcus kernel: CPU1: Temperature above threshold, cpu clock 
throttled (total events = 67911)
Feb  6 06:29:45 arcus kernel: Machine check events logged
Feb  6 06:31:34 arcus kernel: CPU0: Temperature above threshold, cpu clock 
throttled (total events = 19)
Feb  6 06:33:14 arcus kernel: CPU1: Temperature/speed normal
Feb  6 06:34:45 arcus kernel: Machine check events logged
Feb  6 06:39:41 arcus kernel: CPU1: Temperature above threshold, cpu clock 
throttled (total events = 151883)
Feb  6 06:39:45 arcus kernel: Machine check events logged
Feb  6 06:46:47 arcus kernel: CPU1: Temperature above threshold, cpu clock 
throttled (total events = 154679)
Feb  6 06:49:45 arcus kernel: Machine check events logged
Feb  6 06:51:47 arcus kernel: CPU1: Temperature above threshold, cpu clock 
throttled (total events = 179484)
Feb  6 06:52:28 arcus kernel: CPU0: Temperature above threshold, cpu clock 
throttled (total events = 31)
Feb  6 06:54:45 arcus kernel: Machine check events logged

I see the following important things:
- Although the CPU load is now over for a while (even the fan runs idle now), 
there was just ONE message (for one CPU) saying that the CPU is going to 
normal, somewhere in the middle of the throttling messages. Does it mean that 
the CPUs now remain throttled (or at least there is an attempt to throttle) ?
- They are named as CPU0 and CPU1, while ACPI knows them as CPU1 and CPU2! 
Maybe the confusion comes from there ?
- Even during the computation and immediately after the "Throttled" message 
has been printed, reading /proc/acpi/processor/CPU1/throttling showed zero 
throttling, and no visible performance drop has been observed. 
Reading /proc/cpuinfo also showed full clock speed.
Comment 4 Len Brown 2007-02-07 18:43:21 UTC
Re: comment #3

> CPU1: Temperature above threshold, cpu clock throttled (total events = 1)
> Machine check events logged

This is due to TM1 (or TM2), which is a mechanism in the processor
hardware used to control temperature when ACPI, fan, and everything
else have failed.

ACPI doesn't actually know anything about TM1/TM2 -- they are supposed
to be extremely infrequent and very short in duration.
Is the processor colling device attached properly?
Comment 5 Len Brown 2007-02-07 19:01:21 UTC
        Processor (CPU1, 0x01, 0x00000810, 0x06)
        Processor (CPU2, 0x02, 0x00000000, 0x00)

The DSDT shows the 2nd processor is declared w/o a PBLKL (address, length).

That explains the 2nd processor with:
throttling control:      no
limit interface:         no

Also, it seems that the idle code doesn't bother putting any entries
in /proc/acpi/processor/CPU?/power when a processor has no _CST
and no PBLK because it is always using just C1 anyway.
I see this on one of my boxes too.

Indeed, the question is why it bothers to put a C1 entry in there
even for systems with a pblk, because the entries isn't actually
used by the C1 idle code.  So this is just consmetic.

> bogomips        : 6403.55
> bogomips        : 8110.87

This is the only mystery on this system -- though maybe it will
be explained when you figure out why the hardware throttling
is kicking in...
Comment 6 Pavel Troller 2007-02-08 07:19:38 UTC
Regarding missing PBLK: Is it ok, or does the DSDT need to be fixed ? 

Regarding TM1/TM2: ACPI doesn't provide any thermal zones, but there is a 
hardware mechanism (which can be setup in the BIOS as "target temperature"), 
causing that the CPU fan revolutions increase substantially when the CPU is 
hot. It is working perfectly. I've inspected the fans - even the PSU fan is 
working and sucking the hot air out from the case. The cooler is well seated, 
but maybe a bit dusty. I'll shut the system down and clean it soon.
Comment 7 Fu Michael 2007-11-12 17:58:38 UTC
Pavel, Do you still have this issue on latest kernel? 

I tried to search and find there is a new bios upgrade from MSI for this mobo. Maybe you want to try it first.
Comment 8 ykzhao 2007-12-19 22:00:11 UTC
Hi, Pavel
Will you please try the latest kernel and check whether the problem still exists after bios is update?
Thanks.
Comment 9 Len Brown 2008-01-06 21:13:44 UTC
the only ACPI problem I see here is the cosmetic part
about C1 not being displayed.

The real problem seems to be that the hardware
thermal throttling is kicking in, and that has
nothing to do with ACPI.

please re-open if there is still a problem seen
using software from this year.

Note You need to log in before you can comment on or make changes to this bug.