Bug 7934
Summary: | Dual-core CPU: Different info for its cores, missing C-states (including C1) | ||
---|---|---|---|
Product: | ACPI | Reporter: | Pavel Troller (patrol) |
Component: | Config-Processors | Assignee: | acpi_acpica-core (acpi_acpica-core) |
Status: | REJECTED UNREPRODUCIBLE | ||
Severity: | normal | CC: | acpi-bugzilla |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.20 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | dmesg and acpidump output |
Description
Pavel Troller
2007-02-05 00:38:42 UTC
Created attachment 10283 [details]
dmesg and acpidump output
Just verified that not only different info is given for CPU1 and CPU2, but, most importantly, while limit and throttling of CPU1 can read/set its settings, both the files are reading as <not supported> for CPU2. I also grepped my old logs and I've found that while C1 disappeared in 2.6.19, maybe the throttling/limit was unavailable for the second core didn't work in earlier kernels too. I've found, that when the CPU is going hot due to high load, the kernel logs this: Feb 6 06:04:16 arcus kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 1) Feb 6 06:04:45 arcus kernel: Machine check events logged Feb 6 06:07:51 arcus kernel: CPU0: Temperature above threshold, cpu clock throttled (total events = 1) Feb 6 06:09:16 arcus kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 33925) Feb 6 06:09:45 arcus kernel: Machine check events logged Feb 6 06:14:16 arcus kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 44822) Feb 6 06:14:20 arcus kernel: CPU0: Temperature above threshold, cpu clock throttled (total events = 17) Feb 6 06:14:45 arcus kernel: Machine check events logged Feb 6 06:19:54 arcus kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 61607) Feb 6 06:24:45 arcus kernel: Machine check events logged Feb 6 06:28:14 arcus kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 67911) Feb 6 06:29:45 arcus kernel: Machine check events logged Feb 6 06:31:34 arcus kernel: CPU0: Temperature above threshold, cpu clock throttled (total events = 19) Feb 6 06:33:14 arcus kernel: CPU1: Temperature/speed normal Feb 6 06:34:45 arcus kernel: Machine check events logged Feb 6 06:39:41 arcus kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 151883) Feb 6 06:39:45 arcus kernel: Machine check events logged Feb 6 06:46:47 arcus kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 154679) Feb 6 06:49:45 arcus kernel: Machine check events logged Feb 6 06:51:47 arcus kernel: CPU1: Temperature above threshold, cpu clock throttled (total events = 179484) Feb 6 06:52:28 arcus kernel: CPU0: Temperature above threshold, cpu clock throttled (total events = 31) Feb 6 06:54:45 arcus kernel: Machine check events logged I see the following important things: - Although the CPU load is now over for a while (even the fan runs idle now), there was just ONE message (for one CPU) saying that the CPU is going to normal, somewhere in the middle of the throttling messages. Does it mean that the CPUs now remain throttled (or at least there is an attempt to throttle) ? - They are named as CPU0 and CPU1, while ACPI knows them as CPU1 and CPU2! Maybe the confusion comes from there ? - Even during the computation and immediately after the "Throttled" message has been printed, reading /proc/acpi/processor/CPU1/throttling showed zero throttling, and no visible performance drop has been observed. Reading /proc/cpuinfo also showed full clock speed. Re: comment #3 > CPU1: Temperature above threshold, cpu clock throttled (total events = 1) > Machine check events logged This is due to TM1 (or TM2), which is a mechanism in the processor hardware used to control temperature when ACPI, fan, and everything else have failed. ACPI doesn't actually know anything about TM1/TM2 -- they are supposed to be extremely infrequent and very short in duration. Is the processor colling device attached properly? Processor (CPU1, 0x01, 0x00000810, 0x06)
Processor (CPU2, 0x02, 0x00000000, 0x00)
The DSDT shows the 2nd processor is declared w/o a PBLKL (address, length).
That explains the 2nd processor with:
throttling control: no
limit interface: no
Also, it seems that the idle code doesn't bother putting any entries
in /proc/acpi/processor/CPU?/power when a processor has no _CST
and no PBLK because it is always using just C1 anyway.
I see this on one of my boxes too.
Indeed, the question is why it bothers to put a C1 entry in there
even for systems with a pblk, because the entries isn't actually
used by the C1 idle code. So this is just consmetic.
> bogomips : 6403.55
> bogomips : 8110.87
This is the only mystery on this system -- though maybe it will
be explained when you figure out why the hardware throttling
is kicking in...
Regarding missing PBLK: Is it ok, or does the DSDT need to be fixed ? Regarding TM1/TM2: ACPI doesn't provide any thermal zones, but there is a hardware mechanism (which can be setup in the BIOS as "target temperature"), causing that the CPU fan revolutions increase substantially when the CPU is hot. It is working perfectly. I've inspected the fans - even the PSU fan is working and sucking the hot air out from the case. The cooler is well seated, but maybe a bit dusty. I'll shut the system down and clean it soon. Pavel, Do you still have this issue on latest kernel? I tried to search and find there is a new bios upgrade from MSI for this mobo. Maybe you want to try it first. Hi, Pavel Will you please try the latest kernel and check whether the problem still exists after bios is update? Thanks. the only ACPI problem I see here is the cosmetic part about C1 not being displayed. The real problem seems to be that the hardware thermal throttling is kicking in, and that has nothing to do with ACPI. please re-open if there is still a problem seen using software from this year. |