Bug 6519
Summary: | Athlon XP runs hot: s2k disconnect power saving inoperative | ||
---|---|---|---|
Product: | ACPI | Reporter: | Dave Jenkins (iamaluddite) |
Component: | Config-Processors | Assignee: | Jean Delvare (jdelvare) |
Status: | REJECTED INVALID | ||
Severity: | normal | CC: | acpi-bugzilla, akpm, jdelvare, maneesh |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.16.14 (kernel.org) and all 2.6.15, 2.6.16 Fedora kernels | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | dmesg with 2.6.16.14 |
Description
Dave Jenkins
2006-05-08 13:27:26 UTC
hi could you attach dmesg and /proc/interrupts please ? is a cpufreq driver and governor loaded on these two configurations? If so, please modprobe cpufreq_stats and look in /sys/devices/system/cpu/cpu*/cpufreq/ and stats/ and report what frequency the processor is running at in the working and failure configurations. Is ACPI enabled in the working and failure configurations? If yes, any difference if you boot with "acpi=off"? Also, in ACPI mode, does the temperature reported under /proc/acpi/ match what is reported by lm_sensors? In general, running i2c based lm_sensors at the same time as an ACPI enabled kernel is a bad idea. If ACPI reports the same temperature as i2c, is there any effect on the failing system if the i2c drivers are not loaded? Created attachment 8062 [details]
dmesg with 2.6.16.14
Re: comment #1 dmesg attached. /proc/interrupts (after removing extraneous modules): CPU0 0: 781292 IO-APIC-edge timer 1: 2064 IO-APIC-edge i8042 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 714 IO-APIC-edge i8042 14: 3467 IO-APIC-edge ide0 15: 94 IO-APIC-edge ide1 NMI: 0 LOC: 781247 ERR: 0 MIS: 0 Re: comment #2 I don't believe my hardware supports CPU frequency scaling. The Gnome CPU Frequency Scaling Monitor 2.10.1 reports "CPU frequency scaling unsupported" and states the frequency as 1.92GHz in all configurations and circumstances. I tried modprobe cpufreq_stats but /sys/devices/system/cpu/cpu0/ is an empty directory. ACPI is enabled in both the working and failure configurations. Booting with "acpi=off" did not affect the temperature. Here's the output of /proc/interrupts from 2.6.16.14 with acpi=off: CPU0 0: 480379 IO-APIC-edge timer 1: 1538 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 12: 114 IO-APIC-edge i8042 14: 3275 IO-APIC-edge ide0 15: 94 IO-APIC-edge ide1 NMI: 0 LOC: 480321 ERR: 0 MIS: 0 - The 'XT-PIC cascade' line has appeared where previously was 'IO-APIC-level acpi'. The temperature reported under /proc/acpi/ does indeed match what is reported by lm_sensors, plus or minus a degree or so. Booting into 2.6.16.14 without loading the lm_sensors modules did not affect the temperature (as reported by /proc/acpi/thermal_zone/THRM/temperature). Thanks for your comments, sorry to present an entirely negative set of results but I hope these provide some useful info! > Booting with "acpi=off" did not affect the temperature.
Does "acpi=off" also have no effect on the 2.6.14 success case?
If so, you've proven that this issue is independent of ACPI.
> Does "acpi=off" also have no effect on the 2.6.14 success case?
> If so, you've proven that this issue is independent of ACPI.
Aha, I didn't try that but I have now. (I think I was seeing this as a sin of
commission rather than omission!) Results:
Kernel ACPI CPU temp (C)
2.6.14-1.1656_FC4 on 38
2.6.14-1.1656_FC4 off 49
2.6.16.14 on 51
2.6.16.14 off 51
s2k disconnect was enabled in BIOS throughout. I allowed the CPU temperature to
stabilise for several minutes in each case and the machine had been on long
enough for the case to reach its usual temp. The order of testing was different
from the order listed, in case anyone's thinking the results are due to the
machine warming up over the course of the testing.
The difference between 2.6.14 without ACPI (49) and 2.6.16 (51), though small,
is real and reproducible. But obviously the main news here is that 2.6.14 is no
longer cool when acpi=off.
Len, I take it that this means "it isn't ACPI". If so, which subsystem do you think regressed? Thanks. > I take it that this means "it isn't ACPI"
I would have thought the reverse. As Len said, if "acpi=off" had no effect on
the 2.6.14 success case - just as it has no effect on the 2.6.16 failure case -
then we would have established that the issue is independent of ACPI. But it
turns out that "acpi=off" does make a difference with 2.6.14 . Suggesting that
ACPI may not be doing its job properly with 2.6.16 (or some more complex
interaction).
I agree with Dave Jenkins in comment #9, results in comment #7 _do_ indeed suggest that ACPI has its part of responsability in the problem. For completeness about lm_sensors modules, you don't need all what you loaded. w83627hf, i2c-isa, hwmon and hwmon-vid are enough, i2c-viapro, eeprom and i2c-dev are not needed for a minimal configuration. Not that it matters much, as comment #5 seem to exclude a regression on the hwmon side. I also second Len Brown on its lm_sensors vs. ACPI comment #2 (except that the problem isn't limited to i2c). Where ACPI and lm_sensors both report results from the same chip, a race condition exists and problems can happen, although they were only reported on a few specific systems. Dave, just in case, please check for a BIOS update, and give it a try if available. Unless the ACPI folks have an idea of what to try next, and given that the problem can be reproduced at will, I'd suggest a git bisect between 2.6.14 and 2.6.15. Thanks for your comments, Jean. I haven't run into conflicts between ACPI and i2c so far; I use lm_sensors only to feed Gnome Sensors Applet so I'll look out for an ACPI-aware equivalent. I have looked for BIOS updates, there's nothing that seems directly relevant. The latest update is described as "System will auto power on when plug power code" so I suppose that could be ACPI-related, but doesn't appear to be anything to do with s2k bus disconnect. My general attitude to BIOS updates is "if it ain't broke, don't flash it", especially since a friend turned a working motherboard into a knobbly tea tray recently. Given that everything works fine with 2.6.14, I would need some persuading. I don't want to be uncooperative but to be honest I would rather stick with 2.6.14, which is what I'm running currently without problems, than flash the BIOS. Yes, the problem is 100% reproducible and I'm more than happy to do further testing to help track down the cause. The Athlon XP without bus disconnect is energy-inefficient, using almost as much power at idle as at full load: see e.g. http://www.silentpcreview.com/article265-page2.html - 2.6A idle, 3.2A CPUBurn on 12V2 line for Athlon 2500+ Barton. So resolving this problem would help reduce global warming! ;-) The ACPI vs. lm_sensors conflicts are not something you will easily notice. We know that they can happen in theory, but when the race condition triggers, all you'll get in most cases is simply a register read returning the wrong value (or a register write writing the value to a different register.) And that's all so unlikely to happen and so unreproducible by nature that nobody will probably ever report it. Still, we know the problem is there. Dave, I respect your fear of upgrading the BIOS. Let's try the git bisect approach then, and when you have found the patch responsible for the problem, report it here. I'll let it to the ACPI people to then suggest how the problem can be (hopefully) fixed. There are git bisect guides available: http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ Right, I'm new to git but I'm giving it a try. It doesn't seem to understand sub-versions, e.g. "git bisect bad v2.6.15.2" gave the error "fatal: Needed a single revision". (The first FC4 kernel to show the problem was 2.6.15-1.1830_FC4, based on 2.6.15.2). Do I have to work at the granularity of 2.6.14, 2.6.15 or is there another way to specify intermediate versions? I suppose in any case the binary search will quickly start narrowing it down. The 2.6.15.x series is a branch off 2.6.15: 2.6.15 -> 2.6.16-rc1 -> 2.6.16-rc2 ... -> 2.6.17 \ -> 2.6.15.1 -> 2.6.15.2 so 2.6.15.x just isn't in Linus's tree at all. You should use v2.6.15 instead. Progress: git bisect identified the following revision: --------8<-------- 2203d6ed448ff3b777ee6bb614a53e686b483e5b is first bad commit diff-tree 2203d6ed448ff3b777ee6bb614a53e686b483e5b (from 2656c076e31a3ce3ab2a987a578e7122dc2af51d) Author: Linus Torvalds <torvalds@g5.osdl.org> Date: Fri Nov 18 07:29:51 2005 -0800 Fix ACPI processor power block initialization Properly clear the memory, and set "pr->flags.power" only if a C2 or deeper state is valid (to make the code match both the comment and previous behaviour). This fixes a boot-time lockup reported by Maneesh Soni when using "maxcpus=1". Acked-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> :040000 040000 52be621b960ae192b36acf778c966d78ff5edbe2 04c183ce141dab8cdff049c1dae379104b637ed4 M drivers ------------------------ drivers/acpi/processor_idle.c ------------------------ index 573b6a9..70d8a6e 100644 @@ -514,8 +514,6 @@ static int acpi_processor_set_power_poli static int acpi_processor_get_power_info_fadt(struct acpi_processor *pr) { - int i; - ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_fadt"); if (!pr) @@ -524,8 +522,7 @@ static int acpi_processor_get_power_info if (!pr->pblk) return_VALUE(-ENODEV); - for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++) - memset(pr->power.states, 0, sizeof(struct acpi_processor_cx)); + memset(pr->power.states, 0, sizeof(pr->power.states)); /* if info is obtained from pblk/fadt, type equals state */ pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; @@ -555,13 +552,9 @@ static int acpi_processor_get_power_info static int acpi_processor_get_power_info_default_c1(struct acpi_processor *pr) { - int i; - ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_default_c1"); - for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++) - memset(&(pr->power.states[i]), 0, - sizeof(struct acpi_processor_cx)); + memset(pr->power.states, 0, sizeof(pr->power.states)); /* if info is obtained from pblk/fadt, type equals state */ pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; @@ -873,7 +866,8 @@ static int acpi_processor_get_power_info for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) { if (pr->power.states[i].valid) { pr->power.count = i; - pr->flags.power = 1; + if (pr->power.states[i].type >= ACPI_STATE_C2) + pr->flags.power = 1; } } --------8<-------- The last change in the diff, in acpi_processor_get_power_info, is what makes the difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)" restores the power saving in 2.6.16.14 . I know nothing about kernel internals so the following may not be well informed. I'm inclined to think that the revision is not blameworthy in itself, but rather exposes an underlying problem. The comment above this bit of code says, 'if one state of type C2 or C3 is available, mark this CPU as being "idle manageable"'. As the commit log says, the revision makes the code match the comment. What happens in my case is revealed after enabling some debug statements: --------8<-------- acpi_processor-0926 [07] processor_get_power_in: ----Entry acpi_processor-0621 [08] processor_get_power_in: ----Entry acpi_processor-0634 [08] processor_get_power_in: ----Exit- 0000000000000000 acpi_processor-0646 [08] processor_get_power_in: ----Entry acpi_processor-0660 [08] processor_get_power_in: No _CST, giving up acpi_processor-0661 [08] processor_get_power_in: ----Exit- FFFFFFFFFFFFFFED acpi_processor-0582 [08] processor_get_power_in: ----Entry acpi_processor-0614 [08] processor_get_power_in: lvl2[0x00004014] lvl3[0x00004015] acpi_processor-0616 [08] processor_get_power_in: ----Exit- 0000000000000000 acpi_processor-0774 [08] processor_power_verify: ----Entry acpi_processor-0785 [08] processor_power_verify: latency too large [101] acpi_processor-0786 [08] processor_power_verify: ----Exit- acpi_processor-0804 [08] processor_power_verify: ----Entry acpi_processor-0815 [08] processor_power_verify: latency too large [1001] acpi_processor-0816 [08] processor_power_verify: ----Exit- acpi_processor-0511 [08] processor_set_power_po: ----Entry acpi_processor-0577 [08] processor_set_power_po: ----Exit- 0000000000000000 acpi_processor-0963 [07] processor_get_power_in: ----Exit- 0000000000000000 ACPI: CPU0 (power states: C1[C1]) --------8<-------- So in acpi_processor_get_power_info, acpi_processor_get_power_info_cst fails and acpi_processor_get_power_info_fadt is called. The latter finds states C2 and C3 but assigns latencies which, suspiciously, are each 1 greater than the permissible maximum. Would these be dummy values that get inserted when genuine values could not be read? Pretty good for a Luddite - thanks. I've added Linus to cc, which means that this discussion should proceed via email, please. Just do a reply-to-all, make sure that bugme-daemon@bugzilla.kernel.org remains on cc. bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=6519 > > > > > > ------- Additional Comments From iamaluddite@yahoo.com 2006-05-15 11:01 ------- > Progress: git bisect identified the following revision: > > --------8<-------- > 2203d6ed448ff3b777ee6bb614a53e686b483e5b is first bad commit > diff-tree 2203d6ed448ff3b777ee6bb614a53e686b483e5b (from > 2656c076e31a3ce3ab2a987a578e7122dc2af51d) > Author: Linus Torvalds <torvalds@g5.osdl.org> > Date: Fri Nov 18 07:29:51 2005 -0800 > > Fix ACPI processor power block initialization > > Properly clear the memory, and set "pr->flags.power" only if a C2 or > deeper state is valid (to make the code match both the comment and > previous behaviour). > > This fixes a boot-time lockup reported by Maneesh Soni when using > "maxcpus=1". > > Acked-by: Maneesh Soni <maneesh@in.ibm.com> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > :040000 040000 52be621b960ae192b36acf778c966d78ff5edbe2 > 04c183ce141dab8cdff049c1dae379104b637ed4 M drivers > > ------------------------ drivers/acpi/processor_idle.c ------------------------ > index 573b6a9..70d8a6e 100644 > @@ -514,8 +514,6 @@ static int acpi_processor_set_power_poli > > static int acpi_processor_get_power_info_fadt(struct acpi_processor *pr) > { > - int i; > - > ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_fadt"); > > if (!pr) > @@ -524,8 +522,7 @@ static int acpi_processor_get_power_info > if (!pr->pblk) > return_VALUE(-ENODEV); > > - for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++) > - memset(pr->power.states, 0, sizeof(struct acpi_processor_cx)); > + memset(pr->power.states, 0, sizeof(pr->power.states)); > > /* if info is obtained from pblk/fadt, type equals state */ > pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; > @@ -555,13 +552,9 @@ static int acpi_processor_get_power_info > > static int acpi_processor_get_power_info_default_c1(struct acpi_processor *pr) > { > - int i; > - > ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_default_c1"); > > - for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++) > - memset(&(pr->power.states[i]), 0, > - sizeof(struct acpi_processor_cx)); > + memset(pr->power.states, 0, sizeof(pr->power.states)); > > /* if info is obtained from pblk/fadt, type equals state */ > pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; > @@ -873,7 +866,8 @@ static int acpi_processor_get_power_info > for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) { > if (pr->power.states[i].valid) { > pr->power.count = i; > - pr->flags.power = 1; > + if (pr->power.states[i].type >= ACPI_STATE_C2) > + pr->flags.power = 1; > } > } > > --------8<-------- > > The last change in the diff, in acpi_processor_get_power_info, is what makes the > difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)" > restores the power saving in 2.6.16.14 . > > I know nothing about kernel internals so the following may not be well informed. > I'm inclined to think that the revision is not blameworthy in itself, but rather > exposes an underlying problem. The comment above this bit of code says, 'if one > state of type C2 or C3 is available, mark this CPU as being "idle manageable"'. > As the commit log says, the revision makes the code match the comment. > > What happens in my case is revealed after enabling some debug statements: > > --------8<-------- > acpi_processor-0926 [07] processor_get_power_in: ----Entry > acpi_processor-0621 [08] processor_get_power_in: ----Entry > acpi_processor-0634 [08] processor_get_power_in: ----Exit- 0000000000000000 > acpi_processor-0646 [08] processor_get_power_in: ----Entry > acpi_processor-0660 [08] processor_get_power_in: No _CST, giving up > acpi_processor-0661 [08] processor_get_power_in: ----Exit- FFFFFFFFFFFFFFED > acpi_processor-0582 [08] processor_get_power_in: ----Entry > acpi_processor-0614 [08] processor_get_power_in: lvl2[0x00004014] lvl3[0x00004015] > acpi_processor-0616 [08] processor_get_power_in: ----Exit- 0000000000000000 > acpi_processor-0774 [08] processor_power_verify: ----Entry > acpi_processor-0785 [08] processor_power_verify: latency too large [101] > acpi_processor-0786 [08] processor_power_verify: ----Exit- > acpi_processor-0804 [08] processor_power_verify: ----Entry > acpi_processor-0815 [08] processor_power_verify: latency too large [1001] > acpi_processor-0816 [08] processor_power_verify: ----Exit- > acpi_processor-0511 [08] processor_set_power_po: ----Entry > acpi_processor-0577 [08] processor_set_power_po: ----Exit- 0000000000000000 > acpi_processor-0963 [07] processor_get_power_in: ----Exit- 0000000000000000 > ACPI: CPU0 (power states: C1[C1]) > --------8<-------- > > So in acpi_processor_get_power_info, acpi_processor_get_power_info_cst fails and > acpi_processor_get_power_info_fadt is called. The latter finds states C2 and C3 > but assigns latencies which, suspiciously, are each 1 greater than the > permissible maximum. Would these be dummy values that get inserted when genuine > values could not be read? > > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. Excellent work, Dave! You finally got it. Now let's see what the ACPI folks think about it.
On Mon, 15 May 2006, Andrew Morton wrote:
> > @@ -873,7 +866,8 @@ static int acpi_processor_get_power_info
> > for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
> > if (pr->power.states[i].valid) {
> > pr->power.count = i;
> > - pr->flags.power = 1;
> > + if (pr->power.states[i].type >= ACPI_STATE_C2)
> > + pr->flags.power = 1;
> > }
> > }
> >
> > --------8<--------
> >
> > The last change in the diff, in acpi_processor_get_power_info, is what makes the
> > difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)"
> > restores the power saving in 2.6.16.14 .
That last change is also the thing that fixed the lock-up for Maneesh.
We had this particular discussion at some point earlier, and didn't get to
any resolution. The ACPI people wanted to undo the thing, which didn't
make a lot of sense because (a) the comment says otherwise and (b) C1 is
available even without ACPI and (c) nobody ever explained why it locked up
without the "type >= ACPI_STATE_C2" test.
Now, what happens is that your ACPI tables show that C2+ is unusable,
which leaves only C1 usable (that, btw, you might be able to fix with
different tables - possibly through a BIOS update).
However, at that point it's totally pointless to even _try_ to use ACPI
CPU power management for this case, since ACPI can't do any better than
the normal C1 stuff in the bog-standard non-ACPI x86 idle routine.
Do you actally see anything running hotter?
Or is it just that the "CPU%d (power states:..)" message disappeared?
Please realize that we will _always_ use C1 (aka "halt") in the idle state
quite regardless of ACPI - unless you've done "idle=poll" on the kernel
command line. So the fact that we don't use ACPI for it shouldn't make us
actually run any hotter (quite the reverse - we'll go into C1 state with
_less_ work).
So I don't see the downside. Am I missing something?
Linus
>> So in acpi_processor_get_power_info,
>> acpi_processor_get_power_info_cst fails and
>> acpi_processor_get_power_info_fadt is called. The latter
>> finds states C2 and C3
>> but assigns latencies which, suspiciously, are each 1
>> greater than the
>> permissible maximum. Would these be dummy values that get
>> inserted when genuine values could not be read?
These values are inserted by the BIOS writer to indicate that
the the platform does not support these C-states.
Generally, we've found that vendors assume the OS will use _CST
and will ignore the FADT, and thus they don't bother putting
anything interesting in the FADT -- as that path is never tested.
--- Linus Torvalds <torvalds@osdl.org> wrote: > Do you actally see anything running hotter? Yes, that's how this problem manifests itself and the only means by which I became aware of it. In the working case, without "if (pr->power.states[i].type >= ACPI_STATE_C2)", my CPU idle temperature is around 36 degrees C. In the non-working case, with that "if" statement, the idle temp is around 51 degrees C. The CPU temp was my sole guide during the git bisect. > Or is it just that the "CPU%d (power states:..)" message disappeared? This message does disappear in the non-working case, co-inciding with the raised CPU temp. > Please realize that we will _always_ use C1 (aka "halt") in the idle > state > quite regardless of ACPI - unless you've done "idle=poll" on the > kernel > command line. So the fact that we don't use ACPI for it shouldn't > make us > actually run any hotter (quite the reverse - we'll go into C1 state > with > _less_ work). Interesting. I'm not using "idle=poll". I should re-iterate that I know nothing about kernel internals, but I'm wondering, given what you've said, if its possible that there's another problem here, causing non-ACPI C1/halt not to work. Thus ACPI is the only mechanism providing power-saving; and was only doing so, prior to the "if (pr->power.states[i].type >= ACPI_STATE_C2)" revision, due to a happy accident whereby a valid C1 state in pr->power.states caused pr->flags.power to be set. Does that make any sense? This theory seems consistent with the fact that the working (i.e. cool) Fedora kernel 2.6.14-1.1656_FC4, becomes non-working (i.e. hot) when booted with "acpi=off" ( see Comment #7 http://bugzilla.kernel.org/show_bug.cgi?id=6519#c7 ) Thanks for your comments and insight. Dave Send instant messages to your online friends http://uk.messenger.yahoo.com On Mon, 15 May 2006, Dave Jenkins wrote: > > --- Linus Torvalds <torvalds@osdl.org> wrote: > > Do you actally see anything running hotter? > > Yes, that's how this problem manifests itself and the only means by > which I became aware of it. In the working case, without "if > (pr->power.states[i].type >= ACPI_STATE_C2)", my CPU idle temperature > is around 36 degrees C. In the non-working case, with that "if" > statement, the idle temp is around 51 degrees C. The CPU temp was my > sole guide during the git bisect. Good job. That's certainly conclusive. > > Please realize that we will _always_ use C1 (aka "halt") in the idle state > > quite regardless of ACPI - unless you've done "idle=poll" on the kernel > > command line. So the fact that we don't use ACPI for it shouldn't make us > > actually run any hotter (quite the reverse - we'll go into C1 state with > > _less_ work). > > Interesting. I'm not using "idle=poll". I should re-iterate that I know > nothing about kernel internals, but I'm wondering, given what you've > said, if its possible that there's another problem here, causing > non-ACPI C1/halt not to work. I'd also wonder if maybe there is something that causes ACPI to go into C2 even though the BIOS latency tables have apparently said that we shouldn't. Ie it may be that we have a totally unrelated bug that made ACPI actually go into a deeper powersaving mode than it should. That could have explained the lockups that Maneesh saw with "maxcpus=1" (because C2 and above are disabled for SMP _anyway_, since they aren't valid there due to touching the _common_ northbridge rather than being per-core). > Thus ACPI is the only mechanism providing > power-saving; and was only doing so, prior to the "if > (pr->power.states[i].type >= ACPI_STATE_C2)" revision, due to a happy > accident whereby a valid C1 state in pr->power.states caused > pr->flags.power to be set. Does that make any sense? Actually, suddenly it does. When I look more closely at your dmesg report, I find this: Checking 'hlt' instruction... disabled ie your normal idle loop has literally _disabled_ the use of hlt. Do you have "no-hlt" on the kernel command line? That _should_ be the only thing that disables hlt (and thus power-savings in idle). Yup, you do (from that same dmesg): Kernel command line: ro root=/dev/VolGroup00/LogVol00 no-hlt 1 And yes, ACPI will ignore that "no-hlt" flag. Now, the question is, why do you have no-hlt there? Was it some strange distro that set it for you? And why? Linus
On Mon, 15 May 2006, Linus Torvalds wrote:
>
> Yup, you do (from that same dmesg):
>
> Kernel command line: ro root=/dev/VolGroup00/LogVol00 no-hlt 1
>
> Now, the question is, why do you have no-hlt there? Was it some strange
> distro that set it for you? And why?
Btw, I think we can close this report, aside from the question about why
that "no-hlt" was there in the first place. I bet the power usage will go
back to where it was with ACPI without that thing.
The whole "no-hlt" thing exists purely for some _really_ old i486 class
machines that would lock up with hlt for some unexplained reason.
Linus
Linus Torvalds <torvalds@osdl.org> wrote: > > Btw, I think we can close this report, aside from the question about why > that "no-hlt" was there in the first place. I bet the power usage will go > back to where it was with ACPI without that thing. Thanks, Linus. But.. Was Dave using no-hlt on earlier kernels? If so, why didn't they get hot as well?
On Mon, 15 May 2006, Andrew Morton wrote:
> Linus Torvalds <torvalds@osdl.org> wrote:
> >
> > Btw, I think we can close this report, aside from the question about why
> > that "no-hlt" was there in the first place. I bet the power usage will go
> > back to where it was with ACPI without that thing.
>
> Thanks, Linus.
>
> But.. Was Dave using no-hlt on earlier kernels? If so, why didn't they
> get hot as well?
Exactly because ACPI _ignored_ that option, so it would use the broken
ACPI C1 sleepstate.
So ACPI just does:
/*
* Invoke C1.
* Use the appropriate idle routine, the one that would
* be used without acpi C-states.
*/
if (pm_idle_save)
pm_idle_save();
else
acpi_safe_halt();
where acpi_safe_halt() just does
static void acpi_safe_halt(void)
{
clear_thread_flag(TIF_POLLING_NRFLAG);
smp_mb__after_clear_bit();
if (!need_resched())
safe_halt();
set_thread_flag(TIF_POLLING_NRFLAG);
}
and here probably pm_idle_save was NULL.
In contrast, the main CPU idle wil do
...
idle = pm_idle;
if (!idle)
idle = default_idle;
if (cpu_is_offline(cpu))
play_dead();
__get_cpu_var(irq_stat).idle_timestamp = jiffies;
idle();
...
ie if pm_idle is NULL (which is tha ACPI "saved_pm_idle") it will use
default_idle, which in turn does
if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
clear_thread_flag(TIF_POLLING_NRFLAG);
smp_mb__after_clear_bit();
while (!need_resched()) {
local_irq_disable();
if (!need_resched())
safe_halt();
else
local_irq_enable();
}
set_thread_flag(TIF_POLLING_NRFLAG);
} else {
while (!need_resched())
cpu_relax();
}
ie it _honors_ that hlt_works_ok flag, unlike the ACPI one.
Now, I'm not saying that ACPI should honor the hlt_works_ok flag, because
ACPI wouldn't _exist_ on the kind of old machines hat needed it, but I
think it explains why ACPI ended up running in a cooler C1 than the normal
idle routine, which would just end up doing that endless loop of
"cpu_relax()" (which is not a halt, but a special no-op).
Linus
--- Linus Torvalds <torvalds@osdl.org> wrote: > Now, the question is, why do you have no-hlt there? Was it some > strange > distro that set it for you? And why? I must admit I'd forgotten about the no-hlt, sorry. I added it to my grub.conf because I was getting occasional hangs on boot at "Checking 'hlt' instruction", perhaps 5-10% of the time. I first added it when running Fedora kernel 2.6.14-1.1656_FC4 . I added the no-hlt not knowing whether it would affect power saving or simply disable a potentially problematic test. With that kernel, adding no-hlt did not noticeably affect CPU temperature. We now know, or suspect, this is because that kernel had a bug/feature whereby ACPI power saving is controversially enabled for hardware that reports only C1 as a valid power state. At the time, I interpreted the fact that the temperature did not increase as showing that the "no-hlt" had simply disabled the test, not the power-saving. I hope this is a forgiveable mistake. When I upgraded to subsequent Fedora kernels, the existing kernel parameters, including no-hlt, were evidently automatically copied for the new entries in grub.conf . And these new kernels (2.6.15-1.1830_FC4 onward) did not have the same ACPI bug/feature, so suddenly my CPU temperature shot up because I now did not have power saving from hlt or ACPI. I have now confirmed that 2.6.16-1.2108_FC4 without no-hlt runs cool. Yes, I should have spotted and tested this before submitting the bug and I apologise for that. This leaves, apart from some embarrassment on my part, the question why I was getting hangs at "Checking 'hlt' instruction" with 2.6.14-1.1656_FC4 . Wouldn't it be neat if this turned out to have the same cause as the lock-up reported by Maneesh that motivated the "if (pr->power.states[i].type >= ACPI_STATE_C2)" revision! Dave Send instant messages to your online friends http://uk.messenger.yahoo.com Closing bug as INVALID, removing no-hlt from the boot command line restored the power saving when idle. |