Bug 6519 - Athlon XP runs hot: s2k disconnect power saving inoperative
Summary: Athlon XP runs hot: s2k disconnect power saving inoperative
Status: REJECTED INVALID
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Processors (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Jean Delvare
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-08 13:27 UTC by Dave Jenkins
Modified: 2006-05-15 22:54 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.16.14 (kernel.org) and all 2.6.15, 2.6.16 Fedora kernels
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg with 2.6.16.14 (17.22 KB, text/plain)
2006-05-08 15:43 UTC, Dave Jenkins
Details

Description Dave Jenkins 2006-05-08 13:27:26 UTC
Most recent kernel where this bug did not occur: 2.6.14-1.1656_FC4 (Fedora)
 - This was the last 2.6.14 kernel I used. Problem appeared with first 2.6.15
Fedora kernel and persists with all subsequent Fedora kernels as well as
kernel.org 2.6.16.14 .

Distribution: Fedora

Hardware Environment: Athlon XP 2600+ Barton CPU, MSI KM4M-L mobo

Software Environment:
ver_linux:
Linux bagpuss.localdomain 2.6.16.14_DJ_minimal #1 Mon May 8 15:38:54 BST 2006
i686 athlon i386 GNU/Linux

Gnu C                  4.0.2
Gnu make               3.80
binutils               2.15.94.0.2.2
util-linux             2.12p
mount                  2.12p
module-init-tools      3.2-pre9
e2fsprogs              1.38
reiserfsprogs          line
reiser4progs           line
pcmcia-cs              3.2.8
quota-tools            3.12.
PPP                    2.4.2
isdn4k-utils           3.7
nfs-utils              1.0.7
Linux C Library        2.3.6
Dynamic linker (ldd)   2.3.6
Procps                 3.2.5
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
udev                   071
Modules Loaded         w83627hf hwmon_vid hwmon eeprom i2c_isa i2c_viapro
i2c_dev i2c_core dm_mod


Problem Description:
My Athlon XP runs approx 15 degrees C hotter at idle with 2.6.15 and 2.6.16
kernels than with 2.6.14 and earlier. It seems that 2.6.15+ does not allow the
CPU to enter s2k disconnect power saving mode.

MSI KM4M-L motherboard BIOS provides CPU Disconnect Control:
"The item is to reduce the power consumption of the AMD K7 system. When set to
Enabled, the processor is allowed to disconnect the s2k interface when the AMD
K7 system is in some power saving states."

CPU idle temperatures with BIOS CPU Disconnect enabled versus disabled, for
2.6.14 versus 2.6.15 (I'm using a low-speed CPU fan, so the temp differences are
larger than they might be):

Kernel      CPU Disconnect      CPU idle temp, degrees C
2.6.14      enabled             35.5
2.6.14      disabled            46
2.6.15      enabled             49.5
2.6.15      disabled            49.5

I have tried to track down the culprit by booting into runlevel 1, then removing
all unused modules, just leaving those required for lm_sensors so that I can
check the CPU temp. This is the smallest set of modules I got down to:
w83627hf hwmon_vid hwmon eeprom i2c_isa i2c_viapro i2c_dev i2c_core dm_mod

(w83627hf is the module for the Winbond chip queried by lm_sensors).
With just these modules loaded, in runlevel 1 with the system idling, the s2k
disconnect still does not occur with 2.6.15 . Whereas in 2.6.14, it occurs quite
happily in every runlevel I've tried, even with 50+ kernel modules loaded - the
only exception being after attaching a USB storage device, as described in
RedHat Bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172592

How reproducible:
Always

Steps to reproduce:
1. Enable CPU Disconnect in BIOS.
2. Note CPU idle temperature.

Actual Results:  With 2.6.15, idle temperature is the same with or without CPU
Disconnect disabled in BIOS.

Expected Results:  Idle temperature should be significantly lower with CPU
Disconnect enabled in BIOS, i.e. behaviour should be as with 2.6.14 .


Detailed system information
---------------------------
/proc/version:
Linux version 2.6.16.14_DJ_minimal (davej@bagpuss.localdomain) (gcc version
4.0.2 20051125 (Red Hat 4.0.2-8)) #1 Mon May 8 15:38:54 BST 2006

/proc/cpuinfo:
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 2600+
stepping        : 0
cpu MHz         : 1915.477
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : yes
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts
bogomips        : 3833.88

/proc/modules:
w83627hf 26384 0 - Live 0xe09e1000
hwmon_vid 2944 1 w83627hf, Live 0xe092f000
hwmon 3396 1 w83627hf, Live 0xe0823000
eeprom 7824 0 - Live 0xe09de000
i2c_isa 5248 1 w83627hf, Live 0xe09f2000
i2c_viapro 9236 0 - Live 0xe09da000
i2c_dev 9824 0 - Live 0xe097c000
i2c_core 22912 5 w83627hf,eeprom,i2c_isa,i2c_viapro,i2c_dev, Live 0xe0975000
dm_mod 56916 9 - Live 0xe0861000

/proc/ioports:
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0295-0296 : w83627hf
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial
0cf8-0cff : PCI conf1
4000-407f : 0000:00:11.0
  4000-407f : motherboard
    4000-4003 : PM1a_EVT_BLK
    4004-4005 : PM1a_CNT_BLK
    4008-400b : PM_TMR
    4010-4015 : ACPI CPU throttle
    4020-4023 : GPE0_BLK
5000-500f : 0000:00:11.0
  5000-500f : motherboard
    5000-500f : pnp 00:02
      5000-5007 : vt596_smbus
d000-d01f : 0000:00:10.0
d400-d41f : 0000:00:10.1
d800-d81f : 0000:00:10.2
dc00-dc0f : 0000:00:11.1
  dc00-dc07 : ide0
  dc08-dc0f : ide1
e000-e0ff : 0000:00:11.5
e400-e4ff : 0000:00:12.0

/proc/iomem:
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000cf3ff : Video ROM
000f0000-000fffff : System ROM
00100000-1ffeffff : System RAM
  00100000-00345f26 : Kernel code
  00345f27-003f1b83 : Kernel data
1fff0000-1fff2fff : ACPI Non-volatile Storage
1fff3000-1fffffff : ACPI Tables
d0000000-d7ffffff : 0000:00:00.0
d8000000-dfffffff : PCI Bus #01
  d8000000-dfffffff : 0000:01:00.0
e0000000-e1ffffff : PCI Bus #01
  e0000000-e0ffffff : 0000:01:00.0
  e1000000-e101ffff : 0000:01:00.0
e2000000-e20000ff : 0000:00:10.3
e2001000-e20010ff : 0000:00:12.0
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
ffff0000-ffffffff : reserved

lspci -vvv:
00:00.0 Host bridge: VIA Technologies, Inc. VT8378 [KM400/A] Chipset Host Bridge
       Subsystem: VIA Technologies, Inc. VT8378 [KM400/A] Chipset Host Bridge
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Latency: 8
        Region 0: Memory at d0000000 (32-bit, prefetchable) [size=128M]
        Capabilities: [80] AGP version 3.5
                Status: RQ=32 Iso- ArqSz=0 Cal=2 SBA+ ITACoh- GART64- HTrans-
64bit- FW- AGP3+ Rate=x4,x8
                Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge (prog-if 00 [Normal
decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
        Latency: 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: e0000000-e1ffffff
        Prefetchable memory behind bridge: d8000000-dfffffff
        Secondary status: 66Mhz+ FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 80) (prog-if 00 [UHCI])
        Subsystem: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 19
        Region 4: I/O ports at d000 [size=32]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 80) (prog-if 00 [UHCI])
        Subsystem: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin B routed to IRQ 19
        Region 4: I/O ports at d400 [size=32]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 80) (prog-if 00 [UHCI])
        Subsystem: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin C routed to IRQ 19
        Region 4: I/O ports at d800 [size=32]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) (prog-if 20 [EHCI])
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 7340
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin D routed to IRQ 19
        Region 0: Memory at e2000000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [80] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
        Subsystem: VIA Technologies, Inc. VT8235 ISA Bridge
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping+ SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a
[Master SecP PriP])
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 7340
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32
        Interrupt: pin A routed to IRQ 16
        Region 4: I/O ports at dc00 [size=16]
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237
AC97 Audio Controller (rev 50)
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 7340
        Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin C routed to IRQ 18
        Region 0: I/O ports at e000 [size=256]
        Capabilities: [c0] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74)
        Subsystem: Micro-Star International Co., Ltd.: Unknown device 734c
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 17
        Region 0: I/O ports at e400 [size=256]
        Region 1: Memory at e2001000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200]
(rev a1) (prog-if 00 [VGA])
        Subsystem: XFX Pine Group Inc.: Unknown device 1280
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 32 (1250ns min, 250ns max)
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at e0000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at d8000000 (32-bit, prefetchable) [size=128M]
        [virtual] Expansion ROM at e1000000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [44] AGP version 3.0
                Status: RQ=32 Iso- ArqSz=0 Cal=3 SBA+ ITACoh- GART64- HTrans-
64bit- FW+ AGP3+ Rate=x4,x8
                Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none>

SCSI: none.
Comment 1 Sérgio M Basto 2006-05-08 14:53:02 UTC
hi could you attach dmesg and /proc/interrupts please ?
Comment 2 Len Brown 2006-05-08 15:17:05 UTC
is a cpufreq driver and governor loaded on these two configurations?
If so, please modprobe cpufreq_stats and look in
/sys/devices/system/cpu/cpu*/cpufreq/ and stats/
and report what frequency the processor is running at
in the working and failure configurations.

Is ACPI enabled in the working and failure configurations?
If yes, any difference if you boot with "acpi=off"?
Also, in ACPI mode, does the temperature reported
under /proc/acpi/ match what is reported by lm_sensors?

In general, running i2c based lm_sensors at the same time
as an ACPI enabled kernel is a bad idea.  If ACPI reports
the same temperature as i2c, is there any effect on the
failing system if the i2c drivers are not loaded?
Comment 3 Dave Jenkins 2006-05-08 15:43:49 UTC
Created attachment 8062 [details]
dmesg with 2.6.16.14
Comment 4 Dave Jenkins 2006-05-08 15:45:23 UTC
Re: comment #1
dmesg attached.

/proc/interrupts (after removing extraneous modules):
           CPU0
  0:     781292    IO-APIC-edge  timer
  1:       2064    IO-APIC-edge  i8042
  8:          1    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
 12:        714    IO-APIC-edge  i8042
 14:       3467    IO-APIC-edge  ide0
 15:         94    IO-APIC-edge  ide1
NMI:          0
LOC:     781247
ERR:          0
MIS:          0
Comment 5 Dave Jenkins 2006-05-08 16:35:31 UTC
Re: comment #2

I don't believe my hardware supports CPU frequency scaling. The Gnome CPU
Frequency Scaling Monitor 2.10.1 reports "CPU frequency scaling unsupported" and
states the frequency as 1.92GHz in all configurations and circumstances. I tried
modprobe cpufreq_stats but /sys/devices/system/cpu/cpu0/ is an empty directory.

ACPI is enabled in both the working and failure configurations. Booting with
"acpi=off" did not affect the temperature. Here's the output of /proc/interrupts
from 2.6.16.14 with acpi=off:
           CPU0
  0:     480379    IO-APIC-edge  timer
  1:       1538    IO-APIC-edge  i8042
  2:          0          XT-PIC  cascade
  8:          1    IO-APIC-edge  rtc
 12:        114    IO-APIC-edge  i8042
 14:       3275    IO-APIC-edge  ide0
 15:         94    IO-APIC-edge  ide1
NMI:          0
LOC:     480321
ERR:          0
MIS:          0
 - The 'XT-PIC  cascade' line has appeared where previously was 'IO-APIC-level 
acpi'.

The temperature reported under /proc/acpi/ does indeed match what is reported by
lm_sensors, plus or minus a degree or so. Booting into 2.6.16.14 without loading
 the lm_sensors modules did not affect the temperature (as reported by
/proc/acpi/thermal_zone/THRM/temperature).

Thanks for your comments, sorry to present an entirely negative set of results
but I hope these provide some useful info!
Comment 6 Len Brown 2006-05-08 19:44:22 UTC
> Booting with "acpi=off" did not affect the temperature.

Does "acpi=off" also have no effect on the 2.6.14 success case?
If so, you've proven that this issue is independent of ACPI.
Comment 7 Dave Jenkins 2006-05-09 13:39:41 UTC
> Does "acpi=off" also have no effect on the 2.6.14 success case?
> If so, you've proven that this issue is independent of ACPI.

Aha, I didn't try that but I have now. (I think I was seeing this as a sin of
commission rather than omission!) Results:

Kernel               ACPI  CPU temp (C)
2.6.14-1.1656_FC4    on    38
2.6.14-1.1656_FC4    off   49
2.6.16.14            on    51
2.6.16.14            off   51

s2k disconnect was enabled in BIOS throughout. I allowed the CPU temperature to
stabilise for several minutes in each case and the machine had been on long
enough for the case to reach its usual temp. The order of testing was different
from the order listed, in case anyone's thinking the results are due to the
machine warming up over the course of the testing.

The difference between 2.6.14 without ACPI (49) and 2.6.16 (51), though small,
is real and reproducible. But obviously the main news here is that 2.6.14 is no
longer cool when acpi=off.
Comment 8 Andrew Morton 2006-05-09 13:43:12 UTC
Len, I take it that this means "it isn't ACPI".

If so, which subsystem do you think regressed?

Thanks.

Comment 9 Dave Jenkins 2006-05-09 16:13:27 UTC
> I take it that this means "it isn't ACPI"

I would have thought the reverse. As Len said, if "acpi=off" had no effect on
the 2.6.14 success case - just as it has no effect on the 2.6.16 failure case -
then we would have established that the issue is independent of ACPI. But it
turns out that "acpi=off" does make a difference with 2.6.14 . Suggesting that
ACPI may not be doing its job properly with 2.6.16 (or some more complex
interaction).
Comment 10 Jean Delvare 2006-05-11 02:34:35 UTC
I agree with Dave Jenkins in comment #9, results in comment #7 _do_ indeed
suggest that ACPI has its part of responsability in the problem.

For completeness about lm_sensors modules, you don't need all what you loaded.
w83627hf, i2c-isa, hwmon and hwmon-vid are enough, i2c-viapro, eeprom and
i2c-dev are not needed for a minimal configuration. Not that it matters much, as
comment #5 seem to exclude a regression on the hwmon side.

I also second Len Brown on its lm_sensors vs. ACPI comment #2 (except that the
problem isn't limited to i2c). Where ACPI and lm_sensors both report results
from the same chip, a race condition exists and problems can happen, although
they were only reported on a few specific systems.

Dave, just in case, please check for a BIOS update, and give it a try if available.

Unless the ACPI folks have an idea of what to try next, and given that the
problem can be reproduced at will, I'd suggest a git bisect between 2.6.14 and
2.6.15.
Comment 11 Dave Jenkins 2006-05-11 13:18:29 UTC
Thanks for your comments, Jean. I haven't run into conflicts between ACPI and
i2c so far; I use lm_sensors only to feed Gnome Sensors Applet so I'll look out
for an ACPI-aware equivalent.

I have looked for BIOS updates, there's nothing that seems directly relevant.
The latest update is described as "System will auto power on when plug power
code" so I suppose that could be ACPI-related, but doesn't appear to be anything
to do with s2k bus disconnect. My general attitude to BIOS updates is "if it
ain't broke, don't flash it", especially since a friend turned a working
motherboard into a knobbly tea tray recently. Given that everything works fine
with 2.6.14, I would need some persuading. I don't want to be uncooperative but
to be honest I would rather stick with 2.6.14, which is what I'm running
currently without problems, than flash the BIOS.

Yes, the problem is 100% reproducible and I'm more than happy to do further
testing to help track down the cause. The Athlon XP without bus disconnect is
energy-inefficient, using almost as much power at idle as at full load: see e.g. 
http://www.silentpcreview.com/article265-page2.html
- 2.6A idle, 3.2A CPUBurn on 12V2 line for Athlon 2500+ Barton.
So resolving this problem would help reduce global warming! ;-)
Comment 12 Jean Delvare 2006-05-13 03:27:55 UTC
The ACPI vs. lm_sensors conflicts are not something you will easily notice. We
know that they can happen in theory, but when the race condition triggers, all
you'll get in most cases is simply a register read returning the wrong value (or
a register write writing the value to a different register.) And that's all so
unlikely to happen and so unreproducible by nature that nobody will probably
ever report it. Still, we know the problem is there.

Dave, I respect your fear of upgrading the BIOS. Let's try the git bisect
approach then, and when you have found the patch responsible for the problem,
report it here. I'll let it to the ACPI people to then suggest how the problem
can be (hopefully) fixed.

There are git bisect guides available:
http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/
Comment 13 Dave Jenkins 2006-05-14 04:06:38 UTC
Right, I'm new to git but I'm giving it a try. It doesn't seem to understand
sub-versions, e.g. "git bisect bad v2.6.15.2" gave the error "fatal: Needed a
single revision". (The first FC4 kernel to show the problem was
2.6.15-1.1830_FC4, based on 2.6.15.2). Do I have to work at the granularity of
2.6.14, 2.6.15 or is there another way to specify intermediate versions? I
suppose in any case the binary search will quickly start narrowing it down.
Comment 14 Andrew Morton 2006-05-14 04:17:00 UTC
The 2.6.15.x series is a branch off 2.6.15:


	2.6.15 -> 2.6.16-rc1 -> 2.6.16-rc2 ... -> 2.6.17
	  \
	   -> 2.6.15.1 -> 2.6.15.2

so 2.6.15.x just isn't in Linus's tree at all.

You should use v2.6.15 instead.

Comment 15 Dave Jenkins 2006-05-15 11:01:00 UTC
Progress: git bisect identified the following revision:

--------8<--------
2203d6ed448ff3b777ee6bb614a53e686b483e5b is first bad commit
diff-tree 2203d6ed448ff3b777ee6bb614a53e686b483e5b (from
2656c076e31a3ce3ab2a987a578e7122dc2af51d)
Author: Linus Torvalds <torvalds@g5.osdl.org>
Date:   Fri Nov 18 07:29:51 2005 -0800

    Fix ACPI processor power block initialization

    Properly clear the memory, and set "pr->flags.power" only if a C2 or
    deeper state is valid (to make the code match both the comment and
    previous behaviour).

    This fixes a boot-time lockup reported by Maneesh Soni when using
    "maxcpus=1".

    Acked-by: Maneesh Soni <maneesh@in.ibm.com>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

:040000 040000 52be621b960ae192b36acf778c966d78ff5edbe2
04c183ce141dab8cdff049c1dae379104b637ed4 M      drivers

------------------------ drivers/acpi/processor_idle.c ------------------------
index 573b6a9..70d8a6e 100644
@@ -514,8 +514,6 @@ static int acpi_processor_set_power_poli
 
 static int acpi_processor_get_power_info_fadt(struct acpi_processor *pr)
 {
-	int i;
-
 	ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_fadt");
 
 	if (!pr)
@@ -524,8 +522,7 @@ static int acpi_processor_get_power_info
 	if (!pr->pblk)
 		return_VALUE(-ENODEV);
 
-	for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++)
-		memset(pr->power.states, 0, sizeof(struct acpi_processor_cx));
+	memset(pr->power.states, 0, sizeof(pr->power.states));
 
 	/* if info is obtained from pblk/fadt, type equals state */
 	pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
@@ -555,13 +552,9 @@ static int acpi_processor_get_power_info
 
 static int acpi_processor_get_power_info_default_c1(struct acpi_processor *pr)
 {
-	int i;
-
 	ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_default_c1");
 
-	for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++)
-		memset(&(pr->power.states[i]), 0,
-		       sizeof(struct acpi_processor_cx));
+	memset(pr->power.states, 0, sizeof(pr->power.states));
 
 	/* if info is obtained from pblk/fadt, type equals state */
 	pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
@@ -873,7 +866,8 @@ static int acpi_processor_get_power_info
 	for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
 		if (pr->power.states[i].valid) {
 			pr->power.count = i;
-			pr->flags.power = 1;
+			if (pr->power.states[i].type >= ACPI_STATE_C2)
+				pr->flags.power = 1;
 		}
 	}
 
--------8<--------

The last change in the diff, in acpi_processor_get_power_info, is what makes the
difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)"
restores the power saving in 2.6.16.14 .

I know nothing about kernel internals so the following may not be well informed.
I'm inclined to think that the revision is not blameworthy in itself, but rather
exposes an underlying problem. The comment above this bit of code says, 'if one
state of type C2 or C3 is available, mark this CPU as being "idle manageable"'.
As the commit log says, the revision makes the code match the comment.

What happens in my case is revealed after enabling some debug statements:

--------8<--------
acpi_processor-0926 [07] processor_get_power_in: ----Entry
acpi_processor-0621 [08] processor_get_power_in: ----Entry
acpi_processor-0634 [08] processor_get_power_in: ----Exit- 0000000000000000
acpi_processor-0646 [08] processor_get_power_in: ----Entry
acpi_processor-0660 [08] processor_get_power_in: No _CST, giving up
acpi_processor-0661 [08] processor_get_power_in: ----Exit- FFFFFFFFFFFFFFED
acpi_processor-0582 [08] processor_get_power_in: ----Entry
acpi_processor-0614 [08] processor_get_power_in: lvl2[0x00004014] lvl3[0x00004015]
acpi_processor-0616 [08] processor_get_power_in: ----Exit- 0000000000000000
acpi_processor-0774 [08] processor_power_verify: ----Entry
acpi_processor-0785 [08] processor_power_verify: latency too large [101]
acpi_processor-0786 [08] processor_power_verify: ----Exit-
acpi_processor-0804 [08] processor_power_verify: ----Entry
acpi_processor-0815 [08] processor_power_verify: latency too large [1001]
acpi_processor-0816 [08] processor_power_verify: ----Exit-
acpi_processor-0511 [08] processor_set_power_po: ----Entry
acpi_processor-0577 [08] processor_set_power_po: ----Exit- 0000000000000000
acpi_processor-0963 [07] processor_get_power_in: ----Exit- 0000000000000000
ACPI: CPU0 (power states: C1[C1])
--------8<--------

So in acpi_processor_get_power_info, acpi_processor_get_power_info_cst fails and
acpi_processor_get_power_info_fadt is called. The latter finds states C2 and C3
but assigns latencies which, suspiciously, are each 1 greater than the
permissible maximum. Would these be dummy values that get inserted when genuine
values could not be read?
Comment 16 Andrew Morton 2006-05-15 11:18:13 UTC
Pretty good for a Luddite - thanks.

I've added Linus to cc, which means that this discussion should proceed via
email, please.  Just do a reply-to-all, make sure that
bugme-daemon@bugzilla.kernel.org remains on cc.


bugme-daemon@bugzilla.kernel.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6519
> 
> 
> 
> 
> 
> ------- Additional Comments From iamaluddite@yahoo.com  2006-05-15 11:01 -------
> Progress: git bisect identified the following revision:
> 
> --------8<--------
> 2203d6ed448ff3b777ee6bb614a53e686b483e5b is first bad commit
> diff-tree 2203d6ed448ff3b777ee6bb614a53e686b483e5b (from
> 2656c076e31a3ce3ab2a987a578e7122dc2af51d)
> Author: Linus Torvalds <torvalds@g5.osdl.org>
> Date:   Fri Nov 18 07:29:51 2005 -0800
> 
>     Fix ACPI processor power block initialization
> 
>     Properly clear the memory, and set "pr->flags.power" only if a C2 or
>     deeper state is valid (to make the code match both the comment and
>     previous behaviour).
> 
>     This fixes a boot-time lockup reported by Maneesh Soni when using
>     "maxcpus=1".
> 
>     Acked-by: Maneesh Soni <maneesh@in.ibm.com>
>     Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> 
> :040000 040000 52be621b960ae192b36acf778c966d78ff5edbe2
> 04c183ce141dab8cdff049c1dae379104b637ed4 M      drivers
> 
> ------------------------ drivers/acpi/processor_idle.c ------------------------
> index 573b6a9..70d8a6e 100644
> @@ -514,8 +514,6 @@ static int acpi_processor_set_power_poli
>  
>  static int acpi_processor_get_power_info_fadt(struct acpi_processor *pr)
>  {
> -	int i;
> -
>  	ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_fadt");
>  
>  	if (!pr)
> @@ -524,8 +522,7 @@ static int acpi_processor_get_power_info
>  	if (!pr->pblk)
>  		return_VALUE(-ENODEV);
>  
> -	for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++)
> -		memset(pr->power.states, 0, sizeof(struct acpi_processor_cx));
> +	memset(pr->power.states, 0, sizeof(pr->power.states));
>  
>  	/* if info is obtained from pblk/fadt, type equals state */
>  	pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
> @@ -555,13 +552,9 @@ static int acpi_processor_get_power_info
>  
>  static int acpi_processor_get_power_info_default_c1(struct acpi_processor *pr)
>  {
> -	int i;
> -
>  	ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_default_c1");
>  
> -	for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++)
> -		memset(&(pr->power.states[i]), 0,
> -		       sizeof(struct acpi_processor_cx));
> +	memset(pr->power.states, 0, sizeof(pr->power.states));
>  
>  	/* if info is obtained from pblk/fadt, type equals state */
>  	pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1;
> @@ -873,7 +866,8 @@ static int acpi_processor_get_power_info
>  	for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
>  		if (pr->power.states[i].valid) {
>  			pr->power.count = i;
> -			pr->flags.power = 1;
> +			if (pr->power.states[i].type >= ACPI_STATE_C2)
> +				pr->flags.power = 1;
>  		}
>  	}
>  
> --------8<--------
> 
> The last change in the diff, in acpi_processor_get_power_info, is what makes the
> difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)"
> restores the power saving in 2.6.16.14 .
> 
> I know nothing about kernel internals so the following may not be well informed.
> I'm inclined to think that the revision is not blameworthy in itself, but rather
> exposes an underlying problem. The comment above this bit of code says, 'if one
> state of type C2 or C3 is available, mark this CPU as being "idle manageable"'.
> As the commit log says, the revision makes the code match the comment.
> 
> What happens in my case is revealed after enabling some debug statements:
> 
> --------8<--------
> acpi_processor-0926 [07] processor_get_power_in: ----Entry
> acpi_processor-0621 [08] processor_get_power_in: ----Entry
> acpi_processor-0634 [08] processor_get_power_in: ----Exit- 0000000000000000
> acpi_processor-0646 [08] processor_get_power_in: ----Entry
> acpi_processor-0660 [08] processor_get_power_in: No _CST, giving up
> acpi_processor-0661 [08] processor_get_power_in: ----Exit- FFFFFFFFFFFFFFED
> acpi_processor-0582 [08] processor_get_power_in: ----Entry
> acpi_processor-0614 [08] processor_get_power_in: lvl2[0x00004014] lvl3[0x00004015]
> acpi_processor-0616 [08] processor_get_power_in: ----Exit- 0000000000000000
> acpi_processor-0774 [08] processor_power_verify: ----Entry
> acpi_processor-0785 [08] processor_power_verify: latency too large [101]
> acpi_processor-0786 [08] processor_power_verify: ----Exit-
> acpi_processor-0804 [08] processor_power_verify: ----Entry
> acpi_processor-0815 [08] processor_power_verify: latency too large [1001]
> acpi_processor-0816 [08] processor_power_verify: ----Exit-
> acpi_processor-0511 [08] processor_set_power_po: ----Entry
> acpi_processor-0577 [08] processor_set_power_po: ----Exit- 0000000000000000
> acpi_processor-0963 [07] processor_get_power_in: ----Exit- 0000000000000000
> ACPI: CPU0 (power states: C1[C1])
> --------8<--------
> 
> So in acpi_processor_get_power_info, acpi_processor_get_power_info_cst fails and
> acpi_processor_get_power_info_fadt is called. The latter finds states C2 and C3
> but assigns latencies which, suspiciously, are each 1 greater than the
> permissible maximum. Would these be dummy values that get inserted when genuine
> values could not be read?
> 
> 
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.

Comment 17 Jean Delvare 2006-05-15 11:21:19 UTC
Excellent work, Dave! You finally got it. Now let's see what the ACPI folks
think about it.
Comment 18 Linus Torvalds 2006-05-15 11:48:08 UTC

On Mon, 15 May 2006, Andrew Morton wrote:
> > @@ -873,7 +866,8 @@ static int acpi_processor_get_power_info
> >  	for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) {
> >  		if (pr->power.states[i].valid) {
> >  			pr->power.count = i;
> > -			pr->flags.power = 1;
> > +			if (pr->power.states[i].type >= ACPI_STATE_C2)
> > +				pr->flags.power = 1;
> >  		}
> >  	}
> >  
> > --------8<--------
> > 
> > The last change in the diff, in acpi_processor_get_power_info, is what makes the
> > difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)"
> > restores the power saving in 2.6.16.14 .

That last change is also the thing that fixed the lock-up for Maneesh.

We had this particular discussion at some point earlier, and didn't get to 
any resolution. The ACPI people wanted to undo the thing, which didn't 
make a lot of sense because (a) the comment says otherwise and (b) C1 is 
available even without ACPI and (c) nobody ever explained why it locked up 
without the "type >= ACPI_STATE_C2" test.

Now, what happens is that your ACPI tables show that C2+ is unusable, 
which leaves only C1 usable (that, btw, you might be able to fix with 
different tables - possibly through a BIOS update).

However, at that point it's totally pointless to even _try_ to use ACPI 
CPU power management for this case, since ACPI can't do any better than 
the normal C1 stuff in the bog-standard non-ACPI x86 idle routine.

Do you actally see anything running hotter?

Or is it just that the "CPU%d (power states:..)" message disappeared?

Please realize that we will _always_ use C1 (aka "halt") in the idle state 
quite regardless of ACPI - unless you've done "idle=poll" on the kernel 
command line. So the fact that we don't use ACPI for it shouldn't make us 
actually run any hotter (quite the reverse - we'll go into C1 state with 
_less_ work).

So I don't see the downside. Am I missing something?

                 Linus

Comment 19 Len Brown 2006-05-15 12:26:35 UTC
 
>> So in acpi_processor_get_power_info, 
>> acpi_processor_get_power_info_cst fails and
>> acpi_processor_get_power_info_fadt is called. The latter 
>> finds states C2 and C3
>> but assigns latencies which, suspiciously, are each 1 
>> greater than the
>> permissible maximum. Would these be dummy values that get 
>> inserted when genuine values could not be read?

These values are inserted by the BIOS writer to indicate that
the the platform  does not support these C-states.

Generally, we've found that vendors assume the OS will use _CST
and will ignore the FADT, and thus they don't bother putting
anything interesting in the FADT -- as that path is never tested.

Comment 20 Dave Jenkins 2006-05-15 12:35:08 UTC
--- Linus Torvalds <torvalds@osdl.org> wrote:
> Do you actally see anything running hotter?

Yes, that's how this problem manifests itself and the only means by
which I became aware of it. In the working case, without "if
(pr->power.states[i].type >= ACPI_STATE_C2)", my CPU idle temperature
is around 36 degrees C. In the non-working case, with that "if"
statement, the idle temp is around 51 degrees C. The CPU temp was my
sole guide during the git bisect.

> Or is it just that the "CPU%d (power states:..)" message disappeared?

This message does disappear in the non-working case, co-inciding with
the raised CPU temp.

> Please realize that we will _always_ use C1 (aka "halt") in the idle
> state 
> quite regardless of ACPI - unless you've done "idle=poll" on the
> kernel 
> command line. So the fact that we don't use ACPI for it shouldn't
> make us 
> actually run any hotter (quite the reverse - we'll go into C1 state
> with 
> _less_ work).

Interesting. I'm not using "idle=poll". I should re-iterate that I know
nothing about kernel internals, but I'm wondering, given what you've
said, if its possible that there's another problem here, causing
non-ACPI C1/halt not to work. Thus ACPI is the only mechanism providing
power-saving; and was only doing so, prior to the "if
(pr->power.states[i].type >= ACPI_STATE_C2)" revision, due to a happy
accident whereby a valid C1 state in pr->power.states caused
pr->flags.power to be set. Does that make any sense?

This theory seems consistent with the fact that the working (i.e. cool)
Fedora kernel 2.6.14-1.1656_FC4, becomes non-working (i.e. hot) when
booted with "acpi=off" ( see Comment #7
http://bugzilla.kernel.org/show_bug.cgi?id=6519#c7 )

Thanks for your comments and insight. 

Dave

Send instant messages to your online friends http://uk.messenger.yahoo.com 

Comment 21 Linus Torvalds 2006-05-15 13:08:38 UTC

On Mon, 15 May 2006, Dave Jenkins wrote:
>
> --- Linus Torvalds <torvalds@osdl.org> wrote:
> > Do you actally see anything running hotter?
> 
> Yes, that's how this problem manifests itself and the only means by
> which I became aware of it. In the working case, without "if
> (pr->power.states[i].type >= ACPI_STATE_C2)", my CPU idle temperature
> is around 36 degrees C. In the non-working case, with that "if"
> statement, the idle temp is around 51 degrees C. The CPU temp was my
> sole guide during the git bisect.

Good job. That's certainly conclusive.

> > Please realize that we will _always_ use C1 (aka "halt") in the idle state 
> > quite regardless of ACPI - unless you've done "idle=poll" on the kernel 
> > command line. So the fact that we don't use ACPI for it shouldn't make us 
> > actually run any hotter (quite the reverse - we'll go into C1 state with 
> > _less_ work).
> 
> Interesting. I'm not using "idle=poll". I should re-iterate that I know
> nothing about kernel internals, but I'm wondering, given what you've
> said, if its possible that there's another problem here, causing
> non-ACPI C1/halt not to work.

I'd also wonder if maybe there is something that causes ACPI to go into C2 
even though the BIOS latency tables have apparently said that we 
shouldn't.

Ie it may be that we have a totally unrelated bug that made ACPI actually 
go into a deeper powersaving mode than it should. That could have 
explained the lockups that Maneesh saw with "maxcpus=1" (because C2 and 
above are disabled for SMP _anyway_, since they aren't valid there due to 
touching the _common_ northbridge rather than being per-core).

> Thus ACPI is the only mechanism providing
> power-saving; and was only doing so, prior to the "if
> (pr->power.states[i].type >= ACPI_STATE_C2)" revision, due to a happy
> accident whereby a valid C1 state in pr->power.states caused
> pr->flags.power to be set. Does that make any sense?

Actually, suddenly it does. When I look more closely at your dmesg report, 
I find this:

	Checking 'hlt' instruction... disabled

ie your normal idle loop has literally _disabled_ the use of hlt.

Do you have "no-hlt" on the kernel command line? That _should_ be the only 
thing that disables hlt (and thus power-savings in idle).

Yup, you do (from that same dmesg):

	Kernel command line: ro root=/dev/VolGroup00/LogVol00 no-hlt 1

And yes, ACPI will ignore that "no-hlt" flag.

Now, the question is, why do you have no-hlt there? Was it some strange 
distro that set it for you? And why?

		Linus

Comment 22 Linus Torvalds 2006-05-15 13:11:08 UTC

On Mon, 15 May 2006, Linus Torvalds wrote:
> 
> Yup, you do (from that same dmesg):
> 
> 	Kernel command line: ro root=/dev/VolGroup00/LogVol00 no-hlt 1
> 
> Now, the question is, why do you have no-hlt there? Was it some strange 
> distro that set it for you? And why?

Btw, I think we can close this report, aside from the question about why 
that "no-hlt" was there in the first place. I bet the power usage will go 
back to where it was with ACPI without that thing.

The whole "no-hlt" thing exists purely for some _really_ old i486 class 
machines that would lock up with hlt for some unexplained reason.

		Linus

Comment 23 Andrew Morton 2006-05-15 13:22:42 UTC
Linus Torvalds <torvalds@osdl.org> wrote:
>
> Btw, I think we can close this report, aside from the question about why 
> that "no-hlt" was there in the first place. I bet the power usage will go 
> back to where it was with ACPI without that thing.

Thanks, Linus.

But..  Was Dave using no-hlt on earlier kernels?  If so, why didn't they
get hot as well?

Comment 24 Linus Torvalds 2006-05-15 13:39:25 UTC

On Mon, 15 May 2006, Andrew Morton wrote:

> Linus Torvalds <torvalds@osdl.org> wrote:
> >
> > Btw, I think we can close this report, aside from the question about why 
> > that "no-hlt" was there in the first place. I bet the power usage will go 
> > back to where it was with ACPI without that thing.
> 
> Thanks, Linus.
> 
> But..  Was Dave using no-hlt on earlier kernels?  If so, why didn't they
> get hot as well?

Exactly because ACPI _ignored_ that option, so it would use the broken 
ACPI C1 sleepstate.

So ACPI just does:

                /*
                 * Invoke C1.
                 * Use the appropriate idle routine, the one that would
                 * be used without acpi C-states.
                 */ 
                if (pm_idle_save)
                        pm_idle_save();
                else
                        acpi_safe_halt();

where acpi_safe_halt() just does

	static void acpi_safe_halt(void)
	{
	        clear_thread_flag(TIF_POLLING_NRFLAG);
	        smp_mb__after_clear_bit();
	        if (!need_resched())
	                safe_halt();
	        set_thread_flag(TIF_POLLING_NRFLAG);
	}        

and here probably pm_idle_save was NULL.

In contrast, the main CPU idle wil do

			...
                        idle = pm_idle;

                        if (!idle)
                                idle = default_idle;

                        if (cpu_is_offline(cpu))
                                play_dead();

                        __get_cpu_var(irq_stat).idle_timestamp = jiffies;
                        idle();
			...

ie if pm_idle is NULL (which is tha ACPI "saved_pm_idle") it will use 
default_idle, which in turn does

        if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
                clear_thread_flag(TIF_POLLING_NRFLAG);
                smp_mb__after_clear_bit();
                while (!need_resched()) {
                        local_irq_disable();
                        if (!need_resched())
                                safe_halt();
                        else
                                local_irq_enable();
                }
                set_thread_flag(TIF_POLLING_NRFLAG);
        } else {
                while (!need_resched())
                        cpu_relax();
        }

ie it _honors_ that hlt_works_ok flag, unlike the ACPI one.

Now, I'm not saying that ACPI should honor the hlt_works_ok flag, because 
ACPI wouldn't _exist_ on the kind of old machines hat needed it, but I 
think it explains why ACPI ended up running in a cooler C1 than the normal 
idle routine, which would just end up doing that endless loop of 
"cpu_relax()" (which is not a halt, but a special no-op).

		Linus

Comment 25 Dave Jenkins 2006-05-15 13:58:36 UTC
--- Linus Torvalds <torvalds@osdl.org> wrote:
> Now, the question is, why do you have no-hlt there? Was it some
> strange 
> distro that set it for you? And why?

I must admit I'd forgotten about the no-hlt, sorry. I added it to my
grub.conf because I was getting occasional hangs on boot at "Checking
'hlt' instruction", perhaps 5-10% of the time.

I first added it when running Fedora kernel 2.6.14-1.1656_FC4 . I added
the no-hlt not knowing whether it would affect power saving or simply
disable a potentially problematic test. With that kernel, adding no-hlt
did not noticeably affect CPU temperature. We now know, or suspect,
this is because that kernel had a bug/feature whereby ACPI power saving
is controversially enabled for hardware that reports only C1 as a valid
power state. At the time, I interpreted the fact that the temperature
did not increase as showing that the "no-hlt" had simply disabled the
test, not the power-saving. I hope this is a forgiveable mistake.

When I upgraded to subsequent Fedora kernels, the existing kernel
parameters, including no-hlt, were evidently automatically copied for
the new entries in grub.conf . And these new kernels (2.6.15-1.1830_FC4
onward) did not have the same ACPI bug/feature, so suddenly my CPU
temperature shot up because I now did not have power saving from hlt or
ACPI.

I have now confirmed that 2.6.16-1.2108_FC4 without no-hlt runs cool.
Yes, I should have spotted and tested this before submitting the bug
and I apologise for that.

This leaves, apart from some embarrassment on my part, the question why
I was getting hangs at "Checking 'hlt' instruction" with
2.6.14-1.1656_FC4 . Wouldn't it be neat if this turned out to have the
same cause as the lock-up reported by Maneesh that motivated the "if
(pr->power.states[i].type >= ACPI_STATE_C2)" revision!

Dave

Send instant messages to your online friends http://uk.messenger.yahoo.com 

Comment 26 Jean Delvare 2006-05-15 22:54:24 UTC
Closing bug as INVALID, removing no-hlt from the boot command line restored the
power saving when idle.

Note You need to log in before you can comment on or make changes to this bug.