Most recent kernel where this bug did not occur: 2.6.14-1.1656_FC4 (Fedora) - This was the last 2.6.14 kernel I used. Problem appeared with first 2.6.15 Fedora kernel and persists with all subsequent Fedora kernels as well as kernel.org 2.6.16.14 . Distribution: Fedora Hardware Environment: Athlon XP 2600+ Barton CPU, MSI KM4M-L mobo Software Environment: ver_linux: Linux bagpuss.localdomain 2.6.16.14_DJ_minimal #1 Mon May 8 15:38:54 BST 2006 i686 athlon i386 GNU/Linux Gnu C 4.0.2 Gnu make 3.80 binutils 2.15.94.0.2.2 util-linux 2.12p mount 2.12p module-init-tools 3.2-pre9 e2fsprogs 1.38 reiserfsprogs line reiser4progs line pcmcia-cs 3.2.8 quota-tools 3.12. PPP 2.4.2 isdn4k-utils 3.7 nfs-utils 1.0.7 Linux C Library 2.3.6 Dynamic linker (ldd) 2.3.6 Procps 3.2.5 Net-tools 1.60 Kbd 1.12 Sh-utils 5.2.1 udev 071 Modules Loaded w83627hf hwmon_vid hwmon eeprom i2c_isa i2c_viapro i2c_dev i2c_core dm_mod Problem Description: My Athlon XP runs approx 15 degrees C hotter at idle with 2.6.15 and 2.6.16 kernels than with 2.6.14 and earlier. It seems that 2.6.15+ does not allow the CPU to enter s2k disconnect power saving mode. MSI KM4M-L motherboard BIOS provides CPU Disconnect Control: "The item is to reduce the power consumption of the AMD K7 system. When set to Enabled, the processor is allowed to disconnect the s2k interface when the AMD K7 system is in some power saving states." CPU idle temperatures with BIOS CPU Disconnect enabled versus disabled, for 2.6.14 versus 2.6.15 (I'm using a low-speed CPU fan, so the temp differences are larger than they might be): Kernel CPU Disconnect CPU idle temp, degrees C 2.6.14 enabled 35.5 2.6.14 disabled 46 2.6.15 enabled 49.5 2.6.15 disabled 49.5 I have tried to track down the culprit by booting into runlevel 1, then removing all unused modules, just leaving those required for lm_sensors so that I can check the CPU temp. This is the smallest set of modules I got down to: w83627hf hwmon_vid hwmon eeprom i2c_isa i2c_viapro i2c_dev i2c_core dm_mod (w83627hf is the module for the Winbond chip queried by lm_sensors). With just these modules loaded, in runlevel 1 with the system idling, the s2k disconnect still does not occur with 2.6.15 . Whereas in 2.6.14, it occurs quite happily in every runlevel I've tried, even with 50+ kernel modules loaded - the only exception being after attaching a USB storage device, as described in RedHat Bugzilla: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172592 How reproducible: Always Steps to reproduce: 1. Enable CPU Disconnect in BIOS. 2. Note CPU idle temperature. Actual Results: With 2.6.15, idle temperature is the same with or without CPU Disconnect disabled in BIOS. Expected Results: Idle temperature should be significantly lower with CPU Disconnect enabled in BIOS, i.e. behaviour should be as with 2.6.14 . Detailed system information --------------------------- /proc/version: Linux version 2.6.16.14_DJ_minimal (davej@bagpuss.localdomain) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #1 Mon May 8 15:38:54 BST 2006 /proc/cpuinfo: processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 10 model name : AMD Athlon(tm) XP 2600+ stepping : 0 cpu MHz : 1915.477 cache size : 512 KB fdiv_bug : no hlt_bug : yes f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts bogomips : 3833.88 /proc/modules: w83627hf 26384 0 - Live 0xe09e1000 hwmon_vid 2944 1 w83627hf, Live 0xe092f000 hwmon 3396 1 w83627hf, Live 0xe0823000 eeprom 7824 0 - Live 0xe09de000 i2c_isa 5248 1 w83627hf, Live 0xe09f2000 i2c_viapro 9236 0 - Live 0xe09da000 i2c_dev 9824 0 - Live 0xe097c000 i2c_core 22912 5 w83627hf,eeprom,i2c_isa,i2c_viapro,i2c_dev, Live 0xe0975000 dm_mod 56916 9 - Live 0xe0861000 /proc/ioports: 0000-001f : dma1 0020-0021 : pic1 0040-0043 : timer0 0050-0053 : timer1 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0170-0177 : ide1 01f0-01f7 : ide0 0295-0296 : w83627hf 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0 03f8-03ff : serial 0cf8-0cff : PCI conf1 4000-407f : 0000:00:11.0 4000-407f : motherboard 4000-4003 : PM1a_EVT_BLK 4004-4005 : PM1a_CNT_BLK 4008-400b : PM_TMR 4010-4015 : ACPI CPU throttle 4020-4023 : GPE0_BLK 5000-500f : 0000:00:11.0 5000-500f : motherboard 5000-500f : pnp 00:02 5000-5007 : vt596_smbus d000-d01f : 0000:00:10.0 d400-d41f : 0000:00:10.1 d800-d81f : 0000:00:10.2 dc00-dc0f : 0000:00:11.1 dc00-dc07 : ide0 dc08-dc0f : ide1 e000-e0ff : 0000:00:11.5 e400-e4ff : 0000:00:12.0 /proc/iomem: 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cf3ff : Video ROM 000f0000-000fffff : System ROM 00100000-1ffeffff : System RAM 00100000-00345f26 : Kernel code 00345f27-003f1b83 : Kernel data 1fff0000-1fff2fff : ACPI Non-volatile Storage 1fff3000-1fffffff : ACPI Tables d0000000-d7ffffff : 0000:00:00.0 d8000000-dfffffff : PCI Bus #01 d8000000-dfffffff : 0000:01:00.0 e0000000-e1ffffff : PCI Bus #01 e0000000-e0ffffff : 0000:01:00.0 e1000000-e101ffff : 0000:01:00.0 e2000000-e20000ff : 0000:00:10.3 e2001000-e20010ff : 0000:00:12.0 fec00000-fec00fff : reserved fee00000-fee00fff : reserved ffff0000-ffffffff : reserved lspci -vvv: 00:00.0 Host bridge: VIA Technologies, Inc. VT8378 [KM400/A] Chipset Host Bridge Subsystem: VIA Technologies, Inc. VT8378 [KM400/A] Chipset Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 8 Region 0: Memory at d0000000 (32-bit, prefetchable) [size=128M] Capabilities: [80] AGP version 3.5 Status: RQ=32 Iso- ArqSz=0 Cal=2 SBA+ ITACoh- GART64- HTrans- 64bit- FW- AGP3+ Rate=x4,x8 Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none> Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000f000-00000fff Memory behind bridge: e0000000-e1ffffff Prefetchable memory behind bridge: d8000000-dfffffff Secondary status: 66Mhz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B- Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A routed to IRQ 19 Region 4: I/O ports at d000 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin B routed to IRQ 19 Region 4: I/O ports at d400 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 80) (prog-if 00 [UHCI]) Subsystem: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin C routed to IRQ 19 Region 4: I/O ports at d800 [size=32] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) (prog-if 20 [EHCI]) Subsystem: Micro-Star International Co., Ltd.: Unknown device 7340 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin D routed to IRQ 19 Region 0: Memory at e2000000 (32-bit, non-prefetchable) [size=256] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge Subsystem: VIA Technologies, Inc. VT8235 ISA Bridge Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP]) Subsystem: Micro-Star International Co., Ltd.: Unknown device 7340 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 Interrupt: pin A routed to IRQ 16 Region 4: I/O ports at dc00 [size=16] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 AC97 Audio Controller (rev 50) Subsystem: Micro-Star International Co., Ltd.: Unknown device 7340 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin C routed to IRQ 18 Region 0: I/O ports at e000 [size=256] Capabilities: [c0] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 74) Subsystem: Micro-Star International Co., Ltd.: Unknown device 734c Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Interrupt: pin A routed to IRQ 17 Region 0: I/O ports at e400 [size=256] Region 1: Memory at e2001000 (32-bit, non-prefetchable) [size=256] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] (rev a1) (prog-if 00 [VGA]) Subsystem: XFX Pine Group Inc.: Unknown device 1280 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (1250ns min, 250ns max) Interrupt: pin A routed to IRQ 11 Region 0: Memory at e0000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d8000000 (32-bit, prefetchable) [size=128M] [virtual] Expansion ROM at e1000000 [disabled] [size=128K] Capabilities: [60] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [44] AGP version 3.0 Status: RQ=32 Iso- ArqSz=0 Cal=3 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3+ Rate=x4,x8 Command: RQ=1 ArqSz=0 Cal=0 SBA- AGP- GART64- 64bit- FW- Rate=<none> SCSI: none.
hi could you attach dmesg and /proc/interrupts please ?
is a cpufreq driver and governor loaded on these two configurations? If so, please modprobe cpufreq_stats and look in /sys/devices/system/cpu/cpu*/cpufreq/ and stats/ and report what frequency the processor is running at in the working and failure configurations. Is ACPI enabled in the working and failure configurations? If yes, any difference if you boot with "acpi=off"? Also, in ACPI mode, does the temperature reported under /proc/acpi/ match what is reported by lm_sensors? In general, running i2c based lm_sensors at the same time as an ACPI enabled kernel is a bad idea. If ACPI reports the same temperature as i2c, is there any effect on the failing system if the i2c drivers are not loaded?
Created attachment 8062 [details] dmesg with 2.6.16.14
Re: comment #1 dmesg attached. /proc/interrupts (after removing extraneous modules): CPU0 0: 781292 IO-APIC-edge timer 1: 2064 IO-APIC-edge i8042 8: 1 IO-APIC-edge rtc 9: 0 IO-APIC-level acpi 12: 714 IO-APIC-edge i8042 14: 3467 IO-APIC-edge ide0 15: 94 IO-APIC-edge ide1 NMI: 0 LOC: 781247 ERR: 0 MIS: 0
Re: comment #2 I don't believe my hardware supports CPU frequency scaling. The Gnome CPU Frequency Scaling Monitor 2.10.1 reports "CPU frequency scaling unsupported" and states the frequency as 1.92GHz in all configurations and circumstances. I tried modprobe cpufreq_stats but /sys/devices/system/cpu/cpu0/ is an empty directory. ACPI is enabled in both the working and failure configurations. Booting with "acpi=off" did not affect the temperature. Here's the output of /proc/interrupts from 2.6.16.14 with acpi=off: CPU0 0: 480379 IO-APIC-edge timer 1: 1538 IO-APIC-edge i8042 2: 0 XT-PIC cascade 8: 1 IO-APIC-edge rtc 12: 114 IO-APIC-edge i8042 14: 3275 IO-APIC-edge ide0 15: 94 IO-APIC-edge ide1 NMI: 0 LOC: 480321 ERR: 0 MIS: 0 - The 'XT-PIC cascade' line has appeared where previously was 'IO-APIC-level acpi'. The temperature reported under /proc/acpi/ does indeed match what is reported by lm_sensors, plus or minus a degree or so. Booting into 2.6.16.14 without loading the lm_sensors modules did not affect the temperature (as reported by /proc/acpi/thermal_zone/THRM/temperature). Thanks for your comments, sorry to present an entirely negative set of results but I hope these provide some useful info!
> Booting with "acpi=off" did not affect the temperature. Does "acpi=off" also have no effect on the 2.6.14 success case? If so, you've proven that this issue is independent of ACPI.
> Does "acpi=off" also have no effect on the 2.6.14 success case? > If so, you've proven that this issue is independent of ACPI. Aha, I didn't try that but I have now. (I think I was seeing this as a sin of commission rather than omission!) Results: Kernel ACPI CPU temp (C) 2.6.14-1.1656_FC4 on 38 2.6.14-1.1656_FC4 off 49 2.6.16.14 on 51 2.6.16.14 off 51 s2k disconnect was enabled in BIOS throughout. I allowed the CPU temperature to stabilise for several minutes in each case and the machine had been on long enough for the case to reach its usual temp. The order of testing was different from the order listed, in case anyone's thinking the results are due to the machine warming up over the course of the testing. The difference between 2.6.14 without ACPI (49) and 2.6.16 (51), though small, is real and reproducible. But obviously the main news here is that 2.6.14 is no longer cool when acpi=off.
Len, I take it that this means "it isn't ACPI". If so, which subsystem do you think regressed? Thanks.
> I take it that this means "it isn't ACPI" I would have thought the reverse. As Len said, if "acpi=off" had no effect on the 2.6.14 success case - just as it has no effect on the 2.6.16 failure case - then we would have established that the issue is independent of ACPI. But it turns out that "acpi=off" does make a difference with 2.6.14 . Suggesting that ACPI may not be doing its job properly with 2.6.16 (or some more complex interaction).
I agree with Dave Jenkins in comment #9, results in comment #7 _do_ indeed suggest that ACPI has its part of responsability in the problem. For completeness about lm_sensors modules, you don't need all what you loaded. w83627hf, i2c-isa, hwmon and hwmon-vid are enough, i2c-viapro, eeprom and i2c-dev are not needed for a minimal configuration. Not that it matters much, as comment #5 seem to exclude a regression on the hwmon side. I also second Len Brown on its lm_sensors vs. ACPI comment #2 (except that the problem isn't limited to i2c). Where ACPI and lm_sensors both report results from the same chip, a race condition exists and problems can happen, although they were only reported on a few specific systems. Dave, just in case, please check for a BIOS update, and give it a try if available. Unless the ACPI folks have an idea of what to try next, and given that the problem can be reproduced at will, I'd suggest a git bisect between 2.6.14 and 2.6.15.
Thanks for your comments, Jean. I haven't run into conflicts between ACPI and i2c so far; I use lm_sensors only to feed Gnome Sensors Applet so I'll look out for an ACPI-aware equivalent. I have looked for BIOS updates, there's nothing that seems directly relevant. The latest update is described as "System will auto power on when plug power code" so I suppose that could be ACPI-related, but doesn't appear to be anything to do with s2k bus disconnect. My general attitude to BIOS updates is "if it ain't broke, don't flash it", especially since a friend turned a working motherboard into a knobbly tea tray recently. Given that everything works fine with 2.6.14, I would need some persuading. I don't want to be uncooperative but to be honest I would rather stick with 2.6.14, which is what I'm running currently without problems, than flash the BIOS. Yes, the problem is 100% reproducible and I'm more than happy to do further testing to help track down the cause. The Athlon XP without bus disconnect is energy-inefficient, using almost as much power at idle as at full load: see e.g. http://www.silentpcreview.com/article265-page2.html - 2.6A idle, 3.2A CPUBurn on 12V2 line for Athlon 2500+ Barton. So resolving this problem would help reduce global warming! ;-)
The ACPI vs. lm_sensors conflicts are not something you will easily notice. We know that they can happen in theory, but when the race condition triggers, all you'll get in most cases is simply a register read returning the wrong value (or a register write writing the value to a different register.) And that's all so unlikely to happen and so unreproducible by nature that nobody will probably ever report it. Still, we know the problem is there. Dave, I respect your fear of upgrading the BIOS. Let's try the git bisect approach then, and when you have found the patch responsible for the problem, report it here. I'll let it to the ACPI people to then suggest how the problem can be (hopefully) fixed. There are git bisect guides available: http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/
Right, I'm new to git but I'm giving it a try. It doesn't seem to understand sub-versions, e.g. "git bisect bad v2.6.15.2" gave the error "fatal: Needed a single revision". (The first FC4 kernel to show the problem was 2.6.15-1.1830_FC4, based on 2.6.15.2). Do I have to work at the granularity of 2.6.14, 2.6.15 or is there another way to specify intermediate versions? I suppose in any case the binary search will quickly start narrowing it down.
The 2.6.15.x series is a branch off 2.6.15: 2.6.15 -> 2.6.16-rc1 -> 2.6.16-rc2 ... -> 2.6.17 \ -> 2.6.15.1 -> 2.6.15.2 so 2.6.15.x just isn't in Linus's tree at all. You should use v2.6.15 instead.
Progress: git bisect identified the following revision: --------8<-------- 2203d6ed448ff3b777ee6bb614a53e686b483e5b is first bad commit diff-tree 2203d6ed448ff3b777ee6bb614a53e686b483e5b (from 2656c076e31a3ce3ab2a987a578e7122dc2af51d) Author: Linus Torvalds <torvalds@g5.osdl.org> Date: Fri Nov 18 07:29:51 2005 -0800 Fix ACPI processor power block initialization Properly clear the memory, and set "pr->flags.power" only if a C2 or deeper state is valid (to make the code match both the comment and previous behaviour). This fixes a boot-time lockup reported by Maneesh Soni when using "maxcpus=1". Acked-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org> :040000 040000 52be621b960ae192b36acf778c966d78ff5edbe2 04c183ce141dab8cdff049c1dae379104b637ed4 M drivers ------------------------ drivers/acpi/processor_idle.c ------------------------ index 573b6a9..70d8a6e 100644 @@ -514,8 +514,6 @@ static int acpi_processor_set_power_poli static int acpi_processor_get_power_info_fadt(struct acpi_processor *pr) { - int i; - ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_fadt"); if (!pr) @@ -524,8 +522,7 @@ static int acpi_processor_get_power_info if (!pr->pblk) return_VALUE(-ENODEV); - for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++) - memset(pr->power.states, 0, sizeof(struct acpi_processor_cx)); + memset(pr->power.states, 0, sizeof(pr->power.states)); /* if info is obtained from pblk/fadt, type equals state */ pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; @@ -555,13 +552,9 @@ static int acpi_processor_get_power_info static int acpi_processor_get_power_info_default_c1(struct acpi_processor *pr) { - int i; - ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_default_c1"); - for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++) - memset(&(pr->power.states[i]), 0, - sizeof(struct acpi_processor_cx)); + memset(pr->power.states, 0, sizeof(pr->power.states)); /* if info is obtained from pblk/fadt, type equals state */ pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; @@ -873,7 +866,8 @@ static int acpi_processor_get_power_info for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) { if (pr->power.states[i].valid) { pr->power.count = i; - pr->flags.power = 1; + if (pr->power.states[i].type >= ACPI_STATE_C2) + pr->flags.power = 1; } } --------8<-------- The last change in the diff, in acpi_processor_get_power_info, is what makes the difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)" restores the power saving in 2.6.16.14 . I know nothing about kernel internals so the following may not be well informed. I'm inclined to think that the revision is not blameworthy in itself, but rather exposes an underlying problem. The comment above this bit of code says, 'if one state of type C2 or C3 is available, mark this CPU as being "idle manageable"'. As the commit log says, the revision makes the code match the comment. What happens in my case is revealed after enabling some debug statements: --------8<-------- acpi_processor-0926 [07] processor_get_power_in: ----Entry acpi_processor-0621 [08] processor_get_power_in: ----Entry acpi_processor-0634 [08] processor_get_power_in: ----Exit- 0000000000000000 acpi_processor-0646 [08] processor_get_power_in: ----Entry acpi_processor-0660 [08] processor_get_power_in: No _CST, giving up acpi_processor-0661 [08] processor_get_power_in: ----Exit- FFFFFFFFFFFFFFED acpi_processor-0582 [08] processor_get_power_in: ----Entry acpi_processor-0614 [08] processor_get_power_in: lvl2[0x00004014] lvl3[0x00004015] acpi_processor-0616 [08] processor_get_power_in: ----Exit- 0000000000000000 acpi_processor-0774 [08] processor_power_verify: ----Entry acpi_processor-0785 [08] processor_power_verify: latency too large [101] acpi_processor-0786 [08] processor_power_verify: ----Exit- acpi_processor-0804 [08] processor_power_verify: ----Entry acpi_processor-0815 [08] processor_power_verify: latency too large [1001] acpi_processor-0816 [08] processor_power_verify: ----Exit- acpi_processor-0511 [08] processor_set_power_po: ----Entry acpi_processor-0577 [08] processor_set_power_po: ----Exit- 0000000000000000 acpi_processor-0963 [07] processor_get_power_in: ----Exit- 0000000000000000 ACPI: CPU0 (power states: C1[C1]) --------8<-------- So in acpi_processor_get_power_info, acpi_processor_get_power_info_cst fails and acpi_processor_get_power_info_fadt is called. The latter finds states C2 and C3 but assigns latencies which, suspiciously, are each 1 greater than the permissible maximum. Would these be dummy values that get inserted when genuine values could not be read?
Pretty good for a Luddite - thanks. I've added Linus to cc, which means that this discussion should proceed via email, please. Just do a reply-to-all, make sure that bugme-daemon@bugzilla.kernel.org remains on cc. bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=6519 > > > > > > ------- Additional Comments From iamaluddite@yahoo.com 2006-05-15 11:01 ------- > Progress: git bisect identified the following revision: > > --------8<-------- > 2203d6ed448ff3b777ee6bb614a53e686b483e5b is first bad commit > diff-tree 2203d6ed448ff3b777ee6bb614a53e686b483e5b (from > 2656c076e31a3ce3ab2a987a578e7122dc2af51d) > Author: Linus Torvalds <torvalds@g5.osdl.org> > Date: Fri Nov 18 07:29:51 2005 -0800 > > Fix ACPI processor power block initialization > > Properly clear the memory, and set "pr->flags.power" only if a C2 or > deeper state is valid (to make the code match both the comment and > previous behaviour). > > This fixes a boot-time lockup reported by Maneesh Soni when using > "maxcpus=1". > > Acked-by: Maneesh Soni <maneesh@in.ibm.com> > Signed-off-by: Linus Torvalds <torvalds@osdl.org> > > :040000 040000 52be621b960ae192b36acf778c966d78ff5edbe2 > 04c183ce141dab8cdff049c1dae379104b637ed4 M drivers > > ------------------------ drivers/acpi/processor_idle.c ------------------------ > index 573b6a9..70d8a6e 100644 > @@ -514,8 +514,6 @@ static int acpi_processor_set_power_poli > > static int acpi_processor_get_power_info_fadt(struct acpi_processor *pr) > { > - int i; > - > ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_fadt"); > > if (!pr) > @@ -524,8 +522,7 @@ static int acpi_processor_get_power_info > if (!pr->pblk) > return_VALUE(-ENODEV); > > - for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++) > - memset(pr->power.states, 0, sizeof(struct acpi_processor_cx)); > + memset(pr->power.states, 0, sizeof(pr->power.states)); > > /* if info is obtained from pblk/fadt, type equals state */ > pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; > @@ -555,13 +552,9 @@ static int acpi_processor_get_power_info > > static int acpi_processor_get_power_info_default_c1(struct acpi_processor *pr) > { > - int i; > - > ACPI_FUNCTION_TRACE("acpi_processor_get_power_info_default_c1"); > > - for (i = 0; i < ACPI_PROCESSOR_MAX_POWER; i++) > - memset(&(pr->power.states[i]), 0, > - sizeof(struct acpi_processor_cx)); > + memset(pr->power.states, 0, sizeof(pr->power.states)); > > /* if info is obtained from pblk/fadt, type equals state */ > pr->power.states[ACPI_STATE_C1].type = ACPI_STATE_C1; > @@ -873,7 +866,8 @@ static int acpi_processor_get_power_info > for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) { > if (pr->power.states[i].valid) { > pr->power.count = i; > - pr->flags.power = 1; > + if (pr->power.states[i].type >= ACPI_STATE_C2) > + pr->flags.power = 1; > } > } > > --------8<-------- > > The last change in the diff, in acpi_processor_get_power_info, is what makes the > difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)" > restores the power saving in 2.6.16.14 . > > I know nothing about kernel internals so the following may not be well informed. > I'm inclined to think that the revision is not blameworthy in itself, but rather > exposes an underlying problem. The comment above this bit of code says, 'if one > state of type C2 or C3 is available, mark this CPU as being "idle manageable"'. > As the commit log says, the revision makes the code match the comment. > > What happens in my case is revealed after enabling some debug statements: > > --------8<-------- > acpi_processor-0926 [07] processor_get_power_in: ----Entry > acpi_processor-0621 [08] processor_get_power_in: ----Entry > acpi_processor-0634 [08] processor_get_power_in: ----Exit- 0000000000000000 > acpi_processor-0646 [08] processor_get_power_in: ----Entry > acpi_processor-0660 [08] processor_get_power_in: No _CST, giving up > acpi_processor-0661 [08] processor_get_power_in: ----Exit- FFFFFFFFFFFFFFED > acpi_processor-0582 [08] processor_get_power_in: ----Entry > acpi_processor-0614 [08] processor_get_power_in: lvl2[0x00004014] lvl3[0x00004015] > acpi_processor-0616 [08] processor_get_power_in: ----Exit- 0000000000000000 > acpi_processor-0774 [08] processor_power_verify: ----Entry > acpi_processor-0785 [08] processor_power_verify: latency too large [101] > acpi_processor-0786 [08] processor_power_verify: ----Exit- > acpi_processor-0804 [08] processor_power_verify: ----Entry > acpi_processor-0815 [08] processor_power_verify: latency too large [1001] > acpi_processor-0816 [08] processor_power_verify: ----Exit- > acpi_processor-0511 [08] processor_set_power_po: ----Entry > acpi_processor-0577 [08] processor_set_power_po: ----Exit- 0000000000000000 > acpi_processor-0963 [07] processor_get_power_in: ----Exit- 0000000000000000 > ACPI: CPU0 (power states: C1[C1]) > --------8<-------- > > So in acpi_processor_get_power_info, acpi_processor_get_power_info_cst fails and > acpi_processor_get_power_info_fadt is called. The latter finds states C2 and C3 > but assigns latencies which, suspiciously, are each 1 greater than the > permissible maximum. Would these be dummy values that get inserted when genuine > values could not be read? > > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is.
Excellent work, Dave! You finally got it. Now let's see what the ACPI folks think about it.
On Mon, 15 May 2006, Andrew Morton wrote: > > @@ -873,7 +866,8 @@ static int acpi_processor_get_power_info > > for (i = 1; i < ACPI_PROCESSOR_MAX_POWER; i++) { > > if (pr->power.states[i].valid) { > > pr->power.count = i; > > - pr->flags.power = 1; > > + if (pr->power.states[i].type >= ACPI_STATE_C2) > > + pr->flags.power = 1; > > } > > } > > > > --------8<-------- > > > > The last change in the diff, in acpi_processor_get_power_info, is what makes the > > difference: commenting out the "if (pr->power.states[i].type >= ACPI_STATE_C2)" > > restores the power saving in 2.6.16.14 . That last change is also the thing that fixed the lock-up for Maneesh. We had this particular discussion at some point earlier, and didn't get to any resolution. The ACPI people wanted to undo the thing, which didn't make a lot of sense because (a) the comment says otherwise and (b) C1 is available even without ACPI and (c) nobody ever explained why it locked up without the "type >= ACPI_STATE_C2" test. Now, what happens is that your ACPI tables show that C2+ is unusable, which leaves only C1 usable (that, btw, you might be able to fix with different tables - possibly through a BIOS update). However, at that point it's totally pointless to even _try_ to use ACPI CPU power management for this case, since ACPI can't do any better than the normal C1 stuff in the bog-standard non-ACPI x86 idle routine. Do you actally see anything running hotter? Or is it just that the "CPU%d (power states:..)" message disappeared? Please realize that we will _always_ use C1 (aka "halt") in the idle state quite regardless of ACPI - unless you've done "idle=poll" on the kernel command line. So the fact that we don't use ACPI for it shouldn't make us actually run any hotter (quite the reverse - we'll go into C1 state with _less_ work). So I don't see the downside. Am I missing something? Linus
>> So in acpi_processor_get_power_info, >> acpi_processor_get_power_info_cst fails and >> acpi_processor_get_power_info_fadt is called. The latter >> finds states C2 and C3 >> but assigns latencies which, suspiciously, are each 1 >> greater than the >> permissible maximum. Would these be dummy values that get >> inserted when genuine values could not be read? These values are inserted by the BIOS writer to indicate that the the platform does not support these C-states. Generally, we've found that vendors assume the OS will use _CST and will ignore the FADT, and thus they don't bother putting anything interesting in the FADT -- as that path is never tested.
--- Linus Torvalds <torvalds@osdl.org> wrote: > Do you actally see anything running hotter? Yes, that's how this problem manifests itself and the only means by which I became aware of it. In the working case, without "if (pr->power.states[i].type >= ACPI_STATE_C2)", my CPU idle temperature is around 36 degrees C. In the non-working case, with that "if" statement, the idle temp is around 51 degrees C. The CPU temp was my sole guide during the git bisect. > Or is it just that the "CPU%d (power states:..)" message disappeared? This message does disappear in the non-working case, co-inciding with the raised CPU temp. > Please realize that we will _always_ use C1 (aka "halt") in the idle > state > quite regardless of ACPI - unless you've done "idle=poll" on the > kernel > command line. So the fact that we don't use ACPI for it shouldn't > make us > actually run any hotter (quite the reverse - we'll go into C1 state > with > _less_ work). Interesting. I'm not using "idle=poll". I should re-iterate that I know nothing about kernel internals, but I'm wondering, given what you've said, if its possible that there's another problem here, causing non-ACPI C1/halt not to work. Thus ACPI is the only mechanism providing power-saving; and was only doing so, prior to the "if (pr->power.states[i].type >= ACPI_STATE_C2)" revision, due to a happy accident whereby a valid C1 state in pr->power.states caused pr->flags.power to be set. Does that make any sense? This theory seems consistent with the fact that the working (i.e. cool) Fedora kernel 2.6.14-1.1656_FC4, becomes non-working (i.e. hot) when booted with "acpi=off" ( see Comment #7 http://bugzilla.kernel.org/show_bug.cgi?id=6519#c7 ) Thanks for your comments and insight. Dave Send instant messages to your online friends http://uk.messenger.yahoo.com
On Mon, 15 May 2006, Dave Jenkins wrote: > > --- Linus Torvalds <torvalds@osdl.org> wrote: > > Do you actally see anything running hotter? > > Yes, that's how this problem manifests itself and the only means by > which I became aware of it. In the working case, without "if > (pr->power.states[i].type >= ACPI_STATE_C2)", my CPU idle temperature > is around 36 degrees C. In the non-working case, with that "if" > statement, the idle temp is around 51 degrees C. The CPU temp was my > sole guide during the git bisect. Good job. That's certainly conclusive. > > Please realize that we will _always_ use C1 (aka "halt") in the idle state > > quite regardless of ACPI - unless you've done "idle=poll" on the kernel > > command line. So the fact that we don't use ACPI for it shouldn't make us > > actually run any hotter (quite the reverse - we'll go into C1 state with > > _less_ work). > > Interesting. I'm not using "idle=poll". I should re-iterate that I know > nothing about kernel internals, but I'm wondering, given what you've > said, if its possible that there's another problem here, causing > non-ACPI C1/halt not to work. I'd also wonder if maybe there is something that causes ACPI to go into C2 even though the BIOS latency tables have apparently said that we shouldn't. Ie it may be that we have a totally unrelated bug that made ACPI actually go into a deeper powersaving mode than it should. That could have explained the lockups that Maneesh saw with "maxcpus=1" (because C2 and above are disabled for SMP _anyway_, since they aren't valid there due to touching the _common_ northbridge rather than being per-core). > Thus ACPI is the only mechanism providing > power-saving; and was only doing so, prior to the "if > (pr->power.states[i].type >= ACPI_STATE_C2)" revision, due to a happy > accident whereby a valid C1 state in pr->power.states caused > pr->flags.power to be set. Does that make any sense? Actually, suddenly it does. When I look more closely at your dmesg report, I find this: Checking 'hlt' instruction... disabled ie your normal idle loop has literally _disabled_ the use of hlt. Do you have "no-hlt" on the kernel command line? That _should_ be the only thing that disables hlt (and thus power-savings in idle). Yup, you do (from that same dmesg): Kernel command line: ro root=/dev/VolGroup00/LogVol00 no-hlt 1 And yes, ACPI will ignore that "no-hlt" flag. Now, the question is, why do you have no-hlt there? Was it some strange distro that set it for you? And why? Linus
On Mon, 15 May 2006, Linus Torvalds wrote: > > Yup, you do (from that same dmesg): > > Kernel command line: ro root=/dev/VolGroup00/LogVol00 no-hlt 1 > > Now, the question is, why do you have no-hlt there? Was it some strange > distro that set it for you? And why? Btw, I think we can close this report, aside from the question about why that "no-hlt" was there in the first place. I bet the power usage will go back to where it was with ACPI without that thing. The whole "no-hlt" thing exists purely for some _really_ old i486 class machines that would lock up with hlt for some unexplained reason. Linus
Linus Torvalds <torvalds@osdl.org> wrote: > > Btw, I think we can close this report, aside from the question about why > that "no-hlt" was there in the first place. I bet the power usage will go > back to where it was with ACPI without that thing. Thanks, Linus. But.. Was Dave using no-hlt on earlier kernels? If so, why didn't they get hot as well?
On Mon, 15 May 2006, Andrew Morton wrote: > Linus Torvalds <torvalds@osdl.org> wrote: > > > > Btw, I think we can close this report, aside from the question about why > > that "no-hlt" was there in the first place. I bet the power usage will go > > back to where it was with ACPI without that thing. > > Thanks, Linus. > > But.. Was Dave using no-hlt on earlier kernels? If so, why didn't they > get hot as well? Exactly because ACPI _ignored_ that option, so it would use the broken ACPI C1 sleepstate. So ACPI just does: /* * Invoke C1. * Use the appropriate idle routine, the one that would * be used without acpi C-states. */ if (pm_idle_save) pm_idle_save(); else acpi_safe_halt(); where acpi_safe_halt() just does static void acpi_safe_halt(void) { clear_thread_flag(TIF_POLLING_NRFLAG); smp_mb__after_clear_bit(); if (!need_resched()) safe_halt(); set_thread_flag(TIF_POLLING_NRFLAG); } and here probably pm_idle_save was NULL. In contrast, the main CPU idle wil do ... idle = pm_idle; if (!idle) idle = default_idle; if (cpu_is_offline(cpu)) play_dead(); __get_cpu_var(irq_stat).idle_timestamp = jiffies; idle(); ... ie if pm_idle is NULL (which is tha ACPI "saved_pm_idle") it will use default_idle, which in turn does if (!hlt_counter && boot_cpu_data.hlt_works_ok) { clear_thread_flag(TIF_POLLING_NRFLAG); smp_mb__after_clear_bit(); while (!need_resched()) { local_irq_disable(); if (!need_resched()) safe_halt(); else local_irq_enable(); } set_thread_flag(TIF_POLLING_NRFLAG); } else { while (!need_resched()) cpu_relax(); } ie it _honors_ that hlt_works_ok flag, unlike the ACPI one. Now, I'm not saying that ACPI should honor the hlt_works_ok flag, because ACPI wouldn't _exist_ on the kind of old machines hat needed it, but I think it explains why ACPI ended up running in a cooler C1 than the normal idle routine, which would just end up doing that endless loop of "cpu_relax()" (which is not a halt, but a special no-op). Linus
--- Linus Torvalds <torvalds@osdl.org> wrote: > Now, the question is, why do you have no-hlt there? Was it some > strange > distro that set it for you? And why? I must admit I'd forgotten about the no-hlt, sorry. I added it to my grub.conf because I was getting occasional hangs on boot at "Checking 'hlt' instruction", perhaps 5-10% of the time. I first added it when running Fedora kernel 2.6.14-1.1656_FC4 . I added the no-hlt not knowing whether it would affect power saving or simply disable a potentially problematic test. With that kernel, adding no-hlt did not noticeably affect CPU temperature. We now know, or suspect, this is because that kernel had a bug/feature whereby ACPI power saving is controversially enabled for hardware that reports only C1 as a valid power state. At the time, I interpreted the fact that the temperature did not increase as showing that the "no-hlt" had simply disabled the test, not the power-saving. I hope this is a forgiveable mistake. When I upgraded to subsequent Fedora kernels, the existing kernel parameters, including no-hlt, were evidently automatically copied for the new entries in grub.conf . And these new kernels (2.6.15-1.1830_FC4 onward) did not have the same ACPI bug/feature, so suddenly my CPU temperature shot up because I now did not have power saving from hlt or ACPI. I have now confirmed that 2.6.16-1.2108_FC4 without no-hlt runs cool. Yes, I should have spotted and tested this before submitting the bug and I apologise for that. This leaves, apart from some embarrassment on my part, the question why I was getting hangs at "Checking 'hlt' instruction" with 2.6.14-1.1656_FC4 . Wouldn't it be neat if this turned out to have the same cause as the lock-up reported by Maneesh that motivated the "if (pr->power.states[i].type >= ACPI_STATE_C2)" revision! Dave Send instant messages to your online friends http://uk.messenger.yahoo.com
Closing bug as INVALID, removing no-hlt from the boot command line restored the power saving when idle.