On an HP zv5000 (P4/2.66GHz) laptop running 2.6.32, the symptom is a CPU temp that creeps up continuously until leveling off at 55-57 C (2nd fan keeps it from going beyond that). Using the "nolapic" parameter or reverting the commit below resolves the issue and CPU idle temp returns to 34-35 C. (Note: temperature increase is confirmed at fan exhausts) problem commit (determined through git bisection): ---------- commit 69d25870f20c4b2563304f2b79c5300dd60a067e Author: Arjan van de Ven <arjan@infradead.org> Date: Mon Sep 21 17:04:08 2009 -0700 cpuidle: fix the menu governor to boost IO performance ---------- The kernel is compiled with Local APIC and IO-APIC support for uniprocessors. The .config is based on one used for the 2.6.31.x branch w/o issues. For now, I run 2.6.32 with 69d25870f20c4b2563304f2b79c5300dd60a067e reverted ("nolapic" creates IRQ problems on this HW) and have noticed no problems resulting from this reversion. Please let me know if I can provide any additional information that would be of assistance. ~Andy
On Sat, 5 Dec 2009 17:24:17 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > For now, I run 2.6.32 with 69d25870f20c4b2563304f2b79c5300dd60a067e > reverted ("nolapic" creates IRQ problems on this HW) and have noticed > no problems resulting from this reversion. > > Please let me know if I can provide any additional information that > would be of assistance. can you run powertop -d and give us the result? it's a first good diagnostic...
Created attachment 24056 [details] Powertop -d output on pristine 2.6.32 kernel (aka hot kernel)
Created attachment 24057 [details] Powertop -d output on patched 2.6.32 kernel (aka cool kernel)
ok so there is something very interesting here: your system, for some reason, seems to exit the C2 state immediately when it gets entered. (C2 is only available because the bios announces its presence). This is a hardware/BIOS bug, and a bad one at that. In the old code, for some reason, C2 is not used in practice. With the new code, the governor will try to use C2, repeatedly. I think the real solution is not to change the governor, but to make a quirk for your system so that Linux will just not use C2 on your system... Can you attach the output of "dmidecode"; we'll need that to make a quirk.
Created attachment 24075 [details] dmidecode output Many thanks for so quickly pinpointing a likely BIOS/HW problem and for your suggested quirk solution. I tried to view similar Cn residency stats using Intel's PowerInformer 1.2 on Windows but was met with "Add Pdh counter...failed...". Needless to say, not too happy that I might have either broken HW or a buggy BIOS. Can you think of any other parts of the kernel that can be negatively affected by a broken, yet announced, C2? Attached is my dmidecode output for quirk creation. ~Andy
On Friday 08 January 2010, Andrew Watts wrote: > All: > > Though there has been no activity on this bug since my last comment on > 12/7/09, the issue remains unresolved. I have confirmed that the problem > occurs only on linux using the new menu idle governor code > (69d25870f20c4b2563304f2b79c5300dd60a067e) . The HW performs perfectly well > either: (a) in linux with the old idle governor code or (b) under windows XP. > > ~Andy
On Fri, 8 Jan 2010 23:29:55 GMT bugzilla-daemon@bugzilla.kernel.org wrote: the following patch should make this work: acpi: Add the HP Pavilion zv5000 to the power DMI table The HP Pavilion zv5000 is reported (see bug 14742) to not work well in C2; in fact the system exits C2 immediately. This patch adds a DMI entry for this system so that C2 is not used on this machine. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index d1676b1..97d2ee6 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -110,6 +110,10 @@ static struct dmi_system_id __cpuinitdata processor_power_dmi_table[] = { DMI_MATCH(DMI_BIOS_VENDOR,"Phoenix Technologies LTD"), DMI_MATCH(DMI_BIOS_VERSION,"SHE845M0.86C.0013.D.0302131307")}, (void *)2}, + { set_max_cstate, "Pavilion zv5000", { + DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"), + DMI_MATCH(DMI_PRODUCT_NAME,"Pavilion zv5000 (DS502A#ABA)")}, + (void *)1}, {}, };
[ A request for people who see a similar issue: Please file a seperate bug for each different machine. So unless you have a HP PAvilion zv5000, you need a different bug]
(In reply to comment #8) > [ A request for people who see a similar issue: Please file a seperate bug > for > each different machine. So unless you have a HP PAvilion zv5000, you need a > different bug] If this bug is for only one machine, shouldn't that be specified on the summary?
Confirmed - Arjan's patch applied to 2.6.32.3 limits the laptop to C1 and prevents the overheating seen with the new menu idle governor code (similar result can be achieved using processor.max_cstate=1 or idle=halt). With any of these restrictions powertop doesn't report detailed C-state statistics as they're no longer available. Why does the kernel no longer provide information on C0, polling, and C1? Also, zv5000 represents the laptop family and the exact model is zv5030us (with unique identifier DS502A#ABA as used by Arjan). Arjan, should the first zv5000 in your DMI entry patch be a zv5030us instead to reflect this? Sorry for not pointing this out from the start. Finally, isn't it a little early to conclude this is a single-machine-type bug? ~Andy
On 1/11/2010 14:48, bugzilla-daemon@bugzilla.kernel.org wrote: > Also, zv5000 represents the laptop family and the exact model is zv5030us > (with > unique identifier DS502A#ABA as used by Arjan). Arjan, should the first > zv5000 > in your DMI entry patch be a zv5030us instead to reflect this? Sorry for not > pointing this out from the start. it's only a cosmetic thing.... > > Finally, isn't it a little early to conclude this is a single-machine-type > bug? so far, with Fedora 12 shipping this patch already, there is 2 machines total. Having non-working C states is actually rather rare in general..... I don't expect many more machines. (And even then, doing this kind of table is the right thing; you really do not have C2, and you don't even want to pretend you have it because there's a cost associated with that, which normally gets offset by powersavings, but not in your case)
Ah, I see there was a report from an ASUSTek user this month (http://patchwork.kernel.org/patch/71962/). Is there a way for the kernel to still provide detailed C-state information after applying your patch Arjan? Right now that's not the case. Thanks, ~Andy
Handled-By : Arjan van de Ven <arjan@linux.intel.com> Patch : http://bugzilla.kernel.org/show_bug.cgi?id=14742#c7
Fixed by commit 370d5cd88509b93b76eb2f5f97efbd71c25061cb.
Andy, On Windows, can you run perfmon and see if they are able to get into C2 at all? (you have to add (+) the appropriate cpu counters) Also, with the DMI entry backed out to see the bug again, can you bring up the system in single user mode to see if the problem is seen when the network interfaces are not up? bug 15377 sees a similar failure, but only after the network is probed.
also, please paste here the output from "cat /proc/cpuinfo"
Len, the perfmon information you requested is quite interesting especially when compared to my powertop output. Hopefully, you can shed some light here... Regarding bug #15377, I compiled 2.6.33.2 with 370d5cd88509b93b76eb2f5f97efbd71c25061cb reverted and the CPU temp shoots up in single user mode (powertop -d shows 180530.5 wakeups-from-idle per second). When I boot with init=/bin/bash it is cool up until I add processor.ko. ======================== Windows Perfmon A. Idling % C1 Time 0.000 % C2 Time 96.683 % C3 Time 0.000 % Idle Time 96.875 % Processor Time 3.125 C1 Transitions/sec 0.000 C2 Transitions/sec 138.003 C3 Transitions/sec 0.000 Interrupts/sec 176.003 B. Scanning for viruses % C1 Time 18.244 % C2 Time 11.424 % C3 Time 0.000 % Idle Time 31.250 % Processor Time 68.750 C1 Transitions/sec 160.001 C2 Transitions/sec 160.001 C3 Transitions/sec 0.000 Interrupts/sec 478.004 ========================== /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.66GHz stepping : 9 cpu MHz : 2666.970 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr bogomips : 5333.94 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 32 bits virtual power management:
*BUMP*