Bug 7376
Summary: | cpufreq misbehaves if C3 and LAPIC and SMP | ||
---|---|---|---|
Product: | Power Management | Reporter: | S.Caglar Onur (caglar) |
Component: | cpufreq | Assignee: | Venkatesh Pallipadi (venki) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla, ismail, lenb, trenn |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.18, 2.6.18.1, 2.6.19, 2.6.20 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
acpidump
config /proc/cpuinfo dmesg dmidecode lsmod ps top /proc/interrupts /proc/interrupts dmesg_nolapic /proc/interrupts_nolapic /proc/interrupts_2.6.16 /proc/interrupts_2.6.18_UP dmesg.2.6.21_rc5 /proc/interrups_2.6.21_rc5 config.2.6.21_rc5 |
Description
S.Caglar Onur
2006-10-16 13:08:28 UTC
Created attachment 9269 [details]
acpidump
Created attachment 9270 [details]
config
Created attachment 9271 [details]
/proc/cpuinfo
Created attachment 9272 [details]
dmesg
Created attachment 9273 [details]
dmidecode
Created attachment 9274 [details]
lsmod
Created attachment 9275 [details]
ps
Created attachment 9276 [details]
top
Created attachment 9277 [details]
/proc/interrupts
cat /proc/interrupts; sleep 10; cat /proc/interrupts while governor working
(with unplug/plug ac adapter workaround)
Created attachment 9278 [details]
/proc/interrupts
cat /proc/interrupts; sleep 10; cat /proc/interrupts while nothing working
I just found one more workaround, suspending to ram and resuming back solves problem while ac_adapter plugged in :) Can you try another workaround and check whether that helps. #echo 1 > /sys/module/processor/parameters/max_cstate And also, just to be doubly sure, Can you look at 'top' output when the system is totally idle in not working case. Do you CPU idle time as 100%? We had some issue earlier when this idle statistics was going bad. I just want to rule out that happening here. Thanks for all the detailed information by the way.. :-) Re: /proc/interrupts LOC interrupts are falling way behind IRQ0 timer. How do things look if you boot with "nolapic"? What do the /proc/interrupts look like on the working 2.6.16 configuration? > Can you try another workaround and check whether that helps. > > #echo 1 > /sys/module/processor/parameters/max_cstate As soon as i entered this, ondemand governor starts to work :) > LOC interrupts are falling way behind IRQ0 timer. > How do things look if you boot with "nolapic"? Ill try > What do the /proc/interrupts look like on the working 2.6.16 configuration? Right now, i don't have vanilla 2.6.16, but i have the one officially Pardus uses with some patches (none of them related with powermanagement), is it accepted? if not please say so i will compile and report back :) Created attachment 9288 [details]
dmesg_nolapic
Created attachment 9289 [details]
/proc/interrupts_nolapic
> LOC interrupts are falling way behind IRQ0 timer.
> How do things look if you boot with "nolapic"?
Both governors are working with nolapic also
It is clear that the LAPIC timer is stopping in C3 and that is screwing up stats, which screws up cpufreq. The mystery is why the working 2.6.16 vintage kernel didn't run into this. Is it possible to boot that and see what /proc/interrupts says? Created attachment 9294 [details]
/proc/interrupts_2.6.16
> It is clear that the LAPIC timer is stopping in C3
> and that is screwing up stats, which screws up cpufreq.
>
> The mystery is why the working 2.6.16 vintage kernel didn't run into this.
> Is it possible to boot that and see what /proc/interrupts says?
As a note i found although UP compiled 2.6.16 works fine, SMP compiled is not
Does UP compiled 2.6.18 work as well? > Does UP compiled 2.6.18 work as well?
Will try
> Does UP compiled 2.6.18 work as well?
Sorry for long delay, UP compiled 2.6.18 works...
Created attachment 9321 [details]
/proc/interrupts_2.6.18_UP
Any progress on this? If is there anything to test/try etc. just ask please :) Seems like suse also suffers the same problem https://bugzilla.novell.com/show_bug.cgi?id=216205 If the UP kernel is working properly, it is luck, because LOC is ticker here much slower than timer: 37665-36819 = 846 LOC interrupts 163487-153480 = 10007 timer interrupts No. The reason it works in UP is because update_process_times() is called on a external timer interrupt on a UP kernel. And that external timer works without issues. But on SMP kernel, update_process_times() is called in local APIC timer. So, short term resolution is to use UP kernel. For SMP kernel, we cannot switch to update process time in external timer, due to possibility of hotplug. My feeling is, cleanest solution to this is to have idle time micro-accounting and use that instead of these timers. But that is bigger change to do right now. But how is the timer behaves correctly after unpluging/pluging ac adapater? That is because you don't have deep C-states once you do that and timer works correctly. Timer only stops working in deep C-state. You can double check this by looking at output of /proc/acpi/processor/CPU*/power > My feeling is, cleanest solution to this is to have idle time
> micro-accounting
> and use that instead of these timers. But that is bigger change to do right
> now.
Is that right time came :) ?
2.6.20 suffers with same symptoms With 2.6.21_rc5 things start to work not sure what fixed this/or hide real problem (NO_HZ?) zangetsu ~ # cat /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table From : To : 1733000 1333000 1067000 800000 1733000: 0 43 32 265 1333000: 8 0 4 31 1067000: 12 0 0 24 800000: 319 0 0 0 zangetsu ~ # cat /sys/module/processor/parameters/max_cstate 8 will attach dmesg, /proc/interrupts and config Created attachment 10948 [details]
dmesg.2.6.21_rc5
Created attachment 10949 [details]
/proc/interrups_2.6.21_rc5
Created attachment 10950 [details]
config.2.6.21_rc5
Yes. The problem is fixed along with the timer rework and NO_HZ patches from Thomas/Ingo. That was the reason I was being lazy to provide any band-aid patches before 2.6.20. This is now fixed in a generic way. Can you please test this with and without NO_HZ configured, just to be sure that the problem is fixed in all cases. Thanks Also works without a problem with nohz=off kernel parameter :), thanks! so im closing this as FIXED... |