Most recent kernel where this bug did not occur: 2.6.16 Distribution: Pardus Linux, Alpha 3 Hardware Environment: Sony Vaio VGN-FS215B Notebook Software Environment: Problem Description: Ondemand or conservative governors couldnWt change current frequency of CPU while system is idle with 2.6.18, 2.6.18.1, 2.6.19-r1 kernels. 2.6.16 works fine without any problem. As a workaround unplug/plug ac adapter magically solves the problem and these governors start to run as expected. In ordet to be sure this problem is kernel related, all userspace softwares which capable of powermanagent are stopped (hal/powersave etc.) acpidump, /proc/cpuinfo, dmesg output, dmidecode output, lsmod output, top, ps output and CONFIG will follow this report as attachments Whole LKML thread can be found @ http://www.gossamer-threads.com/lists/linux/kernel/692970
Created attachment 9269 [details] acpidump
Created attachment 9270 [details] config
Created attachment 9271 [details] /proc/cpuinfo
Created attachment 9272 [details] dmesg
Created attachment 9273 [details] dmidecode
Created attachment 9274 [details] lsmod
Created attachment 9275 [details] ps
Created attachment 9276 [details] top
Created attachment 9277 [details] /proc/interrupts cat /proc/interrupts; sleep 10; cat /proc/interrupts while governor working (with unplug/plug ac adapter workaround)
Created attachment 9278 [details] /proc/interrupts cat /proc/interrupts; sleep 10; cat /proc/interrupts while nothing working
I just found one more workaround, suspending to ram and resuming back solves problem while ac_adapter plugged in :)
Can you try another workaround and check whether that helps. #echo 1 > /sys/module/processor/parameters/max_cstate
And also, just to be doubly sure, Can you look at 'top' output when the system is totally idle in not working case. Do you CPU idle time as 100%? We had some issue earlier when this idle statistics was going bad. I just want to rule out that happening here. Thanks for all the detailed information by the way.. :-)
Re: /proc/interrupts LOC interrupts are falling way behind IRQ0 timer. How do things look if you boot with "nolapic"? What do the /proc/interrupts look like on the working 2.6.16 configuration?
> Can you try another workaround and check whether that helps. > > #echo 1 > /sys/module/processor/parameters/max_cstate As soon as i entered this, ondemand governor starts to work :)
> LOC interrupts are falling way behind IRQ0 timer. > How do things look if you boot with "nolapic"? Ill try > What do the /proc/interrupts look like on the working 2.6.16 configuration? Right now, i don't have vanilla 2.6.16, but i have the one officially Pardus uses with some patches (none of them related with powermanagement), is it accepted? if not please say so i will compile and report back :)
Created attachment 9288 [details] dmesg_nolapic
Created attachment 9289 [details] /proc/interrupts_nolapic
> LOC interrupts are falling way behind IRQ0 timer. > How do things look if you boot with "nolapic"? Both governors are working with nolapic also
It is clear that the LAPIC timer is stopping in C3 and that is screwing up stats, which screws up cpufreq. The mystery is why the working 2.6.16 vintage kernel didn't run into this. Is it possible to boot that and see what /proc/interrupts says?
Created attachment 9294 [details] /proc/interrupts_2.6.16
> It is clear that the LAPIC timer is stopping in C3 > and that is screwing up stats, which screws up cpufreq. > > The mystery is why the working 2.6.16 vintage kernel didn't run into this. > Is it possible to boot that and see what /proc/interrupts says? As a note i found although UP compiled 2.6.16 works fine, SMP compiled is not
Does UP compiled 2.6.18 work as well?
> Does UP compiled 2.6.18 work as well? Will try
> Does UP compiled 2.6.18 work as well? Sorry for long delay, UP compiled 2.6.18 works...
Created attachment 9321 [details] /proc/interrupts_2.6.18_UP
Any progress on this? If is there anything to test/try etc. just ask please :)
Seems like suse also suffers the same problem https://bugzilla.novell.com/show_bug.cgi?id=216205
If the UP kernel is working properly, it is luck, because LOC is ticker here much slower than timer: 37665-36819 = 846 LOC interrupts 163487-153480 = 10007 timer interrupts
No. The reason it works in UP is because update_process_times() is called on a external timer interrupt on a UP kernel. And that external timer works without issues. But on SMP kernel, update_process_times() is called in local APIC timer. So, short term resolution is to use UP kernel. For SMP kernel, we cannot switch to update process time in external timer, due to possibility of hotplug. My feeling is, cleanest solution to this is to have idle time micro-accounting and use that instead of these timers. But that is bigger change to do right now.
But how is the timer behaves correctly after unpluging/pluging ac adapater?
That is because you don't have deep C-states once you do that and timer works correctly. Timer only stops working in deep C-state. You can double check this by looking at output of /proc/acpi/processor/CPU*/power
> My feeling is, cleanest solution to this is to have idle time > micro-accounting > and use that instead of these timers. But that is bigger change to do right > now. Is that right time came :) ?
2.6.20 suffers with same symptoms
With 2.6.21_rc5 things start to work not sure what fixed this/or hide real problem (NO_HZ?) zangetsu ~ # cat /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table From : To : 1733000 1333000 1067000 800000 1733000: 0 43 32 265 1333000: 8 0 4 31 1067000: 12 0 0 24 800000: 319 0 0 0 zangetsu ~ # cat /sys/module/processor/parameters/max_cstate 8 will attach dmesg, /proc/interrupts and config
Created attachment 10948 [details] dmesg.2.6.21_rc5
Created attachment 10949 [details] /proc/interrups_2.6.21_rc5
Created attachment 10950 [details] config.2.6.21_rc5
Yes. The problem is fixed along with the timer rework and NO_HZ patches from Thomas/Ingo. That was the reason I was being lazy to provide any band-aid patches before 2.6.20. This is now fixed in a generic way. Can you please test this with and without NO_HZ configured, just to be sure that the problem is fixed in all cases. Thanks
Also works without a problem with nohz=off kernel parameter :), thanks!
so im closing this as FIXED...