Bug 7376 - cpufreq misbehaves if C3 and LAPIC and SMP
Summary: cpufreq misbehaves if C3 and LAPIC and SMP
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Venkatesh Pallipadi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-16 13:08 UTC by S.Caglar Onur
Modified: 2011-07-30 05:22 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.18, 2.6.18.1, 2.6.19, 2.6.20
Subsystem:
Regression: No
Bisected commit-id:


Attachments
acpidump (80.91 KB, application/octet-stream)
2006-10-16 13:09 UTC, S.Caglar Onur
Details
config (58.54 KB, text/plain)
2006-10-16 13:09 UTC, S.Caglar Onur
Details
/proc/cpuinfo (440 bytes, text/plain)
2006-10-16 13:10 UTC, S.Caglar Onur
Details
dmesg (15.33 KB, text/plain)
2006-10-16 13:10 UTC, S.Caglar Onur
Details
dmidecode (5.50 KB, text/plain)
2006-10-16 13:11 UTC, S.Caglar Onur
Details
lsmod (1.41 KB, text/plain)
2006-10-16 13:11 UTC, S.Caglar Onur
Details
ps (3.05 KB, text/plain)
2006-10-16 13:12 UTC, S.Caglar Onur
Details
top (3.52 KB, text/plain)
2006-10-16 13:13 UTC, S.Caglar Onur
Details
/proc/interrupts (1.13 KB, text/plain)
2006-10-16 13:14 UTC, S.Caglar Onur
Details
/proc/interrupts (1.13 KB, text/plain)
2006-10-16 13:15 UTC, S.Caglar Onur
Details
dmesg_nolapic (15.23 KB, text/plain)
2006-10-17 12:51 UTC, S.Caglar Onur
Details
/proc/interrupts_nolapic (1.15 KB, text/plain)
2006-10-17 12:51 UTC, S.Caglar Onur
Details
/proc/interrupts_2.6.16 (1.18 KB, text/plain)
2006-10-18 03:44 UTC, S.Caglar Onur
Details
/proc/interrupts_2.6.18_UP (1.16 KB, text/plain)
2006-10-21 06:57 UTC, S.Caglar Onur
Details
dmesg.2.6.21_rc5 (19.41 KB, text/plain)
2007-03-26 10:06 UTC, S.Caglar Onur
Details
/proc/interrups_2.6.21_rc5 (1.44 KB, text/plain)
2007-03-26 10:06 UTC, S.Caglar Onur
Details
config.2.6.21_rc5 (63.40 KB, text/plain)
2007-03-26 10:07 UTC, S.Caglar Onur
Details

Description S.Caglar Onur 2006-10-16 13:08:28 UTC
Most recent kernel where this bug did not occur: 2.6.16
Distribution: Pardus Linux, Alpha 3
Hardware Environment: Sony Vaio VGN-FS215B Notebook
Software Environment:
Problem Description:

Ondemand or conservative governors couldnWt change current frequency of CPU
while system is idle with 2.6.18, 2.6.18.1, 2.6.19-r1 kernels. 2.6.16 works fine
without any problem. 

As a workaround unplug/plug ac adapter magically solves the problem and these
governors start to run as expected.

In ordet to be sure this problem is kernel related, all userspace softwares
which capable of powermanagent are stopped (hal/powersave etc.)

acpidump, /proc/cpuinfo, dmesg output, dmidecode output, lsmod output, top, ps
output and CONFIG will follow this report as attachments

Whole LKML thread can be found @
http://www.gossamer-threads.com/lists/linux/kernel/692970
Comment 1 S.Caglar Onur 2006-10-16 13:09:25 UTC
Created attachment 9269 [details]
acpidump
Comment 2 S.Caglar Onur 2006-10-16 13:09:59 UTC
Created attachment 9270 [details]
config
Comment 3 S.Caglar Onur 2006-10-16 13:10:25 UTC
Created attachment 9271 [details]
/proc/cpuinfo
Comment 4 S.Caglar Onur 2006-10-16 13:10:53 UTC
Created attachment 9272 [details]
dmesg
Comment 5 S.Caglar Onur 2006-10-16 13:11:23 UTC
Created attachment 9273 [details]
dmidecode
Comment 6 S.Caglar Onur 2006-10-16 13:11:56 UTC
Created attachment 9274 [details]
lsmod
Comment 7 S.Caglar Onur 2006-10-16 13:12:40 UTC
Created attachment 9275 [details]
ps
Comment 8 S.Caglar Onur 2006-10-16 13:13:26 UTC
Created attachment 9276 [details]
top
Comment 9 S.Caglar Onur 2006-10-16 13:14:42 UTC
Created attachment 9277 [details]
/proc/interrupts

cat /proc/interrupts; sleep 10; cat /proc/interrupts while governor working
(with unplug/plug ac adapter workaround)
Comment 10 S.Caglar Onur 2006-10-16 13:15:43 UTC
Created attachment 9278 [details]
/proc/interrupts

cat /proc/interrupts; sleep 10; cat /proc/interrupts while nothing working
Comment 11 S.Caglar Onur 2006-10-16 15:31:02 UTC
I just found one more workaround, suspending to ram and resuming back solves
problem while ac_adapter plugged in :)
Comment 12 Venkatesh Pallipadi 2006-10-16 19:58:32 UTC
Can you try another workaround and check whether that helps. 

#echo 1 > /sys/module/processor/parameters/max_cstate
Comment 13 Venkatesh Pallipadi 2006-10-16 20:08:31 UTC
And also, just to be doubly sure, Can you look at 'top' output when the system 
is totally idle in not working case. Do you CPU idle time as 100%? We had some 
issue earlier when this idle statistics was going bad. I just want to rule out 
that happening here.

Thanks for all the detailed information by the way.. :-)
Comment 14 Len Brown 2006-10-16 21:54:46 UTC
Re: /proc/interrupts 
 
LOC interrupts are falling way behind IRQ0 timer. 
How do things look if you boot with "nolapic"? 
 
What do the /proc/interrupts look like on the working 2.6.16 configuration? 
 
Comment 15 S.Caglar Onur 2006-10-17 03:11:03 UTC
> Can you try another workaround and check whether that helps. 
>
> #echo 1 > /sys/module/processor/parameters/max_cstate

As soon as i entered this, ondemand governor starts to work :)
Comment 16 S.Caglar Onur 2006-10-17 03:14:01 UTC
> LOC interrupts are falling way behind IRQ0 timer. 
> How do things look if you boot with "nolapic"? 

Ill try

> What do the /proc/interrupts look like on the working 2.6.16 configuration? 

Right now, i don't have vanilla 2.6.16, but i have the one officially Pardus
uses with some patches (none of them related with powermanagement), is it
accepted? if not please say so i will compile and report back :)
Comment 17 S.Caglar Onur 2006-10-17 12:51:03 UTC
Created attachment 9288 [details]
dmesg_nolapic
Comment 18 S.Caglar Onur 2006-10-17 12:51:43 UTC
Created attachment 9289 [details]
/proc/interrupts_nolapic
Comment 19 S.Caglar Onur 2006-10-17 12:53:22 UTC
> LOC interrupts are falling way behind IRQ0 timer. 
> How do things look if you boot with "nolapic"? 

Both governors are working with nolapic also
Comment 20 Len Brown 2006-10-18 00:28:29 UTC
It is clear that the LAPIC timer is stopping in C3 
and that is screwing up stats, which screws up cpufreq. 
 
The mystery is why the working 2.6.16 vintage kernel didn't run into this. 
Is it possible to boot that and see what /proc/interrupts says? 
 
Comment 21 S.Caglar Onur 2006-10-18 03:44:46 UTC
Created attachment 9294 [details]
/proc/interrupts_2.6.16
Comment 22 S.Caglar Onur 2006-10-18 03:46:07 UTC
> It is clear that the LAPIC timer is stopping in C3 
> and that is screwing up stats, which screws up cpufreq. 
> 
> The mystery is why the working 2.6.16 vintage kernel didn't run into this. 
> Is it possible to boot that and see what /proc/interrupts says? 

As a note i found although UP compiled 2.6.16 works fine, SMP compiled is not
Comment 23 Venkatesh Pallipadi 2006-10-18 09:16:51 UTC
Does UP compiled 2.6.18 work as well?

Comment 24 S.Caglar Onur 2006-10-18 09:33:45 UTC
> Does UP compiled 2.6.18 work as well?

Will try
Comment 25 S.Caglar Onur 2006-10-21 06:57:16 UTC
> Does UP compiled 2.6.18 work as well?

Sorry for long delay, UP compiled 2.6.18 works...
Comment 26 S.Caglar Onur 2006-10-21 06:57:49 UTC
Created attachment 9321 [details]
/proc/interrupts_2.6.18_UP
Comment 27 S.Caglar Onur 2006-10-26 09:57:36 UTC
Any progress on this? If is there anything to test/try etc. just ask please :)
Comment 28 S.Caglar Onur 2006-11-03 16:27:59 UTC
Seems like suse also suffers the same problem
https://bugzilla.novell.com/show_bug.cgi?id=216205
Comment 29 Len Brown 2006-11-06 23:35:33 UTC
If the UP kernel is working properly, it is luck, 
because LOC is ticker here much slower than timer: 
 
37665-36819 = 846 LOC interrupts 
163487-153480 = 10007 timer interrupts 
 
Comment 30 Venkatesh Pallipadi 2006-11-07 11:41:32 UTC
No. The reason it works in UP is because update_process_times() is called on a 
external timer interrupt on a UP kernel. And that external timer works without 
issues.

But on SMP kernel, update_process_times() is called in local APIC timer.

So, short term resolution is to use UP kernel.
For SMP kernel, we cannot switch to update process time in external timer, due 
to possibility of hotplug.

My feeling is, cleanest solution to this is to have idle time micro-accounting 
and use that instead of these timers. But that is bigger change to do right 
now.
Comment 31 S.Caglar Onur 2006-11-11 06:04:07 UTC
But how is the timer behaves correctly after unpluging/pluging ac adapater?
Comment 32 Venkatesh Pallipadi 2006-11-11 06:45:55 UTC
That is because you don't have deep C-states once you do that and timer works 
correctly. Timer only stops working in deep C-state.

You can double check this by looking at output 
of /proc/acpi/processor/CPU*/power
Comment 33 S.Caglar Onur 2007-01-11 17:40:57 UTC
> My feeling is, cleanest solution to this is to have idle time
> micro-accounting 
> and use that instead of these timers. But that is bigger change to do right 
> now.

Is that right time came :) ?
Comment 34 S.Caglar Onur 2007-02-12 19:26:27 UTC
2.6.20 suffers with same symptoms
Comment 35 S.Caglar Onur 2007-03-26 10:05:20 UTC
With 2.6.21_rc5 things start to work not sure what fixed this/or hide real
problem (NO_HZ?)

zangetsu ~ # cat /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table
   From  :    To
         :   1733000   1333000   1067000    800000
  1733000:         0        43        32       265
  1333000:         8         0         4        31
  1067000:        12         0         0        24
   800000:       319         0         0         0
zangetsu ~ # cat /sys/module/processor/parameters/max_cstate
8

will attach dmesg, /proc/interrupts and config
Comment 36 S.Caglar Onur 2007-03-26 10:06:02 UTC
Created attachment 10948 [details]
dmesg.2.6.21_rc5
Comment 37 S.Caglar Onur 2007-03-26 10:06:39 UTC
Created attachment 10949 [details]
/proc/interrups_2.6.21_rc5
Comment 38 S.Caglar Onur 2007-03-26 10:07:03 UTC
Created attachment 10950 [details]
config.2.6.21_rc5
Comment 39 Venkatesh Pallipadi 2007-03-26 10:38:58 UTC
Yes. The problem is fixed along with the timer rework and NO_HZ patches from 
Thomas/Ingo. That was the reason I was being lazy to provide any band-aid 
patches before 2.6.20. This is now fixed in a generic way. Can you please test 
this with and without NO_HZ configured, just to be sure that the problem is 
fixed in all cases.

Thanks
Comment 40 S.Caglar Onur 2007-03-26 11:21:13 UTC
Also works without a problem with nohz=off kernel parameter :), thanks!
Comment 41 S.Caglar Onur 2007-03-26 23:18:12 UTC
so im closing this as FIXED...

Note You need to log in before you can comment on or make changes to this bug.