Bug 56691
Summary: | Using Intel Pstates driver causes panics during high loads | ||
---|---|---|---|
Product: | Power Management | Reporter: | Mike Lothian (mike) |
Component: | cpufreq | Assignee: | Dirk Brandewie (dirk.brandewie) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | alan, dirk.brandewie, mahatma, rui.zhang, tianyu.lan, wmark |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.9-rc | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | Panic Screenshots |
Description
Mike Lothian
2013-04-16 17:31:06 UTC
Created attachment 98881 [details]
Panic Screenshots
This is the same bug as reported by Redhat https://bugzilla.redhat.com/show_bug.cgi?id=952244 I took a long time to reproduce and catch it in the act and get debug info :-( The fix missed rc6 by a few hours but is in rc7. --Dirk Author: Dirk Brandewie <dirk.brandewie@gmail.com> 2013-04-04 10:35:35 Committer: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 2013-04-08 13:09:23 Parent: 05e99c8cf9d4e53ef6e016815db40a89a6156529 (intel-pstate: Use #defines instead of hard-coded values.) Branches: master, pstate, pstate.stgit, remotes/linux-pm/bleeding-edge, remotes/linux-pm/fixes, remotes/linux-pm/linux-next, remotes/next/akpm, remotes/next/akpm-base, remotes/next/master, remotes/next/stable, remotes/origin/master, test_next Follows: v3.9-rc6 Precedes: pm-3.9-rc7 cpufreq / intel_pstate: Set timer timeout correctly The current calculation of the delay time is wrong and a cut and paste error from a previous experimental driver. This can result in the timeout being set to jiffies + 1 which setup the driver to race with itself if the APIC timer interrupt happens at just the right time. References: https://bugzilla.redhat.com/show_bug.cgi?id=920289 Reported-by: Adam Williamson <awilliam@redhat.com> Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> ------------------------ drivers/cpufreq/intel_pstate.c ------------------------ index ad72922..6133ef5 100644 @@ -502,7 +502,6 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu) sample_time = cpu->pstate_policy->sample_rate_ms; delay = msecs_to_jiffies(sample_time); - delay -= jiffies % delay; mod_timer_pinned(&cpu->timer, jiffies + delay); } Sorry I didn't notice that commit I'll turn pstates back on and see if I have any issues Hi, Mike, is there any update? The patch from #2 as well as the one from RedHat (sorry, I don't have a link at hand) didn't work in my case. I get, using the latest version from GIT, panics. Xeon 1230, ProLiant ML110 G7. Confirm: still in 3.9.0 release. Getting fast and sure (seconds-minute): on PREEMPT, *PREEMPT_RCU (and some related), and starting ceph & 2 virtual machines with nice=-20. Once happened even on initrd. This is being caused by the driver timer function being called back to back (sub 1us) which makes the sample duration 0. I am still trying to figure out how this is happening. The following patch removes the idle time and duration tracking. commit 37c6fc9eea3bdf77a7e76f75696a29ca86037274 Author: Dirk Brandewie <dirk.j.brandewie@intel.com> Date: Tue Apr 30 09:12:20 2013 -0700 cpufreq/intel_pstate: remove idle time and duration from sample and calculations Idle time is taken into account in the APERF/MPERF ratio calculation there is no reason for the driver to track it seperately. This reduces the work in the driver and makes the code more readable. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 45 +++++++-------------------------------- 1 files changed, 8 insertions(+), 37 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 6b995dc..778dd72 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -48,12 +48,7 @@ static inline int32_t div_fp(int32_t x, int32_t y) } struct sample { - ktime_t start_time; - ktime_t end_time; int core_pct_busy; - int pstate_pct_busy; - u64 duration_us; - u64 idletime_us; u64 aperf; u64 mperf; int freq; @@ -91,8 +86,6 @@ struct cpudata { int min_pstate_count; int idle_mode; - ktime_t prev_sample; - u64 prev_idle_time_us; u64 prev_aperf; u64 prev_mperf; int sample_ptr; @@ -449,48 +442,26 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu, struct sample *sample) { u64 core_pct; - sample->pstate_pct_busy = 100 - div64_u64( - sample->idletime_us * 100, - sample->duration_us); core_pct = div64_u64(sample->aperf * 100, sample->mperf); sample->freq = cpu->pstate.max_pstate * core_pct * 1000; - sample->core_pct_busy = div_s64((sample->pstate_pct_busy * core_pct), - 100); + sample->core_pct_busy = core_pct; } static inline void intel_pstate_sample(struct cpudata *cpu) { - ktime_t now; - u64 idle_time_us; u64 aperf, mperf; - now = ktime_get(); - idle_time_us = get_cpu_idle_time_us(cpu->cpu, NULL); - rdmsrl(MSR_IA32_APERF, aperf); rdmsrl(MSR_IA32_MPERF, mperf); - /* for the first sample, don't actually record a sample, just - * set the baseline */ - if (cpu->prev_idle_time_us > 0) { - cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT; - cpu->samples[cpu->sample_ptr].start_time = cpu->prev_sample; - cpu->samples[cpu->sample_ptr].end_time = now; - cpu->samples[cpu->sample_ptr].duration_us = - ktime_us_delta(now, cpu->prev_sample); - cpu->samples[cpu->sample_ptr].idletime_us = - idle_time_us - cpu->prev_idle_time_us; - - cpu->samples[cpu->sample_ptr].aperf = aperf; - cpu->samples[cpu->sample_ptr].mperf = mperf; - cpu->samples[cpu->sample_ptr].aperf -= cpu->prev_aperf; - cpu->samples[cpu->sample_ptr].mperf -= cpu->prev_mperf; - - intel_pstate_calc_busy(cpu, &cpu->samples[cpu->sample_ptr]); - } + cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT; + cpu->samples[cpu->sample_ptr].aperf = aperf; + cpu->samples[cpu->sample_ptr].mperf = mperf; + cpu->samples[cpu->sample_ptr].aperf -= cpu->prev_aperf; + cpu->samples[cpu->sample_ptr].mperf -= cpu->prev_mperf; + + intel_pstate_calc_busy(cpu, &cpu->samples[cpu->sample_ptr]); - cpu->prev_sample = now; - cpu->prev_idle_time_us = idle_time_us; cpu->prev_aperf = aperf; cpu->prev_mperf = mperf; } OK, now 4.5 hours uptime with this patch (by hands) over intel_pstate.c from -next... Looks fixed. No, still unstable, but stuck silent and sometimes... PS But looking to roasted router PC and others - it can be bad star position. Before: • Intel Xeon E3-1230, microcode 0x28 • Linux 3.9.0-rc8 (with patches which are part of current Linux 3.11.6) Changes: • Intel has released new microcode for some CPUs: Intel Xeon {3400, 5500, 5600, 7500, E3-{1200, 1200V2, 1200V3}, E5-{2400,2600,4600}, E7} Series → see the »specification updates« After/Then: • microcode 0x29 • Linux 3.11.6 (no differences to intel_pstate.c from ›before‹) I have not been able to reproduce the panics since ›Then‹. I suspect this is caused by changes to APIC page access-related issues in the new microcode. |