I've already had a conversation via email with Zhang about this The driver has been broken throughout the 3.9 rc series As the issue only manifests during high loads after long periods of time I've been able to bisect the issue I've taken screenshots of the panics Disabling the pstates driver stops the panics The first snapshots date back to the 7th March but the issue was noticed before this I update from Linus' tree nightly
Created attachment 98881 [details] Panic Screenshots
This is the same bug as reported by Redhat https://bugzilla.redhat.com/show_bug.cgi?id=952244 I took a long time to reproduce and catch it in the act and get debug info :-( The fix missed rc6 by a few hours but is in rc7. --Dirk Author: Dirk Brandewie <dirk.brandewie@gmail.com> 2013-04-04 10:35:35 Committer: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 2013-04-08 13:09:23 Parent: 05e99c8cf9d4e53ef6e016815db40a89a6156529 (intel-pstate: Use #defines instead of hard-coded values.) Branches: master, pstate, pstate.stgit, remotes/linux-pm/bleeding-edge, remotes/linux-pm/fixes, remotes/linux-pm/linux-next, remotes/next/akpm, remotes/next/akpm-base, remotes/next/master, remotes/next/stable, remotes/origin/master, test_next Follows: v3.9-rc6 Precedes: pm-3.9-rc7 cpufreq / intel_pstate: Set timer timeout correctly The current calculation of the delay time is wrong and a cut and paste error from a previous experimental driver. This can result in the timeout being set to jiffies + 1 which setup the driver to race with itself if the APIC timer interrupt happens at just the right time. References: https://bugzilla.redhat.com/show_bug.cgi?id=920289 Reported-by: Adam Williamson <awilliam@redhat.com> Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> ------------------------ drivers/cpufreq/intel_pstate.c ------------------------ index ad72922..6133ef5 100644 @@ -502,7 +502,6 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu) sample_time = cpu->pstate_policy->sample_rate_ms; delay = msecs_to_jiffies(sample_time); - delay -= jiffies % delay; mod_timer_pinned(&cpu->timer, jiffies + delay); }
Sorry I didn't notice that commit I'll turn pstates back on and see if I have any issues
Hi, Mike, is there any update?
The patch from #2 as well as the one from RedHat (sorry, I don't have a link at hand) didn't work in my case. I get, using the latest version from GIT, panics. Xeon 1230, ProLiant ML110 G7.
Confirm: still in 3.9.0 release. Getting fast and sure (seconds-minute): on PREEMPT, *PREEMPT_RCU (and some related), and starting ceph & 2 virtual machines with nice=-20. Once happened even on initrd.
This is being caused by the driver timer function being called back to back (sub 1us) which makes the sample duration 0. I am still trying to figure out how this is happening. The following patch removes the idle time and duration tracking. commit 37c6fc9eea3bdf77a7e76f75696a29ca86037274 Author: Dirk Brandewie <dirk.j.brandewie@intel.com> Date: Tue Apr 30 09:12:20 2013 -0700 cpufreq/intel_pstate: remove idle time and duration from sample and calculations Idle time is taken into account in the APERF/MPERF ratio calculation there is no reason for the driver to track it seperately. This reduces the work in the driver and makes the code more readable. Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> --- drivers/cpufreq/intel_pstate.c | 45 +++++++-------------------------------- 1 files changed, 8 insertions(+), 37 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 6b995dc..778dd72 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -48,12 +48,7 @@ static inline int32_t div_fp(int32_t x, int32_t y) } struct sample { - ktime_t start_time; - ktime_t end_time; int core_pct_busy; - int pstate_pct_busy; - u64 duration_us; - u64 idletime_us; u64 aperf; u64 mperf; int freq; @@ -91,8 +86,6 @@ struct cpudata { int min_pstate_count; int idle_mode; - ktime_t prev_sample; - u64 prev_idle_time_us; u64 prev_aperf; u64 prev_mperf; int sample_ptr; @@ -449,48 +442,26 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu, struct sample *sample) { u64 core_pct; - sample->pstate_pct_busy = 100 - div64_u64( - sample->idletime_us * 100, - sample->duration_us); core_pct = div64_u64(sample->aperf * 100, sample->mperf); sample->freq = cpu->pstate.max_pstate * core_pct * 1000; - sample->core_pct_busy = div_s64((sample->pstate_pct_busy * core_pct), - 100); + sample->core_pct_busy = core_pct; } static inline void intel_pstate_sample(struct cpudata *cpu) { - ktime_t now; - u64 idle_time_us; u64 aperf, mperf; - now = ktime_get(); - idle_time_us = get_cpu_idle_time_us(cpu->cpu, NULL); - rdmsrl(MSR_IA32_APERF, aperf); rdmsrl(MSR_IA32_MPERF, mperf); - /* for the first sample, don't actually record a sample, just - * set the baseline */ - if (cpu->prev_idle_time_us > 0) { - cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT; - cpu->samples[cpu->sample_ptr].start_time = cpu->prev_sample; - cpu->samples[cpu->sample_ptr].end_time = now; - cpu->samples[cpu->sample_ptr].duration_us = - ktime_us_delta(now, cpu->prev_sample); - cpu->samples[cpu->sample_ptr].idletime_us = - idle_time_us - cpu->prev_idle_time_us; - - cpu->samples[cpu->sample_ptr].aperf = aperf; - cpu->samples[cpu->sample_ptr].mperf = mperf; - cpu->samples[cpu->sample_ptr].aperf -= cpu->prev_aperf; - cpu->samples[cpu->sample_ptr].mperf -= cpu->prev_mperf; - - intel_pstate_calc_busy(cpu, &cpu->samples[cpu->sample_ptr]); - } + cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT; + cpu->samples[cpu->sample_ptr].aperf = aperf; + cpu->samples[cpu->sample_ptr].mperf = mperf; + cpu->samples[cpu->sample_ptr].aperf -= cpu->prev_aperf; + cpu->samples[cpu->sample_ptr].mperf -= cpu->prev_mperf; + + intel_pstate_calc_busy(cpu, &cpu->samples[cpu->sample_ptr]); - cpu->prev_sample = now; - cpu->prev_idle_time_us = idle_time_us; cpu->prev_aperf = aperf; cpu->prev_mperf = mperf; }
OK, now 4.5 hours uptime with this patch (by hands) over intel_pstate.c from -next... Looks fixed.
No, still unstable, but stuck silent and sometimes... PS But looking to roasted router PC and others - it can be bad star position.
Before: • Intel Xeon E3-1230, microcode 0x28 • Linux 3.9.0-rc8 (with patches which are part of current Linux 3.11.6) Changes: • Intel has released new microcode for some CPUs: Intel Xeon {3400, 5500, 5600, 7500, E3-{1200, 1200V2, 1200V3}, E5-{2400,2600,4600}, E7} Series → see the »specification updates« After/Then: • microcode 0x29 • Linux 3.11.6 (no differences to intel_pstate.c from ›before‹) I have not been able to reproduce the panics since ›Then‹. I suspect this is caused by changes to APIC page access-related issues in the new microcode.