Bug 56691 - Using Intel Pstates driver causes panics during high loads
Summary: Using Intel Pstates driver causes panics during high loads
Status: CLOSED CODE_FIX
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Dirk Brandewie
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-16 17:31 UTC by Mike Lothian
Modified: 2013-11-13 21:07 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.9-rc
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Panic Screenshots (79 bytes, text/plain)
2013-04-16 17:34 UTC, Mike Lothian
Details

Description Mike Lothian 2013-04-16 17:31:06 UTC
I've already had a conversation via email with Zhang about this

The driver has been broken throughout the 3.9 rc series

As the issue only manifests during high loads after long periods of time I've been able to bisect the issue

I've taken screenshots of the panics

Disabling the pstates driver stops the panics

The first snapshots date back to the 7th March but the issue was noticed before this

I update from Linus' tree nightly
Comment 1 Mike Lothian 2013-04-16 17:34:59 UTC
Created attachment 98881 [details]
Panic Screenshots
Comment 2 Dirk Brandewie 2013-04-16 17:45:39 UTC
This is the same bug as reported by Redhat 
https://bugzilla.redhat.com/show_bug.cgi?id=952244

I took a long time to reproduce and catch it in the act and get debug info :-(

The fix missed rc6 by a few hours but is in rc7. 

--Dirk
Author: Dirk Brandewie <dirk.brandewie@gmail.com>  2013-04-04 10:35:35
Committer: Rafael J. Wysocki <rafael.j.wysocki@intel.com>  2013-04-08 13:09:23
Parent: 05e99c8cf9d4e53ef6e016815db40a89a6156529 (intel-pstate: Use #defines instead of hard-coded values.)
Branches: master, pstate, pstate.stgit, remotes/linux-pm/bleeding-edge, remotes/linux-pm/fixes, remotes/linux-pm/linux-next, remotes/next/akpm, remotes/next/akpm-base, remotes/next/master, remotes/next/stable, remotes/origin/master, test_next
Follows: v3.9-rc6
Precedes: pm-3.9-rc7

    cpufreq / intel_pstate: Set timer timeout correctly
    
    The current calculation of the delay time is wrong and a cut and
    paste error from a previous experimental driver.  This can result in
    the timeout being set to jiffies + 1 which setup the driver to race
    with itself if the APIC timer interrupt happens at just the right
    time.
    
    References: https://bugzilla.redhat.com/show_bug.cgi?id=920289
    Reported-by: Adam Williamson <awilliam@redhat.com>
    Reported-and-tested-by: Parag Warudkar <parag.lkml@gmail.com>
    Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

------------------------ drivers/cpufreq/intel_pstate.c ------------------------
index ad72922..6133ef5 100644
@@ -502,7 +502,6 @@ static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 
 	sample_time = cpu->pstate_policy->sample_rate_ms;
 	delay = msecs_to_jiffies(sample_time);
-	delay -= jiffies % delay;
 	mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
Comment 3 Mike Lothian 2013-04-16 18:03:20 UTC
Sorry I didn't notice that commit 

I'll turn pstates back on and see if I have any issues
Comment 4 Zhang Rui 2013-04-25 02:28:00 UTC
Hi, Mike,
is there any update?
Comment 5 Mark Kubacki 2013-05-04 22:11:47 UTC
The patch from #2 as well as the one from RedHat (sorry, I don't have a link at hand) didn't work in my case. I get, using the latest version from GIT, panics.

Xeon 1230, ProLiant ML110 G7.
Comment 6 Dzianis Kahanovich 2013-05-06 10:55:18 UTC
Confirm: still in 3.9.0 release. Getting fast and sure (seconds-minute): on PREEMPT, *PREEMPT_RCU (and some related), and starting ceph & 2 virtual machines with nice=-20. Once happened even on initrd.
Comment 7 Dirk Brandewie 2013-05-06 19:26:11 UTC
This is being caused by the driver timer function being called back to back (sub 1us) which makes the sample duration 0.  I am still trying to figure out how this is happening.  The following patch removes the idle time and duration tracking. 

commit 37c6fc9eea3bdf77a7e76f75696a29ca86037274
Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
Date:   Tue Apr 30 09:12:20 2013 -0700

    cpufreq/intel_pstate: remove idle time and duration from sample and calculations
    
    Idle time is taken into account in the APERF/MPERF ratio calculation
    there is no reason for the driver to track it seperately.  This
    reduces the work in the driver and makes the code more readable.
    
    Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   45 +++++++--------------------------------
 1 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 6b995dc..778dd72 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -48,12 +48,7 @@ static inline int32_t div_fp(int32_t x, int32_t y)
 }
 
 struct sample {
-	ktime_t start_time;
-	ktime_t end_time;
 	int core_pct_busy;
-	int pstate_pct_busy;
-	u64 duration_us;
-	u64 idletime_us;
 	u64 aperf;
 	u64 mperf;
 	int freq;
@@ -91,8 +86,6 @@ struct cpudata {
 	int min_pstate_count;
 	int idle_mode;
 
-	ktime_t prev_sample;
-	u64	prev_idle_time_us;
 	u64	prev_aperf;
 	u64	prev_mperf;
 	int	sample_ptr;
@@ -449,48 +442,26 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu,
 					struct sample *sample)
 {
 	u64 core_pct;
-	sample->pstate_pct_busy = 100 - div64_u64(
-					sample->idletime_us * 100,
-					sample->duration_us);
 	core_pct = div64_u64(sample->aperf * 100, sample->mperf);
 	sample->freq = cpu->pstate.max_pstate * core_pct * 1000;
 
-	sample->core_pct_busy = div_s64((sample->pstate_pct_busy * core_pct),
-					100);
+	sample->core_pct_busy = core_pct;
 }
 
 static inline void intel_pstate_sample(struct cpudata *cpu)
 {
-	ktime_t now;
-	u64 idle_time_us;
 	u64 aperf, mperf;
 
-	now = ktime_get();
-	idle_time_us = get_cpu_idle_time_us(cpu->cpu, NULL);
-
 	rdmsrl(MSR_IA32_APERF, aperf);
 	rdmsrl(MSR_IA32_MPERF, mperf);
-	/* for the first sample, don't actually record a sample, just
-	 * set the baseline */
-	if (cpu->prev_idle_time_us > 0) {
-		cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT;
-		cpu->samples[cpu->sample_ptr].start_time = cpu->prev_sample;
-		cpu->samples[cpu->sample_ptr].end_time = now;
-		cpu->samples[cpu->sample_ptr].duration_us =
-			ktime_us_delta(now, cpu->prev_sample);
-		cpu->samples[cpu->sample_ptr].idletime_us =
-			idle_time_us - cpu->prev_idle_time_us;
-
-		cpu->samples[cpu->sample_ptr].aperf = aperf;
-		cpu->samples[cpu->sample_ptr].mperf = mperf;
-		cpu->samples[cpu->sample_ptr].aperf -= cpu->prev_aperf;
-		cpu->samples[cpu->sample_ptr].mperf -= cpu->prev_mperf;
-
-		intel_pstate_calc_busy(cpu, &cpu->samples[cpu->sample_ptr]);
-	}
+	cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT;
+	cpu->samples[cpu->sample_ptr].aperf = aperf;
+	cpu->samples[cpu->sample_ptr].mperf = mperf;
+	cpu->samples[cpu->sample_ptr].aperf -= cpu->prev_aperf;
+	cpu->samples[cpu->sample_ptr].mperf -= cpu->prev_mperf;
+
+	intel_pstate_calc_busy(cpu, &cpu->samples[cpu->sample_ptr]);
 
-	cpu->prev_sample = now;
-	cpu->prev_idle_time_us = idle_time_us;
 	cpu->prev_aperf = aperf;
 	cpu->prev_mperf = mperf;
 }
Comment 8 Dzianis Kahanovich 2013-05-07 16:30:07 UTC
OK, now 4.5 hours uptime with this patch (by hands) over intel_pstate.c from -next... Looks fixed.
Comment 9 Dzianis Kahanovich 2013-05-08 11:12:47 UTC
No, still unstable, but stuck silent and sometimes...

PS But looking to roasted router PC and others - it can be bad star position.
Comment 10 Mark Kubacki 2013-10-27 18:37:52 UTC
Before: 
• Intel Xeon E3-1230, microcode 0x28
• Linux 3.9.0-rc8 (with patches which are part of current Linux 3.11.6)

Changes:
• Intel has released new microcode for some CPUs: Intel Xeon {3400, 5500, 5600, 7500, E3-{1200, 1200V2, 1200V3}, E5-{2400,2600,4600}, E7} Series
→ see the »specification updates«

After/Then: 
• microcode 0x29
• Linux 3.11.6 (no differences to intel_pstate.c from ›before‹)

I have not been able to reproduce the panics since ›Then‹. I suspect this is caused by changes to APIC page access-related issues in the new microcode.

Note You need to log in before you can comment on or make changes to this bug.