Bug 11905 - lots of extra timer interrupts costing 2W
Summary: lots of extra timer interrupts costing 2W
Status: CLOSED CODE_FIX
Alias: None
Product: Timers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Thomas Gleixner
URL:
Keywords:
Depends on:
Blocks: 11808
  Show dependency tree
 
Reported: 2008-10-30 14:24 UTC by Rafael J. Wysocki
Modified: 2008-11-17 11:54 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.28-rc2-git3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Rafael J. Wysocki 2008-10-30 14:24:50 UTC
Subject    : [REGRESSION 2.6.28-rc2-git3] lots of extra timer interrupts costing 2W
Submitter  : "Theodore Ts'o" <tytso@mit.edu>
Date       : 2008-10-30 2:18
References : http://marc.info/?l=linux-kernel&m=122533314305315&w=4

This entry is being used for tracking a regression from 2.6.27.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2008-10-31 15:11:51 UTC
Caused by:

commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Oct 17 10:01:23 2008 +0200

    NOHZ: restart tick device from irq_enter()

    Reported-by: Elias Oltmanns <eo@nebensachen.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Elias Oltmanns <eo@nebensachen.de>

References: http://marc.info/?l=linux-kernel&m=122541849114444&w=4
Notify-Also : Venki Pallipadi <venkatesh.pallipadi@intel.com>
Comment 2 Rafael J. Wysocki 2008-11-02 13:37:11 UTC
On Sunday, 2 of November 2008, Theodore Tso wrote:
> On Sun, Nov 02, 2008 at 05:07:06PM +0100, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.27.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=11905
> > Subject             : lots of extra timer interrupts costing 2W
> > Submitter   : Theodore Ts'o <tytso@mit.edu>
> > Date                : 2008-10-30 2:18 (4 days old)
> > References  : http://marc.info/?l=linux-kernel&m=122533314305315&w=4
> >               http://marc.info/?l=linux-kernel&m=122541849114444&w=4
> 
> Yup, should still be listed.  
> 
> Is someone looking what the proper fix should be (other than reverting
> the commit in question)?  I've been running with the commit reverted
> on my laptop kernel for a couple of days now, with no problems.
Comment 3 Lukas Hejtmanek 2008-11-04 09:18:07 UTC
What is expected number of extra timer interrupts in idle desktop runnig firefox and gnome? I got about 180-250 interrupts per second which seems to be fairly high.
Comment 4 Linux ext4 mailing list 2008-11-04 12:05:28 UTC
With firefox running (and with a lot of tabs open), I currently have around 210-220 interrupts directly attributable to firefox, plus around 120-130 interrupts labelled as "extra timer interrupt", according to powertop.  If I kill firefox (and assuming that Lotus Notes, another major offender, as well gaim) isn't running, the number of "extra timer interrupts" numbers goes to zero and disappears, and the number of interrupts can be as low as 20 interrupts per second, total.
Comment 5 Rafael J. Wysocki 2008-11-09 09:59:10 UTC
Notify-Also : Lukas Hejtmanek <xhejtman@fi.muni.cz>
Comment 6 Rafael J. Wysocki 2008-11-09 11:16:56 UTC
First-Bad-Commit : fb02fbc14d17837b4b7b02dbb36142c16a7bf208
Comment 7 Łukasz Siudut 2008-11-12 00:50:57 UTC
I have the same problem. It's even worst: almost 1200 interrupts on idle with 2.6.28 (all rc's, compiled with SMP).
Comment 8 Andreas Mohr 2008-11-12 12:49:45 UTC
Same problem on -rc4 on AspireOne A110L Atom w/ HZ=250.

CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y

I would get exactly 250 "extra timer interrupt"s in powertop in case of an
entirely idle system, I guess (since it's currently ~ 230 plus 20 app wakeups).

Might possibly test it by reverting soon (I can already confirm that
u8.10 2.6.27-7 does NOT exhibit any significant number "extra timer
interrupts", though).
Comment 9 Rafael J. Wysocki 2008-11-16 09:24:12 UTC
Notify-Also : "Łukasz Siudut" <lsiudut@gmail.com>
Notify-Also : Andreas Mohr <andi@lisas.de>
Comment 10 Andreas Mohr 2008-11-16 14:22:25 UTC
I can confirm that with-rc5 as opposed to -rc4, extra timer interrupt flood issues on my machine are gone, thus if I was indeed experiencing the same bug, then it's fixed. Thanks!
Comment 11 Theodore Tso 2008-11-17 00:56:54 UTC
Hi, what's going on with this regression?  To refresh people's memory,
I tracked down the excess interrupts to a specific commit, fb02fbc1,
and I've been running quite happily with a patch to revert this
commit.

I'm not sure it's the right thing, but if no one has had any time to
figure out what's wrong with the original commit, can we please revert
it for 2.6.28?  I reported it over two weeks ago, and I've not heard
anything from about alternative ways of fixing this regression.

Thanks!!

						- Ted


commit c38a3882eeeb6dcb45c99f29c7b95595606eaf4d
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Fri Oct 31 12:37:31 2008 -0400

    Revert "NOHZ: restart tick device from irq_enter()"
    
    This reverts commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208 to avoid
    a large number of excess interrupts which burns a significant amount 
    on a X61s laptop (and probably many more); others have reported this
    problem as well on the bugzilla entry:

    http://bugzilla.kernel.org/show_bug.cgi?id=11905
    
    Conflicts:
    
    	kernel/time/tick-sched.c

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index f98a1b7..cb01cd8 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -384,19 +384,6 @@ int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
 }
 
 /*
- * Called from irq_enter() when idle was interrupted to reenable the
- * per cpu device.
- */
-void tick_check_oneshot_broadcast(int cpu)
-{
-	if (cpu_isset(cpu, tick_broadcast_oneshot_mask)) {
-		struct tick_device *td = &per_cpu(tick_cpu_device, cpu);
-
-		clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_ONESHOT);
-	}
-}
-
-/*
  * Handle oneshot mode broadcasting
  */
 static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index b1c05bf..4692487 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -36,7 +36,6 @@ extern void tick_broadcast_switch_to_oneshot(void);
 extern void tick_shutdown_broadcast_oneshot(unsigned int *cpup);
 extern int tick_resume_broadcast_oneshot(struct clock_event_device *bc);
 extern int tick_broadcast_oneshot_active(void);
-extern void tick_check_oneshot_broadcast(int cpu);
 # else /* BROADCAST */
 static inline void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
 {
@@ -46,7 +45,6 @@ static inline void tick_broadcast_oneshot_control(unsigned long reason) { }
 static inline void tick_broadcast_switch_to_oneshot(void) { }
 static inline void tick_shutdown_broadcast_oneshot(unsigned int *cpup) { }
 static inline int tick_broadcast_oneshot_active(void) { return 0; }
-static inline void tick_check_oneshot_broadcast(int cpu) { }
 # endif /* !BROADCAST */
 
 #else /* !ONESHOT */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5bbb104..c8b3d8d 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -510,6 +510,10 @@ static void tick_nohz_handler(struct clock_event_device *dev)
 	update_process_times(user_mode(regs));
 	profile_tick(CPU_PROFILING);
 
+	/* Do not restart, when we are in the idle loop */
+	if (ts->tick_stopped)
+		return;
+
 	while (tick_nohz_reprogram(ts, now)) {
 		now = ktime_get();
 		tick_do_update_jiffies64(now);
@@ -555,37 +559,6 @@ static void tick_nohz_switch_to_nohz(void)
 	       smp_processor_id());
 }
 
-/*
- * When NOHZ is enabled and the tick is stopped, we need to kick the
- * tick timer from irq_enter() so that the jiffies update is kept
- * alive during long running softirqs. That's ugly as hell, but
- * correctness is key even if we need to fix the offending softirq in
- * the first place.
- *
- * Note, this is different to tick_nohz_restart. We just kick the
- * timer and do not touch the other magic bits which need to be done
- * when idle is left.
- */
-static void tick_nohz_kick_tick(int cpu)
-{
-	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-	ktime_t delta, now;
-
-	if (!ts->tick_stopped)
-		return;
-
-	/*
-	 * Do not touch the tick device, when the next expiry is either
-	 * already reached or less/equal than the tick period.
-	 */
-	now = ktime_get();
-	delta =	ktime_sub(hrtimer_get_expires(&ts->sched_timer), now);
-	if (delta.tv64 <= tick_period.tv64)
-		return;
-
-	tick_nohz_restart(ts, now);
-}
-
 #else
 
 static inline void tick_nohz_switch_to_nohz(void) { }
@@ -597,11 +570,9 @@ static inline void tick_nohz_switch_to_nohz(void) { }
  */
 void tick_check_idle(int cpu)
 {
-	tick_check_oneshot_broadcast(cpu);
 #ifdef CONFIG_NO_HZ
 	tick_nohz_stop_idle(cpu);
 	tick_nohz_update_jiffies();
-	tick_nohz_kick_tick(cpu);
 #endif
 }
 
@@ -658,6 +629,10 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
 		profile_tick(CPU_PROFILING);
 	}
 
+	/* Do not restart, when we are in the idle loop */
+	if (ts->tick_stopped)
+		return HRTIMER_NORESTART;
+
 	hrtimer_forward(timer, now, tick_period);
 
 	return HRTIMER_RESTART;
Comment 12 Frans Pop 2008-11-17 01:36:59 UTC
Theodore Tso wrote:
> Hi, what's going on with this regression?  To refresh people's memory,
> I tracked down the excess interrupts to a specific commit, fb02fbc1,
> and I've been running quite happily with a patch to revert this
> commit.

Isn't that solved by this commit (included in -rc5)?

commit ae99286b4f1be7788f2d6947c66a91dbd6351eec
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Nov 10 13:20:23 2008 +0100

    nohz: disable tick_nohz_kick_tick() for now
Comment 13 Theodore Tso 2008-11-17 11:54:53 UTC
Yep, so it is.  Closing the bugzilla entry.
Comment 14 Theodore Tso 2008-11-17 11:56:16 UTC
On Mon, Nov 17, 2008 at 10:36:48AM +0100, Frans Pop wrote:
> Theodore Tso wrote:
> > Hi, what's going on with this regression?  To refresh people's memory,
> > I tracked down the excess interrupts to a specific commit, fb02fbc1,
> > and I've been running quite happily with a patch to revert this
> > commit.
> 
> Isn't that solved by this commit (included in -rc5)?
> 
> commit ae99286b4f1be7788f2d6947c66a91dbd6351eec
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Mon Nov 10 13:20:23 2008 +0100
> 
>     nohz: disable tick_nohz_kick_tick() for now

Yep, so it is.  Sorry, I missed that; I haven't migrated to -rc5 yet.

I'll close the Bugzilla entry.

							- Ted

Note You need to log in before you can comment on or make changes to this bug.