Bug 22272

Summary: High [extra timer interrupt] count in powertop since 2.6.36
Product: Other Reporter: Maciej Rutecki (maciej.rutecki)
Component: OtherAssignee: other_other
Status: CLOSED INVALID    
Severity: normal CC: arjan, bgamari, florian, lotekware, maciej.rutecki, pomac, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.36 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16444    

Description Maciej Rutecki 2010-11-07 07:43:33 UTC
Subject    : High [extra timer interrupt] count in powertop since 2.6.36
Submitter  : Ian Kumlien <pomac@demius.net>
Date       : 2010-10-30 23:52
Message-ID : alpine.LNX.2.00.1010310148450.24290@twilight.pomac.com
References : http://marc.info/?l=linux-kernel&m=128848330304431&w=2

This entry is being used for tracking a regression from 2.6.35. Please don't
close it until the problem is fixed in the mainline.
Comment 1 Jordan Roi 2010-11-08 14:15:08 UTC
Looks like there is a patch here for high "extra timer interrupt" values in 2.6.36: http://lkml.org/lkml/2010/9/28/115
Comment 2 Ian Kumlien 2010-11-12 00:14:19 UTC
I missed the patch and will try that next, i just wanted to add that the timer count seems to increase trough uptime - I upgraded to 2.6.37-rc1-git8 to test and it was fine, but right now it peaks at 300 extra timer interrupts and thats after ~25 hours uptime.
Comment 3 Ian Kumlien 2010-11-13 23:02:48 UTC
2.6.37-rc1-git8 + the patch - no success
Comment 4 Ian Kumlien 2010-11-13 23:32:34 UTC
I think this is related to:
Clocksource tsc unstable (delta = -25767719801 ns)
Switching to clocksource hpet

On this AMD cpu that has a constant_tsc flag - seems to happen after 24 hours or so, then the interrupts keep increasing.

So it looks like it's related to this:
http://www.gossamer-threads.com/lists/linux/kernel/1294035

(I don't know if there is a bug entry for it)
Comment 5 Ian Kumlien 2010-11-16 00:20:38 UTC
Without the patch, there is no loss of tsc clock (at least not yet) but the extra timer interrupt wakeups keep increasing...
Comment 6 Ian Kumlien 2010-11-20 15:33:21 UTC
It turns out that you shouldn't trust powertop.

I'm gonna clean up ans submit this patch:

diff --git a/powertop.c b/powertop.c
index 74eb328..9b2ada7 100644
--- a/powertop.c
+++ b/powertop.c
@@ -241,6 +241,7 @@ static void do_proc_irq(void)
                return;
        while (!feof(file)) {
                char *c;
+               char *start;
                int nr = -1;
                uint64_t count = 0;
                int special = 0;
@@ -252,23 +253,17 @@ static void do_proc_irq(void)
                if (!c)
                        continue;
                /* deal with NMI and the like.. make up fake nrs */
-               if (line[0] != ' ' && (line[0] < '0' || line[0] > '9')) {
-                       if (strncmp(line,"NMI:", 4)==0)
-                               nr=20000;
-                       if (strncmp(line,"RES:", 4)==0)
-                               nr=20001;
-                       if (strncmp(line,"CAL:", 4)==0)
-                               nr=20002;
-                       if (strncmp(line,"TLB:", 4)==0)
-                               nr=20003;
-                       if (strncmp(line,"TRM:", 4)==0)
-                               nr=20004;
-                       if (strncmp(line,"THR:", 4)==0)
-                               nr=20005;
-                       if (strncmp(line,"SPU:", 4)==0)
-                               nr=20006;
+               start = line;
+               while (*start == ' ')
+                       start++;
+               if (isalpha(*start))
+               {
+#define MAKE4(ch0, ch1, ch2, ch3) (int)(ch0 | (ch1 << 8) | (ch2 << 16) | (ch3 << 24))
+                       nr = MAKE4(start[0],start[1],start[2],start[3]);
                        special = 1;
-               } else
+#undef MAKE4
+               }
+               else
                        nr = strtoull(line, NULL, 10);
 
                if (nr==-1)
Comment 7 Ian Kumlien 2010-11-20 23:49:20 UTC
If you want to track it:
http://www.bughost.org/pipermail/power/2010-November/002029.html

I think that this could be set to closed - I doubt it's a actual kernel issue beyond changing the format of /proc/interrupts.
Comment 8 Florian Mickler 2010-12-03 21:20:52 UTC
Thx for following up on this.
Comment 9 Florian Mickler 2010-12-04 10:00:01 UTC
Uhm.. btw, did you find when this change to /proc/interrupts was introduced? 
Chances are, that this was an accidential change in the kernel ABI which should be reversed.
Comment 10 Ian Kumlien 2010-12-04 17:05:34 UTC
one machine:

cat /proc/interrupts 
           CPU0       CPU1       
...
NMI:          0          0   Non-maskable interrupts
LOC:   23508960   26173864   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RES:   39035305   37826125   Rescheduling interrupts
CAL:      76547      75363   Function call interrupts
TLB:     105737     107007   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        384        384   Machine check polls
ERR:          1
MIS:          0

Other machine:
cat /proc/interrupts 
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
....
 NMI:          0          0          0          0          0          0   Non-maskable interrupts
 LOC:   17167860   16612451    7879817    6802772    5827157    2049716   Local timer interrupts
 SPU:          0          0          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0          0   Performance monitoring interrupts
 IWI:          0          0          0          0          0          0   IRQ work interrupts
 RES:    7663728    5396717    3521158    2472727    1903596    3524534   Rescheduling interrupts
 CAL:      80233      71037      88573      95769     109727      71864   Function call interrupts
 TLB:     144910     149537     103547     102747     103824     105271   TLB shootdowns
 TRM:          0          0          0          0          0          0   Thermal event interrupts
 THR:          0          0          0          0          0          0   Threshold APIC interrupts
 MCE:          0          0          0          0          0          0   Machine check exceptions
 MCP:       1068       1068       1068       1068       1068       1068   Machine check polls
 ERR:          0
 MIS:          0

Same kernel.

But it's not only that - it's the check for the "labels" that is outdated.
Comment 11 Florian Mickler 2010-12-06 16:38:33 UTC
Can you also provide /proc/interrupts from a good kernel?
Comment 12 Ian Kumlien 2010-12-09 00:44:40 UTC
           CPU0       CPU1       
NMI:          0          0   Non-maskable interrupts
LOC:   83064988   44758271   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:     534211     599941   Rescheduling interrupts
CAL:     201314    1250582   Function call interrupts
TLB:    1519795    1547577   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:       4654       4654   Machine check polls
ERR:          1
MIS:          0

Would be better MCP and ERR might be counted as something unexpected but since thats during 16 days it wont affect powertop in the same way.

The space in the beginning of my six core machine is odd but beyond that it's a matter of adding support for the new labels in powertop.
Comment 13 Florian Mickler 2010-12-13 13:20:00 UTC
Btw, the commit that changed "PND" to "IWI" is 

commit e360adbe29241a0194e10e20595360dd7b98a2b3
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Thu Oct 14 14:01:34 2010 +0800

    irq_work: Add generic hardirq context callbacks
Comment 14 Florian Mickler 2010-12-13 13:30:33 UTC
Also after having made myself intimate with this bug, I agree with your conclusion, that this is just a case of buggy userspace.  

I'm closing this as invalid.