Bug 16147 - ksoftirq hogs the CPU
Summary: ksoftirq hogs the CPU
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Corey Minyard
URL:
Keywords:
Depends on:
Blocks: 15310
  Show dependency tree
 
Reported: 2010-06-07 15:17 UTC by Thomas Jarosch
Modified: 2010-07-09 22:28 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.34
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
revert default behavior back to schedule_timeout_interruptible(1) (430 bytes, patch)
2010-06-11 17:14 UTC, Martin Wilck
Details | Diff

Description Thomas Jarosch 2010-06-07 15:17:20 UTC
Hello,

after upgrading from kernel 2.6.32 to kernel 2.6.34, I can see two ksoftirqd
tasks constantly eating around ~20% CPU on a HP Proliant DL320G6.

Some more information:
- Downgrading to 2.6.32 fixes it
- HP Proliant DL320G4 and HP Proliant DL320G5 are not affected,
  same is true for around eight other systems using different mainboards/CPUs.
- We've switched to CONFIG_NO_HZ (tickless) in the 2.6.32 -> 2.6.34 upgrade
- Ubuntu report of the same issue with 2.6.34:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/575392

Here's the output from /proc/interrupts:
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
  0:         44          0          0     396123          0          0          0          0   IO-APIC-edge      timer
  1:          0          0          0          2          0          0          0          0   IO-APIC-edge      i8042
  8:          0          0          0        596          0          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          0          0          3          0          0          0          0   IO-APIC-edge      i8042
 20:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb1, ehci_hcd:usb4, uhci_hcd:usb6
 21:          0          0       7998          0          0          0          0          0   IO-APIC-fasteoi   ipmi_si
 22:          0         38          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb5, uhci_hcd:usb8, uhci_hcd:usb9
 23:          0          0         48          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb3, uhci_hcd:usb7
 55:      18743          0          0          0          0          0          0          0   PCI-MSI-edge      cciss0
 57:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-rx-0
 58:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1-tx-0
 59:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1
 60:          0          0          0          0          0       3627          0          0   PCI-MSI-edge      eth0
 61:          0          0          0          0          0          0         51          0   PCI-MSI-edge      eth2
NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
LOC:     143033     245518     231358      18221      69809      33569      48544      17431   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0          0          0          0          0   Performance monitoring interrupts
PND:          0          0          0          0          0          0          0          0   Performance pending work
RES:        258        165         73         63        114        246        150         75   Rescheduling interrupts
CAL:         86        150        149        104        138        133        132        136   Function call interrupts
TLB:        113         62         76         62         82        124        127        130   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          5          5          5          5          5          5          5          5   Machine check polls
ERR:          7
MIS:          0

The counter for "IO-APIC-edge      timer" on CPU3 is constantly increasing
by around ~500-1000 per second. Is there any issue like this known?

Output from powertop:
Top causes for wakeups:
  37.0% (607.0)     <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 
  19.0% (312.3)            kipmi0 : schedule_timeout_interruptible (process_timeout) 
  15.3% (251.7)       <interrupt> : extra timer interrupt 
  15.2% (250.3)     <kernel core> : add_timer (smi_timeout) 
   5.4% ( 88.7)     <kernel core> : hrtimer_start (tick_sched_timer) 
   5.1% ( 83.7)             squid : sys_epoll_wait (process_timeout) 
   0.6% ( 10.0)     <kernel core> : ipmi_timeout (ipmi_timeout) 

I'll try 2.6.33.5 next as I have binary packages for it on file.

Thanks in advance,
Thomas
Comment 1 Thomas Jarosch 2010-06-07 15:51:44 UTC
Kernel 2.6.33.5 is also unaffected. The 2.6.33.5 kernel is also tickless, I'll try to bisect tomorrow.
Comment 2 Thomas Jarosch 2010-06-08 13:38:14 UTC
Here's the bisect result. Looks like it's not related to the timers at all:

[root@intranator linux-2.6]# git bisect good
ae74e823cb7d4cd476f623fce9a38f625f6c09a8 is the first bad commit
commit ae74e823cb7d4cd476f623fce9a38f625f6c09a8
Author: Martin Wilck <martin.wilck@ts.fujitsu.com>
Date:   Wed Mar 10 15:23:06 2010 -0800

    ipmi: add parameter to limit CPU usage in kipmid

    In some cases kipmid can use a lot of CPU.  This adds a way to tune the
    CPU used by kipmid to help in those cases.  By setting kipmid_max_busy_us
    to a value between 100 and 500, it is possible to bring down kipmid CPU
    load to practically 0 without loosing too much ipmi throughput
    performance.  Not setting the value, or setting the value to zero,
    operation is unaffected.

    Signed-off-by: Martin Wilck <martin.wilck@ts.fujitsu.com>
    Signed-off-by: Corey Minyard <cminyard@mvista.com>
    Cc: Jean Delvare <jdelvare@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 dde007c89f875dd7553ea13b644da5148d2b9299 b036bc77d136e9cbb702b38dd9ab6cd2af361ea5 M      Documentation
:040000 040000 1ef18e58aca5e025d0ea0691b1823ff9efd06c54 d3025083ab5f79e10ba4651815dda8d86e021766 M      drivers


I've manually checked out this commit and the commit before it
just to verify I didn't mess up the bisect process.

powertop output without the patch:
Top causes for wakeups:
  39.3% (661.0)     <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 
  14.7% (246.7)     <kernel core> : add_timer (smi_timeout) 
  14.4% (241.7)            kipmi0 : schedule_timeout_interruptible (process_timeout) 
  11.9% (200.3)     <kernel core> : hrtimer_start (tick_sched_timer) 
   9.4% (157.7)       <interrupt> : extra timer interrupt 
   5.0% ( 83.7)             squid : sys_epoll_wait (process_timeout) 
   2.6% ( 44.0)       <interrupt> : cciss0 
   0.6% ( 10.0)     <kernel core> : ipmi_timeout (ipmi_timeout)

powertop with the ipmi patch:
Top causes for wakeups:
  43.7% (858.7)     <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 
  25.3% (498.0)            kipmi0 : schedule_timeout_interruptible (process_timeout) 
  12.7% (250.0)     <kernel core> : add_timer (smi_timeout) 
   6.7% (132.0)     <kernel core> : hrtimer_start (tick_sched_timer) 
   4.2% ( 83.0)             squid : sys_epoll_wait (process_timeout) 
   2.9% ( 56.7)       <interrupt> : extra timer interrupt 
   2.1% ( 41.7)       <interrupt> : cciss0 
   0.5% ( 10.0)     <kernel core> : ipmi_timeout (ipmi_timeout)

I can sometimes even see 800 wake ups from "kipmi0" with the patch.

The sysfs text mentions it hogs the CPU by default. Shouldn't this 
be the other way around? I don't use any impitool userspace program at all.

According to the ubuntu bug report
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/575392),
this also happens on a MacBook box, I guess mobile users are
not pleased by the shortened battery life ;)

Cheers,
Thomas
Comment 3 Andrew Morton 2010-06-10 22:29:38 UTC
On Mon, 7 Jun 2010 15:17:23 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16147

(ae74e823cb7d4cd476f623fce9a38f625f6c09a8 "ipmi: add parameter to limit
CPU usage in kipmid" causes ksoftirq meltdown).

OK, this is tricky.  It's a 2.6.33 -> 2.6.34 regression caused by an
IPMI change.  The patch doesn't revert easily from mainline because
it's somewhat tangled up with Matthew's
3326f4f2276791561af1fd5f2020be0186459813 ("ipmi: reduce polling").

For -stable we could just revert
ae74e823cb7d4cd476f623fce9a38f625f6c09a8 as Thomas has done.  For
mainline we'd need to disentangle the revert from Matthew's change, or
revert both.  Or, of course, just fix the bug.

Corey, do you have the time to nail this issue for us?

Thanks.
Comment 4 Corey Minyard 2010-06-11 02:14:37 UTC
I'll work on this tomorrow morning.
Comment 5 Martin Wilck 2010-06-11 08:30:09 UTC
It must be this part of the changeset.

                        schedule();
                else
-                       schedule_timeout_interruptible(1);
+                       schedule_timeout_interruptible(0);
        }
        return 0;

Looks as if we finally found out why the old code had schedule_timeout_interruptible(1)...
Comment 6 Rafael J. Wysocki 2010-06-11 15:34:27 UTC
First-Bad-Commit : ae74e823cb7d4cd476f623fce9a38f625f6c09a8
Comment 7 Corey Minyard 2010-06-11 16:42:20 UTC
This is working fine on my system, with or without the change Martin suggests.

Thomas, you can try Martin's suggestion?  Looking at the patch, that's the only thing that should really change any behavior by default.  Due to some other changes, it will look like:

Index: linux-2.6/drivers/char/ipmi/ipmi_si_intf.c
===================================================================
--- linux-2.6.orig/drivers/char/ipmi/ipmi_si_intf.c
+++ linux-2.6/drivers/char/ipmi/ipmi_si_intf.c
@@ -1018,7 +1018,7 @@ static int ipmi_thread(void *data)
 		else if (smi_result == SI_SM_IDLE)
 			schedule_timeout_interruptible(100);
 		else
-			schedule_timeout_interruptible(0);
+			schedule_timeout_interruptible(1);
 	}
 	return 0;
 }
Comment 8 Martin Wilck 2010-06-11 17:14:48 UTC
Created attachment 26732 [details]
revert default behavior back to schedule_timeout_interruptible(1)
Comment 9 Thomas Jarosch 2010-06-12 09:11:34 UTC
Patch works like a charm!

Powertop output with the patch applied:
Top causes for wakeups:
  57.1% (783.0)     <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 
  18.2% (250.0)            kipmi0 : schedule_timeout_interruptible (process_timeout) 
   9.1% (125.0)     <kernel core> : add_timer (smi_timeout) 
   6.1% ( 83.3)             squid : sys_epoll_wait (process_timeout) 
   5.7% ( 78.3)       <interrupt> : extra timer interrupt 
   0.7% ( 10.0)     <kernel core> : ipmi_timeout (ipmi_timeout) 

The box is idle again. Thanks.
Comment 10 Rafael J. Wysocki 2010-06-13 14:09:09 UTC
Handled-By : Martin Wilck <martin.wilck@ts.fujitsu.com>
Patch : https://bugzilla.kernel.org/attachment.cgi?id=26732
Comment 11 Rafael J. Wysocki 2010-07-09 22:28:57 UTC
Fixed by commit 8d1f66dc9b4f80a1441bc1c33efa98aca99e8813 .

Note You need to log in before you can comment on or make changes to this bug.