Hello, after upgrading from kernel 2.6.32 to kernel 2.6.34, I can see two ksoftirqd tasks constantly eating around ~20% CPU on a HP Proliant DL320G6. Some more information: - Downgrading to 2.6.32 fixes it - HP Proliant DL320G4 and HP Proliant DL320G5 are not affected, same is true for around eight other systems using different mainboards/CPUs. - We've switched to CONFIG_NO_HZ (tickless) in the 2.6.32 -> 2.6.34 upgrade - Ubuntu report of the same issue with 2.6.34: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/575392 Here's the output from /proc/interrupts: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 44 0 0 396123 0 0 0 0 IO-APIC-edge timer 1: 0 0 0 2 0 0 0 0 IO-APIC-edge i8042 8: 0 0 0 596 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 0 0 0 3 0 0 0 0 IO-APIC-edge i8042 20: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb4, uhci_hcd:usb6 21: 0 0 7998 0 0 0 0 0 IO-APIC-fasteoi ipmi_si 22: 0 38 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb5, uhci_hcd:usb8, uhci_hcd:usb9 23: 0 0 48 0 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3, uhci_hcd:usb7 55: 18743 0 0 0 0 0 0 0 PCI-MSI-edge cciss0 57: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1-rx-0 58: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1-tx-0 59: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1 60: 0 0 0 0 0 3627 0 0 PCI-MSI-edge eth0 61: 0 0 0 0 0 0 51 0 PCI-MSI-edge eth2 NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 143033 245518 231358 18221 69809 33569 48544 17431 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts PND: 0 0 0 0 0 0 0 0 Performance pending work RES: 258 165 73 63 114 246 150 75 Rescheduling interrupts CAL: 86 150 149 104 138 133 132 136 Function call interrupts TLB: 113 62 76 62 82 124 127 130 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 5 5 5 5 5 5 5 5 Machine check polls ERR: 7 MIS: 0 The counter for "IO-APIC-edge timer" on CPU3 is constantly increasing by around ~500-1000 per second. Is there any issue like this known? Output from powertop: Top causes for wakeups: 37.0% (607.0) <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 19.0% (312.3) kipmi0 : schedule_timeout_interruptible (process_timeout) 15.3% (251.7) <interrupt> : extra timer interrupt 15.2% (250.3) <kernel core> : add_timer (smi_timeout) 5.4% ( 88.7) <kernel core> : hrtimer_start (tick_sched_timer) 5.1% ( 83.7) squid : sys_epoll_wait (process_timeout) 0.6% ( 10.0) <kernel core> : ipmi_timeout (ipmi_timeout) I'll try 2.6.33.5 next as I have binary packages for it on file. Thanks in advance, Thomas
Kernel 2.6.33.5 is also unaffected. The 2.6.33.5 kernel is also tickless, I'll try to bisect tomorrow.
Here's the bisect result. Looks like it's not related to the timers at all: [root@intranator linux-2.6]# git bisect good ae74e823cb7d4cd476f623fce9a38f625f6c09a8 is the first bad commit commit ae74e823cb7d4cd476f623fce9a38f625f6c09a8 Author: Martin Wilck <martin.wilck@ts.fujitsu.com> Date: Wed Mar 10 15:23:06 2010 -0800 ipmi: add parameter to limit CPU usage in kipmid In some cases kipmid can use a lot of CPU. This adds a way to tune the CPU used by kipmid to help in those cases. By setting kipmid_max_busy_us to a value between 100 and 500, it is possible to bring down kipmid CPU load to practically 0 without loosing too much ipmi throughput performance. Not setting the value, or setting the value to zero, operation is unaffected. Signed-off-by: Martin Wilck <martin.wilck@ts.fujitsu.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> :040000 040000 dde007c89f875dd7553ea13b644da5148d2b9299 b036bc77d136e9cbb702b38dd9ab6cd2af361ea5 M Documentation :040000 040000 1ef18e58aca5e025d0ea0691b1823ff9efd06c54 d3025083ab5f79e10ba4651815dda8d86e021766 M drivers I've manually checked out this commit and the commit before it just to verify I didn't mess up the bisect process. powertop output without the patch: Top causes for wakeups: 39.3% (661.0) <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 14.7% (246.7) <kernel core> : add_timer (smi_timeout) 14.4% (241.7) kipmi0 : schedule_timeout_interruptible (process_timeout) 11.9% (200.3) <kernel core> : hrtimer_start (tick_sched_timer) 9.4% (157.7) <interrupt> : extra timer interrupt 5.0% ( 83.7) squid : sys_epoll_wait (process_timeout) 2.6% ( 44.0) <interrupt> : cciss0 0.6% ( 10.0) <kernel core> : ipmi_timeout (ipmi_timeout) powertop with the ipmi patch: Top causes for wakeups: 43.7% (858.7) <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 25.3% (498.0) kipmi0 : schedule_timeout_interruptible (process_timeout) 12.7% (250.0) <kernel core> : add_timer (smi_timeout) 6.7% (132.0) <kernel core> : hrtimer_start (tick_sched_timer) 4.2% ( 83.0) squid : sys_epoll_wait (process_timeout) 2.9% ( 56.7) <interrupt> : extra timer interrupt 2.1% ( 41.7) <interrupt> : cciss0 0.5% ( 10.0) <kernel core> : ipmi_timeout (ipmi_timeout) I can sometimes even see 800 wake ups from "kipmi0" with the patch. The sysfs text mentions it hogs the CPU by default. Shouldn't this be the other way around? I don't use any impitool userspace program at all. According to the ubuntu bug report (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/575392), this also happens on a MacBook box, I guess mobile users are not pleased by the shortened battery life ;) Cheers, Thomas
On Mon, 7 Jun 2010 15:17:23 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16147 (ae74e823cb7d4cd476f623fce9a38f625f6c09a8 "ipmi: add parameter to limit CPU usage in kipmid" causes ksoftirq meltdown). OK, this is tricky. It's a 2.6.33 -> 2.6.34 regression caused by an IPMI change. The patch doesn't revert easily from mainline because it's somewhat tangled up with Matthew's 3326f4f2276791561af1fd5f2020be0186459813 ("ipmi: reduce polling"). For -stable we could just revert ae74e823cb7d4cd476f623fce9a38f625f6c09a8 as Thomas has done. For mainline we'd need to disentangle the revert from Matthew's change, or revert both. Or, of course, just fix the bug. Corey, do you have the time to nail this issue for us? Thanks.
I'll work on this tomorrow morning.
It must be this part of the changeset. schedule(); else - schedule_timeout_interruptible(1); + schedule_timeout_interruptible(0); } return 0; Looks as if we finally found out why the old code had schedule_timeout_interruptible(1)...
First-Bad-Commit : ae74e823cb7d4cd476f623fce9a38f625f6c09a8
This is working fine on my system, with or without the change Martin suggests. Thomas, you can try Martin's suggestion? Looking at the patch, that's the only thing that should really change any behavior by default. Due to some other changes, it will look like: Index: linux-2.6/drivers/char/ipmi/ipmi_si_intf.c =================================================================== --- linux-2.6.orig/drivers/char/ipmi/ipmi_si_intf.c +++ linux-2.6/drivers/char/ipmi/ipmi_si_intf.c @@ -1018,7 +1018,7 @@ static int ipmi_thread(void *data) else if (smi_result == SI_SM_IDLE) schedule_timeout_interruptible(100); else - schedule_timeout_interruptible(0); + schedule_timeout_interruptible(1); } return 0; }
Created attachment 26732 [details] revert default behavior back to schedule_timeout_interruptible(1)
Patch works like a charm! Powertop output with the patch applied: Top causes for wakeups: 57.1% (783.0) <kernel core> : hrtimer_start_range_ns (tick_sched_timer) 18.2% (250.0) kipmi0 : schedule_timeout_interruptible (process_timeout) 9.1% (125.0) <kernel core> : add_timer (smi_timeout) 6.1% ( 83.3) squid : sys_epoll_wait (process_timeout) 5.7% ( 78.3) <interrupt> : extra timer interrupt 0.7% ( 10.0) <kernel core> : ipmi_timeout (ipmi_timeout) The box is idle again. Thanks.
Handled-By : Martin Wilck <martin.wilck@ts.fujitsu.com> Patch : https://bugzilla.kernel.org/attachment.cgi?id=26732
Fixed by commit 8d1f66dc9b4f80a1441bc1c33efa98aca99e8813 .