kondemand/1 consume 100% CPU usage and all processes on that CPU hangs. Below is a snapshot from /var/log/messages ---cut here--- Sep 16 10:04:15 robucws091x last message repeated 4 times Sep 16 10:48:08 robucws091x kernel: BUG: soft lockup - CPU#1 stuck for 10s! [kondemand/1:1962] Sep 16 10:48:08 robucws091x kernel: Sep 16 10:48:08 robucws091x kernel: Pid: 1962, comm: kondemand/1 Sep 16 10:48:08 robucws091x kernel: EIP: 0060:[<f89522f9>] CPU: 1 Sep 16 10:48:08 robucws091x kernel: EIP is at rtl8169_interrupt+0x216/0x236 [r8169] Sep 16 10:48:08 robucws091x kernel: EFLAGS: 00000206 Tainted: P (2.6.18-194.11.1.el5PAE #1) Sep 16 10:48:08 robucws091x kernel: EAX: 00000040 EBX: 00000040 ECX: c381ab58 EDX: f890803e Sep 16 10:48:08 robucws091x kernel: ESI: f74ed400 EDI: f74ed000 EBP: f8908000 DS: 007b ES: 007b Sep 16 10:48:08 robucws091x kernel: CR0: 8005003b CR2: 29aba114 CR3: 37970340 CR4: 000006f0 Sep 16 10:48:08 robucws091x kernel: [<c044cff5>] handle_IRQ_event+0x45/0x8c Sep 16 10:48:08 robucws091x kernel: [<c044d0c0>] __do_IRQ+0x84/0xd6 Sep 16 10:48:08 robucws091x kernel: [<c044d03c>] __do_IRQ+0x0/0xd6 Sep 16 10:48:08 robucws091x kernel: [<c040749e>] do_IRQ+0x99/0xc3 Sep 16 10:48:08 robucws091x kernel: [<c0405946>] common_interrupt+0x1a/0x20 Sep 16 10:48:08 robucws091x kernel: [<c042a7e7>] __do_softirq+0x57/0x114 Sep 16 10:48:08 robucws091x kernel: [<c04073bb>] do_softirq+0x52/0x9c Sep 16 10:48:08 robucws091x kernel: [<c04059d7>] apic_timer_interrupt+0x1f/0x24 Sep 16 10:48:08 robucws091x kernel: [<c05127f4>] acpi_os_write_port+0x1e/0x36 Sep 16 10:48:08 robucws091x kernel: [<f8bc4439>] acpi_cpufreq_target+0x202/0x2d2 [acpi_cpufreq] Sep 16 10:48:08 robucws091x kernel: [<c05b0002>] store_scaling_max_freq+0x3c/0x5a Sep 16 10:48:08 robucws091x kernel: [<f8bc4237>] acpi_cpufreq_target+0x0/0x2d2 [acpi_cpufreq] Sep 16 10:48:08 robucws091x kernel: [<c05b00c1>] __cpufreq_driver_target+0x47/0x53 Sep 16 10:48:08 robucws091x kernel: [<f8bc9546>] do_dbs_timer+0x244/0x2be [cpufreq_ondemand] Sep 16 10:48:08 robucws091x kernel: [<c04336e2>] run_workqueue+0x78/0xb5 Sep 16 10:48:08 robucws091x kernel: [<f8bc9302>] do_dbs_timer+0x0/0x2be [cpufreq_ondemand] Sep 16 10:48:08 robucws091x kernel: [<c0433f96>] worker_thread+0xd9/0x10b Sep 16 10:48:08 robucws091x kernel: [<c041fc47>] default_wake_function+0x0/0xc Sep 16 10:48:08 robucws091x kernel: [<c0433ebd>] worker_thread+0x0/0x10b Sep 16 10:48:08 robucws091x kernel: [<c04363af>] kthread+0xc0/0xed Sep 16 10:48:08 robucws091x kernel: [<c04362ef>] kthread+0x0/0xed Sep 16 10:48:08 robucws091x kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10 Sep 16 10:48:08 robucws091x kernel: ======================= Sep 16 10:48:28 robucws091x kernel: BUG: soft lockup - CPU#1 stuck for 10s! [kondemand/1:1962] Sep 16 10:48:28 robucws091x kernel: Sep 16 10:48:28 robucws091x kernel: Pid: 1962, comm: kondemand/1 Sep 16 10:48:28 robucws091x kernel: EIP: 0060:[<f89522f9>] CPU: 1 Sep 16 10:48:28 robucws091x kernel: EIP is at rtl8169_interrupt+0x216/0x236 [r8169] Sep 16 10:48:28 robucws091x kernel: EFLAGS: 00000206 Tainted: P (2.6.18-194.11.1.el5PAE #1) Sep 16 10:48:28 robucws091x kernel: EAX: 00000040 EBX: 00000040 ECX: c381ab58 EDX: f890803e Sep 16 10:48:28 robucws091x kernel: ESI: f74ed400 EDI: f74ed000 EBP: f8908000 DS: 007b ES: 007b Sep 16 10:48:28 robucws091x kernel: CR0: 8005003b CR2: 29aba114 CR3: 37970340 CR4: 000006f0 Sep 16 10:48:28 robucws091x kernel: [<c044cff5>] handle_IRQ_event+0x45/0x8c Sep 16 10:48:28 robucws091x kernel: [<c044d0c0>] __do_IRQ+0x84/0xd6 Sep 16 10:48:28 robucws091x kernel: [<c044d03c>] __do_IRQ+0x0/0xd6 Sep 16 10:48:28 robucws091x kernel: [<c040749e>] do_IRQ+0x99/0xc3 Sep 16 10:48:28 robucws091x kernel: [<c0405946>] common_interrupt+0x1a/0x20 Sep 16 10:48:28 robucws091x kernel: [<c042a7e7>] __do_softirq+0x57/0x114 Sep 16 10:48:28 robucws091x kernel: [<c04073bb>] do_softirq+0x52/0x9c Sep 16 10:48:28 robucws091x kernel: [<c04059d7>] apic_timer_interrupt+0x1f/0x24 Sep 16 10:48:28 robucws091x kernel: [<c05127f4>] acpi_os_write_port+0x1e/0x36 Sep 16 10:48:28 robucws091x kernel: [<f8bc4439>] acpi_cpufreq_target+0x202/0x2d2 [acpi_cpufreq] Sep 16 10:48:28 robucws091x kernel: [<c05b0002>] store_scaling_max_freq+0x3c/0x5a Sep 16 10:48:28 robucws091x kernel: [<f8bc4237>] acpi_cpufreq_target+0x0/0x2d2 [acpi_cpufreq] Sep 16 10:48:28 robucws091x kernel: [<c05b00c1>] __cpufreq_driver_target+0x47/0x53 Sep 16 10:48:28 robucws091x kernel: [<f8bc9546>] do_dbs_timer+0x244/0x2be [cpufreq_ondemand] Sep 16 10:48:28 robucws091x kernel: [<c04336e2>] run_workqueue+0x78/0xb5 Sep 16 10:48:28 robucws091x kernel: [<f8bc9302>] do_dbs_timer+0x0/0x2be [cpufreq_ondemand] Sep 16 10:48:28 robucws091x kernel: [<c0433f96>] worker_thread+0xd9/0x10b Sep 16 10:48:28 robucws091x kernel: [<c041fc47>] default_wake_function+0x0/0xc Sep 16 10:48:28 robucws091x kernel: [<c0433ebd>] worker_thread+0x0/0x10b Sep 16 10:48:28 robucws091x kernel: [<c04363af>] kthread+0xc0/0xed Sep 16 10:48:28 robucws091x kernel: [<c04362ef>] kthread+0x0/0xed Sep 16 10:48:28 robucws091x kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10 Sep 16 10:48:28 robucws091x kernel: ======================= Sep 16 10:48:48 robucws091x kernel: BUG: soft lockup - CPU#1 stuck for 10s! [kondemand/1:1962] Sep 16 10:48:48 robucws091x kernel: Sep 16 10:48:48 robucws091x kernel: Pid: 1962, comm: kondemand/1 Sep 16 10:48:48 robucws091x kernel: EIP: 0060:[<f89522f9>] CPU: 1 Sep 16 10:48:48 robucws091x kernel: EIP is at rtl8169_interrupt+0x216/0x236 [r8169] Sep 16 10:48:48 robucws091x kernel: EFLAGS: 00000206 Tainted: P (2.6.18-194.11.1.el5PAE #1) Sep 16 10:48:48 robucws091x kernel: EAX: 00000040 EBX: 00000040 ECX: c381ab58 EDX: f890803e Sep 16 10:48:48 robucws091x kernel: ESI: f74ed400 EDI: f74ed000 EBP: f8908000 DS: 007b ES: 007b Sep 16 10:48:48 robucws091x kernel: CR0: 8005003b CR2: 29aba114 CR3: 37970340 CR4: 000006f0 Sep 16 10:48:48 robucws091x kernel: [<c044cff5>] handle_IRQ_event+0x45/0x8c Sep 16 10:48:48 robucws091x kernel: [<c044d0c0>] __do_IRQ+0x84/0xd6 Sep 16 10:48:48 robucws091x kernel: [<c044d03c>] __do_IRQ+0x0/0xd6 Sep 16 10:48:48 robucws091x kernel: [<c040749e>] do_IRQ+0x99/0xc3 Sep 16 10:48:48 robucws091x kernel: [<c0405946>] common_interrupt+0x1a/0x20 Sep 16 10:48:48 robucws091x kernel: [<c042a7e7>] __do_softirq+0x57/0x114 Sep 16 10:48:48 robucws091x kernel: [<c04073bb>] do_softirq+0x52/0x9c Sep 16 10:48:48 robucws091x kernel: [<c04059d7>] apic_timer_interrupt+0x1f/0x24 Sep 16 10:48:48 robucws091x kernel: [<c05127f4>] acpi_os_write_port+0x1e/0x36 Sep 16 10:48:48 robucws091x kernel: [<f8bc4439>] acpi_cpufreq_target+0x202/0x2d2 [acpi_cpufreq] Sep 16 10:48:48 robucws091x kernel: [<c05b0002>] store_scaling_max_freq+0x3c/0x5a Sep 16 10:48:48 robucws091x kernel: [<f8bc4237>] acpi_cpufreq_target+0x0/0x2d2 [acpi_cpufreq] Sep 16 10:48:48 robucws091x kernel: [<c05b00c1>] __cpufreq_driver_target+0x47/0x53 Sep 16 10:48:48 robucws091x kernel: [<f8bc9546>] do_dbs_timer+0x244/0x2be [cpufreq_ondemand] Sep 16 10:48:48 robucws091x kernel: [<c04336e2>] run_workqueue+0x78/0xb5 Sep 16 10:48:48 robucws091x kernel: [<f8bc9302>] do_dbs_timer+0x0/0x2be [cpufreq_ondemand] Sep 16 10:48:48 robucws091x kernel: [<c0433f96>] worker_thread+0xd9/0x10b Sep 16 10:48:48 robucws091x kernel: [<c041fc47>] default_wake_function+0x0/0xc Sep 16 10:48:48 robucws091x kernel: [<c0433ebd>] worker_thread+0x0/0x10b Sep 16 10:48:48 robucws091x kernel: [<c04363af>] kthread+0xc0/0xed Sep 16 10:48:48 robucws091x kernel: [<c04362ef>] kthread+0x0/0xed Sep 16 10:48:48 robucws091x kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10 Sep 16 10:48:48 robucws091x kernel: ======================= ---cut here---
Seeing a similar issue on 2.6.18-194.32.1.el5. Appeared immediately after upgrading kernel. Here is an example: Feb 1 01:24:51 alpha kernel: BUG: soft lockup - CPU#1 stuck for 25s! [swapper:0] Feb 1 01:24:51 alpha kernel: CPU 1: Feb 1 01:24:51 alpha kernel: Modules linked in: xt_conntrack iptable_mangle ip_conntrack_ftp ipt_LOG xt_limit xt_multiport $ Feb 1 01:24:51 alpha kernel: Pid: 0, comm: swapper Not tainted 2.6.18-194.32.1.el5 #1 Feb 7 01:24:51 alpha kernel: RIP: 0010:[<ffffffff8000ca87>] [<ffffffff8000ca87>] __delay+0xa/0x10 Feb 1 01:24:51 alpha kernel: RSP: 0018:ffff8101043d7e20 EFLAGS: 00000212 Feb 1 01:24:51 alpha kernel: RAX: 0000000000002eba RBX: ffffc2000006c000 RCX: 00000000460a8b27 Feb 1 01:24:51 alpha kernel: RDX: 0000000000062065 RSI: 0000000000000190 RDI: 00000000000061d7 Feb 1 01:24:51 alpha kernel: RBP: ffff8101043d7da0 R08: 000000000000000a R09: 0000000000002710 Feb 1 01:24:51 alpha kernel: R10: ffff8101043fc008 R11: 0000000000000008 R12: ffffffff8005dc8e Feb 1 01:24:51 alpha kernel: R13: 0000000000002706 R14: ffffffff80078225 R15: ffff8101043d7da0 Feb 1 01:24:51 alpha kernel: FS: 0000000041582940(0000) GS:ffff8101043937c0(0000) knlGS:00000000f7f01af0 Feb 1 01:24:51 alpha kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Feb 1 01:24:51 alpha kernel: CR2: 00002b2ff03f6010 CR3: 000000010d9ab000 CR4: 00000000000006e0 Feb 1 01:24:51 alpha kernel: Feb 1 01:24:51 alpha kernel: Call Trace: Feb 1 01:24:51 alpha kernel: <IRQ> [<ffffffff881ac979>] :forcedeth:reg_delay+0x39/0x82 Feb 1 01:24:51 alpha kernel: [<ffffffff881acb5a>] :forcedeth:mii_rw+0x83/0xab Feb 1 01:24:51 alpha kernel: [<ffffffff881acc8b>] :forcedeth:nv_update_linkspeed+0x109/0x41b Feb 1 01:24:51 alpha kernel: [<ffffffff881ae05f>] :forcedeth:nv_linkchange+0x9/0x61 Feb 1 01:24:51 alpha kernel: [<ffffffff881afaa3>] :forcedeth:nv_nic_irq_optimized+0x112/0x232 Feb 1 01:24:51 alpha kernel: [<ffffffff80010c3a>] handle_IRQ_event+0x51/0xa6 Feb 1 01:24:51 alpha kernel: [<ffffffff800bb018>] __do_IRQ+0xa4/0x103 Feb 1 01:24:51 alpha kernel: [<ffffffff8006ca0d>] do_IRQ+0xe7/0xf5 Feb 1 01:24:51 alpha kernel: [<ffffffff8006b342>] default_idle+0x0/0x50 Feb 1 01:24:51 alpha kernel: [<ffffffff8005d615>] ret_from_intr+0x0/0xa Feb 1 01:24:51 alpha kernel: <EOI> [<ffffffff8006b36b>] default_idle+0x29/0x50 Feb 1 01:24:51 alpha kernel: [<ffffffff800492c4>] cpu_idle+0x95/0xb8 Feb 1 01:24:51 alpha kernel: [<ffffffff80077991>] start_secondary+0x498/0x4a7
It's great that kernel bugzilla is back. can you please verify if the problem still exists in the latest upstream kernel?
bug closed as there is no response from the bug reporter. please feel free to reopen it if the problem still exists in the latest upstream kernel.