Bug 18662

Summary: kondemand/1 stuck
Product: Power Management Reporter: Tudor Florea (tcflorea)
Component: cpufreqAssignee: cpufreq
Status: CLOSED INSUFFICIENT_DATA    
Severity: normal CC: justincase, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.18-194.11.1.el5PAE Subsystem:
Regression: No Bisected commit-id:

Description Tudor Florea 2010-09-16 09:21:40 UTC
kondemand/1 consume 100% CPU usage and all processes on that CPU hangs.
Below is a snapshot from /var/log/messages
---cut here---
Sep 16 10:04:15 robucws091x last message repeated 4 times
Sep 16 10:48:08 robucws091x kernel: BUG: soft lockup - CPU#1 stuck for 10s! [kondemand/1:1962]
Sep 16 10:48:08 robucws091x kernel: 
Sep 16 10:48:08 robucws091x kernel: Pid: 1962, comm:          kondemand/1
Sep 16 10:48:08 robucws091x kernel: EIP: 0060:[<f89522f9>] CPU: 1
Sep 16 10:48:08 robucws091x kernel: EIP is at rtl8169_interrupt+0x216/0x236 [r8169]
Sep 16 10:48:08 robucws091x kernel:  EFLAGS: 00000206    Tainted: P       (2.6.18-194.11.1.el5PAE #1)
Sep 16 10:48:08 robucws091x kernel: EAX: 00000040 EBX: 00000040 ECX: c381ab58 EDX: f890803e
Sep 16 10:48:08 robucws091x kernel: ESI: f74ed400 EDI: f74ed000 EBP: f8908000 DS: 007b ES: 007b
Sep 16 10:48:08 robucws091x kernel: CR0: 8005003b CR2: 29aba114 CR3: 37970340 CR4: 000006f0
Sep 16 10:48:08 robucws091x kernel:  [<c044cff5>] handle_IRQ_event+0x45/0x8c
Sep 16 10:48:08 robucws091x kernel:  [<c044d0c0>] __do_IRQ+0x84/0xd6
Sep 16 10:48:08 robucws091x kernel:  [<c044d03c>] __do_IRQ+0x0/0xd6
Sep 16 10:48:08 robucws091x kernel:  [<c040749e>] do_IRQ+0x99/0xc3
Sep 16 10:48:08 robucws091x kernel:  [<c0405946>] common_interrupt+0x1a/0x20
Sep 16 10:48:08 robucws091x kernel:  [<c042a7e7>] __do_softirq+0x57/0x114
Sep 16 10:48:08 robucws091x kernel:  [<c04073bb>] do_softirq+0x52/0x9c
Sep 16 10:48:08 robucws091x kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
Sep 16 10:48:08 robucws091x kernel:  [<c05127f4>] acpi_os_write_port+0x1e/0x36
Sep 16 10:48:08 robucws091x kernel:  [<f8bc4439>] acpi_cpufreq_target+0x202/0x2d2 [acpi_cpufreq]
Sep 16 10:48:08 robucws091x kernel:  [<c05b0002>] store_scaling_max_freq+0x3c/0x5a
Sep 16 10:48:08 robucws091x kernel:  [<f8bc4237>] acpi_cpufreq_target+0x0/0x2d2 [acpi_cpufreq]
Sep 16 10:48:08 robucws091x kernel:  [<c05b00c1>] __cpufreq_driver_target+0x47/0x53
Sep 16 10:48:08 robucws091x kernel:  [<f8bc9546>] do_dbs_timer+0x244/0x2be [cpufreq_ondemand]
Sep 16 10:48:08 robucws091x kernel:  [<c04336e2>] run_workqueue+0x78/0xb5
Sep 16 10:48:08 robucws091x kernel:  [<f8bc9302>] do_dbs_timer+0x0/0x2be [cpufreq_ondemand]
Sep 16 10:48:08 robucws091x kernel:  [<c0433f96>] worker_thread+0xd9/0x10b
Sep 16 10:48:08 robucws091x kernel:  [<c041fc47>] default_wake_function+0x0/0xc
Sep 16 10:48:08 robucws091x kernel:  [<c0433ebd>] worker_thread+0x0/0x10b
Sep 16 10:48:08 robucws091x kernel:  [<c04363af>] kthread+0xc0/0xed
Sep 16 10:48:08 robucws091x kernel:  [<c04362ef>] kthread+0x0/0xed
Sep 16 10:48:08 robucws091x kernel:  [<c0405c53>] kernel_thread_helper+0x7/0x10
Sep 16 10:48:08 robucws091x kernel:  =======================
Sep 16 10:48:28 robucws091x kernel: BUG: soft lockup - CPU#1 stuck for 10s! [kondemand/1:1962]
Sep 16 10:48:28 robucws091x kernel: 
Sep 16 10:48:28 robucws091x kernel: Pid: 1962, comm:          kondemand/1
Sep 16 10:48:28 robucws091x kernel: EIP: 0060:[<f89522f9>] CPU: 1
Sep 16 10:48:28 robucws091x kernel: EIP is at rtl8169_interrupt+0x216/0x236 [r8169]
Sep 16 10:48:28 robucws091x kernel:  EFLAGS: 00000206    Tainted: P       (2.6.18-194.11.1.el5PAE #1)
Sep 16 10:48:28 robucws091x kernel: EAX: 00000040 EBX: 00000040 ECX: c381ab58 EDX: f890803e
Sep 16 10:48:28 robucws091x kernel: ESI: f74ed400 EDI: f74ed000 EBP: f8908000 DS: 007b ES: 007b
Sep 16 10:48:28 robucws091x kernel: CR0: 8005003b CR2: 29aba114 CR3: 37970340 CR4: 000006f0
Sep 16 10:48:28 robucws091x kernel:  [<c044cff5>] handle_IRQ_event+0x45/0x8c
Sep 16 10:48:28 robucws091x kernel:  [<c044d0c0>] __do_IRQ+0x84/0xd6
Sep 16 10:48:28 robucws091x kernel:  [<c044d03c>] __do_IRQ+0x0/0xd6
Sep 16 10:48:28 robucws091x kernel:  [<c040749e>] do_IRQ+0x99/0xc3
Sep 16 10:48:28 robucws091x kernel:  [<c0405946>] common_interrupt+0x1a/0x20
Sep 16 10:48:28 robucws091x kernel:  [<c042a7e7>] __do_softirq+0x57/0x114
Sep 16 10:48:28 robucws091x kernel:  [<c04073bb>] do_softirq+0x52/0x9c
Sep 16 10:48:28 robucws091x kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
Sep 16 10:48:28 robucws091x kernel:  [<c05127f4>] acpi_os_write_port+0x1e/0x36
Sep 16 10:48:28 robucws091x kernel:  [<f8bc4439>] acpi_cpufreq_target+0x202/0x2d2 [acpi_cpufreq]
Sep 16 10:48:28 robucws091x kernel:  [<c05b0002>] store_scaling_max_freq+0x3c/0x5a
Sep 16 10:48:28 robucws091x kernel:  [<f8bc4237>] acpi_cpufreq_target+0x0/0x2d2 [acpi_cpufreq]
Sep 16 10:48:28 robucws091x kernel:  [<c05b00c1>] __cpufreq_driver_target+0x47/0x53
Sep 16 10:48:28 robucws091x kernel:  [<f8bc9546>] do_dbs_timer+0x244/0x2be [cpufreq_ondemand]
Sep 16 10:48:28 robucws091x kernel:  [<c04336e2>] run_workqueue+0x78/0xb5
Sep 16 10:48:28 robucws091x kernel:  [<f8bc9302>] do_dbs_timer+0x0/0x2be [cpufreq_ondemand]
Sep 16 10:48:28 robucws091x kernel:  [<c0433f96>] worker_thread+0xd9/0x10b
Sep 16 10:48:28 robucws091x kernel:  [<c041fc47>] default_wake_function+0x0/0xc
Sep 16 10:48:28 robucws091x kernel:  [<c0433ebd>] worker_thread+0x0/0x10b
Sep 16 10:48:28 robucws091x kernel:  [<c04363af>] kthread+0xc0/0xed
Sep 16 10:48:28 robucws091x kernel:  [<c04362ef>] kthread+0x0/0xed
Sep 16 10:48:28 robucws091x kernel:  [<c0405c53>] kernel_thread_helper+0x7/0x10
Sep 16 10:48:28 robucws091x kernel:  =======================
Sep 16 10:48:48 robucws091x kernel: BUG: soft lockup - CPU#1 stuck for 10s! [kondemand/1:1962]
Sep 16 10:48:48 robucws091x kernel: 
Sep 16 10:48:48 robucws091x kernel: Pid: 1962, comm:          kondemand/1
Sep 16 10:48:48 robucws091x kernel: EIP: 0060:[<f89522f9>] CPU: 1
Sep 16 10:48:48 robucws091x kernel: EIP is at rtl8169_interrupt+0x216/0x236 [r8169]
Sep 16 10:48:48 robucws091x kernel:  EFLAGS: 00000206    Tainted: P       (2.6.18-194.11.1.el5PAE #1)
Sep 16 10:48:48 robucws091x kernel: EAX: 00000040 EBX: 00000040 ECX: c381ab58 EDX: f890803e
Sep 16 10:48:48 robucws091x kernel: ESI: f74ed400 EDI: f74ed000 EBP: f8908000 DS: 007b ES: 007b
Sep 16 10:48:48 robucws091x kernel: CR0: 8005003b CR2: 29aba114 CR3: 37970340 CR4: 000006f0
Sep 16 10:48:48 robucws091x kernel:  [<c044cff5>] handle_IRQ_event+0x45/0x8c
Sep 16 10:48:48 robucws091x kernel:  [<c044d0c0>] __do_IRQ+0x84/0xd6
Sep 16 10:48:48 robucws091x kernel:  [<c044d03c>] __do_IRQ+0x0/0xd6
Sep 16 10:48:48 robucws091x kernel:  [<c040749e>] do_IRQ+0x99/0xc3
Sep 16 10:48:48 robucws091x kernel:  [<c0405946>] common_interrupt+0x1a/0x20
Sep 16 10:48:48 robucws091x kernel:  [<c042a7e7>] __do_softirq+0x57/0x114
Sep 16 10:48:48 robucws091x kernel:  [<c04073bb>] do_softirq+0x52/0x9c
Sep 16 10:48:48 robucws091x kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
Sep 16 10:48:48 robucws091x kernel:  [<c05127f4>] acpi_os_write_port+0x1e/0x36
Sep 16 10:48:48 robucws091x kernel:  [<f8bc4439>] acpi_cpufreq_target+0x202/0x2d2 [acpi_cpufreq]
Sep 16 10:48:48 robucws091x kernel:  [<c05b0002>] store_scaling_max_freq+0x3c/0x5a
Sep 16 10:48:48 robucws091x kernel:  [<f8bc4237>] acpi_cpufreq_target+0x0/0x2d2 [acpi_cpufreq]
Sep 16 10:48:48 robucws091x kernel:  [<c05b00c1>] __cpufreq_driver_target+0x47/0x53
Sep 16 10:48:48 robucws091x kernel:  [<f8bc9546>] do_dbs_timer+0x244/0x2be [cpufreq_ondemand]
Sep 16 10:48:48 robucws091x kernel:  [<c04336e2>] run_workqueue+0x78/0xb5
Sep 16 10:48:48 robucws091x kernel:  [<f8bc9302>] do_dbs_timer+0x0/0x2be [cpufreq_ondemand]
Sep 16 10:48:48 robucws091x kernel:  [<c0433f96>] worker_thread+0xd9/0x10b
Sep 16 10:48:48 robucws091x kernel:  [<c041fc47>] default_wake_function+0x0/0xc
Sep 16 10:48:48 robucws091x kernel:  [<c0433ebd>] worker_thread+0x0/0x10b
Sep 16 10:48:48 robucws091x kernel:  [<c04363af>] kthread+0xc0/0xed
Sep 16 10:48:48 robucws091x kernel:  [<c04362ef>] kthread+0x0/0xed
Sep 16 10:48:48 robucws091x kernel:  [<c0405c53>] kernel_thread_helper+0x7/0x10
Sep 16 10:48:48 robucws091x kernel:  =======================
---cut here---
Comment 1 Yill Din 2011-02-14 16:41:10 UTC
Seeing a similar issue on 2.6.18-194.32.1.el5. Appeared immediately after upgrading kernel. Here is an example:

Feb  1 01:24:51 alpha kernel: BUG: soft lockup - CPU#1 stuck for 25s! [swapper:0]
Feb  1 01:24:51 alpha kernel: CPU 1:
Feb  1 01:24:51 alpha kernel: Modules linked in: xt_conntrack iptable_mangle ip_conntrack_ftp ipt_LOG xt_limit xt_multiport $
Feb  1 01:24:51 alpha kernel: Pid: 0, comm: swapper Not tainted 2.6.18-194.32.1.el5 #1
Feb  7 01:24:51 alpha kernel: RIP: 0010:[<ffffffff8000ca87>]  [<ffffffff8000ca87>] __delay+0xa/0x10
Feb  1 01:24:51 alpha kernel: RSP: 0018:ffff8101043d7e20  EFLAGS: 00000212         
Feb  1 01:24:51 alpha kernel: RAX: 0000000000002eba RBX: ffffc2000006c000 RCX: 00000000460a8b27
Feb  1 01:24:51 alpha kernel: RDX: 0000000000062065 RSI: 0000000000000190 RDI: 00000000000061d7
Feb  1 01:24:51 alpha kernel: RBP: ffff8101043d7da0 R08: 000000000000000a R09: 0000000000002710
Feb  1 01:24:51 alpha kernel: R10: ffff8101043fc008 R11: 0000000000000008 R12: ffffffff8005dc8e
Feb  1 01:24:51 alpha kernel: R13: 0000000000002706 R14: ffffffff80078225 R15: ffff8101043d7da0
Feb  1 01:24:51 alpha kernel: FS:  0000000041582940(0000) GS:ffff8101043937c0(0000) knlGS:00000000f7f01af0
Feb  1 01:24:51 alpha kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b            
Feb  1 01:24:51 alpha kernel: CR2: 00002b2ff03f6010 CR3: 000000010d9ab000 CR4: 00000000000006e0
Feb  1 01:24:51 alpha kernel:
Feb  1 01:24:51 alpha kernel: Call Trace:
Feb  1 01:24:51 alpha kernel:  <IRQ>  [<ffffffff881ac979>] :forcedeth:reg_delay+0x39/0x82      
Feb  1 01:24:51 alpha kernel:  [<ffffffff881acb5a>] :forcedeth:mii_rw+0x83/0xab                
Feb  1 01:24:51 alpha kernel:  [<ffffffff881acc8b>] :forcedeth:nv_update_linkspeed+0x109/0x41b          
Feb  1 01:24:51 alpha kernel:  [<ffffffff881ae05f>] :forcedeth:nv_linkchange+0x9/0x61          
Feb  1 01:24:51 alpha kernel:  [<ffffffff881afaa3>] :forcedeth:nv_nic_irq_optimized+0x112/0x232           
Feb  1 01:24:51 alpha kernel:  [<ffffffff80010c3a>] handle_IRQ_event+0x51/0xa6
Feb  1 01:24:51 alpha kernel:  [<ffffffff800bb018>] __do_IRQ+0xa4/0x103
Feb  1 01:24:51 alpha kernel:  [<ffffffff8006ca0d>] do_IRQ+0xe7/0xf5
Feb  1 01:24:51 alpha kernel:  [<ffffffff8006b342>] default_idle+0x0/0x50
Feb  1 01:24:51 alpha kernel:  [<ffffffff8005d615>] ret_from_intr+0x0/0xa
Feb  1 01:24:51 alpha kernel:  <EOI>  [<ffffffff8006b36b>] default_idle+0x29/0x50
Feb  1 01:24:51 alpha kernel:  [<ffffffff800492c4>] cpu_idle+0x95/0xb8
Feb  1 01:24:51 alpha kernel:  [<ffffffff80077991>] start_secondary+0x498/0x4a7
Comment 2 Zhang Rui 2012-01-18 02:17:44 UTC
It's great that kernel bugzilla is back.

can you please verify if the problem still exists in the latest upstream
kernel?
Comment 3 Zhang Rui 2012-05-24 07:38:23 UTC
bug closed as there is no response from the bug reporter.
please feel free to reopen it if the problem still exists in the latest upstream kernel.