Bug 43028 - INFO: rcu_bh detected stall on CPU 3 (t=0 jiffies)
Summary: INFO: rcu_bh detected stall on CPU 3 (t=0 jiffies)
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 low
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-02 11:52 UTC by Ralf Hildebrandt
Modified: 2014-08-27 18:49 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.3.0
Tree: Mainline
Regression: No


Attachments

Description Ralf Hildebrandt 2012-04-02 11:52:22 UTC
Mar 23 11:54:00 mail kernel:
Mar 23 11:54:00 mail kernel: [347213.025005] INFO: rcu_bh detected stall on CPU 3 (t=0 jiffies)
Mar 23 11:54:00 mail kernel: [347213.025005] Pid: 12654, comm: cleanup Not tainted 3.3.0 #1
Mar 23 11:54:00 mail kernel: [347213.025005] Call Trace:
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c1062ccb>] ? __rcu_pending+0x10a/0x311
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c106388d>] ? rcu_check_callbacks+0x9b/0xa1
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c102c434>] ? update_process_times+0x2a/0x53
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c104f206>] ? tick_sched_timer+0x4f/0x90
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c103a101>] ? __remove_hrtimer+0x25/0x79
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c103a2ff>] ? __run_hrtimer.isra.32+0x38/0xbe
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c103ad95>] ? hrtimer_interrupt+0xda/0x23d
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c1016493>] ? smp_apic_timer_interrupt+0x4c/0x81
Mar 23 11:54:00 mail kernel: [347213.025005]  [<c127b5d9>] ? apic_timer_interrupt+0x31/0x38



I read http://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
but couldn't find "rcu_bh detected stall" being mentioned there.

I reported the same issue on 3.2.9 (same machine!) on the 13th of march:

Mar 12 12:58:38 mail kernel: [440746.244002] INFO: rcu_bh detected stall on CPU 2 (t=0 jiffies)
Mar 12 12:58:38 mail kernel: [440746.244002] Pid: 27980, comm: /usr/sbin/amavi Tainted: G        W    3.2.9 #1
Mar 12 12:58:38 mail kernel: [440746.244002] Call Trace:
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c1061d26>] ? __rcu_pending+0x10a/0x30e
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c10626ca>] ? rcu_check_callbacks+0xd4/0xde
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c1035cf2>] ? update_process_times+0x2a/0x53
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c104e1d2>] ? tick_sched_timer+0x4d/0x8e
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c1043b53>] ? __remove_hrtimer+0x25/0x79
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c1043d4f>] ? __run_hrtimer.isra.32+0x37/0xbd
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c10445d5>] ? hrtimer_interrupt+0xdb/0x23d
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c101b781>] ? vmalloc_sync_all+0x1/0x1
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c1015e4c>] ? smp_apic_timer_interrupt+0x4c/0x81
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c1275491>] ? apic_timer_interrupt+0x31/0x38
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c101b781>] ? vmalloc_sync_all+0x1/0x1
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c101b8c2>] ? do_page_fault+0x141/0x38f
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c1030d8d>] ? irq_exit+0x34/0x87
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c1015e51>] ? smp_apic_timer_interrupt+0x51/0x81
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c101b781>] ? vmalloc_sync_all+0x1/0x1
Mar 12 12:58:38 mail kernel: [440746.244002]  [<c12756b7>] ? error_code+0x67/0x6c

and so did Tilman Schmidt in February:
https://lkml.org/lkml/2012/2/18/34
Comment 1 Kevin Lyles 2012-04-22 22:25:05 UTC
I saw this or something very close to it today on kernel 3.2.12.  I was using the system at the time, but not heavily.

I originally reported the issue on Gentoo's bug tracker here: https://bugs.gentoo.org/show_bug.cgi?id=402625

This is the first time I've seen it on a vanilla kernel.  I'm happy to help track this down, but I'm not a kernel expert so I'd need some handholding.  It's not easy to reproduce -- basically just wait long enough and it will eventually happen, but long enough can be a couple of weeks.

The log follows (each line started with "Apr 22 16:40:13 localhost kernel:", removed for easier reading).  The taint is ati-drivers and the virtualbox modules.

INFO: rcu_bh detected stall on CPU 2 (t=0 jiffies)
Pid: 2327, comm: X Tainted: P           O 3.2.12-desktop-vanilla #1
Call Trace:
 <IRQ>  [<ffffffff810d54a2>] check_cpu_stall.clone.41+0x92/0xf0
 [<ffffffff810a45e0>] ? tick_nohz_handler+0xe0/0xe0
 [<ffffffff810d5923>] __rcu_pending+0x33/0x1c0
 [<ffffffff810d5e33>] rcu_check_callbacks+0x103/0x1a0
 [<ffffffff81085053>] update_process_times+0x43/0x80
 [<ffffffff810a463f>] tick_sched_timer+0x5f/0xb0
 [<ffffffff810985f5>] __run_hrtimer.clone.33+0x55/0x110
 [<ffffffff81098e3f>] hrtimer_interrupt+0xdf/0x210
 [<ffffffff8104f474>] smp_apic_timer_interrupt+0x64/0xa0
 [<ffffffff815d500b>] apic_timer_interrupt+0x6b/0x70
 <EOI>  [<ffffffff814d7878>] ? skb_release_data+0x48/0x100
 [<ffffffff814d6fbf>] ? skb_release_head_state+0x5f/0xb0
 [<ffffffff814d7949>] __kfree_skb+0x19/0xa0
 [<ffffffff814d79e9>] consume_skb+0x19/0x40
 [<ffffffff81558171>] unix_stream_recvmsg+0x291/0x5e0
 [<ffffffff814ce800>] sock_aio_read+0x140/0x150
 [<ffffffff813088a0>] ? rb_insert_color+0x110/0x150
 [<ffffffff81129cda>] do_sync_read+0xda/0x120
 [<ffffffff81098c63>] ? hrtimer_start+0x13/0x20
 [<ffffffff8112a687>] vfs_read+0x167/0x180
 [<ffffffff8112a6ec>] sys_read+0x4c/0x90
 [<ffffffff815d457b>] system_call_fastpath+0x16/0x1b
Comment 2 udo 2012-05-12 13:31:34 UTC
I found a few of similar stalls in my messages:

May 12 11:45:12 bb3 kernel: INFO: rcu_bh detected stall on CPU 3 (t=0 jiffies)
May 12 11:45:12 bb3 kernel: Pid: 3574, comm: plugin-containe Not tainted 3.3.4 #5
May 12 11:45:12 bb3 kernel: Call Trace:
May 12 11:45:12 bb3 kernel: <IRQ>  [<ffffffff8109887b>] ? __rcu_pending+0x16b/0x3f0
May 12 11:45:12 bb3 kernel: [<ffffffff810713c0>] ? tick_nohz_handler+0xe0/0xe0
May 12 11:45:12 bb3 kernel: [<ffffffff8109901b>] ? rcu_check_callbacks+0xfb/0x170
May 12 11:45:12 bb3 kernel: [<ffffffff81042fce>] ? update_process_times+0x3e/0x80
May 12 11:45:12 bb3 kernel: [<ffffffff8107141a>] ? tick_sched_timer+0x5a/0xb0
May 12 11:45:12 bb3 kernel: [<ffffffff81055720>] ? __run_hrtimer+0x60/0x140
May 12 11:45:12 bb3 kernel: [<ffffffff8105602d>] ? hrtimer_interrupt+0xed/0x200
May 12 11:45:12 bb3 kernel: [<ffffffff8101b063>] ? smp_apic_timer_interrupt+0x63/0xa0
May 12 11:45:12 bb3 kernel: [<ffffffff8140068b>] ? apic_timer_interrupt+0x6b/0x70
May 12 11:45:12 bb3 kernel: <EOI>  [<ffffffff813ffc22>] ? system_call_fastpath+0x16/0x1b

May 10 16:40:18 bb3 kernel: INFO: rcu_bh detected stall on CPU 3 (t=0 jiffies)
May 10 16:40:18 bb3 kernel: Pid: 12236, comm: plugin-containe Not tainted 3.3.4 #5
May 10 16:40:18 bb3 kernel: Call Trace:
May 10 16:40:18 bb3 kernel: <IRQ>  [<ffffffff8109887b>] ? __rcu_pending+0x16b/0x3f0
May 10 16:40:18 bb3 kernel: [<ffffffff810713c0>] ? tick_nohz_handler+0xe0/0xe0
May 10 16:40:18 bb3 kernel: [<ffffffff8109901b>] ? rcu_check_callbacks+0xfb/0x170
May 10 16:40:18 bb3 kernel: [<ffffffff81042fce>] ? update_process_times+0x3e/0x80
May 10 16:40:18 bb3 kernel: [<ffffffff8107141a>] ? tick_sched_timer+0x5a/0xb0
May 10 16:40:18 bb3 kernel: [<ffffffff81055720>] ? __run_hrtimer+0x60/0x140
May 10 16:40:18 bb3 kernel: [<ffffffff8105602d>] ? hrtimer_interrupt+0xed/0x200
May 10 16:40:18 bb3 kernel: [<ffffffff8101b063>] ? smp_apic_timer_interrupt+0x63/0xa0
May 10 16:40:18 bb3 kernel: [<ffffffff8140068b>] ? apic_timer_interrupt+0x6b/0x70
May 10 16:40:18 bb3 kernel: <EOI>  [<ffffffff813ffc22>] ? system_call_fastpath+0x16/0x1b

Apr 28 16:54:31 bb3 kernel: INFO: rcu_bh detected stall on CPU 2 (t=0 jiffies)
Apr 28 16:54:31 bb3 kernel: Pid: 4939, comm: gnome-shell Not tainted 3.3.4 #4
Apr 28 16:54:31 bb3 kernel: Call Trace:
Apr 28 16:54:31 bb3 kernel: <IRQ>  [<ffffffff8109887b>] ? __rcu_pending+0x16b/0x3f0
Apr 28 16:54:31 bb3 kernel: [<ffffffff810713c0>] ? tick_nohz_handler+0xe0/0xe0
Apr 28 16:54:31 bb3 kernel: [<ffffffff8109901b>] ? rcu_check_callbacks+0xfb/0x170
Apr 28 16:54:31 bb3 kernel: [<ffffffff81042fce>] ? update_process_times+0x3e/0x80
Apr 28 16:54:31 bb3 kernel: [<ffffffff8107141a>] ? tick_sched_timer+0x5a/0xb0
Apr 28 16:54:31 bb3 kernel: [<ffffffff81055720>] ? __run_hrtimer+0x60/0x140
Apr 28 16:54:31 bb3 kernel: [<ffffffff8105602d>] ? hrtimer_interrupt+0xed/0x200
Apr 28 16:54:31 bb3 kernel: [<ffffffff8101b063>] ? smp_apic_timer_interrupt+0x63/0xa0
Apr 28 16:54:31 bb3 kernel: [<ffffffff81400a0b>] ? apic_timer_interrupt+0x6b/0x70
Apr 28 16:54:31 bb3 kernel: <EOI>  [<ffffffff813fffa2>] ? system_call_fastpath+0x16/0x1b

Apr 21 08:15:13 bb3 kernel: INFO: rcu_bh detected stall on CPU 3 (t=0 jiffies)
Apr 21 08:15:13 bb3 kernel: Pid: 21862, comm: plugin-containe Tainted: G        W    3.3.1 #1
Apr 21 08:15:13 bb3 kernel: Call Trace:
Apr 21 08:15:13 bb3 kernel: <IRQ>  [<ffffffff8109877b>] ? __rcu_pending+0x16b/0x3f0
Apr 21 08:15:13 bb3 kernel: [<ffffffff810711f0>] ? tick_nohz_handler+0xe0/0xe0
Apr 21 08:15:13 bb3 kernel: [<ffffffff81098f1b>] ? rcu_check_callbacks+0xfb/0x170
Apr 21 08:15:13 bb3 kernel: [<ffffffff81042fae>] ? update_process_times+0x3e/0x80
Apr 21 08:15:13 bb3 kernel: [<ffffffff8107124a>] ? tick_sched_timer+0x5a/0xb0
Apr 21 08:15:13 bb3 kernel: [<ffffffff81055590>] ? __run_hrtimer+0x60/0x140
Apr 21 08:15:13 bb3 kernel: [<ffffffff81055e9d>] ? hrtimer_interrupt+0xed/0x200
Apr 21 08:15:13 bb3 kernel: [<ffffffff8101b053>] ? smp_apic_timer_interrupt+0x63/0xa0
Apr 21 08:15:13 bb3 kernel: [<ffffffff8140084b>] ? apic_timer_interrupt+0x6b/0x70
Apr 21 08:15:13 bb3 kernel: <EOI>  [<ffffffff813ffde2>] ? system_call_fastpath+0x16/0x1b

Apr 15 11:20:02 bb3 kernel: INFO: rcu_bh detected stall on CPU 0 (t=0 jiffies)
Apr 15 11:20:02 bb3 kernel: Pid: 4691, comm: plugin-containe Not tainted 3.3.1 #1
Apr 15 11:20:02 bb3 kernel: Call Trace:
Apr 15 11:20:02 bb3 kernel: <IRQ>  [<ffffffff8109877b>] ? __rcu_pending+0x16b/0x3f0
Apr 15 11:20:02 bb3 kernel: [<ffffffff810711f0>] ? tick_nohz_handler+0xe0/0xe0
Apr 15 11:20:02 bb3 kernel: [<ffffffff81098f1b>] ? rcu_check_callbacks+0xfb/0x170
Apr 15 11:20:02 bb3 kernel: [<ffffffff81042fae>] ? update_process_times+0x3e/0x80
Apr 15 11:20:02 bb3 kernel: [<ffffffff8107124a>] ? tick_sched_timer+0x5a/0xb0
Apr 15 11:20:02 bb3 kernel: [<ffffffff81055590>] ? __run_hrtimer+0x60/0x140
Apr 15 11:20:02 bb3 kernel: [<ffffffff81055e9d>] ? hrtimer_interrupt+0xed/0x200
Apr 15 11:20:02 bb3 kernel: [<ffffffff8101b053>] ? smp_apic_timer_interrupt+0x63/0xa0
Apr 15 11:20:02 bb3 kernel: [<ffffffff8140084b>] ? apic_timer_interrupt+0x6b/0x70
Apr 15 11:20:02 bb3 kernel: <EOI>  [<ffffffff813ffde2>] ? system_call_fastpath+0x16/0x1b

Apr 12 17:59:35 bb3 kernel: INFO: rcu_bh detected stall on CPU 3 (t=0 jiffies)
Apr 12 17:59:35 bb3 kernel: Pid: 23600, comm: plugin-containe Not tainted 3.3.1 #1
Apr 12 17:59:35 bb3 kernel: Call Trace:
Apr 12 17:59:35 bb3 kernel: <IRQ>  [<ffffffff8109877b>] ? __rcu_pending+0x16b/0x3f0
Apr 12 17:59:35 bb3 kernel: [<ffffffff813fef71>] ? _raw_spin_unlock+0x11/0x30
Apr 12 17:59:35 bb3 kernel: [<ffffffff810711f0>] ? tick_nohz_handler+0xe0/0xe0
Apr 12 17:59:35 bb3 kernel: [<ffffffff81098f1b>] ? rcu_check_callbacks+0xfb/0x170
Apr 12 17:59:35 bb3 kernel: [<ffffffff81042fae>] ? update_process_times+0x3e/0x80
Apr 12 17:59:35 bb3 kernel: [<ffffffff8107124a>] ? tick_sched_timer+0x5a/0xb0
Apr 12 17:59:35 bb3 kernel: [<ffffffff81055590>] ? __run_hrtimer+0x60/0x140
Apr 12 17:59:35 bb3 kernel: [<ffffffff81055e9d>] ? hrtimer_interrupt+0xed/0x200
Apr 12 17:59:35 bb3 kernel: [<ffffffff8101b053>] ? smp_apic_timer_interrupt+0x63/0xa0
Apr 12 17:59:35 bb3 kernel: [<ffffffff8140084b>] ? apic_timer_interrupt+0x6b/0x70
Apr 12 17:59:35 bb3 kernel: <EOI>  [<ffffffff813ffde2>] ? system_call_fastpath+0x16/0x1b
Comment 3 udo 2012-05-14 17:48:12 UTC
One more:

May 13 17:17:55 bb3 kernel: INFO: rcu_bh detected stall on CPU 0 (t=0 jiffies)
May 13 17:17:55 bb3 kernel: Pid: 26695, comm: plugin-containe Not tainted 3.3.4 #5
May 13 17:17:55 bb3 kernel: Call Trace:
May 13 17:17:55 bb3 kernel: <IRQ>  [<ffffffff8109887b>] ? __rcu_pending+0x16b/0x3f0
May 13 17:17:55 bb3 kernel: [<ffffffff810713c0>] ? tick_nohz_handler+0xe0/0xe0
May 13 17:17:55 bb3 kernel: [<ffffffff8109901b>] ? rcu_check_callbacks+0xfb/0x170
May 13 17:17:55 bb3 kernel: [<ffffffff81042fce>] ? update_process_times+0x3e/0x80
May 13 17:17:55 bb3 kernel: [<ffffffff8107141a>] ? tick_sched_timer+0x5a/0xb0
May 13 17:17:55 bb3 kernel: [<ffffffff81055720>] ? __run_hrtimer+0x60/0x140
May 13 17:17:55 bb3 kernel: [<ffffffff8105602d>] ? hrtimer_interrupt+0xed/0x200
May 13 17:17:55 bb3 kernel: [<ffffffff8101b063>] ? smp_apic_timer_interrupt+0x63/0xa0
May 13 17:17:55 bb3 kernel: [<ffffffff8140068b>] ? apic_timer_interrupt+0x6b/0x70
May 13 17:17:55 bb3 kernel: <EOI>  [<ffffffff813ff00a>] ? _raw_spin_lock+0x2a/0x30
May 13 17:17:55 bb3 kernel: [<ffffffff810497b1>] ? do_prlimit+0x91/0x200
May 13 17:17:55 bb3 kernel: [<ffffffff8104993f>] ? sys_getrlimit+0x1f/0x60
May 13 17:17:55 bb3 kernel: [<ffffffff813ffc22>] ? system_call_fastpath+0x16/0x1b

Note You need to log in before you can comment on or make changes to this bug.