Bug 67751 - Stack trace with suspicious RCU usage, when starting ovs-switchd
Summary: Stack trace with suspicious RCU usage, when starting ovs-switchd
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-26 09:49 UTC by Kashyap Chamarthy
Modified: 2014-06-24 09:53 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.13.0-0.rc4.git5.1.fc21.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Full dmesg output captured from the VM (52.24 KB, text/plain)
2013-12-27 14:32 UTC, Kashyap Chamarthy
Details

Description Kashyap Chamarthy 2013-12-26 09:49:31 UTC
I see the below on serial console of a Fedora-20 virtual machine when I boot into Kernel 3.13.0-0.rc4.git5.1.fc21.x86_64:


---------------------
.
.
.
[   26.327387] 
[   26.328384] ===============================
[   26.330206] [ INFO: suspicious RCU usage. ]
[   26.332044] 3.13.0-0.rc4.git5.1.fc21.x86_64 #1 Not tainted
[   26.334460] -------------------------------
[   26.336286] net/openvswitch/flow_table.c:436 suspicious rcu_dereference_check() usage!
[   26.339609] 
[   26.339609] other info that might help us debug this:
[   26.339609] 
[   26.342996] 
[   26.342996] rcu_scheduler_active = 1, debug_locks = 0
[   26.345794] 2 locks held by ovs-vswitchd/720:
[   26.347643]  #0:  (cb_lock){++++++}, at: [<ffffffff816535c9>] genl_rcv+0x19/0x40
[   26.351695]  #1:  (ovs_mutex){+.+.+.}, at: [<ffffffffa0187dc9>] ovs_flow_cmd_new_or_set+0xf9/0x600 [openvswitch]
[   26.356970] 
[   26.356970] stack backtrace:
[   26.358825] CPU: 1 PID: 720 Comm: ovs-vswitchd Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1
[   26.362207] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[   26.364499]  0000000000000001 ffff880499af37e0 ffffffff8175b6d2 ffff8800da5ab3a0
[   26.368323]  ffff880499af3810 ffffffff810cd2f7 ffff8804a29c9ea0 ffff880499af387c
[   26.372140]  ffff8804986f8a10 ffff8804986f89f0 ffff880499af3860 ffffffffa018f608
[   26.375929] Call Trace:
[   26.377070]  [<ffffffff8175b6d2>] dump_stack+0x4d/0x66
[   26.379340]  [<ffffffff810cd2f7>] lockdep_rcu_suspicious+0xe7/0x120
[   26.381818]  [<ffffffffa018f608>] ovs_flow_tbl_lookup+0x188/0x1b0 [openvswitch]
[   26.384835]  [<ffffffffa0187e06>] ovs_flow_cmd_new_or_set+0x136/0x600 [openvswitch]
[   26.387961]  [<ffffffff816552c1>] ? genl_family_rcv_msg+0x2c1/0x370
[   26.390506]  [<ffffffff8165518d>] genl_family_rcv_msg+0x18d/0x370
[   26.393025]  [<ffffffff81655370>] ? genl_family_rcv_msg+0x370/0x370
[   26.395811]  [<ffffffff816553fe>] genl_rcv_msg+0x8e/0xd0
[   26.398045]  [<ffffffff81653079>] netlink_rcv_skb+0xa9/0xc0
[   26.400416]  [<ffffffff816535d8>] genl_rcv+0x28/0x40
[   26.402561]  [<ffffffff816526a5>] netlink_unicast+0xd5/0x1a0
[   26.404971]  [<ffffffff81652aa7>] netlink_sendmsg+0x337/0x750
[   26.407294]  [<ffffffff81600cab>] sock_sendmsg+0x8b/0xc0
[   26.409646]  [<ffffffff811a7dbf>] ? might_fault+0x5f/0xb0
[   26.411851]  [<ffffffff811a7e08>] ? might_fault+0xa8/0xb0
[   26.414103]  [<ffffffff811a7dbf>] ? might_fault+0x5f/0xb0
[   26.416280]  [<ffffffff81610a5e>] ? verify_iovec+0x5e/0xe0
[   26.418645]  [<ffffffff81601109>] ___sys_sendmsg+0x3d9/0x3f0
[   26.421138]  [<ffffffff810575e7>] ? kvm_clock_read+0x27/0x40
[   26.423483]  [<ffffffff81021a59>] ? sched_clock+0x9/0x10
[   26.425726]  [<ffffffff810b64cd>] ? sched_clock_local+0x1d/0x80
[   26.428104]  [<ffffffff810b6658>] ? sched_clock_cpu+0xa8/0x100
[   26.430515]  [<ffffffff810cb4cd>] ? trace_hardirqs_off+0xd/0x10
[   26.432914]  [<ffffffff810b679f>] ? local_clock+0x5f/0x70
[   26.435118]  [<ffffffff810cc07f>] ? lock_release_holdtime.part.29+0xf/0x190
[   26.439405]  [<ffffffff8121c85d>] ? fget_light+0xed/0x4f0
[   26.441705]  [<ffffffff8121c7ac>] ? fget_light+0x3c/0x4f0
[   26.444102]  [<ffffffff81602662>] __sys_sendmsg+0x42/0x80
[   26.446293]  [<ffffffff816026b2>] SyS_sendmsg+0x12/0x20
[   26.448435]  [<ffffffff8176e329>] system_call_fastpath+0x16/0x1b
.
.
.
---------------------

[This virtual machine has Open vSwitch daemon and a bunch of other OpenStack networking components running.]


I filed this here first -- https://bugzilla.redhat.com/show_bug.cgi?id=1046286
Comment 1 Kashyap Chamarthy 2013-12-27 13:56:35 UTC
Just to note this is reproducible, I ran into it again when I booted the VM through its serial console:

---------------------
.
.
.
[  OK  ] Started RPC bind service.
[   23.707895] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   24.460881] gre: GRE over IPv4 demultiplexor driver
[   24.534446] openvswitch: Open vSwitch switching datapath
[   24.764568] device ovs-system entered promiscuous mode
[   25.109235] device br-int entered promiscuous mode
[   25.121450] 
[   25.122468] ===============================
[   25.124704] [ INFO: suspicious RCU usage. ]
[   25.126644] 3.13.0-0.rc4.git5.1.fc21.x86_64 #1 Not tainted
[   25.129110] -------------------------------
[   25.132497] net/openvswitch/flow_table.c:436 suspicious rcu_dereference_check() usage!
[   25.135792] 
[   25.135792] other info that might help us debug this:
[   25.135792] 
[   25.139221] 
[   25.139221] rcu_scheduler_active = 1, debug_locks = 0
[   25.142044] 2 locks held by ovs-vswitchd/719:
[   25.143934]  #0:  (cb_lock){++++++}, at: [<ffffffff816535c9>] genl_rcv+0x19/0x40
[   25.148294]  #1:  (ovs_mutex){+.+.+.}, at: [<ffffffffa01aedc9>] ovs_flow_cmd_new_or_set+0xf9/0x600 [openvswitch]
[   25.153885] 
[   25.153885] stack backtrace:
[   25.156553] CPU: 10 PID: 719 Comm: ovs-vswitchd Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1
[   25.162587] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[   25.165237]  0000000000000001 ffff8800370877e0 ffffffff8175b6d2 ffff880499f60000
[   25.169376]  ffff880037087810 ffffffff810cd2f7 ffff88049bfb1260 ffff88003708787c
[   25.175682]  ffff8800daaa7d68 ffff8800daaa7d48 ffff880037087860 ffffffffa01b6608
[   25.180373] Call Trace:
[   25.181540]  [<ffffffff8175b6d2>] dump_stack+0x4d/0x66
[   25.183970]  [<ffffffff810cd2f7>] lockdep_rcu_suspicious+0xe7/0x120
[   25.186854]  [<ffffffffa01b6608>] ovs_flow_tbl_lookup+0x188/0x1b0 [openvswitch]
[   25.189780]  [<ffffffffa01aee06>] ovs_flow_cmd_new_or_set+0x136/0x600 [openvswitch]
[   25.193182]  [<ffffffff816552c1>] ? genl_family_rcv_msg+0x2c1/0x370
[   25.195832]  [<ffffffff8165518d>] genl_family_rcv_msg+0x18d/0x370
[   25.198584]  [<ffffffff81655370>] ? genl_family_rcv_msg+0x370/0x370
[   25.201229]  [<ffffffff816553fe>] genl_rcv_msg+0x8e/0xd0
[   25.203439]  [<ffffffff81653079>] netlink_rcv_skb+0xa9/0xc0
[   25.205736]  [<ffffffff816535d8>] genl_rcv+0x28/0x40
[   25.207802]  [<ffffffff816526a5>] netlink_unicast+0xd5/0x1a0
[   25.210136]  [<ffffffff81652aa7>] netlink_sendmsg+0x337/0x750
[   25.212643]  [<ffffffff81600cab>] sock_sendmsg+0x8b/0xc0
[   25.214964]  [<ffffffff811a7dbf>] ? might_fault+0x5f/0xb0
[   25.217209]  [<ffffffff811a7e08>] ? might_fault+0xa8/0xb0
[   25.219443]  [<ffffffff811a7dbf>] ? might_fault+0x5f/0xb0
[   25.221785]  [<ffffffff81610a5e>] ? verify_iovec+0x5e/0xe0
[   25.224309]  [<ffffffff81601109>] ___sys_sendmsg+0x3d9/0x3f0
[   25.227001]  [<ffffffff810575e7>] ? kvm_clock_read+0x27/0x40
[   25.229696]  [<ffffffff81021a59>] ? sched_clock+0x9/0x10
[   25.232047]  [<ffffffff810b64cd>] ? sched_clock_local+0x1d/0x80
[   25.235062]  [<ffffffff810b6658>] ? sched_clock_cpu+0xa8/0x100
[   25.238061]  [<ffffffff810cb4cd>] ? trace_hardirqs_off+0xd/0x10
[   25.240594]  [<ffffffff810b679f>] ? local_clock+0x5f/0x70
[   25.243667]  [<ffffffff810cc07f>] ? lock_release_holdtime.part.29+0xf/0x190
[   25.246875]  [<ffffffff8121c85d>] ? fget_light+0xed/0x4f0
[   25.249346]  [<ffffffff8121c7ac>] ? fget_light+0x3c/0x4f0
[   25.252287]  [<ffffffff81602662>] __sys_sendmsg+0x42/0x80
[   25.254691]  [<ffffffff816026b2>] SyS_sendmsg+0x12/0x20
[   25.257184]  [<ffffffff8176e329>] system_call_fastpath+0x16/0x1b
[   25.359076] device br-tun entered promiscuous mode

Fedora release 20 (Heisenbug)
Kernel 3.13.0-0.rc4.git5.1.fc21.x86_64 on an x86_64 (ttyS0)
---------------------
Comment 2 Kashyap Chamarthy 2013-12-27 14:32:55 UTC
Created attachment 119731 [details]
Full dmesg output captured from the VM
Comment 3 Kashyap Chamarthy 2014-01-28 08:28:09 UTC
Still able to reproduce with 3.14.0-0.rc0.git9.1.fc21.x86_64:

---------------------
[  933.596722] ===============================
[  933.599241] [ INFO: suspicious RCU usage. ]
[  933.601825] 3.14.0-0.rc0.git9.1.fc21.x86_64 #1 Not tainted
[  933.605019] -------------------------------
[  933.607412] net/openvswitch/flow_table.c:436 suspicious rcu_dereference_check() usage!
[  933.611737] 
[  933.611737] other info that might help us debug this:
[  933.611737] 
[  933.616302] 
[  933.616302] rcu_scheduler_active = 1, debug_locks = 0
[  933.619503] 2 locks held by ovs-vswitchd/729:
[  933.621820]  #0:  (cb_lock){++++++}, at: [<ffffffff8168c549>] genl_rcv+0x19/0x40
[  933.626678]  #1:  (ovs_mutex){+.+.+.}, at: [<ffffffffa020fd8a>] ovs_flow_cmd_new_or_set+0xea/0x5e0 [openvswitch]
[  933.632616] 
[  933.632616] stack backtrace:
[  933.634813] CPU: 18 PID: 729 Comm: ovs-vswitchd Not tainted 3.14.0-0.rc0.git9.1.fc21.x86_64 #1
[  933.638912] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[  933.641710]  0000000000000001 ffff8804aef0f7e8 ffffffff81793c95 ffff8804aed68000
[  933.646034]  ffff8804aef0f818 ffffffff810f0227 ffff8800c4286338 ffff8804aef0f884
[  933.650781]  ffff8804ae77ba18 ffff8804ae77b9f8 ffff8804aef0f868 ffffffffa0217548
[  933.655559] Call Trace:
[  933.657021]  [<ffffffff81793c95>] dump_stack+0x4d/0x66
[  933.659752]  [<ffffffff810f0227>] lockdep_rcu_suspicious+0xe7/0x120
[  933.662843]  [<ffffffffa0217548>] ovs_flow_tbl_lookup+0x188/0x1b0 [openvswitch]
[  933.666377]  [<ffffffffa020fdc7>] ovs_flow_cmd_new_or_set+0x127/0x5e0 [openvswitch]
[  933.669879]  [<ffffffff810f1b75>] ? trace_hardirqs_on_caller+0x105/0x1d0
[  933.673063]  [<ffffffff8168e241>] ? genl_family_rcv_msg+0x2c1/0x370
[  933.676001]  [<ffffffff8168e10d>] genl_family_rcv_msg+0x18d/0x370
[  933.678936]  [<ffffffff8168e2f0>] ? genl_family_rcv_msg+0x370/0x370
[  933.681919]  [<ffffffff8168e37e>] genl_rcv_msg+0x8e/0xd0
[  933.684516]  [<ffffffff8168bff9>] netlink_rcv_skb+0xa9/0xc0
[  933.687138]  [<ffffffff8168c558>] genl_rcv+0x28/0x40
[  933.689583]  [<ffffffff8168b625>] netlink_unicast+0xd5/0x1a0
[  933.692306]  [<ffffffff8168ba27>] netlink_sendmsg+0x337/0x750
[  933.694977]  [<ffffffff81639b5b>] sock_sendmsg+0x8b/0xc0
[  933.697504]  [<ffffffff811cf1cf>] ? might_fault+0x5f/0xb0
[  933.700255]  [<ffffffff811cf218>] ? might_fault+0xa8/0xb0
[  933.702754]  [<ffffffff811cf1cf>] ? might_fault+0x5f/0xb0
[  933.705292]  [<ffffffff8164974e>] ? verify_iovec+0x5e/0xe0
[  933.707888]  [<ffffffff81639f69>] ___sys_sendmsg+0x389/0x3a
[  933.710357]  [<ffffffff8105a597>] ? kvm_clock_read+0x27/0x40
[  933.712658]  [<ffffffff81023119>] ? sched_clock+0x9/0x10
[  933.714032]  [<ffffffff810d6cde>] ? local_clock+0xe/0x30
[  933.715441]  [<ffffffff810ef01f>] ? lock_release_holdtime.part.29+0xf/0x190
[  933.717454]  [<ffffffff8124411d>] ? fget_light+0xed/0x4f0
[  933.719799]  [<ffffffff8124406c>] ? fget_light+0x3c/0x4f0
[  933.721273]  [<ffffffff8163b422>] __sys_sendmsg+0x42/0x80
[  933.722749]  [<ffffffff8163b472>] SyS_sendmsg+0x12/0x20
[  933.724359]  [<ffffffff817a6f29>] system_call_fastpath+0x16/0x1b
[  933.774858] device br-tun entered promiscuous mode
[  936.966368] systemd-udevd (530) used greatest stack depth: 3272 bytes left

Fedora release 20 (Heisenbug)
Kernel 3.14.0-0.rc0.git9.1.fc21.x86_64 on an x86_64 (ttyS0)
---------------------
Comment 4 Kashyap Chamarthy 2014-01-31 16:25:26 UTC
For reference, I briefly discussed this with Thomas Graf. "It's nothing serious, but the warnings should be silenced", he said.
Comment 5 Kashyap Chamarthy 2014-06-24 09:53:55 UTC
I just saw this again when I attempt to restart OpenvSwitch with Kernel -- 3.16.0-0.rc1.git1.1.fc21.x86_64

---------------------
[ 9695.079715] 
[ 9695.080492] ===============================
[ 9695.082559] [ INFO: suspicious RCU usage. ]
[ 9695.085105] 3.16.0-0.rc1.git1.1.fc21.x86_64 #1 Not tainted
[ 9695.088505] -------------------------------
[ 9695.090413] net/openvswitch/flow_table.c:255 suspicious rcu_dereference_protected() usage!
[ 9695.094887] 
[ 9695.094887] other info that might help us debug this:
[ 9695.094887] 
[ 9695.100722] 
[ 9695.100722] rcu_scheduler_active = 1, debug_locks = 0
[ 9695.105699] 1 lock held by ovs-vswitchd/1910:
[ 9695.108199]  #0:  (cb_lock){++++++}, at: [<ffffffff816f5159>] genl_rcv+0x19/0x40
[ 9695.113083] 
[ 9695.113083] stack backtrace:
[ 9695.115395] CPU: 0 PID: 1910 Comm: ovs-vswitchd Not tainted 3.16.0-0.rc1.git1.1.fc21.x86_64 #1
[ 9695.121638] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 9695.124076]  0000000000000000 000000002c60696c ffff8800b64ff968 ffffffff81807520
[ 9695.127732]  ffff88029ce24d70 ffff8800b64ff998 ffffffff810fd7f7 0000000000000000
[ 9695.131321]  ffff88029ccc6d20 ffff8800b64ffa90 ffff88029ccc6d00 ffff8800b64ff9b8
[ 9695.134931] Call Trace:
[ 9695.136158]  [<ffffffff81807520>] dump_stack+0x4d/0x66
[ 9695.139031]  [<ffffffff810fd7f7>] lockdep_rcu_suspicious+0xe7/0x120
[ 9695.141850]  [<ffffffffa022abb0>] ovs_flow_tbl_destroy+0x60/0x70 [openvswitch]
[ 9695.145034]  [<ffffffffa02247b1>] ovs_dp_cmd_new+0x311/0x4b0 [openvswitch]
[ 9695.148665]  [<ffffffff816f621c>] genl_family_rcv_msg+0x1bc/0x3e0
[ 9695.152987]  [<ffffffff81024369>] ? sched_clock+0x9/0x10
[ 9695.157925]  [<ffffffff810e1afd>] ? sched_clock_local+0x1d/0x90
[ 9695.162486]  [<ffffffff816f64c4>] genl_rcv_msg+0x84/0xc0
[ 9695.166797]  [<ffffffff816f6440>] ? genl_family_rcv_msg+0x3e0/0x3e0
[ 9695.171411]  [<ffffffff816f4b89>] netlink_rcv_skb+0xa9/0xd0
[ 9695.175809]  [<ffffffff816f5168>] genl_rcv+0x28/0x40
[ 9695.179489]  [<ffffffff816f4185>] netlink_unicast+0x125/0x1a0
[ 9695.184300]  [<ffffffff816f455f>] netlink_sendmsg+0x35f/0x7d0
[ 9695.188510]  [<ffffffff8169b57e>] sock_sendmsg+0x9e/0xe0
[ 9695.192580]  [<ffffffff811f189e>] ? might_fault+0x5e/0xc0
[ 9695.196850]  [<ffffffff811f18f9>] ? might_fault+0xb9/0xc0
[ 9695.200907]  [<ffffffff811f189e>] ? might_fault+0x5e/0xc0
[ 9695.204862]  [<ffffffff8169c5b8>] ___sys_sendmsg+0x408/0x420
[ 9695.208962]  [<ffffffff8126ed5d>] ? __fget_light+0x13d/0x160
[ 9695.213088]  [<ffffffff8126ed5d>] ? __fget_light+0x13d/0x160
[ 9695.217652]  [<ffffffff8169cda1>] __sys_sendmsg+0x51/0x90
[ 9695.220513]  [<ffffffff8169cdf2>] SyS_sendmsg+0x12/0x20
[ 9695.223422]  [<ffffffff818110e9>] system_call_fastpath+0x16/0x1b
---------------------

Note You need to log in before you can comment on or make changes to this bug.