Network namespace cleanup sometimes crashes while cleaning up NAT conntrack entries. In my case, this has been happening intermittently on an Amazon EC2 instance running several LXC containers each with their own network namespace. Traffic is redirected to/from the containers using iptables NAT port mapping. When a container is being shut down, I simultaneously issue iptables commands to remove the NAT port mappings for that container. I have seen this crash happen a few times now, every time with the same call trace. [9661983.179708] BUG: unable to handle kernel paging request at ffffc90001821718 [9661983.179725] IP: [<ffffffffa02cf36d>] nf_nat_cleanup_conntrack+0x3d/0x80 [nf_nat] [9661983.179740] PGD ea41e067 PUD ea41f067 PMD 69279067 PTE 0 [9661983.179750] Oops: 0002 [#1] SMP [9661983.179757] Modules linked in: fuse veth xt_nat xt_comment overlayfs sit tunnel4 ip_tunnel ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack coretemp nf_conntrack btrfs crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel xor raid6_pq bridge aes_x86_64 lrw crc32c gf128mul libcrc32c stp glue_helper zlib_deflate llc ablk_helper cryptd microcode xen_netfront iptable_filter ip_tables x_tables ext4 crc16 mbcache jbd2 xen_blkfront [9661983.179834] CPU: 0 PID: 19984 Comm: kworker/u2:2 Not tainted 3.10.17-1-ec2 #1 [9661983.179848] Workqueue: netns cleanup_net [9661983.179854] task: ffff88003eee5d00 ti: ffff8800b1a08000 task.ti: ffff8800b1a08000 [9661983.179861] RIP: e030:[<ffffffffa02cf36d>] [<ffffffffa02cf36d>] nf_nat_cleanup_conntrack+0x3d/0x80 [nf_nat] [9661983.179872] RSP: e02b:ffff8800b1a09cc8 EFLAGS: 00010246 [9661983.179877] RAX: 0000000000000000 RBX: ffff880047cd9f88 RCX: 0000000000000000 [9661983.179883] RDX: ffffc90001821718 RSI: 0000000000000006 RDI: ffffffffa02d1d38 [9661983.179889] RBP: ffff8800b1a09cd0 R08: 0000000000000000 R09: ffff8800ef616e60 [9661983.179896] R10: ffffea000150b9c0 R11: ffffffff8115db46 R12: ffff880047cd9f00 [9661983.179902] R13: ffff88004d1af000 R14: ffff88004d1af008 R15: ffff88001dd01000 [9661983.179914] FS: 00007f3435e80700(0000) GS:ffff8800ef600000(0000) knlGS:0000000000000000 [9661983.179921] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [9661983.179927] CR2: ffffc90001821718 CR3: 00000000c11aa000 CR4: 0000000000002660 [9661983.179933] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [9661983.179940] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [9661983.179945] Stack: [9661983.179949] 0000000000000001 ffff8800b1a09cf8 ffffffffa02970f4 ffff88004d1af000 [9661983.179960] ffff88001dd01000 ffffffffa02c5460 ffff8800b1a09d18 ffffffffa028f625 [9661983.179970] ffff88004d1af000 ffff88001dd01000 ffff8800b1a09d38 ffffffffa028f711 [9661983.179980] Call Trace: [9661983.179993] [<ffffffffa02970f4>] __nf_ct_ext_destroy+0x44/0x60 [nf_conntrack] [9661983.180005] [<ffffffffa028f625>] nf_conntrack_free+0x25/0x60 [nf_conntrack] [9661983.180015] [<ffffffffa028f711>] destroy_conntrack+0xb1/0xc0 [nf_conntrack] [9661983.180027] [<ffffffffa02941a0>] ? nf_conntrack_helper_fini+0x30/0x30 [nf_conntrack] [9661983.180037] [<ffffffff813d0cae>] nf_conntrack_destroy+0x1e/0x20 [9661983.180047] [<ffffffffa028f4ca>] nf_ct_iterate_cleanup+0x5a/0x160 [nf_conntrack] [9661983.180060] [<ffffffffa0294568>] nf_ct_l3proto_pernet_unregister+0x18/0x20 [nf_conntrack] [9661983.180069] [<ffffffffa02c4499>] ipv4_net_exit+0x19/0x50 [nf_conntrack_ipv4] [9661983.180079] [<ffffffff8139be6d>] ops_exit_list.isra.4+0x4d/0x70 [9661983.180086] [<ffffffff8139c698>] cleanup_net+0x148/0x1e0 [9661983.180098] [<ffffffff8107178d>] process_one_work+0x22d/0x3e0 [9661983.180106] [<ffffffff81072786>] worker_thread+0x226/0x3b0 [9661983.180114] [<ffffffff81072560>] ? manage_workers.isra.18+0x350/0x350 [9661983.180125] [<ffffffff810783bb>] kthread+0xbb/0xd0 [9661983.180133] [<ffffffff81078300>] ? kthread_stop+0x100/0x100 [9661983.180142] [<ffffffff814a1c2c>] ret_from_fork+0x7c/0xb0 [9661983.180149] [<ffffffff81078300>] ? kthread_stop+0x100/0x100 [9661983.180154] Code: 48 89 e5 53 0f b6 58 11 84 db 75 4a eb 50 48 83 7b 20 00 74 49 48 c7 c7 38 1d 2d a0 e8 ad a0 1c e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 02 20 00 00 00 ad de 48 c7 [9661983.180231] RIP [<ffffffffa02cf36d>] nf_nat_cleanup_conntrack+0x3d/0x80 [nf_nat] [9661983.180240] RSP <ffff8800b1a09cc8> [9661983.180244] CR2: ffffc90001821718 [9661983.180253] ---[ end trace d8d85302e25b48d2 ]--- [9661983.180259] Kernel panic - not syncing: Fatal exception in interrupt
Can you test 3.12 on that setup and see if it is then stable ?
I'll try to get a reliable test case going in the next few days. This issue is occurring on production servers, so I'm reluctant to upgrade away from long term stable there.
I think I'm getting the same problem (this in an LXC environment) on a development server so feel free to ask for more info. This is on a 3.13.1 kernel and triggers on UP as well as SMP fairly consistently. KERNEL: /home/wwadge/linux-3.13.1/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 1 DATE: Sat Feb 1 14:34:00 2014 UPTIME: 21:15:10 LOAD AVERAGE: 0.10, 0.09, 0.06 TASKS: 498 NODENAME: cds-dws-centos RELEASE: 3.13.1 VERSION: #3 SMP Fri Jan 31 16:51:22 GMT 2014 MACHINE: x86_64 (2660 Mhz) MEMORY: 2 GB PANIC: WARNING: log buf data structure(s) have changed "" PID: 11536 COMMAND: "kworker/u2:1" TASK: ffff880037de12d0 [THREAD_INFO: ffff880000048000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> bt PID: 11536 TASK: ffff880037de12d0 CPU: 0 COMMAND: "kworker/u2:1" #0 [ffff880000049850] machine_kexec at ffffffff8104ca40 #1 [ffff8800000498c0] crash_kexec at ffffffff810e26e8 #2 [ffff880000049990] oops_end at ffffffff81618258 #3 [ffff8800000499c0] no_context at ffffffff81056a1e #4 [ffff880000049a10] __bad_area_nosemaphore at ffffffff81056c1d #5 [ffff880000049a60] bad_area_nosemaphore at ffffffff81056d33 #6 [ffff880000049a70] __do_page_fault at ffffffff8161b046 #7 [ffff880000049b90] do_page_fault at ffffffff8161b26e #8 [ffff880000049ba0] page_fault at ffffffff81617688 [exception RIP: nf_nat_cleanup_conntrack+70] RIP: ffffffffa0331236 RSP: ffff880000049c58 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800632ebe08 RCX: ffff8800566e2c18 RDX: ffffc90011b9e288 RSI: 0000000000000006 RDI: ffffffffa03343e8 RBP: ffff880000049c68 R8: dead000000200200 R9: dead000000200200 R10: dead000000200200 R11: ffffffc000000030 R12: ffff8800632ebd81 R13: ffff8800566e2c18 R14: ffff880000049d14 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #9 [ffff880000049c70] __nf_ct_ext_destroy at ffffffffa023a2b5 [nf_conntrack] #10 [ffff880000049ca0] nf_conntrack_free at ffffffffa0231fdf [nf_conntrack] #11 [ffff880000049cc0] destroy_conntrack at ffffffffa0232ee2 [nf_conntrack] #12 [ffff880000049ce0] nf_conntrack_destroy at ffffffff8157d787 #13 [ffff880000049cf0] nf_ct_iterate_cleanup at ffffffffa02323f8 [nf_conntrack] #14 [ffff880000049d50] nf_ct_l3proto_pernet_unregister at ffffffffa02374c9 [nf_conntrack] #15 [ffff880000049d70] ipv4_net_exit at ffffffffa0250e2d [nf_conntrack_ipv4] #16 [ffff880000049d90] ops_exit_list at ffffffff81546269 #17 [ffff880000049dc0] cleanup_net at ffffffff81546843 #18 [ffff880000049e00] process_one_work at ffffffff8108468c #19 [ffff880000049e50] worker_thread at ffffffff81085ae3 #20 [ffff880000049ec0] kthread at ffffffff8108b73e #21 [ffff880000049f50] ret_from_fork at ffffffff8161f9fc ---- crash> net NET_DEVICE NAME IP ADDRESS(ES) ffff88007ca9a000 lo 127.0.0.1 ffff88007b746000 eth0 10.20.70.9 ffff880075730000 virbr0 192.168.122.1 ffff88007c0dc000 virbr0-nic ffff880075760000 docker0 172.17.42.1 ffff88006e5ea000 vethQBjNiA ffff88006e42b000 vethKx1UUw ffff88003ae94000 vethPPPQ5A ffff88006334e000 veth9Lc57z ffff8800566c0000 vethdlEGPB ffff88003e8d2000 vethoU41sj --- I can provide crashlog + .config too if need be.
On Sat, Feb 01, 2014 at 04:54:21PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=65191 > > Wallace Wadge <wwadge@gmail.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |wwadge@gmail.com > > --- Comment #3 from Wallace Wadge <wwadge@gmail.com> --- > I think I'm getting the same problem (this in an LXC environment) on a > development server so feel free to ask for more info. > > This is on a 3.13.1 kernel and triggers on UP as well as SMP fairly > consistently. Could you turn on CONFIG_NETFILTER_DEBUG and see if the assertion in nf_nat_cleanup_conntrack() hits please?
Enabled but don't see asserts. I upgraded my crash utility so I got a little more info: --- [ 1515.714065] BUG: unable to handle kernel paging request at ffffc90011b95fc8 [ 1515.714118] IP: [<ffffffffa033623e>] nf_nat_cleanup_conntrack+0x4e/0x90 [nf_nat] [ 1515.714167] PGD 7f851067 PUD 7f852067 PMD 74426067 PTE 0 [ 1515.714208] Oops: 0002 [#1] SMP [ 1515.714240] Modules linked in: veth xt_nat xt_addrtype xt_conntrack ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT xt_CHECKSUM iptable_mangle tun bridge stp llc dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ip6_tables ppdev vmw_balloon pcspkr parport_pc parport e1000 sg vmw_vmci i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif crct10dif_common sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix floppy vmwgfx ttm dm_mirror dm_region_hash dm_log dm_mod [ 1515.714656] CPU: 0 PID: 10628 Comm: kworker/u2:4 Not tainted 3.13.1 #4 [ 1515.714698] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012 [ 1515.714774] Workqueue: netns cleanup_net [ 1515.714806] task: ffff88001c4ea610 ti: ffff88001c4f0000 task.ti: ffff88001c4f0000 [ 1515.714855] RIP: 0010:[<ffffffffa033623e>] [<ffffffffa033623e>] nf_nat_cleanup_conntrack+0x4e/0x90 [nf_nat] [ 1515.714897] RSP: 0000:ffff88001c4f1c58 EFLAGS: 00010246 [ 1515.714917] RAX: 0000000000000000 RBX: ffff88003af1eb88 RCX: ffff880062792c18 [ 1515.714940] RDX: ffffc90011b95fc8 RSI: dead000000200200 RDI: ffffffffa03393e8 [ 1515.714962] RBP: ffff88001c4f1c68 R08: 0000000000000000 R09: dead000000200200 [ 1515.714986] R10: dead000000200200 R11: ffffffc000000030 R12: ffff88003af1eb01 [ 1515.715009] R13: ffff880062792c18 R14: ffff88001c4f1d14 R15: 0000000000000000 [ 1515.715033] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 1515.715060] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1515.715078] CR2: ffffc90011b95fc8 CR3: 000000001df89000 CR4: 00000000000007f0 [ 1515.715173] Stack: [ 1515.715178] ffff88001c4f1c78 ffffffffa0246c28 ffff88001c4f1c98 ffffffffa023f545 [ 1515.715208] ffff88001c4f1ce8 ffff880062792c18 ffff88000065e400 ffffffffa02583a0 [ 1515.715238] ffff88001c4f1cb8 ffffffffa023709f ffff880062792c18 ffff88000065e400 [ 1515.715266] Call Trace: [ 1515.715983] [<ffffffffa023f545>] __nf_ct_ext_destroy+0x45/0x70 [nf_conntrack] [ 1515.716664] [<ffffffffa023709f>] nf_conntrack_free+0x2f/0x70 [nf_conntrack] [ 1515.717345] [<ffffffffa02384ea>] destroy_conntrack+0xba/0x110 [nf_conntrack] [ 1515.718041] [<ffffffffa023c1e0>] ? nf_conntrack_helper_unregister+0xc0/0xc0 [nf_conntrack] [ 1515.718726] [<ffffffff8157d6c7>] nf_conntrack_destroy+0x17/0x30 [ 1515.719405] [<ffffffffa0238208>] nf_ct_iterate_cleanup+0x78/0xb0 [nf_conntrack] [ 1515.720087] [<ffffffffa023c759>] nf_ct_l3proto_pernet_unregister+0x39/0x80 [nf_conntrack] [ 1515.720765] [<ffffffffa0255ecd>] ipv4_net_exit+0x1d/0x60 [nf_conntrack_ipv4] [ 1515.721437] [<ffffffff81546269>] ops_exit_list+0x39/0x60 [ 1515.722104] [<ffffffff81546843>] cleanup_net+0x103/0x1b0 [ 1515.722752] [<ffffffff8108468c>] process_one_work+0x17c/0x420 [ 1515.723385] [<ffffffff81085ae3>] worker_thread+0x123/0x400 [ 1515.724000] [<ffffffff810859c0>] ? manage_workers+0x170/0x170 [ 1515.724600] [<ffffffff8108b73e>] kthread+0xce/0xf0 [ 1515.725180] [<ffffffff8108b670>] ? kthread_freezable_should_stop+0x70/0x70 [ 1515.725752] [<ffffffff8161fa7c>] ret_from_fork+0x7c/0xb0 [ 1515.726306] [<ffffffff8108b670>] ? kthread_freezable_should_stop+0x70/0x70 [ 1515.726860] Code: b6 db 48 8d 1c 18 48 8b 43 10 48 85 c0 74 3f 80 78 78 00 79 40 48 c7 c7 e8 93 33 a0 e8 1c 0d 2e e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b9 00 02 20 00 00 00 ad de 48 c7 [ 1515.728048] RIP [<ffffffffa033623e>] nf_nat_cleanup_conntrack+0x4e/0x90 [nf_nat] [ 1515.728622] RSP <ffff88001c4f1c58> [ 1515.729181] CR2: ffffc90011b95fc8 crash> ---
And not sure if I got the address right in this debug attempt but here's some more info: --- crash> bt PID: 10628 TASK: ffff88001c4ea610 CPU: 0 COMMAND: "kworker/u2:4" #0 [ffff88001c4f1850] machine_kexec at ffffffff8104ca40 #1 [ffff88001c4f18c0] crash_kexec at ffffffff810e26e8 #2 [ffff88001c4f1990] oops_end at ffffffff816182d8 #3 [ffff88001c4f19c0] no_context at ffffffff81056a1e #4 [ffff88001c4f1a10] __bad_area_nosemaphore at ffffffff81056c1d #5 [ffff88001c4f1a60] bad_area_nosemaphore at ffffffff81056d33 #6 [ffff88001c4f1a70] __do_page_fault at ffffffff8161b0c6 #7 [ffff88001c4f1b90] do_page_fault at ffffffff8161b2ee #8 [ffff88001c4f1ba0] page_fault at ffffffff81617708 [exception RIP: nf_nat_cleanup_conntrack+78] RIP: ffffffffa033623e RSP: ffff88001c4f1c58 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88003af1eb88 RCX: ffff880062792c18 RDX: ffffc90011b95fc8 RSI: dead000000200200 RDI: ffffffffa03393e8 RBP: ffff88001c4f1c68 R8: 0000000000000000 R9: dead000000200200 R10: dead000000200200 R11: ffffffc000000030 R12: ffff88003af1eb01 R13: ffff880062792c18 R14: ffff88001c4f1d14 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #9 [ffff88001c4f1c70] __nf_ct_ext_destroy at ffffffffa023f545 [nf_conntrack] #10 [ffff88001c4f1ca0] nf_conntrack_free at ffffffffa023709f [nf_conntrack] #11 [ffff88001c4f1cc0] destroy_conntrack at ffffffffa02384ea [nf_conntrack] #12 [ffff88001c4f1ce0] nf_conntrack_destroy at ffffffff8157d6c7 #13 [ffff88001c4f1cf0] nf_ct_iterate_cleanup at ffffffffa0238208 [nf_conntrack] #14 [ffff88001c4f1d50] nf_ct_l3proto_pernet_unregister at ffffffffa023c759 [nf_conntrack] #15 [ffff88001c4f1d70] ipv4_net_exit at ffffffffa0255ecd [nf_conntrack_ipv4] #16 [ffff88001c4f1d90] ops_exit_list at ffffffff81546269 #17 [ffff88001c4f1dc0] cleanup_net at ffffffff81546843 #18 [ffff88001c4f1e00] process_one_work at ffffffff8108468c #19 [ffff88001c4f1e50] worker_thread at ffffffff81085ae3 #20 [ffff88001c4f1ec0] kthread at ffffffff8108b73e #21 [ffff88001c4f1f50] ret_from_fork at ffffffff8161fa7c crash> struct nf_conn ffff880062792c18 struct nf_conn { ct_general = { use = { counter = 0 } }, lock = { { rlock = { raw_lock = { { head_tail = 445651600, tickets = { head = 6800, tail = 6800 } } } } } }, tuplehash = {{ hnnode = { next = 0x80000003, pprev = 0xdead000000200200 }, tuple = { src = { u3 = { all = {335548844, 0, 0, 0}, ip = 335548844, ip6 = {335548844, 0, 0, 0}, in = { s_addr = 335548844 }, in6 = { in6_u = { u6_addr8 = "\254\021\000\024\000\000\000\000\000\000\000\000\000\000\000", u6_addr16 = {4524, 5120, 0, 0, 0, 0, 0, 0}, u6_addr32 = {335548844, 0, 0, 0} } } }, u = { all = 15256, tcp = { port = 15256 }, udp = { port = 15256 }, icmp = { id = 15256 }, dccp = { port = 15256 }, sctp = { port = 15256 }, gre = { key = 15256 } }, l3num = 2 }, dst = { u3 = { all = {3054244874, 0, 0, 0}, ip = 3054244874, ip6 = {3054244874, 0, 0, 0}, in = { s_addr = 3054244874 }, in6 = { in6_u = { u6_addr8 = "\n\024\f\266\000\000\000\000\000\000\000\000\000\000\000", u6_addr16 = {5130, 46604, 0, 0, 0, 0, 0, 0}, u6_addr32 = {3054244874, 0, 0, 0} } } }, u = { all = 37151, tcp = { port = 37151 }, udp = { port = 37151 }, icmp = { type = 31 '\037', code = 145 '\221' }, dccp = { port = 37151 }, sctp = { port = 37151 }, gre = { key = 37151 } }, protonum = 6 '\006', dir = 0 '\000' } } }, { hnnode = { next = 0x4727, pprev = 0xdead000000200200 }, tuple = { src = { u3 = { all = {3054244874, 0, 0, 0}, ip = 3054244874, ip6 = {3054244874, 0, 0, 0}, in = { s_addr = 3054244874 }, in6 = { in6_u = { u6_addr8 = "\n\024\f\266\000\000\000\000\000\000\000\000\000\000\000", u6_addr16 = {5130, 46604, 0, 0, 0, 0, 0, 0}, u6_addr32 = {3054244874, 0, 0, 0} } } }, u = { all = 37151, tcp = { port = 37151 }, udp = { port = 37151 }, icmp = { id = 37151 }, dccp = { port = 37151 }, sctp = { port = 37151 }, gre = { key = 37151 } }, l3num = 2 }, dst = { u3 = { all = {335548844, 0, 0, 0}, ip = 335548844, ip6 = {335548844, 0, 0, 0}, in = { s_addr = 335548844 }, in6 = { in6_u = { u6_addr8 = "\254\021\000\024\000\000\000\000\000\000\000\000\000\000\000", u6_addr16 = {4524, 5120, 0, 0, 0, 0, 0, 0}, u6_addr32 = {335548844, 0, 0, 0} } } }, u = { all = 15256, tcp = { port = 15256 }, udp = { port = 15256 }, icmp = { type = 152 '\230', code = 59 ';' }, dccp = { port = 15256 }, sctp = { port = 15256 }, gre = { key = 15256 } }, protonum = 6 '\006', dir = 1 '\001' } } }}, status = 910, master = 0x0, timeout = { entry = { next = 0x0, prev = 0xdead000000200200 }, expires = 4296302640, base = 0xffffffff81fb4f80 <boot_tvec_bases>, function = 0xffffffffa0238410 <death_by_timeout>, data = 18446612133966326808, slack = -1, start_pid = -1, start_site = 0x0, start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, mark = 0, secmark = 0, ext = 0xffff88003af1eb00, ct_net = 0xffff88000065e400, proto = { dccp = { role = "\377\327", state = 116 't', last_pkt = 113 'q', last_dir = 255 '\377', handshake_seq = 15620773601466063744 }, sctp = { state = 1903482879, vtag = {1903499519, 3087232} }, tcp = { seen = {{ td_end = 1903482879, td_maxend = 1903499519, td_maxwin = 3087232, td_maxack = 3636994772, td_scale = 7 '\a', flags = 39 '\'' }, { td_end = 3636994772, td_maxend = 3640082004, td_maxwin = 16640, td_maxack = 1903482879, td_scale = 7 '\a', flags = 35 '#' }}, state = 7 '\a', last_dir = 0 '\000', retrans = 0 '\000', last_index = 3 '\003', last_seq = 1903482879, last_ack = 3636994772, last_end = 1903482879, last_win = 7040, last_wscale = 0 '\000', last_flags = 0 '\000' }, gre = { stream_timeout = 1903482879, timeout = 1903499519 } } } crash>
We've also been seeing this panic on several of our bare metal machines. Unfortunately I'm unable to gather crashdumps on these systems due to and unrelated hardware issue. Kernel version: 3.13.0-24-generic SMP i686 Distro: Ubuntu trusty We're seeing this when using a bridge to nat lxc containers with libvirt. The following sysctl tunings seem to have alleviated the symptoms for us, though we're still testing: sysctl -w net.netfilter.nf_conntrack_max=1048576 sysctl -w net.nf_conntrack_max=1048576
This has also been reproduced at Heroku with the v3.15-rc2 kernel: [17345307.967478] BUG: unable to handle kernel paging request at ffffc90003777a70 [17345307.967497] IP: [<ffffffffa013f0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat] [17345307.967510] PGD 1b6425067 PUD 1b6426067 PMD 1b0aed067 PTE 0 [17345307.967519] Oops: 0002 [#1] SMP [17345307.967525] Modules linked in: xt_nat veth tcp_diag inet_diag xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat bridge stp llc xt_owner isofs ipt_REJECT xt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables dm_crypt microcode raid10 raid456 async_memcpy async_raid6_recov async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear [17345307.967578] CPU: 0 PID: 6 Comm: kworker/u16:0 Not tainted 3.15.0-031500rc2-generic #201404201435 [17345307.967591] Workqueue: netns cleanup_net [17345307.967596] task: ffff8801b39d0000 ti: ffff8801b39cc000 task.ti: ffff8801b39cc000 [17345307.967601] RIP: e030:[<ffffffffa013f0b6>] [<ffffffffa013f0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat] [17345307.967611] RSP: e02b:ffff8801b39cdc48 EFLAGS: 00010246 [17345307.967617] RAX: 0000000000000000 RBX: ffff8801b1cf7b10 RCX: ffff880003110000 [17345307.967624] RDX: ffffc90003777a70 RSI: 0000000000000200 RDI: ffffffffa01434e0 [17345307.967630] RBP: ffff8801b39cdc58 R08: 0000000058690aeb R09: 00000000e834b0f3 [17345307.967636] R10: ffff880003110070 R11: 0000000000000002 R12: ffff8801b1cf7a80 [17345307.967643] R13: ffff880003110000 R14: 0000000000000000 R15: 0000000000000000 [17345307.967655] FS: 00007fd6682cc740(0000) GS:ffff8801bec00000(0000) knlGS:0000000000000000 [17345307.967662] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17345307.967667] CR2: ffffc90003777a70 CR3: 00000000058f9000 CR4: 0000000000002660 [17345307.967673] Stack: [17345307.967677] ffffffff8176ce52 0000000000000001 ffff8801b39cdc88 ffffffffa00d1013 [17345307.967686] 0000000000000000 ffff880003110000 ffff8800ca7b8000 ffffffffa00e5140 [17345307.967694] ffff8801b39cdca8 ffffffffa00c867e ffff880003110000 ffff8800ca7b8000 [17345307.967703] Call Trace: [17345307.967712] [<ffffffff8176ce52>] ? _raw_spin_lock+0x12/0x50 [17345307.967726] [<ffffffffa00d1013>] __nf_ct_ext_destroy+0x43/0x60 [nf_conntrack] [17345307.967736] [<ffffffffa00c867e>] nf_conntrack_free+0x2e/0x70 [nf_conntrack] [17345307.967746] [<ffffffffa00c946e>] destroy_conntrack+0x9e/0xf0 [nf_conntrack] [17345307.967756] [<ffffffffa00cdc40>] ? nf_conntrack_helper_fini+0x30/0x30 [nf_conntrack] [17345307.967766] [<ffffffff81686617>] nf_conntrack_destroy+0x17/0x20 [17345307.967775] [<ffffffffa00c9358>] nf_ct_iterate_cleanup+0x78/0xb0 [nf_conntrack] [17345307.967786] [<ffffffffa00cdd1d>] nf_ct_l3proto_pernet_unregister+0x1d/0x20 [nf_conntrack] [17345307.967796] [<ffffffffa00e355d>] ipv4_net_exit+0x1d/0x60 [nf_conntrack_ipv4] [17345307.967804] [<ffffffff8164b918>] ops_exit_list.isra.1+0x38/0x60 [17345307.967811] [<ffffffff8164c222>] cleanup_net+0x112/0x230 [17345307.967820] [<ffffffff81085e2f>] process_one_work+0x17f/0x4c0 [17345307.967827] [<ffffffff81086d7b>] worker_thread+0x11b/0x3d0 [17345307.967833] [<ffffffff81086c60>] ? manage_workers.isra.21+0x190/0x190 [17345307.967841] [<ffffffff8108de69>] kthread+0xc9/0xe0 [17345307.967846] [<ffffffff8108dda0>] ? flush_kthread_worker+0xb0/0xb0 [17345307.967854] [<ffffffff8177647c>] ret_from_fork+0x7c/0xb0 [17345307.967860] [<ffffffff8108dda0>] ? flush_kthread_worker+0xb0/0xb0 [17345307.967865] Code: b7 58 12 66 85 db 74 46 0f b7 db 48 01 c3 48 83 7b 10 00 74 39 48 c7 c7 e0 34 14 a0 e8 14 db 62 e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 ba 00 02 20 00 00 00 ad de 48 c7 [17345307.967917] RIP [<ffffffffa013f0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat] [17345307.967925] RSP <ffff8801b39cdc48> [17345307.967928] CR2: ffffc90003777a70 [17345307.967933] ---[ end trace 84d4f3185a40459f ]--- [17345307.967938] Kernel panic - not syncing: Fatal exception in interrupt
This also happens with the (almost) current mainline kernel, from here: http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/ Commit for this build was: ed8c37e158cb697df905d6b4933bc107c69e8936 Traceback (most recent call last): File "/usr/bin/cloud-init", line 618, in <module> sys.exit(main()) File "/usr/bin/cloud-init", line 614, in main get_uptime=True, func=functor, args=(name, args)) File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1875, in log_time ret = func(*args, **kwargs) File "/usr/bin/cloud-init", line 510, in status_wrapper atomic_write_json(status_path, status) File "/usr/bin/cloud-init", line 434, in atomic_write_json raise e OSError: [Errno 2] No such file or directory: '/var/lib/cloud/data/tmpMBxCza' [474208.150506] BUG: unable to handle kernel paging request at ffffc90003661288 [474208.150524] IP: [<ffffffffa013a0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat] [474208.150536] PGD 1b6423067 PUD 1b6424067 PMD 1b255b067 PTE 0 [474208.150544] Oops: 0002 [#1] SMP [474208.150549] Modules linked in: xt_nat veth tcp_diag inet_diag xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat bridge stp llc xt_owner ipt_REJECT xt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables isofs dm_crypt raid10 raid456 async_memcpy async_raid6_recov async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear [474208.150602] CPU: 3 PID: 6 Comm: kworker/u16:0 Not tainted 3.15.0-999-generic #201404300254 [474208.150614] Workqueue: netns cleanup_net [474208.150619] task: ffff8801b39d0000 ti: ffff8801b39c6000 task.ti: ffff8801b39c6000 [474208.150625] RIP: e030:[<ffffffffa013a0b6>] [<ffffffffa013a0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat] [474208.150634] RSP: e02b:ffff8801b39c7c48 EFLAGS: 00010246 [474208.150639] RAX: 0000000000000000 RBX: ffff8801af4e5510 RCX: ffff8801b2040000 [474208.150645] RDX: ffffc90003661288 RSI: 0000000000000200 RDI: ffffffffa013d4e0 [474208.150651] RBP: ffff8801b39c7c58 R08: 00000000f72af2f7 R09: 0000000002eb94ae [474208.150657] R10: ffff8801b2040070 R11: 0000000000000002 R12: ffff8801af4e5480 [474208.150664] R13: ffff8801b2040000 R14: 0000000000000000 R15: 0000000000000000 [474208.150674] FS: 00007fb3072be700(0000) GS:ffff8801becc0000(0000) knlGS:0000000000000000 [474208.150681] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [474208.150686] CR2: ffffc90003661288 CR3: 00000000f54bb000 CR4: 0000000000002660 [474208.150692] Stack: [474208.150695] ffffffff81763762 0000000000000001 ffff8801b39c7c88 ffffffffa00f2f53 [474208.150704] 0000000000000000 ffff8801b2040000 ffff88008c4e0000 ffffffffa00d1140 [474208.150712] ffff8801b39c7ca8 ffffffffa00ea67e ffff8801b2040000 ffff88008c4e0000 [474208.150721] Call Trace: [474208.150729] [<ffffffff81763762>] ? _raw_spin_lock+0x12/0x50 [474208.150744] [<ffffffffa00f2f53>] __nf_ct_ext_destroy+0x43/0x60 [nf_conntrack] [474208.150755] [<ffffffffa00ea67e>] nf_conntrack_free+0x2e/0x70 [nf_conntrack] [474208.150765] [<ffffffffa00eb46e>] destroy_conntrack+0x9e/0xf0 [nf_conntrack] [474208.150775] [<ffffffffa00efbd0>] ? nf_conntrack_helper_fini+0x30/0x30 [nf_conntrack] [474208.150785] [<ffffffff8167f367>] nf_conntrack_destroy+0x17/0x20 [474208.150794] [<ffffffffa00eb358>] nf_ct_iterate_cleanup+0x78/0xb0 [nf_conntrack] [474208.150804] [<ffffffffa00efcad>] nf_ct_l3proto_pernet_unregister+0x1d/0x20 [nf_conntrack] [474208.150814] [<ffffffffa00cf53d>] ipv4_net_exit+0x1d/0x60 [nf_conntrack_ipv4] [474208.150822] [<ffffffff816450b8>] ops_exit_list.isra.1+0x38/0x60 [474208.150828] [<ffffffff816459c2>] cleanup_net+0x112/0x230 [474208.150837] [<ffffffff81087b4f>] process_one_work+0x17f/0x4c0 [474208.150843] [<ffffffff81088a7b>] worker_thread+0x11b/0x3d0 [474208.150850] [<ffffffff81088960>] ? manage_workers.isra.21+0x190/0x190 [474208.150857] [<ffffffff8108fb49>] kthread+0xc9/0xe0 [474208.150863] [<ffffffff8108fa80>] ? flush_kthread_worker+0xb0/0xb0 [474208.150871] [<ffffffff8176cc7c>] ret_from_fork+0x7c/0xb0 [474208.150877] [<ffffffff8108fa80>] ? flush_kthread_worker+0xb0/0xb0 [474208.150882] Code: b7 58 12 66 85 db 74 46 0f b7 db 48 01 c3 48 83 7b 10 00 74 39 48 c7 c7 e0 d4 13 a0 e8 24 94 62 e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 ba 00 02 20 00 00 00 ad de 48 c7 [474208.150936] RIP [<ffffffffa013a0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat] [474208.150945] RSP <ffff8801b39c7c48> [474208.150949] CR2: ffffc90003661288 [474208.150957] ---[ end trace 7ab56606cd4d25a0 ]--- [474208.150961] Kernel panic - not syncing: Fatal exception in interrupt
I came here from a downstream bug report (https://github.com/dotcloud/docker/issues/2960), I would like to help at finding a solution for this since it's biting us hard here. I will try the mitigation proposed by scottm for the time being and subscribe here to try eventual upcoming patches.
Doesn't happen on 3.10.34.
Created attachment 138271 [details] Don't touch nat_bysource hash chain when netns is going away, preventing crash.
I finally had some time to dig into this and I believe I've pinpointed the problem. Basically what's happening is, during netns cleanup, nf_nat_net_exit gets called before ipv4_net_exit. As I understand it, nf_nat_net_exit is supposed to kill any conntrack entries which have NAT context (through nf_ct_iterate_cleanup), but for some reason this doesn't happen (perhaps something else is still holding refs to those entries?). When ipv4_net_exit is called, conntrack entries (including those with NAT context) are cleaned up, but the nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The bug happens when attempting to free a conntrack entry whose NAT hash 'prev' field points to a slot in the freed hash table (head for that bin). I'm not familiar enough with the netfilter internals to propose "the right fix", but I've attached a patch which basically skips hlist_del_rcu for the nat_bysource hash links when the whole netns is going away anyway. After applying this patch we haven't had any more crashes.
I can confirm the patch prevent the crash on a reliable reproducer on ubuntu trusty kernel 3.13.0-24-generic.
(In reply to Rodrigo Sampaio Vaz from comment #14) > I can confirm the patch prevent the crash on a reliable reproducer on ubuntu > trusty kernel 3.13.0-24-generic. Would you mind testing http://patchwork.ozlabs.org/patch/357147/raw/ as well? This should avoid panic as well without altering the nfct destroy callback.
@Florian I have built .deb packages for both patches (yours and previous one: https://github.com/gdm85/tenku/releases) and I am now testing yours; I do not have a clear cut testcase so I will just intensify the bridge activity that has triggered this behaviour on the development server (creating bridged virtual ethernet devices and then quickly destroying them).
(In reply to Florian Westphal from comment #15) > (In reply to Rodrigo Sampaio Vaz from comment #14) > > I can confirm the patch prevent the crash on a reliable reproducer on > ubuntu > > trusty kernel 3.13.0-24-generic. > > Would you mind testing > > http://patchwork.ozlabs.org/patch/357147/raw/ > > as well? This should avoid panic as well without altering > the nfct destroy callback. Yes confirmed, this patch also prevent crashes with the same test case.
(In reply to Florian Westphal from comment #15) > (In reply to Rodrigo Sampaio Vaz from comment #14) > > I can confirm the patch prevent the crash on a reliable reproducer on > ubuntu > > trusty kernel 3.13.0-24-generic. > > Would you mind testing > > http://patchwork.ozlabs.org/patch/357147/raw/ > > as well? This should avoid panic as well without altering > the nfct destroy callback. Good for me too.