Bug 65191 - BUG in nf_nat_cleanup_conntrack
Summary: BUG in nf_nat_cleanup_conntrack
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Netfilter/Iptables (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: networking_netfilter-iptables@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-19 14:59 UTC by Samu Kallio
Modified: 2016-02-15 20:34 UTC (History)
9 users (show)

See Also:
Kernel Version: 3.10.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Don't touch nat_bysource hash chain when netns is going away, preventing crash. (1.25 KB, patch)
2014-06-05 17:54 UTC, Samu Kallio
Details | Diff

Description Samu Kallio 2013-11-19 14:59:40 UTC
Network namespace cleanup sometimes crashes while cleaning up NAT conntrack entries. In my case, this has been happening intermittently on an Amazon EC2 instance running several LXC containers each with their own network namespace. Traffic is redirected to/from the containers using iptables NAT port mapping. When a container is being shut down, I simultaneously issue iptables commands to remove the NAT port mappings for that container.

I have seen this crash happen a few times now, every time with the same call trace. 

[9661983.179708] BUG: unable to handle kernel paging request at ffffc90001821718
[9661983.179725] IP: [<ffffffffa02cf36d>] nf_nat_cleanup_conntrack+0x3d/0x80 [nf_nat]
[9661983.179740] PGD ea41e067 PUD ea41f067 PMD 69279067 PTE 0
[9661983.179750] Oops: 0002 [#1] SMP 
[9661983.179757] Modules linked in: fuse veth xt_nat xt_comment overlayfs sit tunnel4 ip_tunnel ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack coretemp nf_conntrack btrfs crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel xor raid6_pq bridge aes_x86_64 lrw crc32c gf128mul libcrc32c stp glue_helper zlib_deflate llc ablk_helper cryptd microcode xen_netfront iptable_filter ip_tables x_tables ext4 crc16 mbcache jbd2 xen_blkfront
[9661983.179834] CPU: 0 PID: 19984 Comm: kworker/u2:2 Not tainted 3.10.17-1-ec2 #1
[9661983.179848] Workqueue: netns cleanup_net
[9661983.179854] task: ffff88003eee5d00 ti: ffff8800b1a08000 task.ti: ffff8800b1a08000
[9661983.179861] RIP: e030:[<ffffffffa02cf36d>]  [<ffffffffa02cf36d>] nf_nat_cleanup_conntrack+0x3d/0x80 [nf_nat]
[9661983.179872] RSP: e02b:ffff8800b1a09cc8  EFLAGS: 00010246
[9661983.179877] RAX: 0000000000000000 RBX: ffff880047cd9f88 RCX: 0000000000000000
[9661983.179883] RDX: ffffc90001821718 RSI: 0000000000000006 RDI: ffffffffa02d1d38
[9661983.179889] RBP: ffff8800b1a09cd0 R08: 0000000000000000 R09: ffff8800ef616e60
[9661983.179896] R10: ffffea000150b9c0 R11: ffffffff8115db46 R12: ffff880047cd9f00
[9661983.179902] R13: ffff88004d1af000 R14: ffff88004d1af008 R15: ffff88001dd01000
[9661983.179914] FS:  00007f3435e80700(0000) GS:ffff8800ef600000(0000) knlGS:0000000000000000
[9661983.179921] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[9661983.179927] CR2: ffffc90001821718 CR3: 00000000c11aa000 CR4: 0000000000002660
[9661983.179933] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[9661983.179940] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[9661983.179945] Stack:
[9661983.179949]  0000000000000001 ffff8800b1a09cf8 ffffffffa02970f4 ffff88004d1af000
[9661983.179960]  ffff88001dd01000 ffffffffa02c5460 ffff8800b1a09d18 ffffffffa028f625
[9661983.179970]  ffff88004d1af000 ffff88001dd01000 ffff8800b1a09d38 ffffffffa028f711
[9661983.179980] Call Trace:
[9661983.179993]  [<ffffffffa02970f4>] __nf_ct_ext_destroy+0x44/0x60 [nf_conntrack]
[9661983.180005]  [<ffffffffa028f625>] nf_conntrack_free+0x25/0x60 [nf_conntrack]
[9661983.180015]  [<ffffffffa028f711>] destroy_conntrack+0xb1/0xc0 [nf_conntrack]
[9661983.180027]  [<ffffffffa02941a0>] ? nf_conntrack_helper_fini+0x30/0x30 [nf_conntrack]
[9661983.180037]  [<ffffffff813d0cae>] nf_conntrack_destroy+0x1e/0x20
[9661983.180047]  [<ffffffffa028f4ca>] nf_ct_iterate_cleanup+0x5a/0x160 [nf_conntrack]
[9661983.180060]  [<ffffffffa0294568>] nf_ct_l3proto_pernet_unregister+0x18/0x20 [nf_conntrack]
[9661983.180069]  [<ffffffffa02c4499>] ipv4_net_exit+0x19/0x50 [nf_conntrack_ipv4]
[9661983.180079]  [<ffffffff8139be6d>] ops_exit_list.isra.4+0x4d/0x70
[9661983.180086]  [<ffffffff8139c698>] cleanup_net+0x148/0x1e0
[9661983.180098]  [<ffffffff8107178d>] process_one_work+0x22d/0x3e0
[9661983.180106]  [<ffffffff81072786>] worker_thread+0x226/0x3b0
[9661983.180114]  [<ffffffff81072560>] ? manage_workers.isra.18+0x350/0x350
[9661983.180125]  [<ffffffff810783bb>] kthread+0xbb/0xd0
[9661983.180133]  [<ffffffff81078300>] ? kthread_stop+0x100/0x100
[9661983.180142]  [<ffffffff814a1c2c>] ret_from_fork+0x7c/0xb0
[9661983.180149]  [<ffffffff81078300>] ? kthread_stop+0x100/0x100
[9661983.180154] Code: 48 89 e5 53 0f b6 58 11 84 db 75 4a eb 50 48 83 7b 20 00 74 49 48 c7 c7 38 1d 2d a0 e8 ad a0 1c e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 02 20 00 00 00 ad de 48 c7 
[9661983.180231] RIP  [<ffffffffa02cf36d>] nf_nat_cleanup_conntrack+0x3d/0x80 [nf_nat]
[9661983.180240]  RSP <ffff8800b1a09cc8>
[9661983.180244] CR2: ffffc90001821718
[9661983.180253] ---[ end trace d8d85302e25b48d2 ]---
[9661983.180259] Kernel panic - not syncing: Fatal exception in interrupt
Comment 1 Alan 2013-11-26 22:01:49 UTC
Can you test 3.12 on that setup and see if it is then stable ?
Comment 2 Samu Kallio 2013-11-27 06:06:22 UTC
I'll try to get a reliable test case going in the next few days. This issue is occurring on production servers, so I'm reluctant to upgrade away from long term stable there.
Comment 3 Wallace Wadge 2014-02-01 16:54:21 UTC
I think I'm getting the same problem (this in an LXC environment) on a development server so feel free to ask for more info. 

This is on a 3.13.1 kernel and triggers on UP as well as SMP fairly consistently.

      KERNEL: /home/wwadge/linux-3.13.1/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 1
        DATE: Sat Feb  1 14:34:00 2014
      UPTIME: 21:15:10
LOAD AVERAGE: 0.10, 0.09, 0.06
       TASKS: 498
    NODENAME: cds-dws-centos
     RELEASE: 3.13.1
     VERSION: #3 SMP Fri Jan 31 16:51:22 GMT 2014
     MACHINE: x86_64  (2660 Mhz)
      MEMORY: 2 GB
       PANIC: 
WARNING: log buf data structure(s) have changed
""
         PID: 11536
     COMMAND: "kworker/u2:1"
        TASK: ffff880037de12d0  [THREAD_INFO: ffff880000048000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 11536  TASK: ffff880037de12d0  CPU: 0   COMMAND: "kworker/u2:1"
 #0 [ffff880000049850] machine_kexec at ffffffff8104ca40
 #1 [ffff8800000498c0] crash_kexec at ffffffff810e26e8
 #2 [ffff880000049990] oops_end at ffffffff81618258
 #3 [ffff8800000499c0] no_context at ffffffff81056a1e
 #4 [ffff880000049a10] __bad_area_nosemaphore at ffffffff81056c1d
 #5 [ffff880000049a60] bad_area_nosemaphore at ffffffff81056d33
 #6 [ffff880000049a70] __do_page_fault at ffffffff8161b046
 #7 [ffff880000049b90] do_page_fault at ffffffff8161b26e
 #8 [ffff880000049ba0] page_fault at ffffffff81617688
    [exception RIP: nf_nat_cleanup_conntrack+70]
    RIP: ffffffffa0331236  RSP: ffff880000049c58  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff8800632ebe08  RCX: ffff8800566e2c18
    RDX: ffffc90011b9e288  RSI: 0000000000000006  RDI: ffffffffa03343e8
    RBP: ffff880000049c68   R8: dead000000200200   R9: dead000000200200
    R10: dead000000200200  R11: ffffffc000000030  R12: ffff8800632ebd81
    R13: ffff8800566e2c18  R14: ffff880000049d14  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #9 [ffff880000049c70] __nf_ct_ext_destroy at ffffffffa023a2b5 [nf_conntrack]
#10 [ffff880000049ca0] nf_conntrack_free at ffffffffa0231fdf [nf_conntrack]
#11 [ffff880000049cc0] destroy_conntrack at ffffffffa0232ee2 [nf_conntrack]
#12 [ffff880000049ce0] nf_conntrack_destroy at ffffffff8157d787
#13 [ffff880000049cf0] nf_ct_iterate_cleanup at ffffffffa02323f8 [nf_conntrack]
#14 [ffff880000049d50] nf_ct_l3proto_pernet_unregister at ffffffffa02374c9 [nf_conntrack]
#15 [ffff880000049d70] ipv4_net_exit at ffffffffa0250e2d [nf_conntrack_ipv4]
#16 [ffff880000049d90] ops_exit_list at ffffffff81546269
#17 [ffff880000049dc0] cleanup_net at ffffffff81546843
#18 [ffff880000049e00] process_one_work at ffffffff8108468c
#19 [ffff880000049e50] worker_thread at ffffffff81085ae3
#20 [ffff880000049ec0] kthread at ffffffff8108b73e
#21 [ffff880000049f50] ret_from_fork at ffffffff8161f9fc

----
crash> net
   NET_DEVICE     NAME   IP ADDRESS(ES)
ffff88007ca9a000  lo     127.0.0.1
ffff88007b746000  eth0   10.20.70.9
ffff880075730000  virbr0 192.168.122.1
ffff88007c0dc000  virbr0-nic 
ffff880075760000  docker0 172.17.42.1
ffff88006e5ea000  vethQBjNiA 
ffff88006e42b000  vethKx1UUw 
ffff88003ae94000  vethPPPQ5A 
ffff88006334e000  veth9Lc57z 
ffff8800566c0000  vethdlEGPB 
ffff88003e8d2000  vethoU41sj 
---

I can provide crashlog + .config too if need be.
Comment 4 Patrick McHardy 2014-02-01 18:57:45 UTC
On Sat, Feb 01, 2014 at 04:54:21PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=65191
> 
> Wallace Wadge <wwadge@gmail.com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |wwadge@gmail.com
> 
> --- Comment #3 from Wallace Wadge <wwadge@gmail.com> ---
> I think I'm getting the same problem (this in an LXC environment) on a
> development server so feel free to ask for more info. 
> 
> This is on a 3.13.1 kernel and triggers on UP as well as SMP fairly
> consistently.

Could you turn on CONFIG_NETFILTER_DEBUG and see if the assertion in
nf_nat_cleanup_conntrack() hits please?
Comment 5 Wallace Wadge 2014-02-01 21:22:57 UTC
Enabled but don't see asserts. I upgraded my crash utility so I got a little more info:


---
[ 1515.714065] BUG: unable to handle kernel paging request at ffffc90011b95fc8
[ 1515.714118] IP: [<ffffffffa033623e>] nf_nat_cleanup_conntrack+0x4e/0x90 [nf_nat]
[ 1515.714167] PGD 7f851067 PUD 7f852067 PMD 74426067 PTE 0
[ 1515.714208] Oops: 0002 [#1] SMP 
[ 1515.714240] Modules linked in: veth xt_nat xt_addrtype xt_conntrack ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat ipt_REJECT xt_CHECKSUM iptable_mangle tun bridge stp llc dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio libcrc32c ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables ip6_tables ppdev vmw_balloon pcspkr parport_pc parport e1000 sg vmw_vmci i2c_piix4 shpchp ext4 jbd2 mbcache sd_mod crc_t10dif crct10dif_common sr_mod cdrom mptspi mptscsih mptbase scsi_transport_spi pata_acpi ata_generic ata_piix floppy vmwgfx ttm dm_mirror dm_region_hash dm_log dm_mod
[ 1515.714656] CPU: 0 PID: 10628 Comm: kworker/u2:4 Not tainted 3.13.1 #4
[ 1515.714698] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012
[ 1515.714774] Workqueue: netns cleanup_net
[ 1515.714806] task: ffff88001c4ea610 ti: ffff88001c4f0000 task.ti: ffff88001c4f0000
[ 1515.714855] RIP: 0010:[<ffffffffa033623e>]  [<ffffffffa033623e>] nf_nat_cleanup_conntrack+0x4e/0x90 [nf_nat]
[ 1515.714897] RSP: 0000:ffff88001c4f1c58  EFLAGS: 00010246
[ 1515.714917] RAX: 0000000000000000 RBX: ffff88003af1eb88 RCX: ffff880062792c18
[ 1515.714940] RDX: ffffc90011b95fc8 RSI: dead000000200200 RDI: ffffffffa03393e8
[ 1515.714962] RBP: ffff88001c4f1c68 R08: 0000000000000000 R09: dead000000200200
[ 1515.714986] R10: dead000000200200 R11: ffffffc000000030 R12: ffff88003af1eb01
[ 1515.715009] R13: ffff880062792c18 R14: ffff88001c4f1d14 R15: 0000000000000000
[ 1515.715033] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 1515.715060] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1515.715078] CR2: ffffc90011b95fc8 CR3: 000000001df89000 CR4: 00000000000007f0
[ 1515.715173] Stack:
[ 1515.715178]  ffff88001c4f1c78 ffffffffa0246c28 ffff88001c4f1c98 ffffffffa023f545
[ 1515.715208]  ffff88001c4f1ce8 ffff880062792c18 ffff88000065e400 ffffffffa02583a0
[ 1515.715238]  ffff88001c4f1cb8 ffffffffa023709f ffff880062792c18 ffff88000065e400
[ 1515.715266] Call Trace:
[ 1515.715983]  [<ffffffffa023f545>] __nf_ct_ext_destroy+0x45/0x70 [nf_conntrack]
[ 1515.716664]  [<ffffffffa023709f>] nf_conntrack_free+0x2f/0x70 [nf_conntrack]
[ 1515.717345]  [<ffffffffa02384ea>] destroy_conntrack+0xba/0x110 [nf_conntrack]
[ 1515.718041]  [<ffffffffa023c1e0>] ? nf_conntrack_helper_unregister+0xc0/0xc0 [nf_conntrack]
[ 1515.718726]  [<ffffffff8157d6c7>] nf_conntrack_destroy+0x17/0x30
[ 1515.719405]  [<ffffffffa0238208>] nf_ct_iterate_cleanup+0x78/0xb0 [nf_conntrack]
[ 1515.720087]  [<ffffffffa023c759>] nf_ct_l3proto_pernet_unregister+0x39/0x80 [nf_conntrack]
[ 1515.720765]  [<ffffffffa0255ecd>] ipv4_net_exit+0x1d/0x60 [nf_conntrack_ipv4]
[ 1515.721437]  [<ffffffff81546269>] ops_exit_list+0x39/0x60
[ 1515.722104]  [<ffffffff81546843>] cleanup_net+0x103/0x1b0
[ 1515.722752]  [<ffffffff8108468c>] process_one_work+0x17c/0x420
[ 1515.723385]  [<ffffffff81085ae3>] worker_thread+0x123/0x400
[ 1515.724000]  [<ffffffff810859c0>] ? manage_workers+0x170/0x170
[ 1515.724600]  [<ffffffff8108b73e>] kthread+0xce/0xf0
[ 1515.725180]  [<ffffffff8108b670>] ? kthread_freezable_should_stop+0x70/0x70
[ 1515.725752]  [<ffffffff8161fa7c>] ret_from_fork+0x7c/0xb0
[ 1515.726306]  [<ffffffff8108b670>] ? kthread_freezable_should_stop+0x70/0x70
[ 1515.726860] Code: b6 db 48 8d 1c 18 48 8b 43 10 48 85 c0 74 3f 80 78 78 00 79 40 48 c7 c7 e8 93 33 a0 e8 1c 0d 2e e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b9 00 02 20 00 00 00 ad de 48 c7 
[ 1515.728048] RIP  [<ffffffffa033623e>] nf_nat_cleanup_conntrack+0x4e/0x90 [nf_nat]
[ 1515.728622]  RSP <ffff88001c4f1c58>
[ 1515.729181] CR2: ffffc90011b95fc8
crash>    
---
Comment 6 Wallace Wadge 2014-02-01 22:03:20 UTC
And not sure if I got the address right in this debug attempt but here's some more info:
---
crash> bt
PID: 10628  TASK: ffff88001c4ea610  CPU: 0   COMMAND: "kworker/u2:4"
 #0 [ffff88001c4f1850] machine_kexec at ffffffff8104ca40
 #1 [ffff88001c4f18c0] crash_kexec at ffffffff810e26e8
 #2 [ffff88001c4f1990] oops_end at ffffffff816182d8
 #3 [ffff88001c4f19c0] no_context at ffffffff81056a1e
 #4 [ffff88001c4f1a10] __bad_area_nosemaphore at ffffffff81056c1d
 #5 [ffff88001c4f1a60] bad_area_nosemaphore at ffffffff81056d33
 #6 [ffff88001c4f1a70] __do_page_fault at ffffffff8161b0c6
 #7 [ffff88001c4f1b90] do_page_fault at ffffffff8161b2ee
 #8 [ffff88001c4f1ba0] page_fault at ffffffff81617708
    [exception RIP: nf_nat_cleanup_conntrack+78]
    RIP: ffffffffa033623e  RSP: ffff88001c4f1c58  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff88003af1eb88  RCX: ffff880062792c18
    RDX: ffffc90011b95fc8  RSI: dead000000200200  RDI: ffffffffa03393e8
    RBP: ffff88001c4f1c68   R8: 0000000000000000   R9: dead000000200200
    R10: dead000000200200  R11: ffffffc000000030  R12: ffff88003af1eb01
    R13: ffff880062792c18  R14: ffff88001c4f1d14  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #9 [ffff88001c4f1c70] __nf_ct_ext_destroy at ffffffffa023f545 [nf_conntrack]
#10 [ffff88001c4f1ca0] nf_conntrack_free at ffffffffa023709f [nf_conntrack]
#11 [ffff88001c4f1cc0] destroy_conntrack at ffffffffa02384ea [nf_conntrack]
#12 [ffff88001c4f1ce0] nf_conntrack_destroy at ffffffff8157d6c7
#13 [ffff88001c4f1cf0] nf_ct_iterate_cleanup at ffffffffa0238208 [nf_conntrack]
#14 [ffff88001c4f1d50] nf_ct_l3proto_pernet_unregister at ffffffffa023c759 [nf_conntrack]
#15 [ffff88001c4f1d70] ipv4_net_exit at ffffffffa0255ecd [nf_conntrack_ipv4]
#16 [ffff88001c4f1d90] ops_exit_list at ffffffff81546269
#17 [ffff88001c4f1dc0] cleanup_net at ffffffff81546843
#18 [ffff88001c4f1e00] process_one_work at ffffffff8108468c
#19 [ffff88001c4f1e50] worker_thread at ffffffff81085ae3
#20 [ffff88001c4f1ec0] kthread at ffffffff8108b73e
#21 [ffff88001c4f1f50] ret_from_fork at ffffffff8161fa7c
crash> struct nf_conn ffff880062792c18
struct nf_conn {
  ct_general = {
    use = {
      counter = 0
    }
  }, 
  lock = {
    {
      rlock = {
        raw_lock = {
          {
            head_tail = 445651600, 
            tickets = {
              head = 6800, 
              tail = 6800
            }
          }
        }
      }
    }
  }, 
  tuplehash = {{
      hnnode = {
        next = 0x80000003, 
        pprev = 0xdead000000200200
      }, 
      tuple = {
        src = {
          u3 = {
            all = {335548844, 0, 0, 0}, 
            ip = 335548844, 
            ip6 = {335548844, 0, 0, 0}, 
            in = {
              s_addr = 335548844
            }, 
            in6 = {
              in6_u = {
                u6_addr8 = "\254\021\000\024\000\000\000\000\000\000\000\000\000\000\000", 
                u6_addr16 = {4524, 5120, 0, 0, 0, 0, 0, 0}, 
                u6_addr32 = {335548844, 0, 0, 0}
              }
            }
          }, 
          u = {
            all = 15256, 
            tcp = {
              port = 15256
            }, 
            udp = {
              port = 15256
            }, 
            icmp = {
              id = 15256
            }, 
            dccp = {
              port = 15256
            }, 
            sctp = {
              port = 15256
            }, 
            gre = {
              key = 15256
            }
          }, 
          l3num = 2
        }, 
        dst = {
          u3 = {
            all = {3054244874, 0, 0, 0}, 
            ip = 3054244874, 
            ip6 = {3054244874, 0, 0, 0}, 
            in = {
              s_addr = 3054244874
            }, 
            in6 = {
              in6_u = {
                u6_addr8 = "\n\024\f\266\000\000\000\000\000\000\000\000\000\000\000", 
                u6_addr16 = {5130, 46604, 0, 0, 0, 0, 0, 0}, 
                u6_addr32 = {3054244874, 0, 0, 0}
              }
            }
          }, 
          u = {
            all = 37151, 
            tcp = {
              port = 37151
            }, 
            udp = {
              port = 37151
            }, 
            icmp = {
              type = 31 '\037', 
              code = 145 '\221'
            }, 
            dccp = {
              port = 37151
            }, 
            sctp = {
              port = 37151
            }, 
            gre = {
              key = 37151
            }
          }, 
          protonum = 6 '\006', 
          dir = 0 '\000'
        }
      }
    }, {
      hnnode = {
        next = 0x4727, 
        pprev = 0xdead000000200200
      }, 
      tuple = {
        src = {
          u3 = {
            all = {3054244874, 0, 0, 0}, 
            ip = 3054244874, 
            ip6 = {3054244874, 0, 0, 0}, 
            in = {
              s_addr = 3054244874
            }, 
            in6 = {
              in6_u = {
                u6_addr8 = "\n\024\f\266\000\000\000\000\000\000\000\000\000\000\000", 
                u6_addr16 = {5130, 46604, 0, 0, 0, 0, 0, 0}, 
                u6_addr32 = {3054244874, 0, 0, 0}
              }
            }
          }, 
          u = {
            all = 37151, 
            tcp = {
              port = 37151
            }, 
            udp = {
              port = 37151
            }, 
            icmp = {
              id = 37151
            }, 
            dccp = {
              port = 37151
            }, 
            sctp = {
              port = 37151
            }, 
            gre = {
              key = 37151
            }
          }, 
          l3num = 2
        }, 
        dst = {
          u3 = {
            all = {335548844, 0, 0, 0}, 
            ip = 335548844, 
            ip6 = {335548844, 0, 0, 0}, 
            in = {
              s_addr = 335548844
            }, 
            in6 = {
              in6_u = {
                u6_addr8 = "\254\021\000\024\000\000\000\000\000\000\000\000\000\000\000", 
                u6_addr16 = {4524, 5120, 0, 0, 0, 0, 0, 0}, 
                u6_addr32 = {335548844, 0, 0, 0}
              }
            }
          }, 
          u = {
            all = 15256, 
            tcp = {
              port = 15256
            }, 
            udp = {
              port = 15256
            }, 
            icmp = {
              type = 152 '\230', 
              code = 59 ';'
            }, 
            dccp = {
              port = 15256
            }, 
            sctp = {
              port = 15256
            }, 
            gre = {
              key = 15256
            }
          }, 
          protonum = 6 '\006', 
          dir = 1 '\001'
        }
      }
    }}, 
  status = 910, 
  master = 0x0, 
  timeout = {
    entry = {
      next = 0x0, 
      prev = 0xdead000000200200
    }, 
    expires = 4296302640, 
    base = 0xffffffff81fb4f80 <boot_tvec_bases>, 
    function = 0xffffffffa0238410 <death_by_timeout>, 
    data = 18446612133966326808, 
    slack = -1, 
    start_pid = -1, 
    start_site = 0x0, 
    start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
  }, 
  mark = 0, 
  secmark = 0, 
  ext = 0xffff88003af1eb00, 
  ct_net = 0xffff88000065e400, 
  proto = {
    dccp = {
      role = "\377\327", 
      state = 116 't', 
      last_pkt = 113 'q', 
      last_dir = 255 '\377', 
      handshake_seq = 15620773601466063744
    }, 
    sctp = {
      state = 1903482879, 
      vtag = {1903499519, 3087232}
    }, 
    tcp = {
      seen = {{
          td_end = 1903482879, 
          td_maxend = 1903499519, 
          td_maxwin = 3087232, 
          td_maxack = 3636994772, 
          td_scale = 7 '\a', 
          flags = 39 '\''
        }, {
          td_end = 3636994772, 
          td_maxend = 3640082004, 
          td_maxwin = 16640, 
          td_maxack = 1903482879, 
          td_scale = 7 '\a', 
          flags = 35 '#'
        }}, 
      state = 7 '\a', 
      last_dir = 0 '\000', 
      retrans = 0 '\000', 
      last_index = 3 '\003', 
      last_seq = 1903482879, 
      last_ack = 3636994772, 
      last_end = 1903482879, 
      last_win = 7040, 
      last_wscale = 0 '\000', 
      last_flags = 0 '\000'
    }, 
    gre = {
      stream_timeout = 1903482879, 
      timeout = 1903499519
    }
  }
}
crash>
Comment 7 scottm 2014-04-22 21:49:30 UTC
We've also been seeing this panic on several of our bare metal machines. Unfortunately I'm unable to gather crashdumps on these systems due to and unrelated hardware issue.

Kernel version: 3.13.0-24-generic SMP i686
Distro: Ubuntu trusty

We're seeing this when using a bridge to nat lxc containers with libvirt. The following sysctl tunings seem to have alleviated the symptoms for us, though we're still testing:

sysctl -w net.netfilter.nf_conntrack_max=1048576
sysctl -w net.nf_conntrack_max=1048576
Comment 8 Steve Conklin 2014-04-29 21:03:30 UTC
This has also been reproduced at Heroku with the v3.15-rc2 kernel:

[17345307.967478] BUG: unable to handle kernel paging request at ffffc90003777a70
[17345307.967497] IP: [<ffffffffa013f0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat]
[17345307.967510] PGD 1b6425067 PUD 1b6426067 PMD 1b0aed067 PTE 0
[17345307.967519] Oops: 0002 [#1] SMP 
[17345307.967525] Modules linked in: xt_nat veth tcp_diag inet_diag xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat bridge stp llc xt_owner isofs ipt_REJECT xt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables dm_crypt microcode raid10 raid456 async_memcpy async_raid6_recov async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear
[17345307.967578] CPU: 0 PID: 6 Comm: kworker/u16:0 Not tainted 3.15.0-031500rc2-generic #201404201435
[17345307.967591] Workqueue: netns cleanup_net
[17345307.967596] task: ffff8801b39d0000 ti: ffff8801b39cc000 task.ti: ffff8801b39cc000
[17345307.967601] RIP: e030:[<ffffffffa013f0b6>]  [<ffffffffa013f0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat]
[17345307.967611] RSP: e02b:ffff8801b39cdc48  EFLAGS: 00010246
[17345307.967617] RAX: 0000000000000000 RBX: ffff8801b1cf7b10 RCX: ffff880003110000
[17345307.967624] RDX: ffffc90003777a70 RSI: 0000000000000200 RDI: ffffffffa01434e0
[17345307.967630] RBP: ffff8801b39cdc58 R08: 0000000058690aeb R09: 00000000e834b0f3
[17345307.967636] R10: ffff880003110070 R11: 0000000000000002 R12: ffff8801b1cf7a80
[17345307.967643] R13: ffff880003110000 R14: 0000000000000000 R15: 0000000000000000
[17345307.967655] FS:  00007fd6682cc740(0000) GS:ffff8801bec00000(0000) knlGS:0000000000000000
[17345307.967662] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[17345307.967667] CR2: ffffc90003777a70 CR3: 00000000058f9000 CR4: 0000000000002660
[17345307.967673] Stack:
[17345307.967677]  ffffffff8176ce52 0000000000000001 ffff8801b39cdc88 ffffffffa00d1013
[17345307.967686]  0000000000000000 ffff880003110000 ffff8800ca7b8000 ffffffffa00e5140
[17345307.967694]  ffff8801b39cdca8 ffffffffa00c867e ffff880003110000 ffff8800ca7b8000
[17345307.967703] Call Trace:
[17345307.967712]  [<ffffffff8176ce52>] ? _raw_spin_lock+0x12/0x50
[17345307.967726]  [<ffffffffa00d1013>] __nf_ct_ext_destroy+0x43/0x60 [nf_conntrack]
[17345307.967736]  [<ffffffffa00c867e>] nf_conntrack_free+0x2e/0x70 [nf_conntrack]
[17345307.967746]  [<ffffffffa00c946e>] destroy_conntrack+0x9e/0xf0 [nf_conntrack]
[17345307.967756]  [<ffffffffa00cdc40>] ? nf_conntrack_helper_fini+0x30/0x30 [nf_conntrack]
[17345307.967766]  [<ffffffff81686617>] nf_conntrack_destroy+0x17/0x20
[17345307.967775]  [<ffffffffa00c9358>] nf_ct_iterate_cleanup+0x78/0xb0 [nf_conntrack]
[17345307.967786]  [<ffffffffa00cdd1d>] nf_ct_l3proto_pernet_unregister+0x1d/0x20 [nf_conntrack]
[17345307.967796]  [<ffffffffa00e355d>] ipv4_net_exit+0x1d/0x60 [nf_conntrack_ipv4]
[17345307.967804]  [<ffffffff8164b918>] ops_exit_list.isra.1+0x38/0x60
[17345307.967811]  [<ffffffff8164c222>] cleanup_net+0x112/0x230
[17345307.967820]  [<ffffffff81085e2f>] process_one_work+0x17f/0x4c0
[17345307.967827]  [<ffffffff81086d7b>] worker_thread+0x11b/0x3d0
[17345307.967833]  [<ffffffff81086c60>] ? manage_workers.isra.21+0x190/0x190
[17345307.967841]  [<ffffffff8108de69>] kthread+0xc9/0xe0
[17345307.967846]  [<ffffffff8108dda0>] ? flush_kthread_worker+0xb0/0xb0
[17345307.967854]  [<ffffffff8177647c>] ret_from_fork+0x7c/0xb0
[17345307.967860]  [<ffffffff8108dda0>] ? flush_kthread_worker+0xb0/0xb0
[17345307.967865] Code: b7 58 12 66 85 db 74 46 0f b7 db 48 01 c3 48 83 7b 10 00 74 39 48 c7 c7 e0 34 14 a0 e8 14 db 62 e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 ba 00 02 20 00 00 00 ad de 48 c7 
[17345307.967917] RIP  [<ffffffffa013f0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat]
[17345307.967925]  RSP <ffff8801b39cdc48>
[17345307.967928] CR2: ffffc90003777a70
[17345307.967933] ---[ end trace 84d4f3185a40459f ]---
[17345307.967938] Kernel panic - not syncing: Fatal exception in interrupt
Comment 9 Steve Conklin 2014-04-30 18:38:53 UTC
This also happens with the (almost) current mainline kernel,
from here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/

Commit for this build was:

ed8c37e158cb697df905d6b4933bc107c69e8936

Traceback (most recent call last):
  File "/usr/bin/cloud-init", line 618, in <module>
    sys.exit(main())
  File "/usr/bin/cloud-init", line 614, in main
    get_uptime=True, func=functor, args=(name, args))
  File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1875, in log_time
    ret = func(*args, **kwargs)
  File "/usr/bin/cloud-init", line 510, in status_wrapper
    atomic_write_json(status_path, status)
  File "/usr/bin/cloud-init", line 434, in atomic_write_json
    raise e
OSError: [Errno 2] No such file or directory: '/var/lib/cloud/data/tmpMBxCza'
[474208.150506] BUG: unable to handle kernel paging request at ffffc90003661288
[474208.150524] IP: [<ffffffffa013a0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat]
[474208.150536] PGD 1b6423067 PUD 1b6424067 PMD 1b255b067 PTE 0
[474208.150544] Oops: 0002 [#1] SMP 
[474208.150549] Modules linked in: xt_nat veth tcp_diag inet_diag xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat bridge stp llc xt_owner ipt_REJECT xt_LOG xt_limit nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_conntrack nf_conntrack iptable_filter ip_tables x_tables isofs dm_crypt raid10 raid456 async_memcpy async_raid6_recov async_pq async_xor async_tx xor raid6_pq raid1 raid0 multipath linear
[474208.150602] CPU: 3 PID: 6 Comm: kworker/u16:0 Not tainted 3.15.0-999-generic #201404300254
[474208.150614] Workqueue: netns cleanup_net
[474208.150619] task: ffff8801b39d0000 ti: ffff8801b39c6000 task.ti: ffff8801b39c6000
[474208.150625] RIP: e030:[<ffffffffa013a0b6>]  [<ffffffffa013a0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat]
[474208.150634] RSP: e02b:ffff8801b39c7c48  EFLAGS: 00010246
[474208.150639] RAX: 0000000000000000 RBX: ffff8801af4e5510 RCX: ffff8801b2040000
[474208.150645] RDX: ffffc90003661288 RSI: 0000000000000200 RDI: ffffffffa013d4e0
[474208.150651] RBP: ffff8801b39c7c58 R08: 00000000f72af2f7 R09: 0000000002eb94ae
[474208.150657] R10: ffff8801b2040070 R11: 0000000000000002 R12: ffff8801af4e5480
[474208.150664] R13: ffff8801b2040000 R14: 0000000000000000 R15: 0000000000000000
[474208.150674] FS:  00007fb3072be700(0000) GS:ffff8801becc0000(0000) knlGS:0000000000000000
[474208.150681] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[474208.150686] CR2: ffffc90003661288 CR3: 00000000f54bb000 CR4: 0000000000002660
[474208.150692] Stack:
[474208.150695]  ffffffff81763762 0000000000000001 ffff8801b39c7c88 ffffffffa00f2f53
[474208.150704]  0000000000000000 ffff8801b2040000 ffff88008c4e0000 ffffffffa00d1140
[474208.150712]  ffff8801b39c7ca8 ffffffffa00ea67e ffff8801b2040000 ffff88008c4e0000
[474208.150721] Call Trace:
[474208.150729]  [<ffffffff81763762>] ? _raw_spin_lock+0x12/0x50
[474208.150744]  [<ffffffffa00f2f53>] __nf_ct_ext_destroy+0x43/0x60 [nf_conntrack]
[474208.150755]  [<ffffffffa00ea67e>] nf_conntrack_free+0x2e/0x70 [nf_conntrack]
[474208.150765]  [<ffffffffa00eb46e>] destroy_conntrack+0x9e/0xf0 [nf_conntrack]
[474208.150775]  [<ffffffffa00efbd0>] ? nf_conntrack_helper_fini+0x30/0x30 [nf_conntrack]
[474208.150785]  [<ffffffff8167f367>] nf_conntrack_destroy+0x17/0x20
[474208.150794]  [<ffffffffa00eb358>] nf_ct_iterate_cleanup+0x78/0xb0 [nf_conntrack]
[474208.150804]  [<ffffffffa00efcad>] nf_ct_l3proto_pernet_unregister+0x1d/0x20 [nf_conntrack]
[474208.150814]  [<ffffffffa00cf53d>] ipv4_net_exit+0x1d/0x60 [nf_conntrack_ipv4]
[474208.150822]  [<ffffffff816450b8>] ops_exit_list.isra.1+0x38/0x60
[474208.150828]  [<ffffffff816459c2>] cleanup_net+0x112/0x230
[474208.150837]  [<ffffffff81087b4f>] process_one_work+0x17f/0x4c0
[474208.150843]  [<ffffffff81088a7b>] worker_thread+0x11b/0x3d0
[474208.150850]  [<ffffffff81088960>] ? manage_workers.isra.21+0x190/0x190
[474208.150857]  [<ffffffff8108fb49>] kthread+0xc9/0xe0
[474208.150863]  [<ffffffff8108fa80>] ? flush_kthread_worker+0xb0/0xb0
[474208.150871]  [<ffffffff8176cc7c>] ret_from_fork+0x7c/0xb0
[474208.150877]  [<ffffffff8108fa80>] ? flush_kthread_worker+0xb0/0xb0
[474208.150882] Code: b7 58 12 66 85 db 74 46 0f b7 db 48 01 c3 48 83 7b 10 00 74 39 48 c7 c7 e0 d4 13 a0 e8 24 94 62 e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 ba 00 02 20 00 00 00 ad de 48 c7 
[474208.150936] RIP  [<ffffffffa013a0b6>] nf_nat_cleanup_conntrack+0x46/0x70 [nf_nat]
[474208.150945]  RSP <ffff8801b39c7c48>
[474208.150949] CR2: ffffc90003661288
[474208.150957] ---[ end trace 7ab56606cd4d25a0 ]---
[474208.150961] Kernel panic - not syncing: Fatal exception in interrupt
Comment 10 Daniel 2014-06-03 14:06:50 UTC
I came here from a downstream bug report (https://github.com/dotcloud/docker/issues/2960), I would like to help at finding a solution for this since it's biting us hard here.

I will try the mitigation proposed by scottm for the time being and subscribe here to try eventual upcoming patches.
Comment 11 Wallace Wadge 2014-06-03 14:18:03 UTC
Doesn't happen on 3.10.34.
Comment 12 Samu Kallio 2014-06-05 17:54:21 UTC
Created attachment 138271 [details]
Don't touch nat_bysource hash chain when netns is going away, preventing crash.
Comment 13 Samu Kallio 2014-06-05 17:54:33 UTC
I finally had some time to dig into this and I believe I've pinpointed the problem. Basically what's happening is, during netns cleanup, nf_nat_net_exit gets called before ipv4_net_exit. As I understand it, nf_nat_net_exit is supposed to kill any conntrack entries which have NAT context (through nf_ct_iterate_cleanup), but for some reason this doesn't happen (perhaps something else is still holding refs to those entries?).

When ipv4_net_exit is called, conntrack entries (including those with NAT context) are cleaned up, but the nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The bug happens when attempting to free a conntrack entry whose NAT hash 'prev' field points to a slot in the freed hash table (head for that bin).

I'm not familiar enough with the netfilter internals to propose "the right fix", but I've attached a patch which basically skips hlist_del_rcu for the nat_bysource hash links when the whole netns is going away anyway. After applying this patch we haven't had any more crashes.
Comment 14 Rodrigo Sampaio Vaz 2014-06-10 04:14:18 UTC
I can confirm the patch prevent the crash on a reliable reproducer on ubuntu trusty kernel 3.13.0-24-generic.
Comment 15 Florian Westphal 2014-06-10 09:19:23 UTC
(In reply to Rodrigo Sampaio Vaz from comment #14)
> I can confirm the patch prevent the crash on a reliable reproducer on ubuntu
> trusty kernel 3.13.0-24-generic.

Would you mind testing

http://patchwork.ozlabs.org/patch/357147/raw/

as well?  This should avoid panic as well without altering
the nfct destroy callback.
Comment 16 Daniel 2014-06-10 16:20:08 UTC
@Florian I have built .deb packages for both patches (yours and previous one: https://github.com/gdm85/tenku/releases) and I am now testing yours; I do not have a clear cut testcase so I will just intensify the bridge activity that has triggered this behaviour on the development server (creating bridged virtual ethernet devices and then quickly destroying them).
Comment 17 Rodrigo Sampaio Vaz 2014-06-10 23:45:12 UTC
(In reply to Florian Westphal from comment #15)
> (In reply to Rodrigo Sampaio Vaz from comment #14)
> > I can confirm the patch prevent the crash on a reliable reproducer on
> ubuntu
> > trusty kernel 3.13.0-24-generic.
> 
> Would you mind testing
> 
> http://patchwork.ozlabs.org/patch/357147/raw/
> 
> as well?  This should avoid panic as well without altering
> the nfct destroy callback.

Yes confirmed, this patch also prevent crashes with the same test case.
Comment 18 Daniel 2014-06-12 14:12:14 UTC
(In reply to Florian Westphal from comment #15)
> (In reply to Rodrigo Sampaio Vaz from comment #14)
> > I can confirm the patch prevent the crash on a reliable reproducer on
> ubuntu
> > trusty kernel 3.13.0-24-generic.
> 
> Would you mind testing
> 
> http://patchwork.ozlabs.org/patch/357147/raw/
> 
> as well?  This should avoid panic as well without altering
> the nfct destroy callback.

Good for me too.

Note You need to log in before you can comment on or make changes to this bug.