oom-killer gets kicked despite vm.overcommit_memory=1 and lots of swap-space (16GB+64GB swap LVs). I've switched from the 4.4 to current 4.7 tree (currently kernel-vanilla-4.7.3-1.1 on openSUSE/Tumbleweed) which triggered lots of oom-killer calls (every 2-3 minutes). Adding 64GB of swap and setting vm.overcommit_memory=1 somewhat mitigated, but did not resolve the issue. Please note that the machine workload did not change and I cannot recall any oom-killer invocation running 4.4 or earlier. From the description, this may be related or even be a duplicate of #121181. Any insights from the dmesg output below why the oom-killer was trigged? If you need additional information, please let me know! Walter dmesg output of last oom-killer run: Linux version 4.7.3-1-vanilla (geeko@buildhost) (gcc version 6.2.1 20160830 [gcc-6-branch revision 239856] (SUSE Linux) ) #1 SMP PREEMPT Thu Sep 8 13:08:03 UTC 2016 (7ad9c1d) --cut-- kthreadd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 kthreadd cpuset=/ mems_allowed=0 CPU: 1 PID: 2 Comm: kthreadd Not tainted 4.7.3-1-vanilla #1 Hardware name: System manufacturer System Product Name/M4N78 PRO, BIOS 1303 04/13/2011 0000000000000000 ffff88017ee23ad0 ffffffff813af542 ffff88017ee23c98 ffff8801b40a8240 ffff88017ee23b40 ffffffff8121bd1f ffff88017ee23b38 ffffffff8104f7bb 000000020000003d ffff88022f8606f8 0000000200000001 Call Trace: [<ffffffff813af542>] dump_stack+0x63/0x81 [<ffffffff8121bd1f>] dump_header+0x5f/0x1f6 [<ffffffff8104f7bb>] ? native_send_call_func_ipi+0xbb/0xd0 [<ffffffff8119d1f1>] oom_kill_process+0x221/0x420 [<ffffffff8119d83b>] out_of_memory+0x3fb/0x4c0 [<ffffffff811a2b53>] __alloc_pages_slowpath+0xb83/0xbe0 [<ffffffff811a2dc8>] __alloc_pages_nodemask+0x218/0x260 [<ffffffff811a31ab>] alloc_kmem_pages_node+0x4b/0xc0 [<ffffffff8107f76c>] copy_process.part.32+0x10c/0x1920 [<ffffffff81059b71>] ? flat_send_IPI_mask+0x11/0x20 [<ffffffff81053d79>] ? default_send_IPI_single+0x39/0x40 [<ffffffff8104f7fb>] ? native_smp_send_reschedule+0x2b/0x40 [<ffffffff810aa49a>] ? resched_curr+0x6a/0xe0 [<ffffffff810a1590>] ? kthread_worker_fn+0x170/0x170 [<ffffffff810acf3b>] ? wake_up_new_task+0xfb/0x1a0 [<ffffffff8108114a>] _do_fork+0xca/0x3b0 [<ffffffff81081459>] kernel_thread+0x29/0x30 [<ffffffff810a1e0e>] kthreadd+0x1ce/0x210 [<ffffffff816f2f1f>] ret_from_fork+0x1f/0x40 [<ffffffff810a1c40>] ? kthread_create_on_cpu+0x60/0x60 Mem-Info: active_anon:274288 inactive_anon:290893 isolated_anon:0 active_file:372440 inactive_file:435157 isolated_file:0 unevictable:1939 dirty:1661 writeback:208 unstable:0 slab_reclaimable:364287 slab_unreclaimable:40791 mapped:21908 shmem:3404 pagetables:10738 bounce:0 free:96882 free_pcp:28 free_cma:0 Node 0 DMA free:15888kB min:512kB low:640kB high:768kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15972kB managed:15888kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 3170 7901 7901 7901 Node 0 DMA32 free:166640kB min:104988kB low:131232kB high:157476kB active_anon:446888kB inactive_anon:475248kB active_file:567988kB inactive_file:675340kB unevictable:5380kB isolated(anon):0kB isolated(file):0kB present:3325504kB managed:3246848kB mlocked:5380kB dirty:1436kB writeback:360kB mapped:29876kB shmem:4724kB slab_reclaimable:589176kB slab_unreclaimable:65532kB kernel_stack:8176kB pagetables:14260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:140 all_unreclaimable? no lowmem_reserve[]: 0 0 4730 4730 4730 Node 0 Normal free:205000kB min:156640kB low:195800kB high:234960kB active_anon:650264kB inactive_anon:688324kB active_file:921772kB inactive_file:1065288kB unevictable:2376kB isolated(anon):0kB isolated(file):0kB present:4980736kB managed:4844276kB mlocked:2376kB dirty:5208kB writeback:472kB mapped:57756kB shmem:8892kB slab_reclaimable:867972kB slab_unreclaimable:97632kB kernel_stack:23968kB pagetables:28692kB unstable:0kB bounce:0kB free_pcp:108kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 0 Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB Node 0 DMA32: 39789*4kB (UME) 948*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 166740kB Node 0 Normal: 48824*4kB (UE) 1232*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 205152kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 820486 total pagecache pages 7924 pages in swap cache Swap cache stats: add 347959, delete 340035, find 247106/274364 Free swap = 93378124kB Total swap = 94371828kB 2080553 pages RAM 0 pages HighMem/MovableOnly 53800 pages reserved 0 pages hwpoisoned [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name --cut-- Out of memory: Kill process 21809 (rdiff-backup) score 15 or sacrifice child Killed process 21809 (rdiff-backup) total-vm:1733100kB, anon-rss:1560032kB, file-rss:6696kB, shmem-rss:0kB oom_reaper: reaped process 21809 (rdiff-backup), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Created attachment 239431 [details] monitoring graph: memory usage of the last days Additional note: with 4.4 and before, the system happily ran with 8GB RAM and 10GB swap-space (usually <10% in use).
fyi: I've switched from 4.4 to the 4.8 tree now (running 4.8.10 for a couple of days) and there haven't been any oom invocations yet. The machine workload hasn't changed so it seems the bug is fixed in 4.8 by now and/or the kernel adheres to the vm.overcommit_memory=1 setting. I'll fall back to vm.overcommit_memory=0 and see if this triggers oom.