Bug 172721

Summary: oom-killer invoked despite lots of swap and vm.overcommit_memory=1
Product: Memory Management Reporter: Walter Haidinger (walter.haidinger)
Component: OtherAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.7.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: monitoring graph: memory usage of the last days

Description Walter Haidinger 2016-09-22 16:10:19 UTC
oom-killer gets kicked despite vm.overcommit_memory=1 and lots of swap-space (16GB+64GB swap LVs).

I've switched from the 4.4 to current 4.7 tree (currently kernel-vanilla-4.7.3-1.1 on openSUSE/Tumbleweed) which triggered lots of oom-killer calls (every 2-3 minutes). Adding 64GB of swap and setting vm.overcommit_memory=1 somewhat mitigated, but did not resolve the issue.

Please note that the machine workload did not change and I cannot recall any oom-killer invocation running 4.4 or earlier.

From the description, this may be related or even be a duplicate of #121181.

Any insights from the dmesg output below why the oom-killer was trigged?

If you need additional information, please let me know!

Walter

dmesg output of last oom-killer run:

Linux version 4.7.3-1-vanilla (geeko@buildhost) (gcc version 6.2.1 20160830 [gcc-6-branch revision 239856] (SUSE Linux) ) #1 SMP PREEMPT Thu Sep 8 13:08:03 UTC 2016 (7ad9c1d)
--cut--
kthreadd invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0
kthreadd cpuset=/ mems_allowed=0
CPU: 1 PID: 2 Comm: kthreadd Not tainted 4.7.3-1-vanilla #1
Hardware name: System manufacturer System Product Name/M4N78 PRO, BIOS 1303    04/13/2011
 0000000000000000 ffff88017ee23ad0 ffffffff813af542 ffff88017ee23c98
 ffff8801b40a8240 ffff88017ee23b40 ffffffff8121bd1f ffff88017ee23b38
 ffffffff8104f7bb 000000020000003d ffff88022f8606f8 0000000200000001
Call Trace:
 [<ffffffff813af542>] dump_stack+0x63/0x81
 [<ffffffff8121bd1f>] dump_header+0x5f/0x1f6
 [<ffffffff8104f7bb>] ? native_send_call_func_ipi+0xbb/0xd0
 [<ffffffff8119d1f1>] oom_kill_process+0x221/0x420
 [<ffffffff8119d83b>] out_of_memory+0x3fb/0x4c0
 [<ffffffff811a2b53>] __alloc_pages_slowpath+0xb83/0xbe0
 [<ffffffff811a2dc8>] __alloc_pages_nodemask+0x218/0x260
 [<ffffffff811a31ab>] alloc_kmem_pages_node+0x4b/0xc0
 [<ffffffff8107f76c>] copy_process.part.32+0x10c/0x1920
 [<ffffffff81059b71>] ? flat_send_IPI_mask+0x11/0x20
 [<ffffffff81053d79>] ? default_send_IPI_single+0x39/0x40
 [<ffffffff8104f7fb>] ? native_smp_send_reschedule+0x2b/0x40
 [<ffffffff810aa49a>] ? resched_curr+0x6a/0xe0
 [<ffffffff810a1590>] ? kthread_worker_fn+0x170/0x170
 [<ffffffff810acf3b>] ? wake_up_new_task+0xfb/0x1a0
 [<ffffffff8108114a>] _do_fork+0xca/0x3b0
 [<ffffffff81081459>] kernel_thread+0x29/0x30
 [<ffffffff810a1e0e>] kthreadd+0x1ce/0x210
 [<ffffffff816f2f1f>] ret_from_fork+0x1f/0x40
 [<ffffffff810a1c40>] ? kthread_create_on_cpu+0x60/0x60
Mem-Info:
active_anon:274288 inactive_anon:290893 isolated_anon:0
 active_file:372440 inactive_file:435157 isolated_file:0
 unevictable:1939 dirty:1661 writeback:208 unstable:0
 slab_reclaimable:364287 slab_unreclaimable:40791
 mapped:21908 shmem:3404 pagetables:10738 bounce:0
 free:96882 free_pcp:28 free_cma:0
Node 0 DMA free:15888kB min:512kB low:640kB high:768kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15972kB managed:15888kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3170 7901 7901 7901
Node 0 DMA32 free:166640kB min:104988kB low:131232kB high:157476kB active_anon:446888kB inactive_anon:475248kB active_file:567988kB inactive_file:675340kB unevictable:5380kB isolated(anon):0kB isolated(file):0kB present:3325504kB managed:3246848kB mlocked:5380kB dirty:1436kB writeback:360kB mapped:29876kB shmem:4724kB slab_reclaimable:589176kB slab_unreclaimable:65532kB kernel_stack:8176kB pagetables:14260kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:140 all_unreclaimable? no
lowmem_reserve[]: 0 0 4730 4730 4730
Node 0 Normal free:205000kB min:156640kB low:195800kB high:234960kB active_anon:650264kB inactive_anon:688324kB active_file:921772kB inactive_file:1065288kB unevictable:2376kB isolated(anon):0kB isolated(file):0kB present:4980736kB managed:4844276kB mlocked:2376kB dirty:5208kB writeback:472kB mapped:57756kB shmem:8892kB slab_reclaimable:867972kB slab_unreclaimable:97632kB kernel_stack:23968kB pagetables:28692kB unstable:0kB bounce:0kB free_pcp:108kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
Node 0 DMA32: 39789*4kB (UME) 948*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 166740kB
Node 0 Normal: 48824*4kB (UE) 1232*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 205152kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
820486 total pagecache pages
7924 pages in swap cache
Swap cache stats: add 347959, delete 340035, find 247106/274364
Free swap  = 93378124kB
Total swap = 94371828kB
2080553 pages RAM
0 pages HighMem/MovableOnly
53800 pages reserved
0 pages hwpoisoned
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
--cut--
Out of memory: Kill process 21809 (rdiff-backup) score 15 or sacrifice child
Killed process 21809 (rdiff-backup) total-vm:1733100kB, anon-rss:1560032kB, file-rss:6696kB, shmem-rss:0kB
oom_reaper: reaped process 21809 (rdiff-backup), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Comment 1 Walter Haidinger 2016-09-22 16:20:55 UTC
Created attachment 239431 [details]
monitoring graph: memory usage of the last days

Additional note: with 4.4 and before, the system happily ran with 8GB RAM and 10GB swap-space (usually <10% in use).
Comment 2 Walter Haidinger 2016-11-30 12:09:20 UTC
fyi: I've switched from 4.4 to the 4.8 tree now (running 4.8.10 for a couple of days) and there haven't been any oom invocations yet.

The machine workload hasn't changed so it seems the bug is fixed in 4.8 by now and/or the kernel adheres to the vm.overcommit_memory=1 setting.

I'll fall back to vm.overcommit_memory=0 and see if this triggers oom.