Bug 197127

Summary: Unable to find which specific Python process was killed by OOM killer
Product: Memory Management Reporter: jacohn1
Component: OtherAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.10.0-327 Subsystem:
Regression: No Bisected commit-id:

Description jacohn1 2017-10-04 17:36:39 UTC
If you have multiple python processes (daemons) running, if one of them exceeds 
memory constraints and OOM killer kills the process, its difficult to figure out 
Python process was the culprit.  

In /var/log/messages you would see this:

Oct  2 15:04:18 169 kernel: python invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Oct  2 15:04:18 169 kernel: python cpuset=/ mems_allowed=0
Oct  2 15:04:18 169 kernel: CPU: 3 PID: 9631 Comm: python Tainted: G        W  OE  ------------   3.10.0-327.el7.x86_64 #1
Oct  2 15:04:18 169 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
...
Oct  2 15:04:18 169 kernel: ffff8804b886b980
Oct  2 15:04:18 169 kernel: 00000000cfdfb78a ffff88049eb439f8 ffffffff816351f1
Oct  2 15:04:18 169 kernel: ffff88049eb43a88 ffffffff81630191 ffff8804b8ab4d00 ffff8804b8ab4d18
Oct  2 15:04:18 169 kernel: 0000000000000206 ffff8804b886b980 ffff88049eb43a70 ffffffff8112882f
Oct  2 15:04:18 169 kernel: Call Trace:
Oct  2 15:04:18 169 kernel: [<ffffffff816351f1>] dump_stack+0x19/0x1b
Oct  2 15:04:18 169 kernel: [<ffffffff81630191>] dump_header+0x8e/0x214
Oct  2 15:04:18 169 kernel: [<ffffffff8112882f>] ? delayacct_end+0x8f/0xb0
Oct  2 15:04:18 169 kernel: [<ffffffff8116cdee>] oom_kill_process+0x24e/0x3b0
Oct  2 15:04:18 169 kernel: [<ffffffff8116c956>] ? find_lock_task_mm+0x56/0xc0
Oct  2 15:04:18 169 kernel: [<ffffffff81088dae>] ? has_capability_noaudit+0x1e/0x30
Oct  2 15:04:18 169 kernel: [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
Oct  2 15:04:18 169 kernel: [<ffffffff811737f5>] __alloc_pages_nodemask+0xa95/0xb90
Oct  2 15:04:18 169 kernel: [<ffffffff811b43f9>] alloc_pages_current+0xa9/0x170
Oct  2 15:04:18 169 kernel: [<ffffffff81168fd7>] __page_cache_alloc+0x97/0xc0
Oct  2 15:04:18 169 kernel: [<ffffffff8116b858>] filemap_fault+0x188/0x430
Oct  2 15:04:18 169 kernel: [<ffffffff81192b2e>] __do_fault+0x7e/0x510
Oct  2 15:04:18 169 kernel: [<ffffffff81197088>] handle_mm_fault+0x5b8/0xf50
Oct  2 15:04:18 169 kernel: [<ffffffff8101c829>] ? read_tsc+0x9/0x10
Oct  2 15:04:18 169 kernel: [<ffffffff810d814c>] ? ktime_get_ts64+0x4c/0xf0
Oct  2 15:04:18 169 kernel: [<ffffffff81640e22>] __do_page_fault+0x152/0x420
Oct  2 15:04:18 169 kernel: [<ffffffff81641113>] do_page_fault+0x23/0x80
Oct  2 15:04:18 169 kernel: [<ffffffff8163d408>] page_fault+0x28
...
Oct  2 15:04:18 169 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
...
Oct  2 15:04:18 169 kernel: [ 7747] 0 7748 3226957 2875051 5692 0 0 python
...
Oct  2 15:04:18 169 kernel: Killed process 7747 (python) total-vm:12907828kB, anon-rss:11500204kB, file-rss:0kB




Notice it just says the process being killed is "python".  

Ideally a log file for the process would have the PID somewhere.  But suppose the logs wrapped, or 
suppose the process doesn't log the PID anywhere.  It would be nice if the process table could print
the full command.  

For example:

Oct  2 15:04:18 169 kernel: [ 7747] 0 7748 3226957 2875051 5692 0 0 /usr/bin/python /usr/lib/python2.7/site-packages/foo.pyc