Bug 20402

Summary: [Feature request] OOM killer should count in-kernel memory counters
Product: Memory Management Reporter: Коренберг Марк (socketpair)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, rientjes
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: - Subsystem:
Regression: No Bisected commit-id:

Description Коренберг Марк 2010-10-15 06:24:50 UTC
[Feature request] OOM killer should count in-kernel memory counters, such as memory in pipes, sockets, IPC and so on. 

I can easily write one-process application which open 1023 pipes, fill it with 65K bytes = 64MB of kernel memory. 

Next, using unix sockets I can eat more memory by passing these pipe FDs into socketpair() unix socket and closing my copy of pipe descriptor. 

So I can eat many memory under "guest" sesssion and OOM killer will kill something important.

Another (more real) situation happen when server application does not read arrived data from sockets, because something hang.
Comment 1 David Rientjes 2010-10-15 08:35:26 UTC
Hi!

We don't currently have a way to map slab usage back to an individual task, so we can't target the offending task easily without trying to hack counters into all the possible places where you can allocate a substantial amount of slab.

There is work currently on-going specifically for the memory controller that will account for slab and charge it to a hard limit.  That doesn't quite solve the problem, though, unless you've been constrained to your own memory controller if the motivation happens to be malicious.

We've recently seen similar things with excessively long arguments that are passed when executing a program: the memory allocated by the kernel doesn't actually get charged to task being exec'd, so the oom killer ends up killing another task since the memory usage of that new task is always very low at this point.

So there are several examples where this can happen, and yours would be another to add to the list.  I'm interested in seeing your particular example, though, so if you could attach it to this bugzilla entry, I'd appreciate it.

Thanks!
Comment 2 David Rientjes 2011-01-26 10:06:15 UTC
I've been dying over the past three months to see your usecase that allows an innocent task to be killed by the oom killer :)  This bug hasn't been active, so if you have an example I'd be happy to look at it and see if there's any way we can attribute the memory usage to the correct task.  (Slab, as mentioned previously, is going to be difficult outside of a memory controller environment, however.)
Comment 3 Коренберг Марк 2011-02-05 11:39:40 UTC
2011/1/26 <bugzilla-daemon@bugzilla.kernel.org>

> https://bugzilla.kernel.org/show_bug.cgi?id=20402
>
>
> --- Comment #2 from David Rientjes <rientjes@google.com>  2011-01-26
> 10:06:15 ---
> I've been dying over the past three months to see your usecase that allows
> an
> innocent task to be killed by the oom killer :)  This bug hasn't been
> active,
> so if you have an example I'd be happy to look at it and see if there's any
> way
> we can attribute the memory usage to the correct task.  (Slab, as mentioned
> previously, is going to be difficult outside of a memory controller
> environment, however.)
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>

Well. I do not fully undestand which memory can be named slab.

data in unix sockets? in pipes? in network sockets? in posix IPC messages ?


One interesting example: if I mmap() some big area ( maximum available
vmsize set by ulimit) and try to write it as one big block into a file. In
fact, my process became unkillable (goes into uninterruptible sleep) until
kernel writes completely that block into a file. During that time my
memory-mapped area is the property of the malicious process, and it will eat
RSS more and more. But such process can not be killed until it completes
writing. Also, I can mlock() that pages to make situation worse.

(http://linux-mm.org/OOM_Killer)

Current OOM killer does not detect processes with memory leaks. For example,
If simple database server has memory leak, it will have big piece of
ANONYMOUS memory in the swap, but relatively small count of pages in RSS.
Suppose, another process eat some amount of memory. Now, OOM killer should
do it's action. It will detect that process that eat maximum of rss memory
is the second process and will kill it. But more appropriate decision will
be to kill database server. It's logically valid: OOM appear when bot RSS
and ANONYMOUS MEMORY SWAP exhausted.

So, OOM killer should count not only RSS memory, but

BADNESS = LOCKED PAGES * 10 + UNLOCKED ANONYMOUS RSS * 5 + ANONYMOUS SWAP
(coefficients 10 and 5 are random, but it mean what i want to implement)

for example:
* NTP server will lock itself with mlockall():  ~5MB * 10 = 50 points.
* leaky database server: 50MB*5 + 1000MB  = 1250 points
* innocent application  2*10 + 50*5 + 10 = 280 points
* firefox:
* JVM:
* Virtualbox:

Also, why not to kill processes sorted by SUM (LOCKED NON-ANONYMOUS  +
 ANONYMOUS ) ???

Also, why not to activate KSM (kernel samepage merging) on memory which is
not marked as available for that operation, intead of killing processes ??
KSM, especially for zero-filled pages may recover some memory at small cost.
In our situation (linux with big bunch of chroots), KSM for same pages in
"unshared" shared libraries will help very strongly.