Bug 57951

Summary: race: mmap removes previous mapping when failing to map with MAP_HUGETLB|MAP_FIXED
Product: Memory Management Reporter: Stefan Karlsson (stefan.karlsson)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: alan, kwapulinski.piotr
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.9.1 Subsystem:
Regression: No Bisected commit-id:
Attachments: Reproducer

Description Stefan Karlsson 2013-05-10 13:57:07 UTC
Created attachment 101071 [details]
Reproducer

If too few huge pages are available when calling mmap with both MAP_HUGETLB and MAP_FIXED, the old mapping at the specified address is removed.

It's not clear to me if this is a bug or just unspecified behavior, but it's a behavior that makes it hard for the HotSpot JVM to use MAP_HUGETLB when dynamically sizing the Java heap.

A short background to how HotSpot sets up the Java heap:

When the JVM starts it reserves a memory area for the entire Java heap. We use mmap(...MAP_NORESERVE...) to reserve a contiguous chunk of memory that no other subsystem of the JVM or Java program will be allowed to mmap into.

The reservation of memory only reflects the maximum possible heap size, but often a smaller heap size is used if the memory pressure is low. The part of the heap that is actually used is committed with mmap(...MAP_FIXED...). When the heap is growing we commit a consecutive chunk of memory after the previously committed memory. We rely on the fact that no other thread will mmap into the reserved memory area for the Java heap.

The actual committing of the memory is done by first trying to use mmap(...MAP_FIXED|MAP_HUGETLB...), and if that fails mmap is called without MAP_HUGETLB.

The fact that MAP_FIXED|MAP_HUGETLB removes the mapping inside the old reservation opens a windows for other threads to mmap into the Java heap.

I've attached a test that shows this behavior. A sample output from the test:
mmap MAP_NORESERVE: 0x7f848a1dc000-0x7f848a7dc000
7f848a1dc000-7f848a7dc000 ---p 00000000 00:00 0 
7f848a7dc000-7f848a991000 r-xp 00000000 08:06 3180723                    /lib/x86_64-linux-gnu/libc-2.15.so

mmap MAP_HUGETLB at: 0x7f848a200000-0x7f848a600000
7f848a1dc000-7f848a200000 ---p 00000000 00:00 0 
7f848a600000-7f848a7dc000 ---p 00000000 00:00 0 
7f848a7dc000-7f848a991000 r-xp 00000000 08:06 3180723                    /lib/x86_64-linux-gnu/libc-2.15.so
Comment 1 Piotr KwapuliƄski 2015-03-20 17:01:15 UTC
If the new memory region is going to overlap the existing one(s) then Linux destroys the part of existing memory region(s) that is going to be overlapped. Then it tries to allocate the new memory region. When there is no available memory to satisfy the future request the mmap fails and returns -ENOMEM. In this specific case there are no huge page frames available. When mmap fails the kernel does not try to restore the destroyed memory region(s). The same behaviour may be observed in the later kernels up to most recent 4.0-rc4. The deallocation of the old mapping is done in mmap_region -> do_munmap function while the failing condition for allocating of the new mapping may be found in mapping and mmap_region -> ... -> gather_surplus_pages function.

All this is documented both in POSIX and Linux Programmer's Manuals.

Linux Programmer's Manual states:
"If the memory region specified by addr and len overlaps pages of any existing mapping(s), then the overlapped part of the existing mapping(s) will be discarded. If the specified address cannot be used, mmap() will fail."

POSIX Programmer's Manual states:
"If mmap() fails for reasons other than [EBADF], [EINVAL], or [ENOTSUP], some of the mappings in the address range starting at addr and continuing for len bytes may have been unmapped."

I think it is not a bug and should be closed.