Bug 57951 - race: mmap removes previous mapping when failing to map with MAP_HUGETLB|MAP_FIXED
Summary: race: mmap removes previous mapping when failing to map with MAP_HUGETLB|MAP_...
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-10 13:57 UTC by Stefan Karlsson
Modified: 2015-03-20 17:01 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.9.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Reproducer (3.71 KB, text/x-csrc)
2013-05-10 13:57 UTC, Stefan Karlsson
Details

Description Stefan Karlsson 2013-05-10 13:57:07 UTC
Created attachment 101071 [details]
Reproducer

If too few huge pages are available when calling mmap with both MAP_HUGETLB and MAP_FIXED, the old mapping at the specified address is removed.

It's not clear to me if this is a bug or just unspecified behavior, but it's a behavior that makes it hard for the HotSpot JVM to use MAP_HUGETLB when dynamically sizing the Java heap.

A short background to how HotSpot sets up the Java heap:

When the JVM starts it reserves a memory area for the entire Java heap. We use mmap(...MAP_NORESERVE...) to reserve a contiguous chunk of memory that no other subsystem of the JVM or Java program will be allowed to mmap into.

The reservation of memory only reflects the maximum possible heap size, but often a smaller heap size is used if the memory pressure is low. The part of the heap that is actually used is committed with mmap(...MAP_FIXED...). When the heap is growing we commit a consecutive chunk of memory after the previously committed memory. We rely on the fact that no other thread will mmap into the reserved memory area for the Java heap.

The actual committing of the memory is done by first trying to use mmap(...MAP_FIXED|MAP_HUGETLB...), and if that fails mmap is called without MAP_HUGETLB.

The fact that MAP_FIXED|MAP_HUGETLB removes the mapping inside the old reservation opens a windows for other threads to mmap into the Java heap.

I've attached a test that shows this behavior. A sample output from the test:
mmap MAP_NORESERVE: 0x7f848a1dc000-0x7f848a7dc000
7f848a1dc000-7f848a7dc000 ---p 00000000 00:00 0 
7f848a7dc000-7f848a991000 r-xp 00000000 08:06 3180723                    /lib/x86_64-linux-gnu/libc-2.15.so

mmap MAP_HUGETLB at: 0x7f848a200000-0x7f848a600000
7f848a1dc000-7f848a200000 ---p 00000000 00:00 0 
7f848a600000-7f848a7dc000 ---p 00000000 00:00 0 
7f848a7dc000-7f848a991000 r-xp 00000000 08:06 3180723                    /lib/x86_64-linux-gnu/libc-2.15.so
Comment 1 Piotr Kwapuliński 2015-03-20 17:01:15 UTC
If the new memory region is going to overlap the existing one(s) then Linux destroys the part of existing memory region(s) that is going to be overlapped. Then it tries to allocate the new memory region. When there is no available memory to satisfy the future request the mmap fails and returns -ENOMEM. In this specific case there are no huge page frames available. When mmap fails the kernel does not try to restore the destroyed memory region(s). The same behaviour may be observed in the later kernels up to most recent 4.0-rc4. The deallocation of the old mapping is done in mmap_region -> do_munmap function while the failing condition for allocating of the new mapping may be found in mapping and mmap_region -> ... -> gather_surplus_pages function.

All this is documented both in POSIX and Linux Programmer's Manuals.

Linux Programmer's Manual states:
"If the memory region specified by addr and len overlaps pages of any existing mapping(s), then the overlapped part of the existing mapping(s) will be discarded. If the specified address cannot be used, mmap() will fail."

POSIX Programmer's Manual states:
"If mmap() fails for reasons other than [EBADF], [EINVAL], or [ENOTSUP], some of the mappings in the address range starting at addr and continuing for len bytes may have been unmapped."

I think it is not a bug and should be closed.

Note You need to log in before you can comment on or make changes to this bug.