Bug 202089
Summary: | transparent hugepage not compatable with madvise(MADV_DONTNEED) | ||
---|---|---|---|
Product: | Memory Management | Reporter: | jianpanlanyue (jianpanlanyue) |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | NEW --- | ||
Severity: | high | CC: | jianpanlanyue |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.4.0-117 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
jianpanlanyue
2018-12-29 09:00:22 UTC
i find kerner version prior to 4.4.0 both have this problem. (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202089 > > Bug ID: 202089 > Summary: transparent hugepage not compatable with > madvise(MADV_DONTNEED) > Product: Memory Management > Version: 2.5 > Kernel Version: 4.4.0-117 > Hardware: x86-64 > OS: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: jianpanlanyue@163.com > Regression: No > > environment: > 1.kernel 4.4.0 on x86_64 > 2.echo always > /sys/kernel/mm/transparent_hugepage/enable > echo always > /sys/kernel/mm/transparent_hugepage/defrag > echo 2000000 > > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan > ( faster defrag pages to reproduce problem) > > problem: > 1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G). > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve > a > few pages(by if(i%33==0) continue;), then process's physical memory firstly > come down, but after a few seconds, it rise back to 2G again, and can't come > down forever. > 3. if i delete this condition(if(i%33==0) continue;) or disable > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well > and > the physical memory can come down expectly. > > It seems like transparent_hugepage has problems with non-contiguous > madvise(MADV_DONTEED). > > > Belows is the test code: > > #include <stdio.h> > #include <memory.h> > #include <stdlib.h> > #include <sys/mman.h> > #include <errno.h> > #include <assert.h> > > #define PAGE_SIZE 4096 > #define PAGE_COUNT 1024*512 > int main() > { > void** table = (void**)malloc(sizeof(void*) * PAGE_COUNT); > printf("begin mmap...\n"); > > for (int i=0; i<PAGE_COUNT; i++) { > table[i] = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1 ,0); > assert(table[i] != MAP_FAILED); > memset(table[i], 1, PAGE_SIZE); > } > > printf("mmap ok, press enter to free most of them\n"); > getchar(); > > //it behaves not expectly: after most pages freed, thp make it rise to 2G > again > for(int i=0; i<PAGE_COUNT; i++) { > if (i%33==0) continue; > if (madvise(table[i], PAGE_SIZE, MADV_DONTNEED) != 0) > printf("madvise error, errno:%d\n", errno); > } > > printf("munmap finish\n"); > free(table); > getchar(); > getchar(); > } > > -- > You are receiving this mail because: > You are the assignee for the bug. On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
>
> (switched to email. Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=202089
> >
> > Bug ID: 202089
> > Summary: transparent hugepage not compatable with
> > madvise(MADV_DONTNEED)
> > Product: Memory Management
> > Version: 2.5
> > Kernel Version: 4.4.0-117
> > Hardware: x86-64
> > OS: Linux
> > Tree: Mainline
> > Status: NEW
> > Severity: high
> > Priority: P1
> > Component: Other
> > Assignee: akpm@linux-foundation.org
> > Reporter: jianpanlanyue@163.com
> > Regression: No
> >
> > environment:
> > 1.kernel 4.4.0 on x86_64
> > 2.echo always > /sys/kernel/mm/transparent_hugepage/enable
> > echo always > /sys/kernel/mm/transparent_hugepage/defrag
> > echo 2000000 >
> /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan
> > ( faster defrag pages to reproduce problem)
> >
> > problem:
> > 1. use mmap() to allocate 4096 bytes for 1024*512 times
> (4096*1024*512=2G).
> > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but
> reserve a
> > few pages(by if(i%33==0) continue;), then process's physical memory firstly
> > come down, but after a few seconds, it rise back to 2G again, and can't
> come
> > down forever.
> > 3. if i delete this condition(if(i%33==0) continue;) or disable
> > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well
> and
> > the physical memory can come down expectly.
> >
> > It seems like transparent_hugepage has problems with non-contiguous
> > madvise(MADV_DONTEED).
It's expected behaviour.
MADV_DONTNEED doesn't guarantee that the range will not be repopulated
(with or without direct action on application behalf). It's just a hint
for the kernel.
For sparse mappings, consider using MADV_NOHUGEPAGE.
"MADV_DONTNEED doesn't guarantee that the range will not be repopulated", Firstly, thanks for your suggestion(MADV_NOHUGEPAGE), but I find this problem never appears after kernel 4.15.0, it seems like this problem has already been fixed(or optimized). Then, i look through the git log, although there are some commits about "tph.*MADV_DONTNEED", but i'm not sure which commit does this. I just want to know what has been changed to resolve this problem, thanks. (In reply to Kirill A. Shutemov from comment #3) > It's expected behaviour. > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated > (with or without direct action on application behalf). It's just a hint > for the kernel. > > For sparse mappings, consider using MADV_NOHUGEPAGE. thanks for your suggestion(MADV_NOHUGEPAGE), but I find this problem never appears after kernel 4.15.0, it seems like this problem has already been fixed(or optimized). Then, i look through the git log, although there are some commits about "tph.*MADV_DONTNEED", but i'm not sure which commit does this. I just want to know what has been changed to resolve this problem, thanks. On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote: > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote: > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=202089 > > > > > > Bug ID: 202089 > > > Summary: transparent hugepage not compatable with > > > madvise(MADV_DONTNEED) > > > Product: Memory Management > > > Version: 2.5 > > > Kernel Version: 4.4.0-117 > > > Hardware: x86-64 > > > OS: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: high > > > Priority: P1 > > > Component: Other > > > Assignee: akpm@linux-foundation.org > > > Reporter: jianpanlanyue@163.com > > > Regression: No > > > > > > environment: > > > 1.kernel 4.4.0 on x86_64 > > > 2.echo always > /sys/kernel/mm/transparent_hugepage/enable > > > echo always > /sys/kernel/mm/transparent_hugepage/defrag > > > echo 2000000 > > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan > > > ( faster defrag pages to reproduce problem) > > > > > > problem: > > > 1. use mmap() to allocate 4096 bytes for 1024*512 times > (4096*1024*512=2G). > > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but > reserve a > > > few pages(by if(i%33==0) continue;), then process's physical memory > firstly > > > come down, but after a few seconds, it rise back to 2G again, and can't > come > > > down forever. > > > 3. if i delete this condition(if(i%33==0) continue;) or disable > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go > well and > > > the physical memory can come down expectly. > > > > > > It seems like transparent_hugepage has problems with non-contiguous > > > madvise(MADV_DONTEED). > > It's expected behaviour. > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated > (with or without direct action on application behalf). It's just a hint > for the kernel. I agree with Kirill here but I would be interested in the underlying usecase that triggered this. The test case is clearly artificial but is any userspace actually relying on MADV_DONTNEED reducing the rss longterm? > For sparse mappings, consider using MADV_NOHUGEPAGE. Yes or use a high threshold for khugepaged for collapsing. On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote:
> On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote:
> > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
> > > > 1. use mmap() to allocate 4096 bytes for 1024*512 times
> (4096*1024*512=2G).
> > > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but
> reserve a
> > > > few pages(by if(i%33==0) continue;), then process's physical memory
> firstly
> > > > come down, but after a few seconds, it rise back to 2G again, and can't
> come
> > > > down forever.
> > > > 3. if i delete this condition(if(i%33==0) continue;) or disable
> > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go
> well and
> > > > the physical memory can come down expectly.
> > > >
> > > > It seems like transparent_hugepage has problems with non-contiguous
> > > > madvise(MADV_DONTEED).
> >
> > It's expected behaviour.
> >
> > MADV_DONTNEED doesn't guarantee that the range will not be repopulated
> > (with or without direct action on application behalf). It's just a hint
> > for the kernel.
>
> I agree with Kirill here but I would be interested in the underlying
> usecase that triggered this. The test case is clearly artificial but is
> any userspace actually relying on MADV_DONTNEED reducing the rss
> longterm?
>
> > For sparse mappings, consider using MADV_NOHUGEPAGE.
Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE? It'd prevent
coalescing elsewhere in the VMA, so that might negatively affect other
programs.
On Thu 03-01-19 06:35:02, Matthew Wilcox wrote:
> On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote:
> > On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote:
> > > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
> > > > > 1. use mmap() to allocate 4096 bytes for 1024*512 times
> (4096*1024*512=2G).
> > > > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but
> reserve a
> > > > > few pages(by if(i%33==0) continue;), then process's physical memory
> firstly
> > > > > come down, but after a few seconds, it rise back to 2G again, and
> can't come
> > > > > down forever.
> > > > > 3. if i delete this condition(if(i%33==0) continue;) or disable
> > > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all
> go well and
> > > > > the physical memory can come down expectly.
> > > > >
> > > > > It seems like transparent_hugepage has problems with non-contiguous
> > > > > madvise(MADV_DONTEED).
> > >
> > > It's expected behaviour.
> > >
> > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated
> > > (with or without direct action on application behalf). It's just a hint
> > > for the kernel.
> >
> > I agree with Kirill here but I would be interested in the underlying
> > usecase that triggered this. The test case is clearly artificial but is
> > any userspace actually relying on MADV_DONTNEED reducing the rss
> > longterm?
> >
> > > For sparse mappings, consider using MADV_NOHUGEPAGE.
>
> Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE? It'd prevent
> coalescing elsewhere in the VMA, so that might negatively affect other
> programs.
I really do not think this is a good idea. MADV_DONTEED doesn't really
imply anything to future rss. It only wipes out the current content.
In other words do we want to stop fault around/readahead or any other
optimistic faulting on MADV_DONTEED?
On Thu, Jan 03, 2019 at 06:35:02AM -0800, Matthew Wilcox wrote:
> On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote:
> > On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote:
> > > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote:
> > > > > 1. use mmap() to allocate 4096 bytes for 1024*512 times
> (4096*1024*512=2G).
> > > > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but
> reserve a
> > > > > few pages(by if(i%33==0) continue;), then process's physical memory
> firstly
> > > > > come down, but after a few seconds, it rise back to 2G again, and
> can't come
> > > > > down forever.
> > > > > 3. if i delete this condition(if(i%33==0) continue;) or disable
> > > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all
> go well and
> > > > > the physical memory can come down expectly.
> > > > >
> > > > > It seems like transparent_hugepage has problems with non-contiguous
> > > > > madvise(MADV_DONTEED).
> > >
> > > It's expected behaviour.
> > >
> > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated
> > > (with or without direct action on application behalf). It's just a hint
> > > for the kernel.
> >
> > I agree with Kirill here but I would be interested in the underlying
> > usecase that triggered this. The test case is clearly artificial but is
> > any userspace actually relying on MADV_DONTNEED reducing the rss
> > longterm?
> >
> > > For sparse mappings, consider using MADV_NOHUGEPAGE.
>
> Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE? It'd prevent
> coalescing elsewhere in the VMA, so that might negatively affect other
> programs.
MADV_NOHUGEPAGE often creates a new VMA (or two) and it has performance
implications. And creating a new VMA would require down_write(mmap_sem)
which is no-go for MADV_DONTNEED.
> I agree with Kirill here but I would be interested in the underlying > usecase that triggered this. The test case is clearly artificial but is > any userspace actually relying on MADV_DONTNEED reducing the rss > longterm? > Yes,user space memory-pools and some language's gc(garbage collection module) often use MADV_DONTNEED instead of free(or munmap) to improve performace, e.g. tcmalloc and jemalloc and golang, belows are the problems they encountered, the same with me. jemalloc: https://github.com/jemalloc/jemalloc/issues/1127 tcmalloc: https://github.com/gperftools/gperftools/issues/990 golang: https://bugzilla.kernel.org/show_bug.cgi?id=93111 (https://github.com/golang/go/issues/8832) Strangely, this problem doesn't exists after kernel 4.15.0, it already be fixed? ? |