environment: 1.kernel 4.4.0 on x86_64 2.echo always > /sys/kernel/mm/transparent_hugepage/enable echo always > /sys/kernel/mm/transparent_hugepage/defrag echo 2000000 > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan ( faster defrag pages to reproduce problem) problem: 1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G). 2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve a few pages(by if(i%33==0) continue;), then process's physical memory firstly come down, but after a few seconds, it rise back to 2G again, and can't come down forever. 3. if i delete this condition(if(i%33==0) continue;) or disable transparent_hugepage by setting 'enable' and 'defrag' to never, all go well and the physical memory can come down expectly. It seems like transparent_hugepage has problems with non-contiguous madvise(MADV_DONTEED). Belows is the test code: #include <stdio.h> #include <memory.h> #include <stdlib.h> #include <sys/mman.h> #include <errno.h> #include <assert.h> #define PAGE_SIZE 4096 #define PAGE_COUNT 1024*512 int main() { void** table = (void**)malloc(sizeof(void*) * PAGE_COUNT); printf("begin mmap...\n"); for (int i=0; i<PAGE_COUNT; i++) { table[i] = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1 ,0); assert(table[i] != MAP_FAILED); memset(table[i], 1, PAGE_SIZE); } printf("mmap ok, press enter to free most of them\n"); getchar(); //it behaves not expectly: after most pages freed, thp make it rise to 2G again for(int i=0; i<PAGE_COUNT; i++) { if (i%33==0) continue; if (madvise(table[i], PAGE_SIZE, MADV_DONTNEED) != 0) printf("madvise error, errno:%d\n", errno); } printf("munmap finish\n"); free(table); getchar(); getchar(); }
i find kerner version prior to 4.4.0 both have this problem.
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202089 > > Bug ID: 202089 > Summary: transparent hugepage not compatable with > madvise(MADV_DONTNEED) > Product: Memory Management > Version: 2.5 > Kernel Version: 4.4.0-117 > Hardware: x86-64 > OS: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: jianpanlanyue@163.com > Regression: No > > environment: > 1.kernel 4.4.0 on x86_64 > 2.echo always > /sys/kernel/mm/transparent_hugepage/enable > echo always > /sys/kernel/mm/transparent_hugepage/defrag > echo 2000000 > > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan > ( faster defrag pages to reproduce problem) > > problem: > 1. use mmap() to allocate 4096 bytes for 1024*512 times (4096*1024*512=2G). > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but reserve > a > few pages(by if(i%33==0) continue;), then process's physical memory firstly > come down, but after a few seconds, it rise back to 2G again, and can't come > down forever. > 3. if i delete this condition(if(i%33==0) continue;) or disable > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well > and > the physical memory can come down expectly. > > It seems like transparent_hugepage has problems with non-contiguous > madvise(MADV_DONTEED). > > > Belows is the test code: > > #include <stdio.h> > #include <memory.h> > #include <stdlib.h> > #include <sys/mman.h> > #include <errno.h> > #include <assert.h> > > #define PAGE_SIZE 4096 > #define PAGE_COUNT 1024*512 > int main() > { > void** table = (void**)malloc(sizeof(void*) * PAGE_COUNT); > printf("begin mmap...\n"); > > for (int i=0; i<PAGE_COUNT; i++) { > table[i] = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1 ,0); > assert(table[i] != MAP_FAILED); > memset(table[i], 1, PAGE_SIZE); > } > > printf("mmap ok, press enter to free most of them\n"); > getchar(); > > //it behaves not expectly: after most pages freed, thp make it rise to 2G > again > for(int i=0; i<PAGE_COUNT; i++) { > if (i%33==0) continue; > if (madvise(table[i], PAGE_SIZE, MADV_DONTNEED) != 0) > printf("madvise error, errno:%d\n", errno); > } > > printf("munmap finish\n"); > free(table); > getchar(); > getchar(); > } > > -- > You are receiving this mail because: > You are the assignee for the bug.
On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=202089 > > > > Bug ID: 202089 > > Summary: transparent hugepage not compatable with > > madvise(MADV_DONTNEED) > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 4.4.0-117 > > Hardware: x86-64 > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: Other > > Assignee: akpm@linux-foundation.org > > Reporter: jianpanlanyue@163.com > > Regression: No > > > > environment: > > 1.kernel 4.4.0 on x86_64 > > 2.echo always > /sys/kernel/mm/transparent_hugepage/enable > > echo always > /sys/kernel/mm/transparent_hugepage/defrag > > echo 2000000 > > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan > > ( faster defrag pages to reproduce problem) > > > > problem: > > 1. use mmap() to allocate 4096 bytes for 1024*512 times > (4096*1024*512=2G). > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but > reserve a > > few pages(by if(i%33==0) continue;), then process's physical memory firstly > > come down, but after a few seconds, it rise back to 2G again, and can't > come > > down forever. > > 3. if i delete this condition(if(i%33==0) continue;) or disable > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go well > and > > the physical memory can come down expectly. > > > > It seems like transparent_hugepage has problems with non-contiguous > > madvise(MADV_DONTEED). It's expected behaviour. MADV_DONTNEED doesn't guarantee that the range will not be repopulated (with or without direct action on application behalf). It's just a hint for the kernel. For sparse mappings, consider using MADV_NOHUGEPAGE.
"MADV_DONTNEED doesn't guarantee that the range will not be repopulated", Firstly, thanks for your suggestion(MADV_NOHUGEPAGE), but I find this problem never appears after kernel 4.15.0, it seems like this problem has already been fixed(or optimized). Then, i look through the git log, although there are some commits about "tph.*MADV_DONTNEED", but i'm not sure which commit does this. I just want to know what has been changed to resolve this problem, thanks.
(In reply to Kirill A. Shutemov from comment #3) > It's expected behaviour. > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated > (with or without direct action on application behalf). It's just a hint > for the kernel. > > For sparse mappings, consider using MADV_NOHUGEPAGE. thanks for your suggestion(MADV_NOHUGEPAGE), but I find this problem never appears after kernel 4.15.0, it seems like this problem has already been fixed(or optimized). Then, i look through the git log, although there are some commits about "tph.*MADV_DONTNEED", but i'm not sure which commit does this. I just want to know what has been changed to resolve this problem, thanks.
On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote: > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote: > > > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Sat, 29 Dec 2018 09:00:22 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=202089 > > > > > > Bug ID: 202089 > > > Summary: transparent hugepage not compatable with > > > madvise(MADV_DONTNEED) > > > Product: Memory Management > > > Version: 2.5 > > > Kernel Version: 4.4.0-117 > > > Hardware: x86-64 > > > OS: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: high > > > Priority: P1 > > > Component: Other > > > Assignee: akpm@linux-foundation.org > > > Reporter: jianpanlanyue@163.com > > > Regression: No > > > > > > environment: > > > 1.kernel 4.4.0 on x86_64 > > > 2.echo always > /sys/kernel/mm/transparent_hugepage/enable > > > echo always > /sys/kernel/mm/transparent_hugepage/defrag > > > echo 2000000 > > /sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan > > > ( faster defrag pages to reproduce problem) > > > > > > problem: > > > 1. use mmap() to allocate 4096 bytes for 1024*512 times > (4096*1024*512=2G). > > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but > reserve a > > > few pages(by if(i%33==0) continue;), then process's physical memory > firstly > > > come down, but after a few seconds, it rise back to 2G again, and can't > come > > > down forever. > > > 3. if i delete this condition(if(i%33==0) continue;) or disable > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go > well and > > > the physical memory can come down expectly. > > > > > > It seems like transparent_hugepage has problems with non-contiguous > > > madvise(MADV_DONTEED). > > It's expected behaviour. > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated > (with or without direct action on application behalf). It's just a hint > for the kernel. I agree with Kirill here but I would be interested in the underlying usecase that triggered this. The test case is clearly artificial but is any userspace actually relying on MADV_DONTNEED reducing the rss longterm? > For sparse mappings, consider using MADV_NOHUGEPAGE. Yes or use a high threshold for khugepaged for collapsing.
On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote: > On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote: > > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote: > > > > 1. use mmap() to allocate 4096 bytes for 1024*512 times > (4096*1024*512=2G). > > > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but > reserve a > > > > few pages(by if(i%33==0) continue;), then process's physical memory > firstly > > > > come down, but after a few seconds, it rise back to 2G again, and can't > come > > > > down forever. > > > > 3. if i delete this condition(if(i%33==0) continue;) or disable > > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all go > well and > > > > the physical memory can come down expectly. > > > > > > > > It seems like transparent_hugepage has problems with non-contiguous > > > > madvise(MADV_DONTEED). > > > > It's expected behaviour. > > > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated > > (with or without direct action on application behalf). It's just a hint > > for the kernel. > > I agree with Kirill here but I would be interested in the underlying > usecase that triggered this. The test case is clearly artificial but is > any userspace actually relying on MADV_DONTNEED reducing the rss > longterm? > > > For sparse mappings, consider using MADV_NOHUGEPAGE. Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE? It'd prevent coalescing elsewhere in the VMA, so that might negatively affect other programs.
On Thu 03-01-19 06:35:02, Matthew Wilcox wrote: > On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote: > > On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote: > > > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote: > > > > > 1. use mmap() to allocate 4096 bytes for 1024*512 times > (4096*1024*512=2G). > > > > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but > reserve a > > > > > few pages(by if(i%33==0) continue;), then process's physical memory > firstly > > > > > come down, but after a few seconds, it rise back to 2G again, and > can't come > > > > > down forever. > > > > > 3. if i delete this condition(if(i%33==0) continue;) or disable > > > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all > go well and > > > > > the physical memory can come down expectly. > > > > > > > > > > It seems like transparent_hugepage has problems with non-contiguous > > > > > madvise(MADV_DONTEED). > > > > > > It's expected behaviour. > > > > > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated > > > (with or without direct action on application behalf). It's just a hint > > > for the kernel. > > > > I agree with Kirill here but I would be interested in the underlying > > usecase that triggered this. The test case is clearly artificial but is > > any userspace actually relying on MADV_DONTNEED reducing the rss > > longterm? > > > > > For sparse mappings, consider using MADV_NOHUGEPAGE. > > Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE? It'd prevent > coalescing elsewhere in the VMA, so that might negatively affect other > programs. I really do not think this is a good idea. MADV_DONTEED doesn't really imply anything to future rss. It only wipes out the current content. In other words do we want to stop fault around/readahead or any other optimistic faulting on MADV_DONTEED?
On Thu, Jan 03, 2019 at 06:35:02AM -0800, Matthew Wilcox wrote: > On Thu, Jan 03, 2019 at 10:44:22AM +0100, Michal Hocko wrote: > > On Sun 30-12-18 01:48:43, Kirill A. Shutemov wrote: > > > On Sat, Dec 29, 2018 at 12:53:16PM -0800, Andrew Morton wrote: > > > > > 1. use mmap() to allocate 4096 bytes for 1024*512 times > (4096*1024*512=2G). > > > > > 2. use madvise(MADV_DONTNEED) to free most of the above pages, but > reserve a > > > > > few pages(by if(i%33==0) continue;), then process's physical memory > firstly > > > > > come down, but after a few seconds, it rise back to 2G again, and > can't come > > > > > down forever. > > > > > 3. if i delete this condition(if(i%33==0) continue;) or disable > > > > > transparent_hugepage by setting 'enable' and 'defrag' to never, all > go well and > > > > > the physical memory can come down expectly. > > > > > > > > > > It seems like transparent_hugepage has problems with non-contiguous > > > > > madvise(MADV_DONTEED). > > > > > > It's expected behaviour. > > > > > > MADV_DONTNEED doesn't guarantee that the range will not be repopulated > > > (with or without direct action on application behalf). It's just a hint > > > for the kernel. > > > > I agree with Kirill here but I would be interested in the underlying > > usecase that triggered this. The test case is clearly artificial but is > > any userspace actually relying on MADV_DONTNEED reducing the rss > > longterm? > > > > > For sparse mappings, consider using MADV_NOHUGEPAGE. > > Should the MADV_DONTNEED hint imply MADV_NOHUGEPAGE? It'd prevent > coalescing elsewhere in the VMA, so that might negatively affect other > programs. MADV_NOHUGEPAGE often creates a new VMA (or two) and it has performance implications. And creating a new VMA would require down_write(mmap_sem) which is no-go for MADV_DONTNEED.
> I agree with Kirill here but I would be interested in the underlying > usecase that triggered this. The test case is clearly artificial but is > any userspace actually relying on MADV_DONTNEED reducing the rss > longterm? > Yes,user space memory-pools and some language's gc(garbage collection module) often use MADV_DONTNEED instead of free(or munmap) to improve performace, e.g. tcmalloc and jemalloc and golang, belows are the problems they encountered, the same with me. jemalloc: https://github.com/jemalloc/jemalloc/issues/1127 tcmalloc: https://github.com/gperftools/gperftools/issues/990 golang: https://bugzilla.kernel.org/show_bug.cgi?id=93111 (https://github.com/golang/go/issues/8832) Strangely, this problem doesn't exists after kernel 4.15.0, it already be fixed?
?