Bug 33682 - mprotect got stuck when THP is "always" enabled
Summary: mprotect got stuck when THP is "always" enabled
Status: CLOSED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-19 05:25 UTC by Caspar Zhang
Modified: 2012-06-13 15:05 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.39-rc3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
mprotect test program (889 bytes, text/plain)
2011-04-19 05:25 UTC, Caspar Zhang
Details

Description Caspar Zhang 2011-04-19 05:25:38 UTC
Created attachment 54662 [details]
mprotect test program 

Description of problem:

see attached test program. This program can be run like this:

./mprotect <times> <length> <flag>

times: how many times the mprotect() function execute;
length: same as the "length" option in mprotect() function;
flag: when flag is set to 1, the program would touch every page within the
range [addr, addr+length-1] before it calls mprotect().

to reproduce the stuck, execute: ./mprotect 50 128 1

Note that the stuck only happens when the following conditions are all satisfied:

flag == 1, i.e. touch page before mprotect()
proto = PROT_NONE in mprotect()
THP is enabled with "always" option

Version-Release number of selected component (if applicable):

Linux version 2.6.39-rc3 (caspar@caspar-gentoo) (gcc version 4.5.2 (Gentoo 4.5.2 p1.0, pie-0.4.5) ) #1 SMP Tue Apr 19 12:32:20 CST 2011

How reproducible:
very often

Actual results:
test program got stuck when touching pages + THP always enabled:

caspar-gentoo tmp # echo always > /sys/kernel/mm/transparent_hugepage/enabled 
caspar-gentoo tmp # ./mprotect 50 128 1
^C <- stuck
caspar-gentoo tmp # ./mprotect 50 128 1
^C
caspar-gentoo tmp # ./mprotect 50 128 1
^C
caspar-gentoo tmp # ./mprotect 50 128 1
^C
caspar-gentoo tmp # echo madvise > /sys/kernel/mm/transparent_hugepage/enabled 
caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # echo never > /sys/kernel/mm/transparent_hugepage/enabled 
caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1
done caspar-gentoo tmp # ./mprotect 50 128 1

Expected results:
test program exit normally

Additional info:

This reproducer was similar to a test program in upstream test suite: libMicro
(http://hub.opensolaris.org/bin/view/Project+libmicro/)

strace ouput: 

# strace ./mprotect 50 128 1
execve("./mprotect", ["./mprotect", "50", "128", "1"], [/* 35 vars */]) = 0
brk(0)                                  = 0x16ad000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffa0c266000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=163815, ...}) = 0
mmap(NULL, 163815, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffa0c23e000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\357\1\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1608912, ...}) = 0
mmap(NULL, 3718152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffa0bcbc000
mprotect(0x7ffa0be3f000, 2093056, PROT_NONE) = 0
mmap(0x7ffa0c03e000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x182000) = 0x7ffa0c03e000
mmap(0x7ffa0c043000, 19464, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffa0c043000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffa0c23d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffa0c23c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffa0c23b000
arch_prctl(ARCH_SET_FS, 0x7ffa0c23c700) = 0
mprotect(0x7ffa0c03e000, 16384, PROT_READ) = 0
mprotect(0x600000, 4096, PROT_READ)     = 0
mprotect(0x7ffa0c267000, 4096, PROT_READ) = 0
munmap(0x7ffa0c23e000, 163815)          = 0
open("/dev/zero", O_RDWR)               = 3
mmap(NULL, 6553600, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7ffa0b67c000
mprotect(0x7ffa0b67c000, 131072, PROT_NONE) = 0
mprotect(0x7ffa0b69c000, 131072, PROT_NONE) = 0
mprotect(0x7ffa0b6bc000, 131072, PROT_NONE) = 0
mprotect(0x7ffa0b6dc000, 131072, PROT_NONE) = 0
mprotect(0x7ffa0b6fc000, 131072, PROT_NONE) = 0
mprotect(0x7ffa0b71c000, 131072, PROT_NONE) = 0
mprotect(0x7ffa0b73c000, 131072, PROT_NONE) = 0
mprotect(0x7ffa0b75c000, 131072, PROT_NONE) = 0
<repeated random times, snip>
Comment 1 Andrew Morton 2011-04-19 06:03:22 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 19 Apr 2011 05:25:41 GMT bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=33682
> 
>            Summary: mprotect got stuck when THP is "always" enabled
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 2.6.38-r1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: bugs@casparzhang.com
>         Regression: No
> 
> 
> Created an attachment (id=54662)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=54662)
> mprotect test program 
> 
> Description of problem:
> 
> see attached test program. This program can be run like this:
> 
> ./mprotect <times> <length> <flag>
> 
> times: how many times the mprotect() function execute;
> length: same as the "length" option in mprotect() function;
> flag: when flag is set to 1, the program would touch every page within the
> range [addr, addr+length-1] before it calls mprotect().
> 
> to reproduce the stuck, execute: ./mprotect 50 128 1
> 
> Note that the stuck only happens when the following conditions are all
> satisfied:
> 
> flag == 1, i.e. touch page before mprotect()
> proto = PROT_NONE in mprotect()
> THP is enabled with "always" option
> 
> Version-Release number of selected component (if applicable):
> 
> Linux version 2.6.39-rc3 (caspar@caspar-gentoo) (gcc version 4.5.2 (Gentoo
> 4.5.2 p1.0, pie-0.4.5) ) #1 SMP Tue Apr 19 12:32:20 CST 2011
> 
> How reproducible:
> very often
> 
> Actual results:
> test program got stuck when touching pages + THP always enabled:
> 
> caspar-gentoo tmp # echo always > /sys/kernel/mm/transparent_hugepage/enabled 
> caspar-gentoo tmp # ./mprotect 50 128 1
> ^C <- stuck
> caspar-gentoo tmp # ./mprotect 50 128 1
> ^C
> caspar-gentoo tmp # ./mprotect 50 128 1
> ^C
> caspar-gentoo tmp # ./mprotect 50 128 1
> ^C
> caspar-gentoo tmp # echo madvise >
> /sys/kernel/mm/transparent_hugepage/enabled 
> caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # echo never >
> /sys/kernel/mm/transparent_hugepage/enabled 
> caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> done caspar-gentoo tmp # ./mprotect 50 128 1
> 
> Expected results:
> test program exit normally
> 
> Additional info:
> 
> This reproducer was similar to a test program in upstream test suite:
> libMicro
> (http://hub.opensolaris.org/bin/view/Project+libmicro/)
> 
> strace ouput: 
> 
> # strace ./mprotect 50 128 1
> execve("./mprotect", ["./mprotect", "50", "128", "1"], [/* 35 vars */]) = 0
> brk(0)                                  = 0x16ad000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7ffa0c266000
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or
> directory)
> open("/etc/ld.so.cache", O_RDONLY)      = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=163815, ...}) = 0
> mmap(NULL, 163815, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffa0c23e000
> close(3)                                = 0
> open("/lib64/libc.so.6", O_RDONLY)      = 3
> read(3,
> "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\357\1\0\0\0\0\0"..., 832)
> =
> 832
> fstat(3, {st_mode=S_IFREG|0755, st_size=1608912, ...}) = 0
> mmap(NULL, 3718152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) =
> 0x7ffa0bcbc000
> mprotect(0x7ffa0be3f000, 2093056, PROT_NONE) = 0
> mmap(0x7ffa0c03e000, 20480, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x182000) = 0x7ffa0c03e000
> mmap(0x7ffa0c043000, 19464, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffa0c043000
> close(3)                                = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7ffa0c23d000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7ffa0c23c000
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
> 0x7ffa0c23b000
> arch_prctl(ARCH_SET_FS, 0x7ffa0c23c700) = 0
> mprotect(0x7ffa0c03e000, 16384, PROT_READ) = 0
> mprotect(0x600000, 4096, PROT_READ)     = 0
> mprotect(0x7ffa0c267000, 4096, PROT_READ) = 0
> munmap(0x7ffa0c23e000, 163815)          = 0
> open("/dev/zero", O_RDWR)               = 3
> mmap(NULL, 6553600, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7ffa0b67c000
> mprotect(0x7ffa0b67c000, 131072, PROT_NONE) = 0
> mprotect(0x7ffa0b69c000, 131072, PROT_NONE) = 0
> mprotect(0x7ffa0b6bc000, 131072, PROT_NONE) = 0
> mprotect(0x7ffa0b6dc000, 131072, PROT_NONE) = 0
> mprotect(0x7ffa0b6fc000, 131072, PROT_NONE) = 0
> mprotect(0x7ffa0b71c000, 131072, PROT_NONE) = 0
> mprotect(0x7ffa0b73c000, 131072, PROT_NONE) = 0
> mprotect(0x7ffa0b75c000, 131072, PROT_NONE) = 0
> <repeated random times, snip>
>
Comment 2 Anonymous Emailer 2011-04-19 11:26:32 UTC
Reply-To: aarcange@redhat.com

On Mon, Apr 18, 2011 at 11:06:51PM -0700, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).

Ok.

/dev/zero mapped with MAP_PRIVATE is special as it doesn't set vm_ops
but is vm_file backed. The page fault code only checks the vm_ops
being NULL to decide if to use THP or not (it doesn't check
vm_file). But then the huge_memory.c code also checks vm_file (like in
vma_adjust_trans_huge). So I think what's going on in the special case
of /dev/zero MAP_PRIVATE is that we could have THP enabled on a
filebacked vma (but no vm_ops backed) that doesn't fully enable all
operations necessary to be safe like vma_adjust_trans_huge because
vm_file being set. To let /dev/zero to use THP safely I guess we can
remove the vm_file checks and only use the vm_ops checks across
huge_memory.c/huge_mm.h or to add add "!vma->vm_ops && !vma->vm_file"
to the page fault so THP isn't used. The former is more risky than
adding the vm_file check in the page fault path but it'd match more
closely what do_anonymous_page does, so I'm thinking what are the
implications of removing vm_file across the huge_memory.c/huge_mm.h. I
didn't manage to reboot into a fixed kernel to test yet... I'll let
you know the results after some testing.
Comment 3 Anonymous Emailer 2011-04-19 13:51:49 UTC
Reply-To: aarcange@redhat.com

Hi,

this should fix bug
https://bugzilla.kernel.org/show_bug.cgi?id=33682 .

====
Subject: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups

From: Andrea Arcangeli <aarcange@redhat.com>

The huge_memory.c THP page fault was allowed to run if vm_ops was null (which
would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't setup a
special vma->vm_ops and it would fallback to regular anonymous memory) but
other THP logics weren't fully activated for vmas with vm_file not NULL
(/dev/zero has a not NULL vma->vm_file).

So this removes the vm_file checks so that /dev/zero also can safely
use THP (the other albeit safer approach to fix this bug would have
been to prevent the THP initial page fault to run if vm_file was set).

After removing the vm_file checks, this also makes huge_memory.c
stricter in khugepaged for the DEBUG_VM=y case. It doesn't replace the
vm_file check with a is_pfn_mapping check (but it keeps checking for
VM_PFNMAP under VM_BUG_ON) because for a is_cow_mapping() mapping
VM_PFNMAP should only be allowed to exist before the first page fault,
and in turn when vma->anon_vma is null (so preventing khugepaged
registration). So I tend to think the previous comment saying if
vm_file was set, VM_PFNMAP might have been set and we could still be
registered in khugepaged (despite anon_vma was not NULL to be
registered in khugepaged) was too paranoid. The is_linear_pfn_mapping
check is also I think superfluous (as described by comment) but under
DEBUG_VM it is safe to stay.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index df29c8f..8847c8c 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -117,7 +117,7 @@ static inline void vma_adjust_trans_huge(struct vm_area_struct *vma,
 					 unsigned long end,
 					 long adjust_next)
 {
-	if (!vma->anon_vma || vma->vm_ops || vma->vm_file)
+	if (!vma->anon_vma || vma->vm_ops)
 		return;
 	__vma_adjust_trans_huge(vma, start, end, adjust_next);
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 692dbae..2348db2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -137,7 +137,8 @@ extern unsigned int kobjsize(const void *objp);
 #define VM_RandomReadHint(v)		((v)->vm_flags & VM_RAND_READ)
 
 /*
- * special vmas that are non-mergable, non-mlock()able
+ * Special vmas that are non-mergable, non-mlock()able.
+ * Note: mm/huge_memory.c VM_NO_THP depends on this definition.
  */
 #define VM_SPECIAL (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 470dcda..83326ad 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1408,6 +1408,9 @@ out:
 	return ret;
 }
 
+#define VM_NO_THP (VM_SPECIAL|VM_INSERTPAGE|VM_MIXEDMAP|VM_SAO| \
+		   VM_HUGETLB|VM_SHARED|VM_MAYSHARE)
+
 int hugepage_madvise(struct vm_area_struct *vma,
 		     unsigned long *vm_flags, int advice)
 {
@@ -1416,11 +1419,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		/*
 		 * Be somewhat over-protective like KSM for now!
 		 */
-		if (*vm_flags & (VM_HUGEPAGE |
-				 VM_SHARED   | VM_MAYSHARE   |
-				 VM_PFNMAP   | VM_IO      | VM_DONTEXPAND |
-				 VM_RESERVED | VM_HUGETLB | VM_INSERTPAGE |
-				 VM_MIXEDMAP | VM_SAO))
+		if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
 			return -EINVAL;
 		*vm_flags &= ~VM_NOHUGEPAGE;
 		*vm_flags |= VM_HUGEPAGE;
@@ -1436,11 +1435,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
 		/*
 		 * Be somewhat over-protective like KSM for now!
 		 */
-		if (*vm_flags & (VM_NOHUGEPAGE |
-				 VM_SHARED   | VM_MAYSHARE   |
-				 VM_PFNMAP   | VM_IO      | VM_DONTEXPAND |
-				 VM_RESERVED | VM_HUGETLB | VM_INSERTPAGE |
-				 VM_MIXEDMAP | VM_SAO))
+		if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
 			return -EINVAL;
 		*vm_flags &= ~VM_HUGEPAGE;
 		*vm_flags |= VM_NOHUGEPAGE;
@@ -1574,10 +1569,14 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
 		 * page fault if needed.
 		 */
 		return 0;
-	if (vma->vm_file || vma->vm_ops)
+	if (vma->vm_ops)
 		/* khugepaged not yet working on file or special mappings */
 		return 0;
-	VM_BUG_ON(is_linear_pfn_mapping(vma) || is_pfn_mapping(vma));
+	/*
+	 * If is_pfn_mapping() is true is_learn_pfn_mapping() must be
+	 * true too, verify it here.
+	 */
+	VM_BUG_ON(is_linear_pfn_mapping(vma) || vma->vm_flags & VM_NO_THP);
 	hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
 	hend = vma->vm_end & HPAGE_PMD_MASK;
 	if (hstart < hend)
@@ -1828,12 +1827,15 @@ static void collapse_huge_page(struct mm_struct *mm,
 	    (vma->vm_flags & VM_NOHUGEPAGE))
 		goto out;
 
-	/* VM_PFNMAP vmas may have vm_ops null but vm_file set */
-	if (!vma->anon_vma || vma->vm_ops || vma->vm_file)
+	if (!vma->anon_vma || vma->vm_ops)
 		goto out;
 	if (is_vma_temporary_stack(vma))
 		goto out;
-	VM_BUG_ON(is_linear_pfn_mapping(vma) || is_pfn_mapping(vma));
+	/*
+	 * If is_pfn_mapping() is true is_learn_pfn_mapping() must be
+	 * true too, verify it here.
+	 */
+	VM_BUG_ON(is_linear_pfn_mapping(vma) || vma->vm_flags & VM_NO_THP);
 
 	pgd = pgd_offset(mm, address);
 	if (!pgd_present(*pgd))
@@ -2066,13 +2068,16 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
 			progress++;
 			continue;
 		}
-		/* VM_PFNMAP vmas may have vm_ops null but vm_file set */
-		if (!vma->anon_vma || vma->vm_ops || vma->vm_file)
+		if (!vma->anon_vma || vma->vm_ops)
 			goto skip;
 		if (is_vma_temporary_stack(vma))
 			goto skip;
-
-		VM_BUG_ON(is_linear_pfn_mapping(vma) || is_pfn_mapping(vma));
+		/*
+		 * If is_pfn_mapping() is true is_learn_pfn_mapping()
+		 * must be true too, verify it here.
+		 */
+		VM_BUG_ON(is_linear_pfn_mapping(vma) ||
+			  vma->vm_flags & VM_NO_THP);
 
 		hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
 		hend = vma->vm_end & HPAGE_PMD_MASK;
Comment 4 Anonymous Emailer 2011-04-19 20:20:09 UTC
Reply-To: riel@redhat.com

On 04/19/2011 09:51 AM, Andrea Arcangeli wrote:
> Hi,
>
> this should fix bug
> https://bugzilla.kernel.org/show_bug.cgi?id=33682 .
>
> ====
> Subject: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups
>
> From: Andrea Arcangeli<aarcange@redhat.com>
>
> The huge_memory.c THP page fault was allowed to run if vm_ops was null (which
> would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't setup a
> special vma->vm_ops and it would fallback to regular anonymous memory) but
> other THP logics weren't fully activated for vmas with vm_file not NULL
> (/dev/zero has a not NULL vma->vm_file).

> Signed-off-by: Andrea Arcangeli<aarcange@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>
Comment 5 Anonymous Emailer 2011-04-20 10:02:12 UTC
Reply-To: mel@csn.ul.ie

On Tue, Apr 19, 2011 at 03:51:19PM +0200, Andrea Arcangeli wrote:
> Hi,
> 
> this should fix bug
> https://bugzilla.kernel.org/show_bug.cgi?id=33682 .
> 
> ====
> Subject: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups
> 
> From: Andrea Arcangeli <aarcange@redhat.com>
> 
> The huge_memory.c THP page fault was allowed to run if vm_ops was null (which
> would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't setup a
> special vma->vm_ops and it would fallback to regular anonymous memory) but
> other THP logics weren't fully activated for vmas with vm_file not NULL
> (/dev/zero has a not NULL vma->vm_file).
> 
> So this removes the vm_file checks so that /dev/zero also can safely
> use THP (the other albeit safer approach to fix this bug would have
> been to prevent the THP initial page fault to run if vm_file was set).
> 
> After removing the vm_file checks, this also makes huge_memory.c
> stricter in khugepaged for the DEBUG_VM=y case. It doesn't replace the
> vm_file check with a is_pfn_mapping check (but it keeps checking for
> VM_PFNMAP under VM_BUG_ON) because for a is_cow_mapping() mapping
> VM_PFNMAP should only be allowed to exist before the first page fault,
> and in turn when vma->anon_vma is null (so preventing khugepaged
> registration). So I tend to think the previous comment saying if
> vm_file was set, VM_PFNMAP might have been set and we could still be
> registered in khugepaged (despite anon_vma was not NULL to be
> registered in khugepaged) was too paranoid. The is_linear_pfn_mapping
> check is also I think superfluous (as described by comment) but under
> DEBUG_VM it is safe to stay.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Mel Gorman <mel@csn.ul.ie>
Comment 6 Florian Mickler 2011-05-04 06:06:15 UTC
A patch referencing this bug report has been merged in v2.6.39-rc5-274-g609cfda:

commit 78f11a255749d09025f54d4e2df4fbcb031530e2
Author: Andrea Arcangeli <aarcange@redhat.com>
Date:   Wed Apr 27 15:26:45 2011 -0700

    mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups

Note You need to log in before you can comment on or make changes to this bug.