Bug 67651

Summary: [REGRESSION] Lots of fragmented mmaps cause gimp to fail in 3.12 after exceeding vm_max_map_count
Product: Memory Management Reporter: Bastian Hougaard (gnome)
Component: OtherAssignee: Andrew Morton (akpm)
Status: REOPENED ---    
Severity: normal CC: alan, dimhen, drawoc, duncan-kernel-org, gnome, konstantin.s.serebryany, linux, nils, pluto, rossetyler
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.12.5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Output of dmesg after reproducing this bug.
.config file for Fedora 19 3.12 kernel.

Description Bastian Hougaard 2013-12-24 13:17:47 UTC
Opening/creating large images (fx 15427 x 12549 pixel) in GIMP under Linux 3.12 causes GIMP to crash. The following messages is typically found in terminal:

"(gimp:17857): GLib-ERROR **: gmem.c:110: failed to allocate 16384 bytes

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff3e5a289 in g_logv () from /usr/lib/libglib-2.0.so.0
(script-fu:17871): LibGimpBase-WARNING **: script-fu: gimp_wire_read(): error"

I am unsure whether this problem lies within gimp, glib or Linux 3.12. This issue has also been filed in the GNOME bug tracker here, have a look for further information on reproduction:
https://bugzilla.gnome.org/show_bug.cgi?id=719619
Comment 1 Alan 2013-12-27 12:14:34 UTC
Thanks for the report - that is a gimp crash not a kernel one.

Actually it looks like you simply ran out of memory and gimp died in protest
Comment 2 Bastian Hougaard 2013-12-27 13:39:29 UTC
GIMP developer Mike Henning say that it looks like a Linux kernel bug to him, as Gimp 2.8.8 crashes under Linux 3.12, but not Linux 3.11 and below. See also comment 10:
https://bugzilla.gnome.org/show_bug.cgi?id=719619#c10

I am sorry if this wasn't clear from my initial bug description.
Comment 3 Alan 2013-12-27 14:13:51 UTC
The trace was absolutely clear - gimp ran out of memory and died (which is a bit rude of it)

(gimp:17857): GLib-ERROR **: gmem.c:110: failed to allocate 16384 bytes

So there are lots of things this could be but they depend upon knowing a fair bit more about your setup

Can you provide the output of "dmesg" after this occurs, and the contents of "free" off both boots.
Comment 4 Michael Henning 2013-12-27 14:52:29 UTC
The reason I said this was likely a kernel bug is because it disappears when I set the environment variable MALLOC_MMAP_MAX_ to 0 before running gimp (running under otherwise identical conditions).

I was under the impression that a normal out of memory condition would result in gimp being killed by the kernel, not malloc returning null. (That message only means that malloc returned null.)

This is the output of free before running gimp on my machine:
mike@mike-newbox-arch:~$  free -h
             total       used       free     shared    buffers     cached
Mem:          7.7G       2.5G       5.3G       2.8M       2.8M       1.3G
-/+ buffers/cache:       1.1G       6.6G
Swap:           0B         0B         0B

As you can see, I have over 6 GB of memory free. Running gimp and creating a 16000x16000 image (about 2.5 GB) then crashes. Only this is appended to dmesg:
[  754.663771] traps: gimp[844] trap int3 ip:7fdb67e07289 sp:7fffcd30c7d0 error:0

(I will attach my full kernel log. Other than this line, you're only missing the boot process.)

As I stated earlier, I can then successfully create an image with the same dimensions by running gimp like this: MALLOC_MMAP_MAX_=0 gimp

Free reports this with that image open:
mike@mike-newbox-arch:~$  free -h
             total       used       free     shared    buffers     cached
Mem:          7.7G       4.9G       2.8G        10M       2.8M       1.4G
-/+ buffers/cache:       3.5G       4.2G
Swap:           0B         0B         0B

Because of this, I believe it is caused by a change in the kernel mmap implementation. (But then, you can probably diagnose it better than I can.)

I have kernel 3.12.5, Arch Linux x86_64, glibc 2.18
Comment 5 Michael Henning 2013-12-27 14:54:31 UTC
Created attachment 119741 [details]
Output of dmesg after reproducing this bug.
Comment 6 Alan 2013-12-27 15:19:07 UTC
The kernel (and userspace set policy) try and avoid getting into a situation where there is too much overcommitting of resources. It's quite likely you will see a malloc NULL return from the kernel refusing an mmap especially if fedora uses memcfg and cgroups to stop one set of tasks (eg the desktop) from taking down the entire box.

The kernel kill case is an extreme 'worst case scenario' recovery which the rest of the policy tries to avoid allowing to happen in the first place.

The fact MALLOC_MMAP_MAX=0 makes it go away does indeed seem suspicious.

One thing related to similar failures has been fixed:  2afc745f3e3079ab16c826be4860da2529054dd2


Hopefully Andrew has some idea
Comment 7 Duncan Grisby 2013-12-30 14:28:26 UTC
I also see this change between 3.11 and 3.12. I have seen that it is due to a change in the way maps are handled. Trying a particular sequence of gimp operations with 3.11, I see 1177 entries in /proc/<pid>/maps. Doing the same sequence with the 3.12 kernel, I see 70054 maps entries. The default vm.max_map_count (on my Fedora 19 system) is 65530, so gimp's allocations fail before reaching that 70054 mark.

Increasing vm.max_map_count allows gimp to continue working, but something has definitely changed in the way the maps are managed by the kernel.
Comment 8 Andrew Morton 2013-12-30 19:42:53 UTC
I haven't used gimp in years.  Please precisely describe the sequence of steps you're using to create this image.  Also, the kernel .config would be useful, thanks.
Comment 9 Duncan Grisby 2013-12-30 22:01:24 UTC
Exactly what I was doing would be somewhat hard for you to reproduce, but here is a simple set of similar steps that demonstrates it. Firstly, I am using 64 bit Fedora 19, updated to the latest package updates. I have the following versions of pertinent packages:

kernel-3.12.5-200.fc19.x86_64
gimp-2.8.10-4.fc19.x86_64
glib2-2.36.3-4.fc19.x86_64

My machine has 16 GB of RAM and 16 GB of swap; it has an Intel Core i7 CPU with 4 cores, 2 threads per core.

Some steps. Numbers in parentheses are the result of "wc -l maps" in /proc/<pid> for the gimp process.

1. run gimp. (1074 maps)
2. File -> New. Create an image of 9000x4000 pixels. (29122)
3. Scribble a bit in the window with the paintbrush tool. (31123)
4. In the layers dialog, duplicate the layer. (29766 -- lower)
5. In the layers dialog, right click on the top layer and select "add layer mask"; choose white (full opacity). (37060)
6. Scribble a bit in the layer mask. (37551)
7. Select Image -> Flatten Image. (46543)

More of the same sorts of actions quickly drive the maps count over the maximum and gimp dies an ungraceful death.

Doing the same sequence with Kernel 3.11, the maps count never goes about about 1200.

I am using the stock kernel from Fedora 19 updates. I didn't build it myself. I'll attach its .config shortly.
Comment 10 Duncan Grisby 2013-12-30 22:02:17 UTC
Created attachment 120301 [details]
.config file for Fedora 19 3.12 kernel.
Comment 11 Michael Henning 2013-12-31 00:08:37 UTC
If you just want the crash, an easier way to reproduce it is:
1. Open GIMP
2. Go to File->New
3. Type in 16000 in for both the width and height.
4. Click OK. If another dialog appears, click OK on that too.
5. Gimp should now crash on recent kernels.

My kernel is the Arch Linux stock kernel. The config can be found here:
https://projects.archlinux.org/svntogit/packages.git/tree/linux/trunk/config.x86_64?id=17d596f51b07023cbbae5dac4c7aeb166913131a
Comment 12 Mel Gorman 2014-01-22 19:08:27 UTC
Cyrill,

Gimp is broken due to a kernel bug included in 3.12. It cannot open
large files without failing memory allocations due to exceeding
vm.max_map_count. The relevant bugzilla entries are

https://bugzilla.kernel.org/show_bug.cgi?id=67651
https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0

They include details on how to reproduce the issue. In my case, a
failure shows messages like this

	(gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes

	(file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load: gimp_wire_read(): error
	xinit: connection to X server lost

	waiting for X server to shut down
	/usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup
	/usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
	/usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup

X-related junk is there was because I was using a headless server and
xinit directly to launch gimp to reproduce the bug.

Automated bisection using mmtests (https://github.com/gormanm/mmtests)
and the configuration file configs/config-global-dhp__gimp-simple (needs
local web server with a copy of the image file) identified the following
commit. Test case was simple -- try and open the large file described in
the bug. I did not investigate the patch itself as I'm just reporting
the results of the bisection. If I had to guess, I'd say that VMA
merging has been affected.

d9104d1ca9662498339c0de975b4666c30485f4e is the first bad commit
commit d9104d1ca9662498339c0de975b4666c30485f4e
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date:   Wed Sep 11 14:22:24 2013 -0700

    mm: track vma changes with VM_SOFTDIRTY bit
    
    Pavel reported that in case if vma area get unmapped and then mapped (or
    expanded) in-place, the soft dirty tracker won't be able to recognize this
    situation since it works on pte level and ptes are get zapped on unmap,
    loosing soft dirty bit of course.
    
    So to resolve this situation we need to track actions on vma level, there
    VM_SOFTDIRTY flag comes in.  When new vma area created (or old expanded)
    we set this bit, and keep it here until application calls for clearing
    soft dirty bit.
    
    Thus when user space application track memory changes now it can detect if
    vma area is renewed.
    
    Reported-by: Pavel Emelyanov <xemul@parallels.com>
    Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Matt Mackall <mpm@selenic.com>
    Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: Rob Landley <rob@landley.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comment 13 Cyrill Gorcunov 2014-01-22 19:19:33 UTC
On Wed, Jan 22, 2014 at 07:08:16PM +0000, Mel Gorman wrote:
> Cyrill,
> 
> Gimp is broken due to a kernel bug included in 3.12. It cannot open
> large files without failing memory allocations due to exceeding
> vm.max_map_count. The relevant bugzilla entries are
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=67651
> https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0
> 
> They include details on how to reproduce the issue. In my case, a
> failure shows messages like this
> 
>       (gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes
> 
>       (file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load:
> gimp_wire_read(): error
>       xinit: connection to X server lost
> 
>       waiting for X server to shut down
>       /usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup
>       /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
>       /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
> 
> X-related junk is there was because I was using a headless server and
> xinit directly to launch gimp to reproduce the bug.
> 
> Automated bisection using mmtests (https://github.com/gormanm/mmtests)
> and the configuration file configs/config-global-dhp__gimp-simple (needs
> local web server with a copy of the image file) identified the following
> commit. Test case was simple -- try and open the large file described in
> the bug. I did not investigate the patch itself as I'm just reporting
> the results of the bisection. If I had to guess, I'd say that VMA
> merging has been affected.

Thanks a lot for report, Mel! I'm investigating...
Comment 14 Andrew Morton 2014-01-22 19:52:18 UTC
On Wed, 22 Jan 2014 19:08:16 +0000 Mel Gorman <mgorman@suse.de> wrote:

> X-related junk is there was because I was using a headless server and
> xinit directly to launch gimp to reproduce the bug.

I've never done this.  Can you share the magic recipe for running an X
app in this way?

Thanks.
Comment 15 Cyrill Gorcunov 2014-01-22 22:33:32 UTC
On Wed, Jan 22, 2014 at 11:19:28PM +0400, Cyrill Gorcunov wrote:
> > commit. Test case was simple -- try and open the large file described in
> > the bug. I did not investigate the patch itself as I'm just reporting
> > the results of the bisection. If I had to guess, I'd say that VMA
> > merging has been affected.
> 
> Thanks a lot for report, Mel! I'm investigating...

Mel, here is a quick fix for bring merging back (just in case if you
have a minute to test it and confirm the merging were affected). It
seems I've lost setting up vma-softdirty bit somewhere and procedure
which tests vma flags mathcing fails, will continue investigating/testing
tomorrow.
---
 mm/mmap.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Index: linux-2.6.git/mm/mmap.c
===================================================================
--- linux-2.6.git.orig/mm/mmap.c
+++ linux-2.6.git/mm/mmap.c
@@ -893,8 +893,18 @@ again:			remove_next = 1 + (end > next->
 static inline int is_mergeable_vma(struct vm_area_struct *vma,
 			struct file *file, unsigned long vm_flags)
 {
+	/*
+	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
+	 * match the flags but dirty bit -- just mark merged one as
+	 * a dirty then.
+	 */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+	if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY)
+		return 0;
+#else
 	if (vma->vm_flags ^ vm_flags)
 		return 0;
+#endif
 	if (vma->vm_file != file)
 		return 0;
 	if (vma->vm_ops && vma->vm_ops->close)
@@ -1082,7 +1092,11 @@ static int anon_vma_compatible(struct vm
 	return a->vm_end == b->vm_start &&
 		mpol_equal(vma_policy(a), vma_policy(b)) &&
 		a->vm_file == b->vm_file &&
+#ifdef CONFIG_MEM_SOFT_DIRTY
+		!((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC|VM_SOFTDIRTY)) &&
+#else
 		!((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC)) &&
+#endif
 		b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
 }
Comment 16 luto 2014-01-22 22:45:58 UTC
On 01/22/2014 11:08 AM, Mel Gorman wrote:
> Cyrill,
> 
> Gimp is broken due to a kernel bug included in 3.12. It cannot open
> large files without failing memory allocations due to exceeding
> vm.max_map_count. The relevant bugzilla entries are
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=67651
> https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0
> 
> They include details on how to reproduce the issue. In my case, a
> failure shows messages like this
> 
>       (gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes
> 
>       (file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load:
> gimp_wire_read(): error
>       xinit: connection to X server lost
> 
>       waiting for X server to shut down
>       /usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup
>       /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
>       /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
> 
> X-related junk is there was because I was using a headless server and
> xinit directly to launch gimp to reproduce the bug.
> 
> Automated bisection using mmtests (https://github.com/gormanm/mmtests)
> and the configuration file configs/config-global-dhp__gimp-simple (needs
> local web server with a copy of the image file) identified the following
> commit. Test case was simple -- try and open the large file described in
> the bug. I did not investigate the patch itself as I'm just reporting
> the results of the bisection. If I had to guess, I'd say that VMA
> merging has been affected.
> 
> d9104d1ca9662498339c0de975b4666c30485f4e is the first bad commit
> commit d9104d1ca9662498339c0de975b4666c30485f4e
> Author: Cyrill Gorcunov <gorcunov@gmail.com>
> Date:   Wed Sep 11 14:22:24 2013 -0700
> 
>     mm: track vma changes with VM_SOFTDIRTY bit
>     
>     Pavel reported that in case if vma area get unmapped and then mapped (or
>     expanded) in-place, the soft dirty tracker won't be able to recognize
>     this
>     situation since it works on pte level and ptes are get zapped on unmap,
>     loosing soft dirty bit of course.
>     
>     So to resolve this situation we need to track actions on vma level, there
>     VM_SOFTDIRTY flag comes in.  When new vma area created (or old expanded)
>     we set this bit, and keep it here until application calls for clearing
>     soft dirty bit.
>     
>     Thus when user space application track memory changes now it can detect
>     if
>     vma area is renewed.

Presumably some path is failing to set VM_SOFTDIRTY, thus preventing mms
from being merged.

That being said, this could cause vma blowups for programs that are
actually using this thing.

--Andy
Comment 17 Cyrill Gorcunov 2014-01-23 05:59:14 UTC
On Wed, Jan 22, 2014 at 02:45:53PM -0800, Andy Lutomirski wrote:
> >     
> >     Thus when user space application track memory changes now it can detect
> if
> >     vma area is renewed.
> 
> Presumably some path is failing to set VM_SOFTDIRTY, thus preventing mms
> from being merged.
> 
> That being said, this could cause vma blowups for programs that are
> actually using this thing.

Hi Andy, indeed, this could happen. The easiest way is to ignore softdirty bit
when we're trying to merge vmas and set it one new merged. I think this should
be correct. Once I finish I'll send the patch.

	Cyrill
Comment 18 Andrew Morton 2014-01-23 06:06:12 UTC
On Thu, 23 Jan 2014 09:59:06 +0400 Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> On Wed, Jan 22, 2014 at 02:45:53PM -0800, Andy Lutomirski wrote:
> > >     
> > >     Thus when user space application track memory changes now it can
> detect if
> > >     vma area is renewed.
> > 
> > Presumably some path is failing to set VM_SOFTDIRTY, thus preventing mms
> > from being merged.
> > 
> > That being said, this could cause vma blowups for programs that are
> > actually using this thing.
> 
> Hi Andy, indeed, this could happen. The easiest way is to ignore softdirty
> bit
> when we're trying to merge vmas and set it one new merged. I think this
> should
> be correct. Once I finish I'll send the patch.

Hang on.  We think the problem is that gimp is generating vmas which
*should* be merged, but for unknown reasons they differ in
VM_SOFTDIRTY, yes?

Shouldn't we work out where we're forgetting to set VM_SOFTDIRTY? 
Putting bandaids over this error when we come to trying to merge the
vmas sounds very wrong?
Comment 19 Cyrill Gorcunov 2014-01-23 06:27:55 UTC
On Wed, Jan 22, 2014 at 10:09:10PM -0800, Andrew Morton wrote:
> > > 
> > > That being said, this could cause vma blowups for programs that are
> > > actually using this thing.
> > 
> > Hi Andy, indeed, this could happen. The easiest way is to ignore softdirty
> bit
> > when we're trying to merge vmas and set it one new merged. I think this
> should
> > be correct. Once I finish I'll send the patch.
> 
> Hang on.  We think the problem is that gimp is generating vmas which
> *should* be merged, but for unknown reasons they differ in
> VM_SOFTDIRTY, yes?

Yes. One place where I forgot to set softdirty bit is setup_arg_pages. But
it called once on elf load, so it can't cause such effect (but should be
fixed too). Also there is do_brk where vmasoftdirty is missed too :/

Another problem is the potential scenario when we have a bunch of vmas
and clear vma-softdirty bit on them, then we try to map new one, flags
won't match and instead of extending old vma the new one will be created.
I think (if only I'm not missing something) that vma-softdirty should
be ignored in such case (ie inside is_mergeable_vma) and once vma extended
it should be marked as dirty one. Again, I need to think and test more.

> Shouldn't we work out where we're forgetting to set VM_SOFTDIRTY? 
> Putting bandaids over this error when we come to trying to merge the
> vmas sounds very wrong?

I'm looking into this as well.

	Cyrill
Comment 20 Mel Gorman 2014-01-23 07:28:41 UTC
On Wed, Jan 22, 2014 at 11:52:15AM -0800, Andrew Morton wrote:
> On Wed, 22 Jan 2014 19:08:16 +0000 Mel Gorman <mgorman@suse.de> wrote:
> 
> > X-related junk is there was because I was using a headless server and
> > xinit directly to launch gimp to reproduce the bug.
> 
> I've never done this.  Can you share the magic recipe for running an X
> app in this way?
> 

The relevant part of the test script is

# Build a wrapper script to launch gimp
cat > gimp-launch.sh << EOF
/usr/bin/gimp -i -b "(mmtests-open-image \"$FILENAME\")" -b "(gimp-quit 0)" > $LOGDIR_RESULTS/gimp-out.1 2>&1
echo \$? > gimp-exit-code
EOF
chmod u+x gimp-launch.sh

$TIME_CMD xinit ./gimp-launch.sh 2> $LOGDIR_RESULTS/time.1
RETVAL=`cat gimp-exit-code`

It's clumsy because the application would start with no window manager
and looking at it again, it probably was not even necessary because of
the -i switch in gimp.

Previously when I needed to automate an X app I configured the machine to
login automatically, exported the DISPLAY variable in the test script and
used wmctrl to detect if an application had a window displayed yet.
Comment 21 Mel Gorman 2014-01-23 09:55:49 UTC
On Thu, Jan 23, 2014 at 02:33:25AM +0400, Cyrill Gorcunov wrote:
> On Wed, Jan 22, 2014 at 11:19:28PM +0400, Cyrill Gorcunov wrote:
> > > commit. Test case was simple -- try and open the large file described in
> > > the bug. I did not investigate the patch itself as I'm just reporting
> > > the results of the bisection. If I had to guess, I'd say that VMA
> > > merging has been affected.
> > 
> > Thanks a lot for report, Mel! I'm investigating...
> 
> Mel, here is a quick fix for bring merging back (just in case if you
> have a minute to test it and confirm the merging were affected). It
> seems I've lost setting up vma-softdirty bit somewhere and procedure
> which tests vma flags mathcing fails, will continue investigating/testing
> tomorrow.

The test case passes with this patch applied to 3.13 so that appears
to confirm that this is related to VM_SOFTDIRTY preventing merges.
Unfortunately I did not have slabinfo tracking enabled to double check
the number of vm_area_structs in teh system.
Comment 22 Mel Gorman 2014-01-23 10:30:59 UTC
On Thu, Jan 23, 2014 at 02:33:25AM +0400, Cyrill Gorcunov wrote:
> On Wed, Jan 22, 2014 at 11:19:28PM +0400, Cyrill Gorcunov wrote:
> > > commit. Test case was simple -- try and open the large file described in
> > > the bug. I did not investigate the patch itself as I'm just reporting
> > > the results of the bisection. If I had to guess, I'd say that VMA
> > > merging has been affected.
> > 
> > Thanks a lot for report, Mel! I'm investigating...
> 
> Mel, here is a quick fix for bring merging back (just in case if you
> have a minute to test it and confirm the merging were affected). It
> seems I've lost setting up vma-softdirty bit somewhere and procedure
> which tests vma flags mathcing fails, will continue investigating/testing
> tomorrow.

The test case passes with this patch applied to 3.13 so that appears
to confirm that this is related to VM_SOFTDIRTY preventing merges.
Unfortunately I did not have slabinfo tracking enabled to double check
the number of vm_area_structs in teh system.
Comment 23 Cyrill Gorcunov 2014-01-23 10:36:14 UTC
On Thu, Jan 23, 2014 at 09:55:41AM +0000, Mel Gorman wrote:
> On Thu, Jan 23, 2014 at 02:33:25AM +0400, Cyrill Gorcunov wrote:
> > On Wed, Jan 22, 2014 at 11:19:28PM +0400, Cyrill Gorcunov wrote:
> > > > commit. Test case was simple -- try and open the large file described
> in
> > > > the bug. I did not investigate the patch itself as I'm just reporting
> > > > the results of the bisection. If I had to guess, I'd say that VMA
> > > > merging has been affected.
> > > 
> > > Thanks a lot for report, Mel! I'm investigating...
> > 
> > Mel, here is a quick fix for bring merging back (just in case if you
> > have a minute to test it and confirm the merging were affected). It
> > seems I've lost setting up vma-softdirty bit somewhere and procedure
> > which tests vma flags mathcing fails, will continue investigating/testing
> > tomorrow.
> 
> The test case passes with this patch applied to 3.13 so that appears
> to confirm that this is related to VM_SOFTDIRTY preventing merges.
> Unfortunately I did not have slabinfo tracking enabled to double check
> the number of vm_area_structs in teh system.

Thanks a lot, Mel! I'm testing the patch as well (manually though :).
I'll send the final fix today.
Comment 24 Cyrill Gorcunov 2014-01-23 12:16:04 UTC
On Thu, Jan 23, 2014 at 02:36:06PM +0400, Cyrill Gorcunov wrote:
> > The test case passes with this patch applied to 3.13 so that appears
> > to confirm that this is related to VM_SOFTDIRTY preventing merges.
> > Unfortunately I did not have slabinfo tracking enabled to double check
> > the number of vm_area_structs in teh system.
> 
> Thanks a lot, Mel! I'm testing the patch as well (manually though :).
> I'll send the final fix today.

The patch below should fix the problem. I would really appreaciate
some additional testing.
---
From: Cyrill Gorcunov <gorcunov@gmail.com>
Subject: [PATCH] mm: Ignore VM_SOFTDIRTY on VMA merging

VM_SOFTDIRTY bit affects vma merge routine: if two VMAs has all
bits in vm_flags matched except dirty bit the kernel can't longer
merge them and this forces the kernel to generate new VMAs instead.

It finally may lead to the situation when userspace application
reaches vm.max_map_count limit and get crashed in worse case

 | (gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes
 |
 | (file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load: gimp_wire_read(): error
 | xinit: connection to X server lost
 |
 | waiting for X server to shut down
 | /usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup
 | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
 | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup

https://bugzilla.kernel.org/show_bug.cgi?id=67651
https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0

Initial problem came from missed VM_SOFTDIRTY in do_brk() routine
but even if we would set up VM_SOFTDIRTY here, there is still a way to
prevent VMAs from merging: one can call

 | echo 4 > /proc/$PID/clear_refs

and clear all VM_SOFTDIRTY over all VMAs presented in memory map,
then new do_brk() will try to extend old VMA and finds that dirty
bit doesn't match thus new VMA will be generated.

As discussed to Pavel, the right approach should be to ignore
VM_SOFTDIRTY bit when we're trying to merge VMAs and if merged
successed we mark extended VMA with dirty bit.

Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 mm/mmap.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Index: linux-2.6.git/mm/mmap.c
===================================================================
--- linux-2.6.git.orig/mm/mmap.c
+++ linux-2.6.git/mm/mmap.c
@@ -893,7 +893,15 @@ again:			remove_next = 1 + (end > next->
 static inline int is_mergeable_vma(struct vm_area_struct *vma,
 			struct file *file, unsigned long vm_flags)
 {
-	if (vma->vm_flags ^ vm_flags)
+	/*
+	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
+	 * match the flags but dirty bit -- the caller should mark
+	 * merged VMA as dirty. If dirty bit won't be excluded from
+	 * comparison, we increase pressue on the memory system forcing
+	 * the kernel to generate new VMAs when old one could be extended
+	 * instead.
+	 */
+	if ((vma->vm_flags ^ vm_flags) & VM_SOFTDIRTY)
 		return 0;
 	if (vma->vm_file != file)
 		return 0;
@@ -1038,6 +1046,8 @@ struct vm_area_struct *vma_merge(struct
 				end, prev->vm_pgoff, NULL);
 		if (err)
 			return NULL;
+		else
+			prev->vm_flags |= VM_SOFTDIRTY;
 		khugepaged_enter_vma_merge(prev);
 		return prev;
 	}
@@ -1057,6 +1067,8 @@ struct vm_area_struct *vma_merge(struct
 				next->vm_pgoff - pglen, NULL);
 		if (err)
 			return NULL;
+		else
+			area->vm_flags |= VM_SOFTDIRTY;
 		khugepaged_enter_vma_merge(area);
 		return area;
 	}
@@ -1082,7 +1094,7 @@ static int anon_vma_compatible(struct vm
 	return a->vm_end == b->vm_start &&
 		mpol_equal(vma_policy(a), vma_policy(b)) &&
 		a->vm_file == b->vm_file &&
-		!((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC)) &&
+		!((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC|VM_SOFTDIRTY)) &&
 		b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
 }
Comment 25 Cyrill Gorcunov 2014-01-23 12:55:50 UTC
On Thu, Jan 23, 2014 at 04:15:55PM +0400, Cyrill Gorcunov wrote:
> On Thu, Jan 23, 2014 at 02:36:06PM +0400, Cyrill Gorcunov wrote:
> > > The test case passes with this patch applied to 3.13 so that appears
> > > to confirm that this is related to VM_SOFTDIRTY preventing merges.
> > > Unfortunately I did not have slabinfo tracking enabled to double check
> > > the number of vm_area_structs in teh system.
> > 
> > Thanks a lot, Mel! I'm testing the patch as well (manually though :).
> > I'll send the final fix today.
> 
> The patch below should fix the problem. I would really appreaciate
> some additional testing.

Forgot to refresh the patch, sorry.
---
From: Cyrill Gorcunov <gorcunov@gmail.com>
Subject: [PATCH] mm: Ignore VM_SOFTDIRTY on VMA merging

VM_SOFTDIRTY bit affects vma merge routine: if two VMAs has all
bits in vm_flags matched except dirty bit the kernel can't longer
merge them and this forces the kernel to generate new VMAs instead.

It finally may lead to the situation when userspace application
reaches vm.max_map_count limit and get crashed in worse case

 | (gimp:11768): GLib-ERROR **: gmem.c:110: failed to allocate 4096 bytes
 |
 | (file-tiff-load:12038): LibGimpBase-WARNING **: file-tiff-load: gimp_wire_read(): error
 | xinit: connection to X server lost
 |
 | waiting for X server to shut down
 | /usr/lib64/gimp/2.0/plug-ins/file-tiff-load terminated: Hangup
 | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup
 | /usr/lib64/gimp/2.0/plug-ins/script-fu terminated: Hangup

https://bugzilla.kernel.org/show_bug.cgi?id=67651
https://bugzilla.gnome.org/show_bug.cgi?id=719619#c0

Initial problem came from missed VM_SOFTDIRTY in do_brk() routine
but even if we would set up VM_SOFTDIRTY here, there is still a way to
prevent VMAs from merging: one can call

 | echo 4 > /proc/$PID/clear_refs

and clear all VM_SOFTDIRTY over all VMAs presented in memory map,
then new do_brk() will try to extend old VMA and finds that dirty
bit doesn't match thus new VMA will be generated.

As discussed to Pavel, the right approach should be to ignore
VM_SOFTDIRTY bit when we're trying to merge VMAs and if merged
successed we mark extended VMA with dirty bit.

Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
CC: Pavel Emelyanov <xemul@parallels.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 mm/mmap.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Index: linux-2.6.git/mm/mmap.c
===================================================================
--- linux-2.6.git.orig/mm/mmap.c
+++ linux-2.6.git/mm/mmap.c
@@ -893,7 +893,15 @@ again:			remove_next = 1 + (end > next->
 static inline int is_mergeable_vma(struct vm_area_struct *vma,
 			struct file *file, unsigned long vm_flags)
 {
-	if (vma->vm_flags ^ vm_flags)
+	/*
+	 * VM_SOFTDIRTY should not prevent from VMA merging, if we
+	 * match the flags but dirty bit -- the caller should mark
+	 * merged VMA as dirty. If dirty bit won't be excluded from
+	 * comparison, we increase pressue on the memory system forcing
+	 * the kernel to generate new VMAs when old one could be extended
+	 * instead.
+	 */
+	if ((vma->vm_flags ^ vm_flags) & ~VM_SOFTDIRTY)
 		return 0;
 	if (vma->vm_file != file)
 		return 0;
@@ -1038,6 +1046,8 @@ struct vm_area_struct *vma_merge(struct
 				end, prev->vm_pgoff, NULL);
 		if (err)
 			return NULL;
+		else
+			prev->vm_flags |= VM_SOFTDIRTY;
 		khugepaged_enter_vma_merge(prev);
 		return prev;
 	}
@@ -1057,6 +1067,8 @@ struct vm_area_struct *vma_merge(struct
 				next->vm_pgoff - pglen, NULL);
 		if (err)
 			return NULL;
+		else
+			area->vm_flags |= VM_SOFTDIRTY;
 		khugepaged_enter_vma_merge(area);
 		return area;
 	}
@@ -1082,7 +1094,7 @@ static int anon_vma_compatible(struct vm
 	return a->vm_end == b->vm_start &&
 		mpol_equal(vma_policy(a), vma_policy(b)) &&
 		a->vm_file == b->vm_file &&
-		!((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC)) &&
+		!((a->vm_flags ^ b->vm_flags) & ~(VM_READ|VM_WRITE|VM_EXEC|VM_SOFTDIRTY)) &&
 		b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);
 }
Comment 26 Kostya Serebryany 2014-01-27 07:38:59 UTC
FYI: This bug hits us in AddressSanitizer, where the allocator creates lots
of adjacent mappings and expects the kernel to merge them.
Comment 27 Kostya Serebryany 2014-01-27 07:39:17 UTC
FYI: This bug hits us in AddressSanitizer, where the allocator creates lots
of adjacent mappings and expects the kernel to merge them.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59733
Comment 28 Kostya Serebryany 2014-01-29 12:46:16 UTC
FTR, here is a small test case that would fail if the bug is present: 

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <assert.h>
int main() {
  char *p = (char*)0x600000000000;
  size_t i;
  for (i = 0; i < 100000; i++) {
    void *addr = p + i * 4096;
    void *ret = mmap(addr, 4096, PROT_WRITE | PROT_READ,
                     MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
    if (addr != ret) {
      fprintf(stderr, "failed at iteration %zd\n", i);
      char command[100];
      snprintf(command, sizeof(command), "cat /proc/%d/maps | head -30", getpid());
      system(command);
      return 1;
    }
  }
}


On a broken kernel it will print something like this:

failed at iteration 65514
00400000-00401000 r-xp 00000000 fc:00 20991316  /tmp/a.out
00600000-00601000 r--p 00000000 fc:00 20991316  /tmp/a.out
00601000-00602000 rw-p 00001000 fc:00 20991316  /tmp/a.out
600000000000-600000001000 rw-p 00000000 00:00 0 
600000001000-600000002000 rw-p 00000000 00:00 0 
600000002000-600000003000 rw-p 00000000 00:00 0 
600000003000-600000004000 rw-p 00000000 00:00 0 
600000004000-600000005000 rw-p 00000000 00:00 0 
...