Bug 87891

Summary:	kernel BUG at mm/slab.c:2625!
Product:	Memory Management	Reporter:	Luke-Jr (luke-jr+linuxbugs)
Component:	Slab Allocator	Assignee:	Andrew Morton (akpm)
Status:	NEW ---
Severity:	blocking	CC:	knight4553kai, naeilzoueidi, szg00000
Priority:	P1
Hardware:	i386
OS:	Linux
Kernel Version:	3.17.2, 3.18.1	Subsystem:
Regression:	No	Bisected commit-id:
Attachments:	Linux .config

Description Luke-Jr 2014-11-06 17:28:41 UTC

No idea what caused this, but one instance of non-nested KVM is running, and Chromium refuses to start.

[359782.842082] ------------[ cut here ]------------
[359782.842112] kernel BUG at mm/slab.c:2625!
[359782.842127] invalid opcode: 0000 [#1] PREEMPT SMP 
[359782.842157] Modules linked in: ftdi_sio nfnetlink_queue nfnetlink_log nfnetlink netconsole configfs cfq_iosched bridge stp llc ipv6 pl2303 cp210x rndis_host cdc_ether usbserial usbnet cdc_acm usb_storage fuse hid_generic usbhid hid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic evdev coretemp radeon x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm hwmon crc32_pclmul snd_hda_intel drm_kms_helper snd_hda_controller ttm snd_hda_codec ehci_pci xhci_hcd microcode ehci_hcd psmouse 8139too drm snd_hwdep i2c_i801 snd_pcm mii e1000e usbcore snd_timer i2c_algo_bit ptp snd usb_common pps_core battery 8250 rtc_cmos serial_core video tpm_tis tpm shpchp intel_smartconnect button
[359782.842498] CPU: 4 PID: 27625 Comm: quake4smp.x86 Not tainted 3.17.2-gentoo #1
[359782.842532] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z87 Extreme4, BIOS P2.30 07/03/2013
[359782.842567] task: ffff88028973a1d0 ti: ffff880005340000 task.ti: ffff880005340000
[359782.842593] RIP: 0010:[<ffffffff8115134f>]  [<ffffffff8115134f>] cache_alloc_refill+0x5cf/0x5f0
[359782.842632] RSP: 0000:ffff880005343828  EFLAGS: 00010002
[359782.842650] RAX: 0000000000000000 RBX: ffff88044d002468 RCX: 0000000000349c1c
[359782.842683] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
[359782.842716] RBP: ffff880005343888 R08: 28f5c28f5c28f5c3 R09: ffff88044d002458
[359782.842741] R10: ffff88044d002468 R11: ffff88044d000740 R12: ffff88044d000740
[359782.842765] R13: ffff88044d002440 R14: 000000000000001b R15: ffff88043a4a0800
[359782.842792] FS:  0000000000000000(0000) GS:ffff88044f300000(0063) knlGS:00000000f7694700
[359782.842821] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[359782.842841] CR2: 00000000ec9ff013 CR3: 00000002e88fb000 CR4: 00000000001427e0
[359782.842866] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[359782.842892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[359782.842916] Stack:
[359782.842920]  ffff8800053438a0 ffff880300000000 ffff880005343868 0000000000000046
[359782.842949]  ffff88044f3128c0 004352da00000000 000000174f3128c0 0000000000000001
[359782.842979]  00000000004352da 0000000000000400 ffff88044d000740 00000000004352da
[359782.843008] Call Trace:
[359782.843017]  [<ffffffff8115181f>] __kmalloc+0xdf/0x200
[359782.843037]  [<ffffffffa0466285>] ? ttm_page_pool_free+0x35/0x180 [ttm]
[359782.843060]  [<ffffffffa0466285>] ttm_page_pool_free+0x35/0x180 [ttm]
[359782.843084]  [<ffffffffa046674e>] ttm_pool_shrink_scan+0xae/0xd0 [ttm]
[359782.843108]  [<ffffffff8111c2fb>] shrink_slab_node+0x12b/0x2e0
[359782.843129]  [<ffffffff81127ed4>] ? fragmentation_index+0x14/0x70
[359782.843150]  [<ffffffff8110fc3a>] ? zone_watermark_ok+0x1a/0x20
[359782.843171]  [<ffffffff8111ceb8>] shrink_slab+0xc8/0x110
[359782.843189]  [<ffffffff81120480>] do_try_to_free_pages+0x300/0x410
[359782.843210]  [<ffffffff8112084b>] try_to_free_pages+0xbb/0x190
[359782.843230]  [<ffffffff81113136>] __alloc_pages_nodemask+0x696/0xa90
[359782.843253]  [<ffffffff8115810a>] do_huge_pmd_anonymous_page+0xfa/0x3f0
[359782.843278]  [<ffffffff812dffe7>] ? debug_smp_processor_id+0x17/0x20
[359782.843300]  [<ffffffff81118dc7>] ? __lru_cache_add+0x57/0xa0
[359782.843321]  [<ffffffff811385ce>] handle_mm_fault+0x37e/0xdd0
[359782.843341]  [<ffffffff8113ddf3>] ? mmap_region+0x143/0x600
[359782.843360]  [<ffffffff8103ca16>] __do_page_fault+0x156/0x4c0
[359782.843381]  [<ffffffff8106ed91>] ? get_parent_ip+0x11/0x50
[359782.843400]  [<ffffffff81126338>] ? vm_mmap_pgoff+0x78/0xb0
[359782.843419]  [<ffffffff8113d61f>] ? do_munmap+0x30f/0x3b0
[359782.843438]  [<ffffffff8103cdac>] do_page_fault+0xc/0x10
[359782.843458]  [<ffffffff814f5ee2>] page_fault+0x22/0x30
[359782.843474] Code: f2 4c 89 e7 44 89 fe e8 60 f1 00 00 85 c0 0f 85 14 ff ff ff 41 8b 74 24 20 e9 61 fd ff ff 0f 0b f6 c1 01 0f 84 2a fd ff ff 0f 0b <0f> 0b 0f b6 4d cb 4c 89 fe 4c 89 e7 8b 55 cc e8 38 c8 39 00 e9 
[359782.843642] RIP  [<ffffffff8115134f>] cache_alloc_refill+0x5cf/0x5f0
[359782.843665]  RSP <ffff880005343828>
[359782.853918] ---[ end trace fd66575365c8eed5 ]---

Comment 1 Andrew Morton 2014-11-11 23:31:25 UTC

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 06 Nov 2014 17:28:41 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=87891
> 
>             Bug ID: 87891
>            Summary: kernel BUG at mm/slab.c:2625!
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 3.17.2
>           Hardware: i386
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: blocking
>           Priority: P1
>          Component: Slab Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: luke-jr+linuxbugs@utopios.org
>         Regression: No

Well this is interesting.


> [359782.842112] kernel BUG at mm/slab.c:2625!
> ...
> [359782.843008] Call Trace:
> [359782.843017]  [<ffffffff8115181f>] __kmalloc+0xdf/0x200
> [359782.843037]  [<ffffffffa0466285>] ? ttm_page_pool_free+0x35/0x180 [ttm]
> [359782.843060]  [<ffffffffa0466285>] ttm_page_pool_free+0x35/0x180 [ttm]
> [359782.843084]  [<ffffffffa046674e>] ttm_pool_shrink_scan+0xae/0xd0 [ttm]
> [359782.843108]  [<ffffffff8111c2fb>] shrink_slab_node+0x12b/0x2e0
> [359782.843129]  [<ffffffff81127ed4>] ? fragmentation_index+0x14/0x70
> [359782.843150]  [<ffffffff8110fc3a>] ? zone_watermark_ok+0x1a/0x20
> [359782.843171]  [<ffffffff8111ceb8>] shrink_slab+0xc8/0x110
> [359782.843189]  [<ffffffff81120480>] do_try_to_free_pages+0x300/0x410
> [359782.843210]  [<ffffffff8112084b>] try_to_free_pages+0xbb/0x190
> [359782.843230]  [<ffffffff81113136>] __alloc_pages_nodemask+0x696/0xa90
> [359782.843253]  [<ffffffff8115810a>] do_huge_pmd_anonymous_page+0xfa/0x3f0
> [359782.843278]  [<ffffffff812dffe7>] ? debug_smp_processor_id+0x17/0x20
> [359782.843300]  [<ffffffff81118dc7>] ? __lru_cache_add+0x57/0xa0
> [359782.843321]  [<ffffffff811385ce>] handle_mm_fault+0x37e/0xdd0

It went pagefault
        ->__alloc_pages_nodemask
          ->shrink_slab
            ->ttm_pool_shrink_scan
              ->ttm_page_pool_free
                ->kmalloc
                  ->cache_grow
                    ->BUG_ON(flags & GFP_SLAB_BUG_MASK);

And I don't really know why - I'm not seeing anything in there which
can set a GFP flag which is outside GFP_SLAB_BUG_MASK.  However I see
lots of nits.

Core MM:

__alloc_pages_nodemask() does

	if (unlikely(!page)) {
		/*
		 * Runtime PM, block IO and its error handling path
		 * can deadlock because I/O on the device might not
		 * complete.
		 */
		gfp_mask = memalloc_noio_flags(gfp_mask);
		page = __alloc_pages_slowpath(gfp_mask, order,
				zonelist, high_zoneidx, nodemask,
				preferred_zone, classzone_idx, migratetype);
	}

so it permanently alters the value of incoming arg gfp_mask.  This
means that the following trace_mm_page_alloc() will print the wrong
value of gfp_mask, and if we later do the `goto retry_cpuset', we retry
with a possibly different gfp_mask.  Isn't this a bug?


Also, why are we even passing a gfp_t down to the shrinkers?  So they
can work out the allocation context - things like __GFP_IO, __GFP_FS,
etc?  Is it even appropriate to use that mask for a new allocation
attempt within a particular shrinker?


ttm:

I think it's a bad idea to be calling kmalloc() in the slab shrinker
function.  We *know* that the system is low on memory and is trying to
free things up.  Trying to allocate *more* memory at this time is
asking for trouble.  ttm_page_pool_free() could easily be tweaked to
use a fixed-size local array of page*'s t avoid that allocation.  Could
someone implement this please?


slab:

There's no point in doing

	#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)

because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK.
What's it trying to do here?

And it's quite infuriating to go BUG when the code could easily warn
and fix it up.

And it's quite infuriating to go BUG because one of the bits was set,
but not tell us which bit it was!


Could the slab guys please review this?

From: Andrew Morton <akpm@linux-foundation.org>
Subject: slab: improve checking for invalid gfp_flags

- The code goes BUG, but doesn't tell us which bits were unexpectedly
  set.  Print that out.

- The code goes BUG when it could jsut fix things up and proceed.  Do that.

- ~__GFP_BITS_MASK already includes __GFP_DMA32 and __GFP_HIGHMEM, so
  remove those from the GFP_SLAB_BUG_MASK definition.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/gfp.h |    2 +-
 mm/slab.c           |    5 ++++-
 mm/slub.c           |    5 ++++-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff -puN include/linux/gfp.h~slab-improve-checking-for-invalid-gfp_flags include/linux/gfp.h
--- a/include/linux/gfp.h~slab-improve-checking-for-invalid-gfp_flags
+++ a/include/linux/gfp.h
@@ -145,7 +145,7 @@ struct vm_area_struct;
 #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE)
 
 /* Do not use these with a slab allocator */
-#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
+#define GFP_SLAB_BUG_MASK (~__GFP_BITS_MASK)
 
 /* Flag - indicates that the buffer will be suitable for DMA.  Ignored on some
    platforms, used as appropriate on others */
diff -puN mm/slab.c~slab-improve-checking-for-invalid-gfp_flags mm/slab.c
--- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags
+++ a/mm/slab.c
@@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache
 	 * Be lazy and only check for valid flags here,  keeping it out of the
 	 * critical path in kmem_cache_alloc().
 	 */
-	BUG_ON(flags & GFP_SLAB_BUG_MASK);
+	if (WARN_ON(flags & GFP_SLAB_BUG_MASK)) {
+		pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
+		flags &= ~GFP_SLAB_BUG_MASK;
+	}
 	local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
 	/* Take the node list lock to change the colour_next on this node */
diff -puN mm/slub.c~slab-improve-checking-for-invalid-gfp_flags mm/slub.c
--- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags
+++ a/mm/slub.c
@@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem
 	int order;
 	int idx;
 
-	BUG_ON(flags & GFP_SLAB_BUG_MASK);
+	if (WARN_ON(flags & GFP_SLAB_BUG_MASK)) {
+		pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
+		flags &= ~GFP_SLAB_BUG_MASK;
+	}
 
 	page = allocate_slab(s,
 		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
_

Comment 2 Andrew Morton 2014-11-12 00:49:16 UTC

On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote:

> On Tue, 11 Nov 2014, Andrew Morton wrote:
> 
> > There's no point in doing
> >
> >     #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> >
> > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK.
> 
> ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> 
> __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.

Ah, yes, OK.

I suppose it's possible that __GFP_HIGHMEM was set.

do_huge_pmd_anonymous_page
->pte_alloc_one
  ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)

but I haven't traced that through and that's 32-bit.

But anyway - Luke, please attach your .config to
https://bugzilla.kernel.org/show_bug.cgi?id=87891?

Comment 3 Luke-Jr 2014-11-12 00:53:06 UTC

Created attachment 157381 [details]
Linux .config

Comment 4 Andrew Morton 2014-11-12 00:53:34 UTC

On Wed, 12 Nov 2014 09:44:19 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:

> > 
> > And it's quite infuriating to go BUG when the code could easily warn
> > and fix it up.
> 
> If user wants memory on HIGHMEM, it can be easily fixed by following
> change because all memory is compatible for HIGHMEM. But, if user wants
> memory on DMA32, it's not easy to fix because memory on NORMAL isn't
> compatible with DMA32. slab could return object from another slab page
> even if cache_grow() is successfully called. So BUG_ON() here
> looks right thing to me. We cannot know in advance whether ignoring this
> flag cause more serious result or not.

Well, attempting to fix it up and continue is nice, but we can live
with the BUG.

Not knowing which bit was set is bad.

diff -puN mm/slab.c~slab-improve-checking-for-invalid-gfp_flags mm/slab.c
--- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags
+++ a/mm/slab.c
@@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache
 	 * Be lazy and only check for valid flags here,  keeping it out of the
 	 * critical path in kmem_cache_alloc().
 	 */
-	BUG_ON(flags & GFP_SLAB_BUG_MASK);
+	if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
+		pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
+		BUG();
+	}
 	local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
 	/* Take the node list lock to change the colour_next on this node */
--- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags
+++ a/mm/slub.c
@@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem
 	int order;
 	int idx;
 
-	BUG_ON(flags & GFP_SLAB_BUG_MASK);
+	if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
+		pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
+		BUG();
+	}
 
 	page = allocate_slab(s,
 		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
_

Comment 5 Andrew Morton 2014-11-12 01:02:47 UTC

On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote:

> On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote:
> > But anyway - Luke, please attach your .config to
> > https://bugzilla.kernel.org/show_bug.cgi?id=87891?
> 
> Done: https://bugzilla.kernel.org/attachment.cgi?id=157381
> 

OK, thanks.  No CONFIG_HIGHMEM of course.  I'm stumped.

It might just have been a random memory bitflip or other corruption of
course.  Is it repeatable at all?

If it is, please add the below and retest?

--- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags
+++ a/mm/slab.c
@@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache
 	 * Be lazy and only check for valid flags here,  keeping it out of the
 	 * critical path in kmem_cache_alloc().
 	 */
-	BUG_ON(flags & GFP_SLAB_BUG_MASK);
+	if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
+		pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
+		BUG();
+	}
 	local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);
 
 	/* Take the node list lock to change the colour_next on this node */
diff -puN mm/slub.c~slab-improve-checking-for-invalid-gfp_flags mm/slub.c
--- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags
+++ a/mm/slub.c
@@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem
 	int order;
 	int idx;
 
-	BUG_ON(flags & GFP_SLAB_BUG_MASK);
+	if (unlikely(flags & GFP_SLAB_BUG_MASK)) {
+		pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK);
+		BUG();
+	}
 
 	page = allocate_slab(s,
 		flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node);
_

Comment 6 Joonsoo Kim 2014-11-12 01:20:36 UTC

On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote:
> On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote:
> 
> > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote:
> > > But anyway - Luke, please attach your .config to
> > > https://bugzilla.kernel.org/show_bug.cgi?id=87891?
> > 
> > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381
> > 
> 
> OK, thanks.  No CONFIG_HIGHMEM of course.  I'm stumped.

Hello, Andrew.

I think that the cause is GFP_HIGHMEM.
GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM.
Please look at the do_huge_pmd_anonymous_page().
It calls alloc_hugepage_vma() and then alloc_pages_vma() is called
with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE
and then GFP_HIGHUSER_MOVABLE.

Thanks.

Comment 7 Kirill A. Shutemov 2014-11-12 01:29:02 UTC

On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote:
> On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com>
> wrote:
> 
> > On Tue, 11 Nov 2014, Andrew Morton wrote:
> > 
> > > There's no point in doing
> > >
> > >   #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> > >
> > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK.
> > 
> > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> > 
> > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.
> 
> Ah, yes, OK.
> 
> I suppose it's possible that __GFP_HIGHMEM was set.
> 
> do_huge_pmd_anonymous_page
> ->pte_alloc_one
>   ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)

do_huge_pmd_anonymous_page
 alloc_hugepage_vma
  alloc_pages_vma(GFP_TRANSHUGE)

GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM.

Comment 8 Andrew Morton 2014-11-12 01:43:43 UTC

On Wed, 12 Nov 2014 10:22:45 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:

> On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote:
> > On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote:
> > 
> > > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote:
> > > > But anyway - Luke, please attach your .config to
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891?
> > > 
> > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381
> > > 
> > 
> > OK, thanks.  No CONFIG_HIGHMEM of course.  I'm stumped.
> 
> Hello, Andrew.
> 
> I think that the cause is GFP_HIGHMEM.
> GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM.
> Please look at the do_huge_pmd_anonymous_page().
> It calls alloc_hugepage_vma() and then alloc_pages_vma() is called
> with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE
> and then GFP_HIGHUSER_MOVABLE.

OK.

So where's the bug?  I'm inclined to say that it's in ttm.  It's taking
a gfp_mask which means "this is the allocation attempt which we are
attempting to satisfy" and uses that for its own allocation.

But ttm has no business using that gfp_mask for its own allocation
attempt.  If anything it should use something like, err,

	GFP_KERNEL & ~__GFP_IO & ~__GFP_FS | __GFP_HIGH

although as I mentioned earlier, it would be better to avoid allocation
altogether.

Poor ttm guys - this is a bit of a trap we set for them.

Comment 9 Kirill A. Shutemov 2014-11-12 01:47:12 UTC

On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote:
> > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com>
> wrote:
> > 
> > > On Tue, 11 Nov 2014, Andrew Morton wrote:
> > > 
> > > > There's no point in doing
> > > >
> > > >         #define GFP_SLAB_BUG_MASK
> (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> > > >
> > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK.
> > > 
> > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> > > 
> > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.
> > 
> > Ah, yes, OK.
> > 
> > I suppose it's possible that __GFP_HIGHMEM was set.
> > 
> > do_huge_pmd_anonymous_page
> > ->pte_alloc_one
> >   ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)
> 
> do_huge_pmd_anonymous_page
>  alloc_hugepage_vma
>   alloc_pages_vma(GFP_TRANSHUGE)
> 
> GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM.

Looks like it's reasonable to sanitize flags in shrink_slab() by dropping
flags incompatible with slab expectation. Like this:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcb47074ae03..eb165d29c5e5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -369,6 +369,8 @@ unsigned long shrink_slab(struct shrink_control *shrinkctl,
        if (nr_pages_scanned == 0)
                nr_pages_scanned = SWAP_CLUSTER_MAX;
 
+       shrinkctl->gfp_mask &= ~(__GFP_DMA32 | __GFP_HIGHMEM);
+
        if (!down_read_trylock(&shrinker_rwsem)) {
                /*
                 * If we would return 0, our callers would understand that we

Comment 10 Andrew Morton 2014-11-12 01:55:36 UTC

On Wed, 12 Nov 2014 03:47:03 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote:

> On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote:
> > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com>
> wrote:
> > > 
> > > > On Tue, 11 Nov 2014, Andrew Morton wrote:
> > > > 
> > > > > There's no point in doing
> > > > >
> > > > >       #define GFP_SLAB_BUG_MASK
> (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> > > > >
> > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of
> ~__GFP_BITS_MASK.
> > > > 
> > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> > > > 
> > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.
> > > 
> > > Ah, yes, OK.
> > > 
> > > I suppose it's possible that __GFP_HIGHMEM was set.
> > > 
> > > do_huge_pmd_anonymous_page
> > > ->pte_alloc_one
> > >   ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)
> > 
> > do_huge_pmd_anonymous_page
> >  alloc_hugepage_vma
> >   alloc_pages_vma(GFP_TRANSHUGE)
> > 
> > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM.
> 
> Looks like it's reasonable to sanitize flags in shrink_slab() by dropping
> flags incompatible with slab expectation. Like this:
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index dcb47074ae03..eb165d29c5e5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -369,6 +369,8 @@ unsigned long shrink_slab(struct shrink_control
> *shrinkctl,
>         if (nr_pages_scanned == 0)
>                 nr_pages_scanned = SWAP_CLUSTER_MAX;
>  
> +       shrinkctl->gfp_mask &= ~(__GFP_DMA32 | __GFP_HIGHMEM);
> +
>         if (!down_read_trylock(&shrinker_rwsem)) {
>                 /*
>                  * If we would return 0, our callers would understand that we

Well no, because nobody is supposed to be passing this gfp_mask back
into a new allocation attempt anyway.  It would be better to do

	shrinkctl->gfp_mask |= __GFP_IMMEDIATELY_GO_BUG;

?

Comment 11 Kirill A. Shutemov 2014-11-12 02:07:10 UTC

On Tue, Nov 11, 2014 at 05:56:03PM -0800, Andrew Morton wrote:
> On Wed, 12 Nov 2014 03:47:03 +0200 "Kirill A. Shutemov"
> <kirill@shutemov.name> wrote:
> 
> > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote:
> > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter
> <cl@linux.com> wrote:
> > > > 
> > > > > On Tue, 11 Nov 2014, Andrew Morton wrote:
> > > > > 
> > > > > > There's no point in doing
> > > > > >
> > > > > >     #define GFP_SLAB_BUG_MASK
> (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> > > > > >
> > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of
> ~__GFP_BITS_MASK.
> > > > > 
> > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> > > > > 
> > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.
> > > > 
> > > > Ah, yes, OK.
> > > > 
> > > > I suppose it's possible that __GFP_HIGHMEM was set.
> > > > 
> > > > do_huge_pmd_anonymous_page
> > > > ->pte_alloc_one
> > > >   ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)
> > > 
> > > do_huge_pmd_anonymous_page
> > >  alloc_hugepage_vma
> > >   alloc_pages_vma(GFP_TRANSHUGE)
> > > 
> > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM.
> > 
> > Looks like it's reasonable to sanitize flags in shrink_slab() by dropping
> > flags incompatible with slab expectation. Like this:
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index dcb47074ae03..eb165d29c5e5 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -369,6 +369,8 @@ unsigned long shrink_slab(struct shrink_control
> *shrinkctl,
> >         if (nr_pages_scanned == 0)
> >                 nr_pages_scanned = SWAP_CLUSTER_MAX;
> >  
> > +       shrinkctl->gfp_mask &= ~(__GFP_DMA32 | __GFP_HIGHMEM);
> > +
> >         if (!down_read_trylock(&shrinker_rwsem)) {
> >                 /*
> >                  * If we would return 0, our callers would understand that
> we
> 
> Well no, because nobody is supposed to be passing this gfp_mask back
> into a new allocation attempt anyway.  It would be better to do
> 
>       shrinkctl->gfp_mask |= __GFP_IMMEDIATELY_GO_BUG;
> 
> ?

From my POV, the problem is that we combine what-need-to-be-freed gfp_mask
with if-have-to-allocate gfp_mask: we want to respect __GFP_IO/FS on
alloc, but not nessesary both if there's no restriction from the context.

For shrink_slab(), __GFP_DMA32 and __GFP_HIGHMEM don't make sense in both
cases.

__GFP_IMMEDIATELY_GO_BUG would work too, but we also need to provide
macros to construct alloc-suitable mask from the given one for
yes-i-really-have-to-allocate case.

Comment 12 Joonsoo Kim 2014-11-12 02:11:26 UTC

On Tue, Nov 11, 2014 at 05:44:12PM -0800, Andrew Morton wrote:
> On Wed, 12 Nov 2014 10:22:45 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com>
> wrote:
> 
> > On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote:
> > > On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote:
> > > 
> > > > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote:
> > > > > But anyway - Luke, please attach your .config to
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891?
> > > > 
> > > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381
> > > > 
> > > 
> > > OK, thanks.  No CONFIG_HIGHMEM of course.  I'm stumped.
> > 
> > Hello, Andrew.
> > 
> > I think that the cause is GFP_HIGHMEM.
> > GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM.
> > Please look at the do_huge_pmd_anonymous_page().
> > It calls alloc_hugepage_vma() and then alloc_pages_vma() is called
> > with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE
> > and then GFP_HIGHUSER_MOVABLE.
> 
> OK.
> 
> So where's the bug?  I'm inclined to say that it's in ttm.  It's taking

I agree that.

> a gfp_mask which means "this is the allocation attempt which we are
> attempting to satisfy" and uses that for its own allocation.
> 
> But ttm has no business using that gfp_mask for its own allocation
> attempt.  If anything it should use something like, err,
> 
>       GFP_KERNEL & ~__GFP_IO & ~__GFP_FS | __GFP_HIGH
> 
> although as I mentioned earlier, it would be better to avoid allocation
> altogether.

Yes, avoiding would be the best.

If not possible, introducing new common helper for changing shrinker
control's gfp to valid allocation gfp is better than just open code.

Thanks.

> 
> Poor ttm guys - this is a bit of a trap we set for them.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

Comment 13 Joonsoo Kim 2014-11-12 02:15:07 UTC

On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote:
> On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote:
> > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com>
> wrote:
> > 
> > > On Tue, 11 Nov 2014, Andrew Morton wrote:
> > > 
> > > > There's no point in doing
> > > >
> > > >         #define GFP_SLAB_BUG_MASK
> (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> > > >
> > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK.
> > > 
> > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> > > 
> > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.
> > 
> > Ah, yes, OK.
> > 
> > I suppose it's possible that __GFP_HIGHMEM was set.
> > 
> > do_huge_pmd_anonymous_page
> > ->pte_alloc_one
> >   ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)
> 
> do_huge_pmd_anonymous_page
>  alloc_hugepage_vma
>   alloc_pages_vma(GFP_TRANSHUGE)
> 
> GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM.

Hello, Kirill.

BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't
movable? After breaking hugepage, it could be movable, but, it may
prevent CMA from working correctly until break.

Thanks.

Comment 14 Kirill A. Shutemov 2014-11-12 02:37:56 UTC

On Wed, Nov 12, 2014 at 11:17:16AM +0900, Joonsoo Kim wrote:
> On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote:
> > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote:
> > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com>
> wrote:
> > > 
> > > > On Tue, 11 Nov 2014, Andrew Morton wrote:
> > > > 
> > > > > There's no point in doing
> > > > >
> > > > >       #define GFP_SLAB_BUG_MASK
> (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> > > > >
> > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of
> ~__GFP_BITS_MASK.
> > > > 
> > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> > > > 
> > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.
> > > 
> > > Ah, yes, OK.
> > > 
> > > I suppose it's possible that __GFP_HIGHMEM was set.
> > > 
> > > do_huge_pmd_anonymous_page
> > > ->pte_alloc_one
> > >   ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)
> > 
> > do_huge_pmd_anonymous_page
> >  alloc_hugepage_vma
> >   alloc_pages_vma(GFP_TRANSHUGE)
> > 
> > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM.
> 
> Hello, Kirill.
> 
> BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't
> movable? After breaking hugepage, it could be movable, but, it may
> prevent CMA from working correctly until break.

Again, the same alloc vs. free gfp_mask: we want page allocator to move
pages around to find space from THP, but resulting page is no really
movable.

I've tried to look into making THP movable: it requires quite a bit of
infrastructure changes around rmap: try_to_unmap*(), remove_migration_pmd(),
migration entries for PMDs, etc. I gets ugly pretty fast :-/
I probably need to give it second try. No promises.

Comment 15 Andrew Morton 2014-11-12 04:38:30 UTC

On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:

> Andrew Morton wrote:
> > Poor ttm guys - this is a bit of a trap we set for them.
> 
> Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid
> deadlock.\")
> changed to use sc->gfp_mask rather than GFP_KERNEL.
> 
> -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
> -                       GFP_KERNEL);
> +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> 
> But this bug is caused by sc->gfp_mask containing some flags which are not
> in GFP_KERNEL, right? Then, I think
> 
> -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp &
> GFP_KERNEL);
> 
> would hide this bug.
> 
> But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag)

Well no - ttm_page_pool_free() should stop calling kmalloc altogether. 
Just do

	struct page *pages_to_free[16];

and rework the code to free 16 pages at a time.  Easy.

Apart from all the other things we're discussing here, it should do
this because kmalloc() isn't very reliable within a shrinker.


> for
> two reasons when __alloc_pages_nodemask() is called from shrinker functions.
> 
> (1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow
>     recursive __alloc_pages_nodemask() calls, kernel stack could overflow
>     under extreme memory pressure.
> 
> (2) Some shrinker functions are using sleepable locks which could make kswapd
>     sleep for unpredictable duration. If kswapd is unexpectedly blocked
>     inside
>     shrinker functions and somebody is expecting that kswapd is running for
>     reclaiming memory, it is a memory allocation deadlock.
> 
> Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock()
> to
> avoid deadlock inside shrinker functions.\") prevents unlimited recursive
> __alloc_pages_nodemask() calls.

Yes, there are such problems.

Shrinkers do all sorts of surprising things - some of the filesystem
ones do disk writes!  And these involve all sorts of locking and memory
allocations.  But they won't be directly using scan_control.gfp_mask. 
They may be using open-coded __GFP_NOFS for the allocations.  The
complicated ones pass the IO over to kernel threads and wait for them
to complete, which addresses the stack consumption concerns (at least).

Comment 16 Tetsuo Handa 2014-11-12 05:01:25 UTC

Andrew Morton wrote:
> Poor ttm guys - this is a bit of a trap we set for them.

Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid deadlock.\")
changed to use sc->gfp_mask rather than GFP_KERNEL.

-       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
-                       GFP_KERNEL);
+       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);

But this bug is caused by sc->gfp_mask containing some flags which are not
in GFP_KERNEL, right? Then, I think

-       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
+       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL);

would hide this bug.

But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag) for
two reasons when __alloc_pages_nodemask() is called from shrinker functions.

(1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow
    recursive __alloc_pages_nodemask() calls, kernel stack could overflow
    under extreme memory pressure.

(2) Some shrinker functions are using sleepable locks which could make kswapd
    sleep for unpredictable duration. If kswapd is unexpectedly blocked inside
    shrinker functions and somebody is expecting that kswapd is running for
    reclaiming memory, it is a memory allocation deadlock.

Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock() to
avoid deadlock inside shrinker functions.\") prevents unlimited recursive
__alloc_pages_nodemask() calls.

Comment 17 Joonsoo Kim 2014-11-12 08:19:33 UTC

On Wed, Nov 12, 2014 at 04:37:46AM +0200, Kirill A. Shutemov wrote:
> On Wed, Nov 12, 2014 at 11:17:16AM +0900, Joonsoo Kim wrote:
> > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote:
> > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter
> <cl@linux.com> wrote:
> > > > 
> > > > > On Tue, 11 Nov 2014, Andrew Morton wrote:
> > > > > 
> > > > > > There's no point in doing
> > > > > >
> > > > > >     #define GFP_SLAB_BUG_MASK
> (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> > > > > >
> > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of
> ~__GFP_BITS_MASK.
> > > > > 
> > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> > > > > 
> > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.
> > > > 
> > > > Ah, yes, OK.
> > > > 
> > > > I suppose it's possible that __GFP_HIGHMEM was set.
> > > > 
> > > > do_huge_pmd_anonymous_page
> > > > ->pte_alloc_one
> > > >   ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)
> > > 
> > > do_huge_pmd_anonymous_page
> > >  alloc_hugepage_vma
> > >   alloc_pages_vma(GFP_TRANSHUGE)
> > > 
> > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM.
> > 
> > Hello, Kirill.
> > 
> > BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't
> > movable? After breaking hugepage, it could be movable, but, it may
> > prevent CMA from working correctly until break.
> 
> Again, the same alloc vs. free gfp_mask: we want page allocator to move
> pages around to find space from THP, but resulting page is no really
> movable.

Hmm... AFAIK, without MOVABLE flag page allocator will try to move
pages to find space for THP page. Am I missing something?
        
> 
> I've tried to look into making THP movable: it requires quite a bit of
> infrastructure changes around rmap: try_to_unmap*(), remove_migration_pmd(),
> migration entries for PMDs, etc. I gets ugly pretty fast :-/
> I probably need to give it second try. No promises.

Good to hear. :)

I think that we can go another way that breaks the hugepage. This
operation makes it movable and CMA would be succeed.

Thanks.

Comment 18 Joonsoo Kim 2014-11-13 06:34:58 UTC

On Wed, Nov 12, 2014 at 10:39:24AM +0000, Mel Gorman wrote:
> On Wed, Nov 12, 2014 at 11:17:16AM +0900, Joonsoo Kim wrote:
> > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote:
> > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote:
> > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter
> <cl@linux.com> wrote:
> > > > 
> > > > > On Tue, 11 Nov 2014, Andrew Morton wrote:
> > > > > 
> > > > > > There's no point in doing
> > > > > >
> > > > > >     #define GFP_SLAB_BUG_MASK
> (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
> > > > > >
> > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of
> ~__GFP_BITS_MASK.
> > > > > 
> > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set.
> > > > > 
> > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1.
> > > > 
> > > > Ah, yes, OK.
> > > > 
> > > > I suppose it's possible that __GFP_HIGHMEM was set.
> > > > 
> > > > do_huge_pmd_anonymous_page
> > > > ->pte_alloc_one
> > > >   ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM)
> > > 
> > > do_huge_pmd_anonymous_page
> > >  alloc_hugepage_vma
> > >   alloc_pages_vma(GFP_TRANSHUGE)
> > > 
> > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM.
> > 
> > Hello, Kirill.
> > 
> > BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't
> > movable? After breaking hugepage, it could be movable, but, it may
> > prevent CMA from working correctly until break.
> > 
> 
> Because THP can use the Movable zone if it's allocated. When movable was
> introduced it did not just mean migratable. It meant it could also be
> moved to swap. THP can be broken up and swapped so it tagged as movable.

Great explanation!

Thanks Mel.

Comment 19 Luke-Jr 2014-12-23 05:29:10 UTC

Anything I can help test that might fix this yet?

Comment 20 Naeil ZOUEIDI 2017-09-29 10:29:07 UTC

Any updates about this bug? 

Thanks,
Naeîl