Bug 30702

Summary: vmalloc(GFP_NOFS) can callback file system evict_inode, inducing deadlock.
Product: Memory Management Reporter: Prasad Gajanan Joshi. (prasadjoshi124)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED OBSOLETE    
Severity: normal CC: akpm, alan, kernel, prasadjoshi124
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38-rc7 Subsystem:
Regression: No Bisected commit-id:
Attachments: The patch fixes the problem by propagating the allocation flag down the call hierarchy

Description Prasad Gajanan Joshi. 2011-03-07 19:12:22 UTC
I am working on a propitiatory file system development. The problem I am facing is with calling __vmalloc in a lock. Though I am working on changing the code that I have, I thought it would be good to atleast report the VMALLOC problem.

The code looks something like this

const struct file_operations lzfs_file_operations = {
    .write              = lzfs_vnop_write,
};

ssize_t
lzfs_vnop_write()
{
      mutex_lock(some global mutex);
      ptr = __vmalloc(size, GFP_NOFS | __GFP_HIGHMEM, PAGE_KERNEL);
      mutex_unlock(some global mutex);
}

static const struct super_operations lzfs_super_ops = {
    .evict_inode    = lzfs_evict_vnode,
};

static void
lzfs_evict_vnode(struct inode *inode)
{
      mutex_lock(some global mutex);

      some code for eviction;

      mutex_unlock(some global mutex);
}

As the __vmalloc is called with GFP_NOFS, I was expecting the evict_inode (or clear_inode) would not be called when page cache is purned. But I noticed following oops message during the testing.

[ 5058.193312]  [<ffffffffa092a534>] lzfs_clear_vnode+0x104/0x160 [lzfs]
[ 5058.193318]  [<ffffffff8116abc5>] clear_inode+0x75/0xf0
[ 5058.193323]  [<ffffffff8116ac80>] dispose_list+0x40/0x150
[ 5058.193328]  [<ffffffff8116af23>] prune_icache+0x193/0x2a0
[ 5058.193332]  [<ffffffff811665e3>] ? prune_dcache+0x183/0x1d0
[ 5058.193338]  [<ffffffff8116b081>] shrink_icache_memory+0x51/0x60
[ 5058.193345]  [<ffffffff8110e6d4>] shrink_slab+0x124/0x180
[ 5058.193349]  [<ffffffff8110ff0f>] do_try_to_free_pages+0x1cf/0x360
[ 5058.193354]  [<ffffffff8111024b>] try_to_free_pages+0x6b/0x70
[ 5058.193359]  [<ffffffff8110740a>] __alloc_pages_slowpath+0x27a/0x590
[ 5058.193365]  [<ffffffff81107884>] __alloc_pages_nodemask+0x164/0x1d0
[ 5058.193370]  [<ffffffff811397ba>] alloc_pages_current+0x9a/0x100
[ 5058.193375]  [<ffffffff811066ce>] __get_free_pages+0xe/0x50
[ 5058.193380]  [<ffffffff81042435>] pte_alloc_one_kernel+0x15/0x20
[ 5058.193385]  [<ffffffff8111c86b>] __pte_alloc_kernel+0x1b/0xc0
[ 5058.193391]  [<ffffffff8112ad63>] vmap_pte_range+0x183/0x1a0
[ 5058.193395]  [<ffffffff8112aec6>] vmap_pud_range+0x146/0x1c0
[ 5058.193400]  [<ffffffff8112afda>] vmap_page_range_noflush+0x9a/0xc0
[ 5058.193405]  [<ffffffff8112b032>] map_vm_area+0x32/0x50
[ 5058.193410]  [<ffffffff8112c4a8>] __vmalloc_area_node+0x108/0x190
[ 5058.193426]  [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl]
[ 5058.193431]  [<ffffffff8112c392>] __vmalloc_node+0xa2/0xb0
[ 5058.193443]  [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl]
[ 5058.193453]  [<ffffffff8112c712>] __vmalloc+0x22/0x30
[ 5058.193464]  [<ffffffffa06591a0>] kv_alloc+0x90/0x130 [spl]
[ 5058.194007]  [<ffffffffa0858136>] zfs_grow_blocksize+0x46/0xe0 [zfs]
[ 5058.194063]  [<ffffffffa08547e8>] zfs_write+0xbb8/0x1100 [zfs]
[ 5058.194075]  [<ffffffff8114e740>] ? mem_cgroup_charge_common+0x70/0x90
[ 5058.194082]  [<ffffffffa092ced7>] lzfs_vnop_write+0xc7/0x3b0 [lzfs]
[ 5058.194087]  [<ffffffff8111bacc>] ? do_anonymous_page+0x11c/0x350
[ 5058.194096]  [<ffffffff81152ec8>] vfs_write+0xb8/0x1a0
[ 5058.194100]  [<ffffffff81153711>] sys_write+0x51/0x80
[ 5058.194105]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b

The problem is with __vmalloc (map_vm_area) which discards the allocation flag while mapping the scattered physical pages contiguously into the virtual vmalloc area. 

1482 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
1483                  pgprot_t prot, int node, void *caller)
1484 {
1525     if (map_vm_area(area, prot, &pages, gfp_mask))
1526         goto fail;
1527     return area->addr;
1532 }

The function map_vm_area() can result in calls to 
pud_alloc
pmd_alloc
pte_alloc_kernel

Which allocate memory using flag GFP_KERNEL
for example
pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
    pte_t *pte;

    pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
    return pte;
}

The page allocation might trigger the clear_inode (or evict_inode) if the system is running short of the memory. Thus causing the oops.

Though non of the file system in Linux Kernel seems to calling vmalloc in a lock, it would be good to fix the problem anyway.

As far as I can understand the solution is to pass the gfp_mask down the call hierarchy. I wanted to send the patch with these changes, but soon I realized changes are needed at various places and are too much. I thought to reporting the problem first.

Thanks and Regards.
Comment 1 Andrew Morton 2011-03-09 22:24:03 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 7 Mar 2011 19:12:23 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=30702
> 
>            Summary: vmalloc(GFP_NOFS) can callback file system
>                     evict_inode, inducing deadlock.

Yeah.

Ricardo has been working on this.  See the thread at
http://marc.info/?l=linux-mm&m=128942194520631&w=4

It's tough, and we've been bad, and progress is slow :(

>            Product: Memory Management
>            Version: 2.5
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: prasadjoshi124@gmail.com
>         Regression: No
> 
> 
> I am working on a propitiatory file system development. The problem I am
> facing
> is with calling __vmalloc in a lock. Though I am working on changing the code
> that I have, I thought it would be good to atleast report the VMALLOC
> problem.
> 
> The code looks something like this
> 
> const struct file_operations lzfs_file_operations = {
>     .write              = lzfs_vnop_write,
> };
> 
> ssize_t
> lzfs_vnop_write()
> {
>       mutex_lock(some global mutex);
>       ptr = __vmalloc(size, GFP_NOFS | __GFP_HIGHMEM, PAGE_KERNEL);
>       mutex_unlock(some global mutex);
> }
> 
> static const struct super_operations lzfs_super_ops = {
>     .evict_inode    = lzfs_evict_vnode,
> };
> 
> static void
> lzfs_evict_vnode(struct inode *inode)
> {
>       mutex_lock(some global mutex);
> 
>       some code for eviction;
> 
>       mutex_unlock(some global mutex);
> }
> 
> As the __vmalloc is called with GFP_NOFS, I was expecting the evict_inode (or
> clear_inode) would not be called when page cache is purned. But I noticed
> following oops message during the testing.
> 
> [ 5058.193312]  [<ffffffffa092a534>] lzfs_clear_vnode+0x104/0x160 [lzfs]
> [ 5058.193318]  [<ffffffff8116abc5>] clear_inode+0x75/0xf0
> [ 5058.193323]  [<ffffffff8116ac80>] dispose_list+0x40/0x150
> [ 5058.193328]  [<ffffffff8116af23>] prune_icache+0x193/0x2a0
> [ 5058.193332]  [<ffffffff811665e3>] ? prune_dcache+0x183/0x1d0
> [ 5058.193338]  [<ffffffff8116b081>] shrink_icache_memory+0x51/0x60
> [ 5058.193345]  [<ffffffff8110e6d4>] shrink_slab+0x124/0x180
> [ 5058.193349]  [<ffffffff8110ff0f>] do_try_to_free_pages+0x1cf/0x360
> [ 5058.193354]  [<ffffffff8111024b>] try_to_free_pages+0x6b/0x70
> [ 5058.193359]  [<ffffffff8110740a>] __alloc_pages_slowpath+0x27a/0x590
> [ 5058.193365]  [<ffffffff81107884>] __alloc_pages_nodemask+0x164/0x1d0
> [ 5058.193370]  [<ffffffff811397ba>] alloc_pages_current+0x9a/0x100
> [ 5058.193375]  [<ffffffff811066ce>] __get_free_pages+0xe/0x50
> [ 5058.193380]  [<ffffffff81042435>] pte_alloc_one_kernel+0x15/0x20
> [ 5058.193385]  [<ffffffff8111c86b>] __pte_alloc_kernel+0x1b/0xc0
> [ 5058.193391]  [<ffffffff8112ad63>] vmap_pte_range+0x183/0x1a0
> [ 5058.193395]  [<ffffffff8112aec6>] vmap_pud_range+0x146/0x1c0
> [ 5058.193400]  [<ffffffff8112afda>] vmap_page_range_noflush+0x9a/0xc0
> [ 5058.193405]  [<ffffffff8112b032>] map_vm_area+0x32/0x50
> [ 5058.193410]  [<ffffffff8112c4a8>] __vmalloc_area_node+0x108/0x190
> [ 5058.193426]  [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl]
> [ 5058.193431]  [<ffffffff8112c392>] __vmalloc_node+0xa2/0xb0
> [ 5058.193443]  [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl]
> [ 5058.193453]  [<ffffffff8112c712>] __vmalloc+0x22/0x30
> [ 5058.193464]  [<ffffffffa06591a0>] kv_alloc+0x90/0x130 [spl]
> [ 5058.194007]  [<ffffffffa0858136>] zfs_grow_blocksize+0x46/0xe0 [zfs]
> [ 5058.194063]  [<ffffffffa08547e8>] zfs_write+0xbb8/0x1100 [zfs]
> [ 5058.194075]  [<ffffffff8114e740>] ? mem_cgroup_charge_common+0x70/0x90
> [ 5058.194082]  [<ffffffffa092ced7>] lzfs_vnop_write+0xc7/0x3b0 [lzfs]
> [ 5058.194087]  [<ffffffff8111bacc>] ? do_anonymous_page+0x11c/0x350
> [ 5058.194096]  [<ffffffff81152ec8>] vfs_write+0xb8/0x1a0
> [ 5058.194100]  [<ffffffff81153711>] sys_write+0x51/0x80
> [ 5058.194105]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
> 
> The problem is with __vmalloc (map_vm_area) which discards the allocation
> flag
> while mapping the scattered physical pages contiguously into the virtual
> vmalloc area. 
> 
> 1482 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> 1483                  pgprot_t prot, int node, void *caller)
> 1484 {
> 1525     if (map_vm_area(area, prot, &pages, gfp_mask))
> 1526         goto fail;
> 1527     return area->addr;
> 1532 }
> 
> The function map_vm_area() can result in calls to 
> pud_alloc
> pmd_alloc
> pte_alloc_kernel
> 
> Which allocate memory using flag GFP_KERNEL
> for example
> pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
> {
>     pte_t *pte;
> 
>     pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
>     return pte;
> }
> 
> The page allocation might trigger the clear_inode (or evict_inode) if the
> system is running short of the memory. Thus causing the oops.
> 
> Though non of the file system in Linux Kernel seems to calling vmalloc in a
> lock, it would be good to fix the problem anyway.
> 
> As far as I can understand the solution is to pass the gfp_mask down the call
> hierarchy. I wanted to send the patch with these changes, but soon I realized
> changes are needed at various places and are too much. I thought to reporting
> the problem first.
> 
> Thanks and Regards.
Comment 2 Prasad Gajanan Joshi. 2011-03-10 12:12:21 UTC
>
> Ricardo has been working on this.  See the thread at
> http://marc.info/?l=linux-mm&m=128942194520631&w=4
>
> It's tough, and we've been bad, and progress is slow :(
>
>>            Product: Memory Management

Thanks Andrew,

Hi Richardo,

I too worked on the problem last day, here is a patch which adds a new function
__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)

The function __pte_alloc_kernel() can use __pte_alloc_one_kernel()
along with the correct GFP flag.

int __pte_alloc_kernel(pmd_t *pmd, unsigned long address)
{
    pte_t *new = __pte_alloc_one_kernel(&init_mm, address, GFP_KERNEL);
}

I thought of going from bottom to up, passing GFP_KERNEL flag for
testing. If everything works fine then the GFP flag can be changed.

I am planning to run few tests on x86 machine to ensure it works.
BTW, if you are following some other approach, I can test the patch on
my machine.
I hope the browser will not trim the lines at the bottom. This is the patch

---
diff --git a/arch/alpha/include/asm/pgalloc.h b/arch/alpha/include/asm/pgalloc.h
index bc2a0da..a5685aa 100644
--- a/arch/alpha/include/asm/pgalloc.h
+++ b/arch/alpha/include/asm/pgalloc.h
@@ -51,10 +51,15 @@ pmd_free(struct mm_struct *mm, pmd_t *pmd)
 }

 static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addressi,
gfp_t gfp_mask)
+{
+	return (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
+}
+
+static inline pte_t *
 pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-	return pte;
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline void
diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index 9763be0..a4161c8 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -57,17 +57,24 @@ static inline void clean_pte_table(pte_t *pte)
  *  +------------+
  */
 static inline pte_t *
-pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr,
gfp_t gfp_mask)
 {
 	pte_t *pte;

-	pte = (pte_t *)__get_free_page(PGALLOC_GFP);
+	pte = (pte_t *)__get_free_page(gfp_mask | __GFP_NOTRACK |
+		__GFP_REPEAT | __GFP_ZERO);
 	if (pte)
 		clean_pte_table(pte);

 	return pte;
 }

+static inline pte_t *
+pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
+{
+	return __pte_alloc_one_kernel(mm, addr, GFP_KERNEL);
+}
+
 static inline pgtable_t
 pte_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
diff --git a/arch/avr32/include/asm/pgalloc.h b/arch/avr32/include/asm/pgalloc.h
index bc7e8ae..2eb4824 100644
--- a/arch/avr32/include/asm/pgalloc.h
+++ b/arch/avr32/include/asm/pgalloc.h
@@ -51,10 +51,16 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	quicklist_free(QUICK_PGD, NULL, pgd);
 }

+static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long address, gfp_t gfp_mask)
+{
+	return quicklist_alloc(QUICK_PT, gfp_mask | __GFP_REPEAT, NULL);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return quicklist_alloc(QUICK_PT, GFP_KERNEL | __GFP_REPEAT, NULL);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/cris/include/asm/pgalloc.h b/arch/cris/include/asm/pgalloc.h
index 6da975d..453a388 100644
--- a/arch/cris/include/asm/pgalloc.h
+++ b/arch/cris/include/asm/pgalloc.h
@@ -22,10 +22,16 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	free_page((unsigned long)pgd);
 }

+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+	gfp_t gfp_mask)
+{
+  	return (pte_t *) __get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
 {
-  	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
- 	return pte;
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned
long address)
diff --git a/arch/frv/mm/pgalloc.c b/arch/frv/mm/pgalloc.c
index c42c83d..c74ace1 100644
--- a/arch/frv/mm/pgalloc.c
+++ b/arch/frv/mm/pgalloc.c
@@ -20,14 +20,19 @@

 pgd_t swapper_pg_dir[PTRS_PER_PGD] __attribute__((aligned(PAGE_SIZE)));

-pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t *__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long
address, gfp_t gfp_mask)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+	pte_t *pte = (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT);
 	if (pte)
 		clear_page(pte);
 	return pte;
 }

+pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	struct page *page;
diff --git a/arch/ia64/include/asm/pgalloc.h b/arch/ia64/include/asm/pgalloc.h
index 96a8d92..be59452 100644
--- a/arch/ia64/include/asm/pgalloc.h
+++ b/arch/ia64/include/asm/pgalloc.h
@@ -95,10 +95,16 @@ static inline pgtable_t pte_alloc_one(struct
mm_struct *mm, unsigned long addr)
 	return page;
 }

+static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long addr, gfp_t gfp_mask)
+{
+	return quicklist_alloc(0, gfp_mask, NULL);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long addr)
 {
-	return quicklist_alloc(0, GFP_KERNEL, NULL);
+	return __pte_alloc_one_kernel(mm, addr, GFP_KERNEL);
 }

 static inline void pte_free(struct mm_struct *mm, pgtable_t pte)
diff --git a/arch/m32r/include/asm/pgalloc.h b/arch/m32r/include/asm/pgalloc.h
index 0fc7361..fd6650f 100644
--- a/arch/m32r/include/asm/pgalloc.h
+++ b/arch/m32r/include/asm/pgalloc.h
@@ -30,12 +30,16 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	free_page((unsigned long)pgd);
 }

+static __inline__ pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+	unsigned long address, gfp_t gfp_mask)
+{
+	return (pte_t *)__get_free_page(gfp_mask|__GFP_ZERO);
+}
+
 static __inline__ pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 	unsigned long address)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
-
-	return pte;
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static __inline__ pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/m68k/include/asm/motorola_pgalloc.h
b/arch/m68k/include/asm/motorola_pgalloc.h
index 2f02f26..c5190f8 100644
--- a/arch/m68k/include/asm/motorola_pgalloc.h
+++ b/arch/m68k/include/asm/motorola_pgalloc.h
@@ -7,11 +7,13 @@
 extern pmd_t *get_pointer_table(void);
 extern int free_pointer_table(pmd_t *);

-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+	gfp_t gfp_mask)
 {
 	pte_t *pte;

-	pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+	pte = (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
 	if (pte) {
 		__flush_page_to_ram(pte);
 		flush_tlb_kernel_page(pte);
@@ -21,6 +23,12 @@ static inline pte_t *pte_alloc_one_kernel(struct
mm_struct *mm, unsigned long ad
 	return pte;
 }

+static inline pte_t *
+pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
 	cache_page(pte);
diff --git a/arch/m68k/include/asm/sun3_pgalloc.h
b/arch/m68k/include/asm/sun3_pgalloc.h
index 48d80d5..383a8bf 100644
--- a/arch/m68k/include/asm/sun3_pgalloc.h
+++ b/arch/m68k/include/asm/sun3_pgalloc.h
@@ -38,10 +38,11 @@ do {							\
 	tlb_remove_page((tlb), pte);			\
 } while (0)

-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
-					  unsigned long address)
+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
 {
-	unsigned long page = __get_free_page(GFP_KERNEL|__GFP_REPEAT);
+	unsigned long page = __get_free_page(gfp_mask|__GFP_REPEAT);

 	if (!page)
 		return NULL;
@@ -50,6 +51,12 @@ static inline pte_t *pte_alloc_one_kernel(struct
mm_struct *mm,
 	return (pte_t *) (page);
 }

+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
 					unsigned long address)
 {
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 59bf233..7d89c4b 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -240,12 +240,12 @@ unsigned long iopa(unsigned long addr)
 	return pa;
 }

-__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
-		unsigned long address)
+__init_refok pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+		unsigned long address, gfp_t gfp_mask)
 {
 	pte_t *pte;
 	if (mem_init_done) {
-		pte = (pte_t *)__get_free_page(GFP_KERNEL |
+		pte = (pte_t *)__get_free_page(gfp_mask |
 					__GFP_REPEAT | __GFP_ZERO);
 	} else {
 		pte = (pte_t *)early_get_page();
@@ -254,3 +254,9 @@ __init_refok pte_t *pte_alloc_one_kernel(struct
mm_struct *mm,
 	}
 	return pte;
 }
+
+__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+		unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index 881d18b..3521903 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -64,14 +64,16 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	free_pages((unsigned long)pgd, PGD_ORDER);
 }

+static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+	unsigned long address, gfp_t gfp_mask)
+{
+	return (pte_t *) __get_free_pages(gfp_mask|__GFP_REPEAT|__GFP_ZERO,
PTE_ORDER);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 	unsigned long address)
 {
-	pte_t *pte;
-
-	pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO,
PTE_ORDER);
-
-	return pte;
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline struct page *pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/mn10300/mm/pgtable.c b/arch/mn10300/mm/pgtable.c
index 450f7ba..59fd04d 100644
--- a/arch/mn10300/mm/pgtable.c
+++ b/arch/mn10300/mm/pgtable.c
@@ -62,14 +62,20 @@ void set_pmd_pfn(unsigned long vaddr, unsigned
long pfn, pgprot_t flags)
 	local_flush_tlb_one(vaddr);
 }

-pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t *__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+	pte_t *pte = (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT);
 	if (pte)
 		clear_page(pte);
 	return pte;
 }

+pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	struct page *pte;
diff --git a/arch/parisc/include/asm/pgalloc.h
b/arch/parisc/include/asm/pgalloc.h
index fc987a1..e3fbd89 100644
--- a/arch/parisc/include/asm/pgalloc.h
+++ b/arch/parisc/include/asm/pgalloc.h
@@ -127,10 +127,15 @@ pte_alloc_one(struct mm_struct *mm, unsigned long address)
 }

 static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr,
gfp_t gfp_mask)
+{
+	return (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
+}
+
+static inline pte_t *
 pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-	return pte;
+	return __pte_alloc_one_kernel(mm, addr, GFP_KERNEL);
 }

 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
diff --git a/arch/powerpc/include/asm/pgalloc-64.h
b/arch/powerpc/include/asm/pgalloc-64.h
index 292725c..ce2ae2f 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -100,10 +100,17 @@ static inline void pmd_free(struct mm_struct
*mm, pmd_t *pmd)
 	kmem_cache_free(PGT_CACHE(PMD_INDEX_SIZE), pmd);
 }

+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
+{
+        return (pte_t *)__get_free_page(gfp_mask | __GFP_REPEAT | __GFP_ZERO);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-        return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT |
__GFP_ZERO);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 8dc41c0..8e3c0b4 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -95,14 +95,15 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 #endif
 }

-__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
+__init_refok pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
 {
 	pte_t *pte;
 	extern int mem_init_done;
 	extern void *early_get_page(void);

 	if (mem_init_done) {
-		pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+		pte = (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
 	} else {
 		pte = (pte_t *)early_get_page();
 		if (pte)
@@ -111,6 +112,11 @@ __init_refok pte_t *pte_alloc_one_kernel(struct
mm_struct *mm, unsigned long add
 	return pte;
 }

+__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	struct page *ptepage;
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 082eb4e..7c6fd31 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -172,7 +172,11 @@ static inline void pmd_populate(struct mm_struct *mm,
 /*
  * page table entry allocation/free routines.
  */
-#define pte_alloc_one_kernel(mm, vmaddr) ((pte_t *) page_table_alloc(mm))
+#define __pte_alloc_one_kernel(mm, vmaddr, mask) \
+	((pte_t *) __page_table_alloc((mm), (mask)))
+#define pte_alloc_one_kernel(mm, vmaddr) \
+	((pte_t *) __pte_alloc_one_kernel((mm), (vmaddr), GFP_KERNEL)
+
 #define pte_alloc_one(mm, vmaddr) ((pte_t *) page_table_alloc(mm))

 #define pte_free_kernel(mm, pte) page_table_free(mm, (unsigned long *) pte)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index e1850c2..44cf377 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -267,7 +267,7 @@ void crst_table_downgrade(struct mm_struct *mm,
unsigned long limit)
 /*
  * page table entry allocation/free routines.
  */
-unsigned long *page_table_alloc(struct mm_struct *mm)
+unsigned long *__page_table_alloc(struct mm_struct *mm, gfp_t gfp_mask)
 {
 	struct page *page;
 	unsigned long *table;
@@ -284,7 +284,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 	}
 	if (!page) {
 		spin_unlock_bh(&mm->context.list_lock);
-		page = alloc_page(GFP_KERNEL|__GFP_REPEAT);
+		page = alloc_page(gfp_mask|__GFP_REPEAT);
 		if (!page)
 			return NULL;
 		pgtable_page_ctor(page);
@@ -309,6 +309,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 	return table;
 }

+
+unsigned long *page_table_alloc(struct mm_struct *mm)
+{
+	return __page_table_alloc(mm, GFP_KERNEL);
+}
+
 static void __page_table_free(struct mm_struct *mm, unsigned long *table)
 {
 	struct page *page;
diff --git a/arch/score/include/asm/pgalloc.h b/arch/score/include/asm/pgalloc.h
index 059a61b..5c2a47b 100644
--- a/arch/score/include/asm/pgalloc.h
+++ b/arch/score/include/asm/pgalloc.h
@@ -37,15 +37,17 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	free_pages((unsigned long)pgd, PGD_ORDER);
 }

-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
-	unsigned long address)
+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
 {
-	pte_t *pte;
-
-	pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO,
+	return (pte_t *) __get_free_pages(gfp_mask|__GFP_REPEAT|__GFP_ZERO,
 					PTE_ORDER);
+}

-	return pte;
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+	unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline struct page *pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/sh/include/asm/pgalloc.h b/arch/sh/include/asm/pgalloc.h
index 8c00785..1214abd 100644
--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -31,10 +31,16 @@ static inline void pmd_populate(struct mm_struct
*mm, pmd_t *pmd,
 /*
  * Allocate and free page tables.
  */
+static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long address, gfp_t gfp_mask)
+{
+	return quicklist_alloc(QUICK_PT, gfp_mask | __GFP_REPEAT, NULL);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return quicklist_alloc(QUICK_PT, GFP_KERNEL | __GFP_REPEAT, NULL);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/sparc/include/asm/pgalloc_64.h
b/arch/sparc/include/asm/pgalloc_64.h
index 5bdfa2c..a238412 100644
--- a/arch/sparc/include/asm/pgalloc_64.h
+++ b/arch/sparc/include/asm/pgalloc_64.h
@@ -36,10 +36,17 @@ static inline void pmd_free(struct mm_struct *mm,
pmd_t *pmd)
 	quicklist_free(0, NULL, pmd);
 }

+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
+{
+	return quicklist_alloc(0, gfp_mask, NULL);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return quicklist_alloc(0, GFP_KERNEL, NULL);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/tile/include/asm/pgalloc.h b/arch/tile/include/asm/pgalloc.h
index cf52791..a457042 100644
--- a/arch/tile/include/asm/pgalloc.h
+++ b/arch/tile/include/asm/pgalloc.h
@@ -74,9 +74,16 @@ extern void pte_free(struct mm_struct *mm, struct page *pte);
 #define pmd_pgtable(pmd) pmd_page(pmd)

 static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
+{
+	return pfn_to_kaddr(page_to_pfn(__pte_alloc_one(mm, address, gfp_mask)));
+}
+
+static inline pte_t *
 pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return pfn_to_kaddr(page_to_pfn(pte_alloc_one(mm, address)));
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 1f5430c..7875a32 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -218,9 +218,10 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)

 #define L2_USER_PGTABLE_PAGES (1 << L2_USER_PGTABLE_ORDER)

-struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
+struct page *
+__pte_alloc_one(struct mm_struct *mm, unsigned long address, gfp_t gfp_mask)
 {
-	gfp_t flags = GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO|__GFP_COMP;
+	gfp_t flags = gfp_mask|__GFP_REPEAT|__GFP_ZERO|__GFP_COMP;
 	struct page *p;

 #ifdef CONFIG_HIGHPTE
@@ -235,6 +236,11 @@ struct page *pte_alloc_one(struct mm_struct *mm,
unsigned long address)
 	return p;
 }

+struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one(mm, address, GFP_KERNEL);
+}
+
 /*
  * Free page immediately (used in __pte_alloc if we raced with another
  * process).  We have to correct whatever pte_alloc_one() did before
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 8137ccc..d7969b3 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -284,12 +284,15 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 	free_page((unsigned long) pgd);
 }

-pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
 {
-	pte_t *pte;
+	return (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
+}

-	pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-	return pte;
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 500242d..6b61bbd 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -15,9 +15,16 @@

 gfp_t __userpte_alloc_gfp = PGALLOC_GFP | PGALLOC_USER_GFP;

+pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
+{
+	return (pte_t *)__get_free_page(gfp_mask | __GFP_NOTRACK |
+				__GFP_REPEAT | __GFP_ZERO);
+}
+
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return (pte_t *)__get_free_page(PGALLOC_GFP);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
diff --git a/arch/xtensa/include/asm/pgalloc.h
b/arch/xtensa/include/asm/pgalloc.h
index 40cf9bc..e24c720 100644
--- a/arch/xtensa/include/asm/pgalloc.h
+++ b/arch/xtensa/include/asm/pgalloc.h
@@ -42,10 +42,17 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)

 extern struct kmem_cache *pgtable_cache;

+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
+{
+	return kmem_cache_alloc(pgtable_cache, gfp_mask|__GFP_REPEAT);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					 unsigned long address)
 {
-	return kmem_cache_alloc(pgtable_cache, GFP_KERNEL|__GFP_REPEAT);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/xtensa/mm/pgtable.c b/arch/xtensa/mm/pgtable.c
index 6979927..1c53abc 100644
--- a/arch/xtensa/mm/pgtable.c
+++ b/arch/xtensa/mm/pgtable.c
@@ -12,13 +12,14 @@

 #if (DCACHE_SIZE > PAGE_SIZE)

-pte_t* pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t*
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
 {
 	pte_t *pte = NULL, *p;
 	int color = ADDR_COLOR(address);
 	int i;

-	p = (pte_t*) __get_free_pages(GFP_KERNEL|__GFP_REPEAT, COLOR_ORDER);
+	p = (pte_t*) __get_free_pages(gfp_mask|__GFP_REPEAT, COLOR_ORDER);

 	if (likely(p)) {
 		split_page(virt_to_page(p), COLOR_ORDER);
@@ -35,6 +36,11 @@ pte_t* pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
 	return pte;
 }

+pte_t* pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 #ifdef PROFILING

 int mask;
Comment 3 Anonymous Emailer 2011-03-10 12:23:33 UTC
Reply-To: ricardo.correia@oracle.com

Hi Andrew, Prasad,

On Wed, 2011-03-09 at 14:23 -0800, Andrew Morton wrote:
> Ricardo has been working on this.  See the thread at
> http://marc.info/?l=linux-mm&m=128942194520631&w=4

Sorry, but I am no longer working on this and unfortunately it's
unlikely that I will continue working on it in the future... :-(

Best regards,
Ricardo
Comment 4 Prasad Gajanan Joshi. 2011-03-14 19:11:27 UTC
Created attachment 50802 [details]
The patch fixes the problem by propagating the allocation flag down the call hierarchy
Comment 5 Prasad Gajanan Joshi. 2012-08-21 06:16:26 UTC
Not sure why this is marked as 'RESOLVED OBSOLETE'. Is it fixed in the mainline kernel?
Comment 6 Alan 2012-08-21 09:05:53 UTC
Because its well over a year old
Comment 7 Prasad Gajanan Joshi. 2012-08-21 09:16:01 UTC
However this BUG is still not fixed nor resolved. It won't be good idea to mark it OBSOLETE. This is a critical BUG and should be fixed rather than ignored.
Comment 8 Alan 2012-08-21 11:45:50 UTC
It's not a 'critical bug' - its an unsupported functionality that nobody in tree currently seems to need.
Comment 9 Prasad Gajanan Joshi. 2012-08-22 05:39:18 UTC
I don't understand why you are saying the functionality is not being used in the  Linux kernel.

Here is the list of the __vmalloc(GFP_NOFS) calls in the 3.5.0 Linux kernel. (I have not looked into the latest Linux kernel though)

prasad@pjoshi:~/Linux/linux-2.6$ grep GFP_NOFS ~/vmalloc_list 
drivers/mtd/ubi/io.c:	buf1 = __vmalloc(len, GFP_NOFS, PAGE_KERNEL);
drivers/mtd/ubi/io.c:	buf = __vmalloc(len, GFP_NOFS, PAGE_KERNEL);
fs/gfs2/dir.c:		ptr = __vmalloc(size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/lprops.c:	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/orphan.c:	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/debug.c:	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/lpt_commit.c:	buf = p = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/lpt_commit.c:	buf = p = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);

I still think it is a 'critical BUG' and should not be marked OBSOLETE.
Comment 10 Prasad Gajanan Joshi. 2012-08-27 05:55:04 UTC
Hello Alan,

Do you still think this should be marked 'RESOLVED OBSOLETE'

Thanks and Warm Regards,
Prasad