Bug 30702 - vmalloc(GFP_NOFS) can callback file system evict_inode, inducing deadlock.
vmalloc(GFP_NOFS) can callback file system evict_inode, inducing deadlock.
Status: RESOLVED OBSOLETE
Product: Memory Management
Classification: Unclassified
Component: Page Allocator
All Linux
: P1 normal
Assigned To: Andrew Morton
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-03-07 19:12 UTC by Prasad Gajanan Joshi.
Modified: 2012-08-27 05:55 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.38-rc7
Tree: Mainline
Regression: No


Attachments
The patch fixes the problem by propagating the allocation flag down the call hierarchy (54.16 KB, patch)
2011-03-14 19:11 UTC, Prasad Gajanan Joshi.
Details | Diff

Description Prasad Gajanan Joshi. 2011-03-07 19:12:22 UTC
I am working on a propitiatory file system development. The problem I am facing is with calling __vmalloc in a lock. Though I am working on changing the code that I have, I thought it would be good to atleast report the VMALLOC problem.

The code looks something like this

const struct file_operations lzfs_file_operations = {
    .write              = lzfs_vnop_write,
};

ssize_t
lzfs_vnop_write()
{
      mutex_lock(some global mutex);
      ptr = __vmalloc(size, GFP_NOFS | __GFP_HIGHMEM, PAGE_KERNEL);
      mutex_unlock(some global mutex);
}

static const struct super_operations lzfs_super_ops = {
    .evict_inode    = lzfs_evict_vnode,
};

static void
lzfs_evict_vnode(struct inode *inode)
{
      mutex_lock(some global mutex);

      some code for eviction;

      mutex_unlock(some global mutex);
}

As the __vmalloc is called with GFP_NOFS, I was expecting the evict_inode (or clear_inode) would not be called when page cache is purned. But I noticed following oops message during the testing.

[ 5058.193312]  [<ffffffffa092a534>] lzfs_clear_vnode+0x104/0x160 [lzfs]
[ 5058.193318]  [<ffffffff8116abc5>] clear_inode+0x75/0xf0
[ 5058.193323]  [<ffffffff8116ac80>] dispose_list+0x40/0x150
[ 5058.193328]  [<ffffffff8116af23>] prune_icache+0x193/0x2a0
[ 5058.193332]  [<ffffffff811665e3>] ? prune_dcache+0x183/0x1d0
[ 5058.193338]  [<ffffffff8116b081>] shrink_icache_memory+0x51/0x60
[ 5058.193345]  [<ffffffff8110e6d4>] shrink_slab+0x124/0x180
[ 5058.193349]  [<ffffffff8110ff0f>] do_try_to_free_pages+0x1cf/0x360
[ 5058.193354]  [<ffffffff8111024b>] try_to_free_pages+0x6b/0x70
[ 5058.193359]  [<ffffffff8110740a>] __alloc_pages_slowpath+0x27a/0x590
[ 5058.193365]  [<ffffffff81107884>] __alloc_pages_nodemask+0x164/0x1d0
[ 5058.193370]  [<ffffffff811397ba>] alloc_pages_current+0x9a/0x100
[ 5058.193375]  [<ffffffff811066ce>] __get_free_pages+0xe/0x50
[ 5058.193380]  [<ffffffff81042435>] pte_alloc_one_kernel+0x15/0x20
[ 5058.193385]  [<ffffffff8111c86b>] __pte_alloc_kernel+0x1b/0xc0
[ 5058.193391]  [<ffffffff8112ad63>] vmap_pte_range+0x183/0x1a0
[ 5058.193395]  [<ffffffff8112aec6>] vmap_pud_range+0x146/0x1c0
[ 5058.193400]  [<ffffffff8112afda>] vmap_page_range_noflush+0x9a/0xc0
[ 5058.193405]  [<ffffffff8112b032>] map_vm_area+0x32/0x50
[ 5058.193410]  [<ffffffff8112c4a8>] __vmalloc_area_node+0x108/0x190
[ 5058.193426]  [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl]
[ 5058.193431]  [<ffffffff8112c392>] __vmalloc_node+0xa2/0xb0
[ 5058.193443]  [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl]
[ 5058.193453]  [<ffffffff8112c712>] __vmalloc+0x22/0x30
[ 5058.193464]  [<ffffffffa06591a0>] kv_alloc+0x90/0x130 [spl]
[ 5058.194007]  [<ffffffffa0858136>] zfs_grow_blocksize+0x46/0xe0 [zfs]
[ 5058.194063]  [<ffffffffa08547e8>] zfs_write+0xbb8/0x1100 [zfs]
[ 5058.194075]  [<ffffffff8114e740>] ? mem_cgroup_charge_common+0x70/0x90
[ 5058.194082]  [<ffffffffa092ced7>] lzfs_vnop_write+0xc7/0x3b0 [lzfs]
[ 5058.194087]  [<ffffffff8111bacc>] ? do_anonymous_page+0x11c/0x350
[ 5058.194096]  [<ffffffff81152ec8>] vfs_write+0xb8/0x1a0
[ 5058.194100]  [<ffffffff81153711>] sys_write+0x51/0x80
[ 5058.194105]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b

The problem is with __vmalloc (map_vm_area) which discards the allocation flag while mapping the scattered physical pages contiguously into the virtual vmalloc area. 

1482 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
1483                  pgprot_t prot, int node, void *caller)
1484 {
1525     if (map_vm_area(area, prot, &pages, gfp_mask))
1526         goto fail;
1527     return area->addr;
1532 }

The function map_vm_area() can result in calls to 
pud_alloc
pmd_alloc
pte_alloc_kernel

Which allocate memory using flag GFP_KERNEL
for example
pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
{
    pte_t *pte;

    pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
    return pte;
}

The page allocation might trigger the clear_inode (or evict_inode) if the system is running short of the memory. Thus causing the oops.

Though non of the file system in Linux Kernel seems to calling vmalloc in a lock, it would be good to fix the problem anyway.

As far as I can understand the solution is to pass the gfp_mask down the call hierarchy. I wanted to send the patch with these changes, but soon I realized changes are needed at various places and are too much. I thought to reporting the problem first.

Thanks and Regards.
Comment 1 Andrew Morton 2011-03-09 22:24:03 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Mon, 7 Mar 2011 19:12:23 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=30702
> 
>            Summary: vmalloc(GFP_NOFS) can callback file system
>                     evict_inode, inducing deadlock.

Yeah.

Ricardo has been working on this.  See the thread at
http://marc.info/?l=linux-mm&m=128942194520631&w=4

It's tough, and we've been bad, and progress is slow :(

>            Product: Memory Management
>            Version: 2.5
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: prasadjoshi124@gmail.com
>         Regression: No
> 
> 
> I am working on a propitiatory file system development. The problem I am facing
> is with calling __vmalloc in a lock. Though I am working on changing the code
> that I have, I thought it would be good to atleast report the VMALLOC problem.
> 
> The code looks something like this
> 
> const struct file_operations lzfs_file_operations = {
>     .write              = lzfs_vnop_write,
> };
> 
> ssize_t
> lzfs_vnop_write()
> {
>       mutex_lock(some global mutex);
>       ptr = __vmalloc(size, GFP_NOFS | __GFP_HIGHMEM, PAGE_KERNEL);
>       mutex_unlock(some global mutex);
> }
> 
> static const struct super_operations lzfs_super_ops = {
>     .evict_inode    = lzfs_evict_vnode,
> };
> 
> static void
> lzfs_evict_vnode(struct inode *inode)
> {
>       mutex_lock(some global mutex);
> 
>       some code for eviction;
> 
>       mutex_unlock(some global mutex);
> }
> 
> As the __vmalloc is called with GFP_NOFS, I was expecting the evict_inode (or
> clear_inode) would not be called when page cache is purned. But I noticed
> following oops message during the testing.
> 
> [ 5058.193312]  [<ffffffffa092a534>] lzfs_clear_vnode+0x104/0x160 [lzfs]
> [ 5058.193318]  [<ffffffff8116abc5>] clear_inode+0x75/0xf0
> [ 5058.193323]  [<ffffffff8116ac80>] dispose_list+0x40/0x150
> [ 5058.193328]  [<ffffffff8116af23>] prune_icache+0x193/0x2a0
> [ 5058.193332]  [<ffffffff811665e3>] ? prune_dcache+0x183/0x1d0
> [ 5058.193338]  [<ffffffff8116b081>] shrink_icache_memory+0x51/0x60
> [ 5058.193345]  [<ffffffff8110e6d4>] shrink_slab+0x124/0x180
> [ 5058.193349]  [<ffffffff8110ff0f>] do_try_to_free_pages+0x1cf/0x360
> [ 5058.193354]  [<ffffffff8111024b>] try_to_free_pages+0x6b/0x70
> [ 5058.193359]  [<ffffffff8110740a>] __alloc_pages_slowpath+0x27a/0x590
> [ 5058.193365]  [<ffffffff81107884>] __alloc_pages_nodemask+0x164/0x1d0
> [ 5058.193370]  [<ffffffff811397ba>] alloc_pages_current+0x9a/0x100
> [ 5058.193375]  [<ffffffff811066ce>] __get_free_pages+0xe/0x50
> [ 5058.193380]  [<ffffffff81042435>] pte_alloc_one_kernel+0x15/0x20
> [ 5058.193385]  [<ffffffff8111c86b>] __pte_alloc_kernel+0x1b/0xc0
> [ 5058.193391]  [<ffffffff8112ad63>] vmap_pte_range+0x183/0x1a0
> [ 5058.193395]  [<ffffffff8112aec6>] vmap_pud_range+0x146/0x1c0
> [ 5058.193400]  [<ffffffff8112afda>] vmap_page_range_noflush+0x9a/0xc0
> [ 5058.193405]  [<ffffffff8112b032>] map_vm_area+0x32/0x50
> [ 5058.193410]  [<ffffffff8112c4a8>] __vmalloc_area_node+0x108/0x190
> [ 5058.193426]  [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl]
> [ 5058.193431]  [<ffffffff8112c392>] __vmalloc_node+0xa2/0xb0
> [ 5058.193443]  [<ffffffffa06591a0>] ? kv_alloc+0x90/0x130 [spl]
> [ 5058.193453]  [<ffffffff8112c712>] __vmalloc+0x22/0x30
> [ 5058.193464]  [<ffffffffa06591a0>] kv_alloc+0x90/0x130 [spl]
> [ 5058.194007]  [<ffffffffa0858136>] zfs_grow_blocksize+0x46/0xe0 [zfs]
> [ 5058.194063]  [<ffffffffa08547e8>] zfs_write+0xbb8/0x1100 [zfs]
> [ 5058.194075]  [<ffffffff8114e740>] ? mem_cgroup_charge_common+0x70/0x90
> [ 5058.194082]  [<ffffffffa092ced7>] lzfs_vnop_write+0xc7/0x3b0 [lzfs]
> [ 5058.194087]  [<ffffffff8111bacc>] ? do_anonymous_page+0x11c/0x350
> [ 5058.194096]  [<ffffffff81152ec8>] vfs_write+0xb8/0x1a0
> [ 5058.194100]  [<ffffffff81153711>] sys_write+0x51/0x80
> [ 5058.194105]  [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
> 
> The problem is with __vmalloc (map_vm_area) which discards the allocation flag
> while mapping the scattered physical pages contiguously into the virtual
> vmalloc area. 
> 
> 1482 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> 1483                  pgprot_t prot, int node, void *caller)
> 1484 {
> 1525     if (map_vm_area(area, prot, &pages, gfp_mask))
> 1526         goto fail;
> 1527     return area->addr;
> 1532 }
> 
> The function map_vm_area() can result in calls to 
> pud_alloc
> pmd_alloc
> pte_alloc_kernel
> 
> Which allocate memory using flag GFP_KERNEL
> for example
> pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
> {
>     pte_t *pte;
> 
>     pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
>     return pte;
> }
> 
> The page allocation might trigger the clear_inode (or evict_inode) if the
> system is running short of the memory. Thus causing the oops.
> 
> Though non of the file system in Linux Kernel seems to calling vmalloc in a
> lock, it would be good to fix the problem anyway.
> 
> As far as I can understand the solution is to pass the gfp_mask down the call
> hierarchy. I wanted to send the patch with these changes, but soon I realized
> changes are needed at various places and are too much. I thought to reporting
> the problem first.
> 
> Thanks and Regards.
Comment 2 Prasad Gajanan Joshi. 2011-03-10 12:12:21 UTC
>
> Ricardo has been working on this.  See the thread at
> http://marc.info/?l=linux-mm&m=128942194520631&w=4
>
> It's tough, and we've been bad, and progress is slow :(
>
>>            Product: Memory Management

Thanks Andrew,

Hi Richardo,

I too worked on the problem last day, here is a patch which adds a new function
__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)

The function __pte_alloc_kernel() can use __pte_alloc_one_kernel()
along with the correct GFP flag.

int __pte_alloc_kernel(pmd_t *pmd, unsigned long address)
{
    pte_t *new = __pte_alloc_one_kernel(&init_mm, address, GFP_KERNEL);
}

I thought of going from bottom to up, passing GFP_KERNEL flag for
testing. If everything works fine then the GFP flag can be changed.

I am planning to run few tests on x86 machine to ensure it works.
BTW, if you are following some other approach, I can test the patch on
my machine.
I hope the browser will not trim the lines at the bottom. This is the patch

---
diff --git a/arch/alpha/include/asm/pgalloc.h b/arch/alpha/include/asm/pgalloc.h
index bc2a0da..a5685aa 100644
--- a/arch/alpha/include/asm/pgalloc.h
+++ b/arch/alpha/include/asm/pgalloc.h
@@ -51,10 +51,15 @@ pmd_free(struct mm_struct *mm, pmd_t *pmd)
 }

 static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addressi,
gfp_t gfp_mask)
+{
+	return (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
+}
+
+static inline pte_t *
 pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-	return pte;
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline void
diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index 9763be0..a4161c8 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -57,17 +57,24 @@ static inline void clean_pte_table(pte_t *pte)
  *  +------------+
  */
 static inline pte_t *
-pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr,
gfp_t gfp_mask)
 {
 	pte_t *pte;

-	pte = (pte_t *)__get_free_page(PGALLOC_GFP);
+	pte = (pte_t *)__get_free_page(gfp_mask | __GFP_NOTRACK |
+		__GFP_REPEAT | __GFP_ZERO);
 	if (pte)
 		clean_pte_table(pte);

 	return pte;
 }

+static inline pte_t *
+pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
+{
+	return __pte_alloc_one_kernel(mm, addr, GFP_KERNEL);
+}
+
 static inline pgtable_t
 pte_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
diff --git a/arch/avr32/include/asm/pgalloc.h b/arch/avr32/include/asm/pgalloc.h
index bc7e8ae..2eb4824 100644
--- a/arch/avr32/include/asm/pgalloc.h
+++ b/arch/avr32/include/asm/pgalloc.h
@@ -51,10 +51,16 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	quicklist_free(QUICK_PGD, NULL, pgd);
 }

+static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long address, gfp_t gfp_mask)
+{
+	return quicklist_alloc(QUICK_PT, gfp_mask | __GFP_REPEAT, NULL);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return quicklist_alloc(QUICK_PT, GFP_KERNEL | __GFP_REPEAT, NULL);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/cris/include/asm/pgalloc.h b/arch/cris/include/asm/pgalloc.h
index 6da975d..453a388 100644
--- a/arch/cris/include/asm/pgalloc.h
+++ b/arch/cris/include/asm/pgalloc.h
@@ -22,10 +22,16 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	free_page((unsigned long)pgd);
 }

+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+	gfp_t gfp_mask)
+{
+  	return (pte_t *) __get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
 {
-  	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
- 	return pte;
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned
long address)
diff --git a/arch/frv/mm/pgalloc.c b/arch/frv/mm/pgalloc.c
index c42c83d..c74ace1 100644
--- a/arch/frv/mm/pgalloc.c
+++ b/arch/frv/mm/pgalloc.c
@@ -20,14 +20,19 @@

 pgd_t swapper_pg_dir[PTRS_PER_PGD] __attribute__((aligned(PAGE_SIZE)));

-pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t *__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long
address, gfp_t gfp_mask)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+	pte_t *pte = (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT);
 	if (pte)
 		clear_page(pte);
 	return pte;
 }

+pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	struct page *page;
diff --git a/arch/ia64/include/asm/pgalloc.h b/arch/ia64/include/asm/pgalloc.h
index 96a8d92..be59452 100644
--- a/arch/ia64/include/asm/pgalloc.h
+++ b/arch/ia64/include/asm/pgalloc.h
@@ -95,10 +95,16 @@ static inline pgtable_t pte_alloc_one(struct
mm_struct *mm, unsigned long addr)
 	return page;
 }

+static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long addr, gfp_t gfp_mask)
+{
+	return quicklist_alloc(0, gfp_mask, NULL);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long addr)
 {
-	return quicklist_alloc(0, GFP_KERNEL, NULL);
+	return __pte_alloc_one_kernel(mm, addr, GFP_KERNEL);
 }

 static inline void pte_free(struct mm_struct *mm, pgtable_t pte)
diff --git a/arch/m32r/include/asm/pgalloc.h b/arch/m32r/include/asm/pgalloc.h
index 0fc7361..fd6650f 100644
--- a/arch/m32r/include/asm/pgalloc.h
+++ b/arch/m32r/include/asm/pgalloc.h
@@ -30,12 +30,16 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	free_page((unsigned long)pgd);
 }

+static __inline__ pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+	unsigned long address, gfp_t gfp_mask)
+{
+	return (pte_t *)__get_free_page(gfp_mask|__GFP_ZERO);
+}
+
 static __inline__ pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 	unsigned long address)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_ZERO);
-
-	return pte;
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static __inline__ pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/m68k/include/asm/motorola_pgalloc.h
b/arch/m68k/include/asm/motorola_pgalloc.h
index 2f02f26..c5190f8 100644
--- a/arch/m68k/include/asm/motorola_pgalloc.h
+++ b/arch/m68k/include/asm/motorola_pgalloc.h
@@ -7,11 +7,13 @@
 extern pmd_t *get_pointer_table(void);
 extern int free_pointer_table(pmd_t *);

-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+	gfp_t gfp_mask)
 {
 	pte_t *pte;

-	pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+	pte = (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
 	if (pte) {
 		__flush_page_to_ram(pte);
 		flush_tlb_kernel_page(pte);
@@ -21,6 +23,12 @@ static inline pte_t *pte_alloc_one_kernel(struct
mm_struct *mm, unsigned long ad
 	return pte;
 }

+static inline pte_t *
+pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
 {
 	cache_page(pte);
diff --git a/arch/m68k/include/asm/sun3_pgalloc.h
b/arch/m68k/include/asm/sun3_pgalloc.h
index 48d80d5..383a8bf 100644
--- a/arch/m68k/include/asm/sun3_pgalloc.h
+++ b/arch/m68k/include/asm/sun3_pgalloc.h
@@ -38,10 +38,11 @@ do {							\
 	tlb_remove_page((tlb), pte);			\
 } while (0)

-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
-					  unsigned long address)
+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
 {
-	unsigned long page = __get_free_page(GFP_KERNEL|__GFP_REPEAT);
+	unsigned long page = __get_free_page(gfp_mask|__GFP_REPEAT);

 	if (!page)
 		return NULL;
@@ -50,6 +51,12 @@ static inline pte_t *pte_alloc_one_kernel(struct
mm_struct *mm,
 	return (pte_t *) (page);
 }

+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
 					unsigned long address)
 {
diff --git a/arch/microblaze/mm/pgtable.c b/arch/microblaze/mm/pgtable.c
index 59bf233..7d89c4b 100644
--- a/arch/microblaze/mm/pgtable.c
+++ b/arch/microblaze/mm/pgtable.c
@@ -240,12 +240,12 @@ unsigned long iopa(unsigned long addr)
 	return pa;
 }

-__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
-		unsigned long address)
+__init_refok pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+		unsigned long address, gfp_t gfp_mask)
 {
 	pte_t *pte;
 	if (mem_init_done) {
-		pte = (pte_t *)__get_free_page(GFP_KERNEL |
+		pte = (pte_t *)__get_free_page(gfp_mask |
 					__GFP_REPEAT | __GFP_ZERO);
 	} else {
 		pte = (pte_t *)early_get_page();
@@ -254,3 +254,9 @@ __init_refok pte_t *pte_alloc_one_kernel(struct
mm_struct *mm,
 	}
 	return pte;
 }
+
+__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+		unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
diff --git a/arch/mips/include/asm/pgalloc.h b/arch/mips/include/asm/pgalloc.h
index 881d18b..3521903 100644
--- a/arch/mips/include/asm/pgalloc.h
+++ b/arch/mips/include/asm/pgalloc.h
@@ -64,14 +64,16 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	free_pages((unsigned long)pgd, PGD_ORDER);
 }

+static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+	unsigned long address, gfp_t gfp_mask)
+{
+	return (pte_t *) __get_free_pages(gfp_mask|__GFP_REPEAT|__GFP_ZERO,
PTE_ORDER);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 	unsigned long address)
 {
-	pte_t *pte;
-
-	pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO,
PTE_ORDER);
-
-	return pte;
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline struct page *pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/mn10300/mm/pgtable.c b/arch/mn10300/mm/pgtable.c
index 450f7ba..59fd04d 100644
--- a/arch/mn10300/mm/pgtable.c
+++ b/arch/mn10300/mm/pgtable.c
@@ -62,14 +62,20 @@ void set_pmd_pfn(unsigned long vaddr, unsigned
long pfn, pgprot_t flags)
 	local_flush_tlb_one(vaddr);
 }

-pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t *__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT);
+	pte_t *pte = (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT);
 	if (pte)
 		clear_page(pte);
 	return pte;
 }

+pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	struct page *pte;
diff --git a/arch/parisc/include/asm/pgalloc.h
b/arch/parisc/include/asm/pgalloc.h
index fc987a1..e3fbd89 100644
--- a/arch/parisc/include/asm/pgalloc.h
+++ b/arch/parisc/include/asm/pgalloc.h
@@ -127,10 +127,15 @@ pte_alloc_one(struct mm_struct *mm, unsigned long address)
 }

 static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr,
gfp_t gfp_mask)
+{
+	return (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
+}
+
+static inline pte_t *
 pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
 {
-	pte_t *pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-	return pte;
+	return __pte_alloc_one_kernel(mm, addr, GFP_KERNEL);
 }

 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
diff --git a/arch/powerpc/include/asm/pgalloc-64.h
b/arch/powerpc/include/asm/pgalloc-64.h
index 292725c..ce2ae2f 100644
--- a/arch/powerpc/include/asm/pgalloc-64.h
+++ b/arch/powerpc/include/asm/pgalloc-64.h
@@ -100,10 +100,17 @@ static inline void pmd_free(struct mm_struct
*mm, pmd_t *pmd)
 	kmem_cache_free(PGT_CACHE(PMD_INDEX_SIZE), pmd);
 }

+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
+{
+        return (pte_t *)__get_free_page(gfp_mask | __GFP_REPEAT | __GFP_ZERO);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-        return (pte_t *)__get_free_page(GFP_KERNEL | __GFP_REPEAT |
__GFP_ZERO);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 8dc41c0..8e3c0b4 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -95,14 +95,15 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 #endif
 }

-__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
+__init_refok pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
 {
 	pte_t *pte;
 	extern int mem_init_done;
 	extern void *early_get_page(void);

 	if (mem_init_done) {
-		pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
+		pte = (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
 	} else {
 		pte = (pte_t *)early_get_page();
 		if (pte)
@@ -111,6 +112,11 @@ __init_refok pte_t *pte_alloc_one_kernel(struct
mm_struct *mm, unsigned long add
 	return pte;
 }

+__init_refok pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
 {
 	struct page *ptepage;
diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 082eb4e..7c6fd31 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -172,7 +172,11 @@ static inline void pmd_populate(struct mm_struct *mm,
 /*
  * page table entry allocation/free routines.
  */
-#define pte_alloc_one_kernel(mm, vmaddr) ((pte_t *) page_table_alloc(mm))
+#define __pte_alloc_one_kernel(mm, vmaddr, mask) \
+	((pte_t *) __page_table_alloc((mm), (mask)))
+#define pte_alloc_one_kernel(mm, vmaddr) \
+	((pte_t *) __pte_alloc_one_kernel((mm), (vmaddr), GFP_KERNEL)
+
 #define pte_alloc_one(mm, vmaddr) ((pte_t *) page_table_alloc(mm))

 #define pte_free_kernel(mm, pte) page_table_free(mm, (unsigned long *) pte)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index e1850c2..44cf377 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -267,7 +267,7 @@ void crst_table_downgrade(struct mm_struct *mm,
unsigned long limit)
 /*
  * page table entry allocation/free routines.
  */
-unsigned long *page_table_alloc(struct mm_struct *mm)
+unsigned long *__page_table_alloc(struct mm_struct *mm, gfp_t gfp_mask)
 {
 	struct page *page;
 	unsigned long *table;
@@ -284,7 +284,7 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 	}
 	if (!page) {
 		spin_unlock_bh(&mm->context.list_lock);
-		page = alloc_page(GFP_KERNEL|__GFP_REPEAT);
+		page = alloc_page(gfp_mask|__GFP_REPEAT);
 		if (!page)
 			return NULL;
 		pgtable_page_ctor(page);
@@ -309,6 +309,12 @@ unsigned long *page_table_alloc(struct mm_struct *mm)
 	return table;
 }

+
+unsigned long *page_table_alloc(struct mm_struct *mm)
+{
+	return __page_table_alloc(mm, GFP_KERNEL);
+}
+
 static void __page_table_free(struct mm_struct *mm, unsigned long *table)
 {
 	struct page *page;
diff --git a/arch/score/include/asm/pgalloc.h b/arch/score/include/asm/pgalloc.h
index 059a61b..5c2a47b 100644
--- a/arch/score/include/asm/pgalloc.h
+++ b/arch/score/include/asm/pgalloc.h
@@ -37,15 +37,17 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)
 	free_pages((unsigned long)pgd, PGD_ORDER);
 }

-static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
-	unsigned long address)
+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
 {
-	pte_t *pte;
-
-	pte = (pte_t *) __get_free_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO,
+	return (pte_t *) __get_free_pages(gfp_mask|__GFP_REPEAT|__GFP_ZERO,
 					PTE_ORDER);
+}

-	return pte;
+static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
+	unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline struct page *pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/sh/include/asm/pgalloc.h b/arch/sh/include/asm/pgalloc.h
index 8c00785..1214abd 100644
--- a/arch/sh/include/asm/pgalloc.h
+++ b/arch/sh/include/asm/pgalloc.h
@@ -31,10 +31,16 @@ static inline void pmd_populate(struct mm_struct
*mm, pmd_t *pmd,
 /*
  * Allocate and free page tables.
  */
+static inline pte_t *__pte_alloc_one_kernel(struct mm_struct *mm,
+					  unsigned long address, gfp_t gfp_mask)
+{
+	return quicklist_alloc(QUICK_PT, gfp_mask | __GFP_REPEAT, NULL);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return quicklist_alloc(QUICK_PT, GFP_KERNEL | __GFP_REPEAT, NULL);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/sparc/include/asm/pgalloc_64.h
b/arch/sparc/include/asm/pgalloc_64.h
index 5bdfa2c..a238412 100644
--- a/arch/sparc/include/asm/pgalloc_64.h
+++ b/arch/sparc/include/asm/pgalloc_64.h
@@ -36,10 +36,17 @@ static inline void pmd_free(struct mm_struct *mm,
pmd_t *pmd)
 	quicklist_free(0, NULL, pmd);
 }

+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
+{
+	return quicklist_alloc(0, gfp_mask, NULL);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					  unsigned long address)
 {
-	return quicklist_alloc(0, GFP_KERNEL, NULL);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/tile/include/asm/pgalloc.h b/arch/tile/include/asm/pgalloc.h
index cf52791..a457042 100644
--- a/arch/tile/include/asm/pgalloc.h
+++ b/arch/tile/include/asm/pgalloc.h
@@ -74,9 +74,16 @@ extern void pte_free(struct mm_struct *mm, struct page *pte);
 #define pmd_pgtable(pmd) pmd_page(pmd)

 static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
+{
+	return pfn_to_kaddr(page_to_pfn(__pte_alloc_one(mm, address, gfp_mask)));
+}
+
+static inline pte_t *
 pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return pfn_to_kaddr(page_to_pfn(pte_alloc_one(mm, address)));
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index 1f5430c..7875a32 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -218,9 +218,10 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)

 #define L2_USER_PGTABLE_PAGES (1 << L2_USER_PGTABLE_ORDER)

-struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
+struct page *
+__pte_alloc_one(struct mm_struct *mm, unsigned long address, gfp_t gfp_mask)
 {
-	gfp_t flags = GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO|__GFP_COMP;
+	gfp_t flags = gfp_mask|__GFP_REPEAT|__GFP_ZERO|__GFP_COMP;
 	struct page *p;

 #ifdef CONFIG_HIGHPTE
@@ -235,6 +236,11 @@ struct page *pte_alloc_one(struct mm_struct *mm,
unsigned long address)
 	return p;
 }

+struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one(mm, address, GFP_KERNEL);
+}
+
 /*
  * Free page immediately (used in __pte_alloc if we raced with another
  * process).  We have to correct whatever pte_alloc_one() did before
diff --git a/arch/um/kernel/mem.c b/arch/um/kernel/mem.c
index 8137ccc..d7969b3 100644
--- a/arch/um/kernel/mem.c
+++ b/arch/um/kernel/mem.c
@@ -284,12 +284,15 @@ void pgd_free(struct mm_struct *mm, pgd_t *pgd)
 	free_page((unsigned long) pgd);
 }

-pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
 {
-	pte_t *pte;
+	return (pte_t *)__get_free_page(gfp_mask|__GFP_REPEAT|__GFP_ZERO);
+}

-	pte = (pte_t *)__get_free_page(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO);
-	return pte;
+pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 500242d..6b61bbd 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -15,9 +15,16 @@

 gfp_t __userpte_alloc_gfp = PGALLOC_GFP | PGALLOC_USER_GFP;

+pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
+{
+	return (pte_t *)__get_free_page(gfp_mask | __GFP_NOTRACK |
+				__GFP_REPEAT | __GFP_ZERO);
+}
+
 pte_t *pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
 {
-	return (pte_t *)__get_free_page(PGALLOC_GFP);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 pgtable_t pte_alloc_one(struct mm_struct *mm, unsigned long address)
diff --git a/arch/xtensa/include/asm/pgalloc.h
b/arch/xtensa/include/asm/pgalloc.h
index 40cf9bc..e24c720 100644
--- a/arch/xtensa/include/asm/pgalloc.h
+++ b/arch/xtensa/include/asm/pgalloc.h
@@ -42,10 +42,17 @@ static inline void pgd_free(struct mm_struct *mm,
pgd_t *pgd)

 extern struct kmem_cache *pgtable_cache;

+static inline pte_t *
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
+		gfp_t gfp_mask)
+{
+	return kmem_cache_alloc(pgtable_cache, gfp_mask|__GFP_REPEAT);
+}
+
 static inline pte_t *pte_alloc_one_kernel(struct mm_struct *mm,
 					 unsigned long address)
 {
-	return kmem_cache_alloc(pgtable_cache, GFP_KERNEL|__GFP_REPEAT);
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
 }

 static inline pgtable_t pte_alloc_one(struct mm_struct *mm,
diff --git a/arch/xtensa/mm/pgtable.c b/arch/xtensa/mm/pgtable.c
index 6979927..1c53abc 100644
--- a/arch/xtensa/mm/pgtable.c
+++ b/arch/xtensa/mm/pgtable.c
@@ -12,13 +12,14 @@

 #if (DCACHE_SIZE > PAGE_SIZE)

-pte_t* pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+pte_t*
+__pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address,
gfp_t gfp_mask)
 {
 	pte_t *pte = NULL, *p;
 	int color = ADDR_COLOR(address);
 	int i;

-	p = (pte_t*) __get_free_pages(GFP_KERNEL|__GFP_REPEAT, COLOR_ORDER);
+	p = (pte_t*) __get_free_pages(gfp_mask|__GFP_REPEAT, COLOR_ORDER);

 	if (likely(p)) {
 		split_page(virt_to_page(p), COLOR_ORDER);
@@ -35,6 +36,11 @@ pte_t* pte_alloc_one_kernel(struct mm_struct *mm,
unsigned long address)
 	return pte;
 }

+pte_t* pte_alloc_one_kernel(struct mm_struct *mm, unsigned long address)
+{
+	return __pte_alloc_one_kernel(mm, address, GFP_KERNEL);
+}
+
 #ifdef PROFILING

 int mask;
Comment 3 Anonymous Emailer 2011-03-10 12:23:33 UTC
Reply-To: ricardo.correia@oracle.com

Hi Andrew, Prasad,

On Wed, 2011-03-09 at 14:23 -0800, Andrew Morton wrote:
> Ricardo has been working on this.  See the thread at
> http://marc.info/?l=linux-mm&m=128942194520631&w=4

Sorry, but I am no longer working on this and unfortunately it's
unlikely that I will continue working on it in the future... :-(

Best regards,
Ricardo
Comment 4 Prasad Gajanan Joshi. 2011-03-14 19:11:27 UTC
Created attachment 50802 [details]
The patch fixes the problem by propagating the allocation flag down the call hierarchy
Comment 5 Prasad Gajanan Joshi. 2012-08-21 06:16:26 UTC
Not sure why this is marked as 'RESOLVED OBSOLETE'. Is it fixed in the mainline kernel?
Comment 6 Alan 2012-08-21 09:05:53 UTC
Because its well over a year old
Comment 7 Prasad Gajanan Joshi. 2012-08-21 09:16:01 UTC
However this BUG is still not fixed nor resolved. It won't be good idea to mark it OBSOLETE. This is a critical BUG and should be fixed rather than ignored.
Comment 8 Alan 2012-08-21 11:45:50 UTC
It's not a 'critical bug' - its an unsupported functionality that nobody in tree currently seems to need.
Comment 9 Prasad Gajanan Joshi. 2012-08-22 05:39:18 UTC
I don't understand why you are saying the functionality is not being used in the  Linux kernel.

Here is the list of the __vmalloc(GFP_NOFS) calls in the 3.5.0 Linux kernel. (I have not looked into the latest Linux kernel though)

prasad@pjoshi:~/Linux/linux-2.6$ grep GFP_NOFS ~/vmalloc_list 
drivers/mtd/ubi/io.c:	buf1 = __vmalloc(len, GFP_NOFS, PAGE_KERNEL);
drivers/mtd/ubi/io.c:	buf = __vmalloc(len, GFP_NOFS, PAGE_KERNEL);
fs/gfs2/dir.c:		ptr = __vmalloc(size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/lprops.c:	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/orphan.c:	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/debug.c:	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/lpt_commit.c:	buf = p = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
fs/ubifs/lpt_commit.c:	buf = p = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);

I still think it is a 'critical BUG' and should not be marked OBSOLETE.
Comment 10 Prasad Gajanan Joshi. 2012-08-27 05:55:04 UTC
Hello Alan,

Do you still think this should be marked 'RESOLVED OBSOLETE'

Thanks and Warm Regards,
Prasad

Note You need to log in before you can comment on or make changes to this bug.