With a machine with 3 TB (more than 2 TB memory). If you use vmalloc to allocate > 2 TB memory, the array_size below will be overflowed. The array_size is an unsigned int and can only be used to allocate less than 2 TB memory. If you pass 2*1028*1028*1024*1024 = 2 * 2^40 in the argument of vmalloc. The array_size will become 2*2^31 = 2^32. The 2^32 cannot be store with a 32 bit integer. The fix is to change the type of array_size to unsigned long. vmalloc.c 1762 void *vmalloc(unsigned long size) 1763 { 1764 return __vmalloc_node_flags(size, NUMA_NO_NODE, 1765 GFP_KERNEL | __GFP_HIGHMEM); 1766 } # diff -Nurp mm/vmalloc.c mm/vmalloc.c.new --- mm/vmalloc.c 2020-08-11 18:40:55.000000000 -0400 +++ mm/vmalloc.c.new 2020-10-20 10:02:22.071509308 -0400 @@ -1603,11 +1603,12 @@ static void *__vmalloc_area_node(struct { const int order = 0; struct page **pages; - unsigned int nr_pages, array_size, i; + unsigned int nr_pages, i; + unsigned long array_size; gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; - array_size = (nr_pages * sizeof(struct page *)); + array_size = ((unsigned long)(nr_pages) * sizeof(struct page *)); area->nr_pages = nr_pages; /* Please note that the recursion is strictly bounded. */
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Tue, 03 Nov 2020 18:50:07 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=210023 > > Bug ID: 210023 > Summary: Crash when allocating > 2 TB memory > Product: Memory Management > Version: 2.5 > Kernel Version: 3.10.0-957.27.2.el7.x86_64 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: blocking > Priority: P1 > Component: Slab Allocator > Assignee: akpm@linux-foundation.org > Reporter: hsinhuiwu@gmail.com > Regression: No > > With a machine with 3 TB (more than 2 TB memory). If you use vmalloc to > allocate > 2 TB memory, the array_size below will be overflowed. How was this observed? Is there any know userspace operation which causes the kernel to try to vmalloc such a large hunk of memory? > The array_size is an unsigned int and can only be used to allocate less than > 2 > TB memory. If you pass 2*1028*1028*1024*1024 = 2 * 2^40 in the argument of > vmalloc. The array_size will become 2*2^31 = 2^32. The 2^32 cannot be store > with a 32 bit integer. > > The fix is to change the type of array_size to unsigned long. > > vmalloc.c > > 1762 void *vmalloc(unsigned long size) > 1763 { > 1764 return __vmalloc_node_flags(size, NUMA_NO_NODE, > 1765 GFP_KERNEL | __GFP_HIGHMEM); > 1766 } OK, thanks. Against current mainline your proposed change would look like this, yes? --- a/mm/vmalloc.c~a +++ a/mm/vmalloc.c @@ -2461,9 +2461,11 @@ static void *__vmalloc_area_node(struct { const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; - unsigned int array_size = nr_pages * sizeof(struct page *), i; + unsigned long array_size + unsigned int i; struct page **pages; + array_size = (unsigned long)nr_pages * sizeof(struct page *); gfp_mask |= __GFP_NOWARN; if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) gfp_mask |= __GFP_HIGHMEM; _
On Tue, Nov 03, 2020 at 04:27:40PM -0800, Andrew Morton wrote: > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; > - unsigned int array_size = nr_pages * sizeof(struct page *), i; > + unsigned long array_size > + unsigned int i; > struct page **pages; > > + array_size = (unsigned long)nr_pages * sizeof(struct page *); This is only pushing the problem out ~5 years. If somebody tries to allocate a 16TB area, 16TB / 4kB is 4GB and nr_pages overflows. That's only 3 doublings of RAM away. I think we need to change vm_struct's nr_pages from an unsigned int to an unsigned long. It won't take up any more room because it's sandwiched between a pointer and a phys_addr_t.
Created attachment 293461 [details] attachment-10548-0.html > With a machine with 3 TB (more than 2 TB memory). If you use vmalloc to > allocate > 2 TB memory, the array_size below will be overflowed. How was this observed? Is there any know userspace operation which causes the kernel to try to vmalloc such a large hunk of memory? [Frank] The Dell PowerEdge R740/R940 can have up to 3TB/6TB memory. installed. Our application requires reserve consecutive memory in the kernel space and protected from userspace programs. ---------------------------------------------------------------------------------------------------- OK, thanks. Against current mainline your proposed change would look like this, yes? [Frank] Yes. This will support up to less than 16 TB. If you want to support more than 16 TB, we need to expand nr_pages to unsigned long as Matthew pointed out. Will it be possible to add this to kernel 3.10.0-957.27.2.el7.x86_64? Thanks, Frank On Tue, Nov 3, 2020 at 7:27 PM Andrew Morton <akpm@linux-foundation.org> wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Tue, 03 Nov 2020 18:50:07 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=210023 > > > > Bug ID: 210023 > > Summary: Crash when allocating > 2 TB memory > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 3.10.0-957.27.2.el7.x86_64 > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: blocking > > Priority: P1 > > Component: Slab Allocator > > Assignee: akpm@linux-foundation.org > > Reporter: hsinhuiwu@gmail.com > > Regression: No > > > > With a machine with 3 TB (more than 2 TB memory). If you use vmalloc to > > allocate > 2 TB memory, the array_size below will be overflowed. > > How was this observed? > > Is there any know userspace operation which causes the kernel to try to > vmalloc such a large hunk of memory? > > > The array_size is an unsigned int and can only be used to allocate less > than 2 > > TB memory. If you pass 2*1028*1028*1024*1024 = 2 * 2^40 in the argument > of > > vmalloc. The array_size will become 2*2^31 = 2^32. The 2^32 cannot be > store > > with a 32 bit integer. > > > > The fix is to change the type of array_size to unsigned long. > > > > vmalloc.c > > > > 1762 void *vmalloc(unsigned long size) > > 1763 { > > 1764 return __vmalloc_node_flags(size, NUMA_NO_NODE, > > 1765 GFP_KERNEL | __GFP_HIGHMEM); > > 1766 } > > OK, thanks. Against current mainline your proposed change would look > like this, yes? > > --- a/mm/vmalloc.c~a > +++ a/mm/vmalloc.c > @@ -2461,9 +2461,11 @@ static void *__vmalloc_area_node(struct > { > const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | > __GFP_ZERO; > unsigned int nr_pages = get_vm_area_size(area) >> PAGE_SHIFT; > - unsigned int array_size = nr_pages * sizeof(struct page *), i; > + unsigned long array_size > + unsigned int i; > struct page **pages; > > + array_size = (unsigned long)nr_pages * sizeof(struct page *); > gfp_mask |= __GFP_NOWARN; > if (!(gfp_mask & (GFP_DMA | GFP_DMA32))) > gfp_mask |= __GFP_HIGHMEM; > _ > >
On Wed, 4 Nov 2020 08:55:11 -0500 Hsin-Hui Wu <hsinhuiwu@gmail.com> wrote: > > With a machine with 3 TB (more than 2 TB memory). If you use vmalloc to > > allocate > 2 TB memory, the array_size below will be overflowed. > > How was this observed? > > Is there any know userspace operation which causes the kernel to try to > vmalloc such a large hunk of memory? > > [Frank] The Dell PowerEdge R740/R940 can have up to 3TB/6TB memory. > installed. Our application requires reserve consecutive memory in the kernel > space and protected from userspace programs. Did this require custom kernel changes? If not, precisely which system calls were used to cause this allocation attempt? > > ---------------------------------------------------------------------------------------------------- > OK, thanks. Against current mainline your proposed change would look > like this, yes? > > [Frank] Yes. This will support up to less than 16 TB. If you want to support > more than 16 TB, we need to expand nr_pages to unsigned long as > Matthew pointed out. > > Will it be possible to add this to kernel 3.10.0-957.27.2.el7.x86_64? That is up to Red Hat to decide.
Created attachment 293489 [details] attachment-10105-0.html Did this require custom kernel changes? If not, precisely which system calls were used to cause this allocation attempt? [Frank] No. vmalloc(). That is up to Red Hat to decide. {Frank] Understood. On Wed, Nov 4, 2020 at 1:19 PM Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 4 Nov 2020 08:55:11 -0500 Hsin-Hui Wu <hsinhuiwu@gmail.com> wrote: > > > > With a machine with 3 TB (more than 2 TB memory). If you use vmalloc to > > > allocate > 2 TB memory, the array_size below will be overflowed. > > > > How was this observed? > > > > Is there any know userspace operation which causes the kernel to try to > > vmalloc such a large hunk of memory? > > > > [Frank] The Dell PowerEdge R740/R940 can have up to 3TB/6TB memory. > > installed. Our application requires reserve consecutive memory in the > kernel > > space and protected from userspace programs. > > Did this require custom kernel changes? If not, precisely which system > calls were used to cause this allocation attempt? > > > > > ---------------------------------------------------------------------------------------------------- > > OK, thanks. Against current mainline your proposed change would look > > like this, yes? > > > > [Frank] Yes. This will support up to less than 16 TB. If you want to > support > > more than 16 TB, we need to expand nr_pages to unsigned long as > > Matthew pointed out. > > > > Will it be possible to add this to kernel 3.10.0-957.27.2.el7.x86_64? > > That is up to Red Hat to decide. > >
On Wed, 4 Nov 2020 15:53:36 -0500 Hsin-Hui Wu <hsinhuiwu@gmail.com> wrote: > Did this require custom kernel changes? If not, precisely which system > calls were used to cause this allocation attempt? > > [Frank] No. vmalloc(). > vmalloc() is not a system call. Please fully describe how vmalloc() came to be called with such a large request size. Fully. Are you able to provide example userspace code?
Created attachment 293503 [details] attachment-3513-0.html We don't use userspace program to allocate memory. We use kernel module to do so. Here is the sample kernel program you can use to test on a machine that has more than 2 TB memory. Use "insmod memtest.ko mb=2100000" to allocate more than 2 TB memory. Use "rmmod memtest" to free the memory. memtest.c ============================================================= #include <linux/version.h> #include <linux/init.h> #include <linux/module.h> #include <linux/fs.h> #include <linux/cdev.h> #include <linux/mm.h> #include <linux/vmalloc.h> #ifdef MODVERSIONS #include <linux/modversions.h> #endif #include <asm/io.h> static char *mem_ptr; MODULE_LICENSE("GPL"); int mb = 0; MODULE_PARM_DESC(mb, "alloc MB memory"); module_param(mb, int, S_IWUSR | S_IRUGO); static int __init memtest_init(void) { mem_ptr = NULL; if (mb > 0) { unsigned long alloc_len; alloc_len = mb * 1024 * 1024; mem_ptr = vmalloc(alloc_len); if (mem_ptr) { printk("memtest: vmalloc for size %u MB.\n", mb); } else { printk("memtest: failed to vmalloc for bytes %u MB.\n", mb); } } else { printk("memtest len_MB\n"); } return(0); } static void __exit memtest_exit(void) { if (mem_ptr) { vfree(mem_ptr); printk("memtest: vfree for size %u MB.\n", mb); mem_ptr = NULL; } } module_init(memtest_init); module_exit(memtest_exit); On Wed, Nov 4, 2020 at 7:38 PM Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 4 Nov 2020 15:53:36 -0500 Hsin-Hui Wu <hsinhuiwu@gmail.com> wrote: > > > Did this require custom kernel changes? If not, precisely which system > > calls were used to cause this allocation attempt? > > > > [Frank] No. vmalloc(). > > > > vmalloc() is not a system call. > > Please fully describe how vmalloc() came to be called with such a large > request size. Fully. Are you able to provide example userspace code? > >
To correct the overflow problem in my test program. Here is the update. I should use unsigned long long for alloc_len to avoid overflow. l static int __init memtest_init(void) { mem_ptr = NULL; // init memory pointer if (mb > 0) { unsigned long long alloc_len; alloc_len = (unsigned long long)mb * 1024 * 1024; mem_ptr = vmalloc(alloc_len); if (mem_ptr) { printk("memtest: vmalloc for size %u MB.\n", mb); } else { printk("memtest: failed to vmalloc for bytes %u MB.\n", mb); } } else { printk("memtest len_MB\n"); } return(0); }
Andrew, Just a quick note: I think we can close this one now, as it is addressed by: commit 34fe653716b0d340bc26dd4823d2dbe00c57f849 Author: Andrew Morton <akpm@linux-foundation.org> Date: Mon Dec 14 19:08:43 2020 -0800 mm/vmalloc.c:__vmalloc_area_node(): avoid 32-bit overflow