Bug 3
Summary: | Enabling shared pagetables causes KDE to wierd out | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Martin J. Bligh (mbligh) |
Component: | Other | Assignee: | Dave McCracken (dmccr) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | ||
Priority: | P2 | ||
Hardware: | IA-32 | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | --- | Bisected commit-id: |
Description
Martin J. Bligh
2002-11-13 18:55:38 UTC
This should be fixed in 2.5.49-mm1 A vma with vm_pgoff large enough to overflow a loff_t type when converted to a byte offset can be passed via the remap_file_pages system call. The hugetlbfs mmap routine uses the byte offset to calculate reservations and file size. A sequence such as: mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0); remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0); will result in the following when task exits/file closed, kernel BUG at mm/hugetlb.c:749! Call Trace: hugetlbfs_evict_inode+0x2f/0x40 evict+0xcb/0x190 __dentry_kill+0xcb/0x150 __fput+0x164/0x1e0 task_work_run+0x84/0xa0 exit_to_usermode_loop+0x7d/0x80 do_syscall_64+0x18b/0x190 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 The overflowed pgoff value causes hugetlbfs to try to set up a mapping with a negative range (end < start) that leaves invalid state which causes the BUG. The previous overflow fix to this code was incomplete and did not take the remap_file_pages system call into account. Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap") Cc: <stable@vger.kernel.org> Reported-by: Nic Losby <blurbdust@gmail.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- Changes in v3 * Use a simpler mask computation as suggested by Andrew Morton Changes in v2 * Use bitmask for overflow check as suggested by Yisheng Xie * Add explicit (from > to) check when setting up reservations * Cc stable fs/hugetlbfs/inode.c | 16 +++++++++++++--- mm/hugetlb.c | 6 ++++++ 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 8fe1b0aa2896..e46117dc006a 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -108,6 +108,15 @@ static void huge_pagevec_release(struct pagevec *pvec) pagevec_reinit(pvec); } +/* + * Mask used when checking the page offset value passed in via system + * calls. This value will be converted to a loff_t which is signed. + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the + * value. The extra bit (- 1 in the shift value) is to take the sign + * bit into account. + */ +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1)) + static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file_inode(file); @@ -127,12 +136,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) vma->vm_ops = &hugetlb_vm_ops; /* - * Offset passed to mmap (before page shift) could have been - * negative when represented as a (l)off_t. + * page based offset in vm_pgoff could be sufficiently large to + * overflow a (l)off_t when converted to byte offset. */ - if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0) + if (vma->vm_pgoff & PGOFF_LOFFT_MAX) return -EINVAL; + /* must be huge page aligned */ if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT)) return -EINVAL; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7c204e3d132b..8eeade0a0b7a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode, struct resv_map *resv_map; long gbl_reserve; + /* This should never happen */ + if (from > to) { + VM_WARN(1, "%s called with a negative range\n", __func__); + return -EINVAL; + } + /* * Only apply hugepage reservation if asked. At fault time, an * attempt will be made for VM_NORESERVE to allocate a page On Thu, 8 Mar 2018 16:27:26 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote: > A vma with vm_pgoff large enough to overflow a loff_t type when > converted to a byte offset can be passed via the remap_file_pages > system call. The hugetlbfs mmap routine uses the byte offset to > calculate reservations and file size. > > A sequence such as: > mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0); > remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0); > will result in the following when task exits/file closed, > kernel BUG at mm/hugetlb.c:749! > Call Trace: > hugetlbfs_evict_inode+0x2f/0x40 > evict+0xcb/0x190 > __dentry_kill+0xcb/0x150 > __fput+0x164/0x1e0 > task_work_run+0x84/0xa0 > exit_to_usermode_loop+0x7d/0x80 > do_syscall_64+0x18b/0x190 > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > The overflowed pgoff value causes hugetlbfs to try to set up a > mapping with a negative range (end < start) that leaves invalid > state which causes the BUG. > > The previous overflow fix to this code was incomplete and did not > take the remap_file_pages system call into account. > > ... > > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode, > struct resv_map *resv_map; > long gbl_reserve; > > + /* This should never happen */ > + if (from > to) { > + VM_WARN(1, "%s called with a negative range\n", __func__); > + return -EINVAL; > + } This is tidier, and that comment is a bit too obvious to live ;) --- a/mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow-v3-fix +++ a/mm/hugetlb.c @@ -18,6 +18,7 @@ #include <linux/bootmem.h> #include <linux/sysfs.h> #include <linux/slab.h> +#include <linux/mmdebug.h> #include <linux/sched/signal.h> #include <linux/rmap.h> #include <linux/string_helpers.h> @@ -4374,11 +4375,8 @@ int hugetlb_reserve_pages(struct inode * struct resv_map *resv_map; long gbl_reserve; - /* This should never happen */ - if (from > to) { - VM_WARN(1, "%s called with a negative range\n", __func__); + if (VM_WARN(from > to, "%s called with a negative range\n", __func__)) return -EINVAL; - } /* * Only apply hugepage reservation if asked. At fault time, an On Thu 08-03-18 16:27:26, Mike Kravetz wrote: > A vma with vm_pgoff large enough to overflow a loff_t type when > converted to a byte offset can be passed via the remap_file_pages > system call. The hugetlbfs mmap routine uses the byte offset to > calculate reservations and file size. > > A sequence such as: > mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0); > remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0); > will result in the following when task exits/file closed, > kernel BUG at mm/hugetlb.c:749! > Call Trace: > hugetlbfs_evict_inode+0x2f/0x40 > evict+0xcb/0x190 > __dentry_kill+0xcb/0x150 > __fput+0x164/0x1e0 > task_work_run+0x84/0xa0 > exit_to_usermode_loop+0x7d/0x80 > do_syscall_64+0x18b/0x190 > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > The overflowed pgoff value causes hugetlbfs to try to set up a > mapping with a negative range (end < start) that leaves invalid > state which causes the BUG. > > The previous overflow fix to this code was incomplete and did not > take the remap_file_pages system call into account. > > Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap") > Cc: <stable@vger.kernel.org> > Reported-by: Nic Losby <blurbdust@gmail.com> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> OK, looks good to me. Hairy but seems to be the easiest way around this. Acked-by: Michal Hocko <mhocko@suse.com> > --- > Changes in v3 > * Use a simpler mask computation as suggested by Andrew Morton > Changes in v2 > * Use bitmask for overflow check as suggested by Yisheng Xie > * Add explicit (from > to) check when setting up reservations > * Cc stable > > fs/hugetlbfs/inode.c | 16 +++++++++++++--- > mm/hugetlb.c | 6 ++++++ > 2 files changed, 19 insertions(+), 3 deletions(-) > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 8fe1b0aa2896..e46117dc006a 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -108,6 +108,15 @@ static void huge_pagevec_release(struct pagevec *pvec) > pagevec_reinit(pvec); > } > > +/* > + * Mask used when checking the page offset value passed in via system > + * calls. This value will be converted to a loff_t which is signed. > + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the > + * value. The extra bit (- 1 in the shift value) is to take the sign > + * bit into account. > + */ > +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - > 1)) > + > static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct > *vma) > { > struct inode *inode = file_inode(file); > @@ -127,12 +136,13 @@ static int hugetlbfs_file_mmap(struct file *file, > struct vm_area_struct *vma) > vma->vm_ops = &hugetlb_vm_ops; > > /* > - * Offset passed to mmap (before page shift) could have been > - * negative when represented as a (l)off_t. > + * page based offset in vm_pgoff could be sufficiently large to > + * overflow a (l)off_t when converted to byte offset. > */ > - if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0) > + if (vma->vm_pgoff & PGOFF_LOFFT_MAX) > return -EINVAL; > > + /* must be huge page aligned */ > if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT)) > return -EINVAL; > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 7c204e3d132b..8eeade0a0b7a 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode, > struct resv_map *resv_map; > long gbl_reserve; > > + /* This should never happen */ > + if (from > to) { > + VM_WARN(1, "%s called with a negative range\n", __func__); > + return -EINVAL; > + } > + > /* > * Only apply hugepage reservation if asked. At fault time, an > * attempt will be made for VM_NORESERVE to allocate a page > -- > 2.13.6 On 03/16/2018 03:17 AM, Michal Hocko wrote: > On Thu 08-03-18 16:27:26, Mike Kravetz wrote: > > OK, looks good to me. Hairy but seems to be the easiest way around this. > Acked-by: Michal Hocko <mhocko@suse.com> > <snip> >> +/* >> + * Mask used when checking the page offset value passed in via system >> + * calls. This value will be converted to a loff_t which is signed. >> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the >> + * value. The extra bit (- 1 in the shift value) is to take the sign >> + * bit into account. >> + */ >> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - >> 1)) Thanks Michal, However, kbuild found a problem with this definition on certain configs. Consider a config where, BITS_PER_LONG = 32 (32bit config) PAGE_SHIFT = 16 (64K pages) This results in the negative shift value. Not something I would not immediately think of, but a valid config. The definition has been changed to, #define PGOFF_LOFFT_MAX \ (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1))) as discussed here, http://lkml.kernel.org/r/432fb2a3-b729-9c3a-7d60-890b8f9b10dd@oracle.com On Fri 16-03-18 09:19:07, Mike Kravetz wrote: > On 03/16/2018 03:17 AM, Michal Hocko wrote: > > On Thu 08-03-18 16:27:26, Mike Kravetz wrote: > > > > OK, looks good to me. Hairy but seems to be the easiest way around this. > > Acked-by: Michal Hocko <mhocko@suse.com> > > > <snip> > >> +/* > >> + * Mask used when checking the page offset value passed in via system > >> + * calls. This value will be converted to a loff_t which is signed. > >> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the > >> + * value. The extra bit (- 1 in the shift value) is to take the sign > >> + * bit into account. > >> + */ > >> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - > 1)) > > Thanks Michal, > > However, kbuild found a problem with this definition on certain configs. > Consider a config where, > BITS_PER_LONG = 32 (32bit config) > PAGE_SHIFT = 16 (64K pages) > This results in the negative shift value. > Not something I would not immediately think of, but a valid config. Well, 64K pages on 32b doesn't sound even remotely sane to me but what ever. > The definition has been changed to, > #define PGOFF_LOFFT_MAX \ > (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + > 1))) > > as discussed here, > http://lkml.kernel.org/r/432fb2a3-b729-9c3a-7d60-890b8f9b10dd@oracle.com This looks more wild but seems correct as well. You can keep my acked-by Thanks! This is an automated email and please do not reply to this email. Dear Submitter, Thank you for submitting the patches to the linux bluetooth mailing list. While preparing the CI tests, the patches you submitted couldn't be applied to the current HEAD of the repository. ----- Output ----- error: patch failed: drivers/bluetooth/btintel.c:3249 error: drivers/bluetooth/btintel.c: patch does not apply hint: Use 'git am --show-current-patch' to see the failed patch Please resolve the issue and submit the patches again. --- Regards, Linux Bluetooth |