Bug 3

Summary: Enabling shared pagetables causes KDE to wierd out
Product: Memory Management Reporter: Martin J. Bligh (mbligh)
Component: OtherAssignee: Dave McCracken (dmccr)
Status: CLOSED CODE_FIX    
Severity: normal    
Priority: P2    
Hardware: IA-32   
OS: Linux   
Kernel Version: Subsystem:
Regression: --- Bisected commit-id:

Description Martin J. Bligh 2002-11-13 18:55:38 UTC
Exact Kernel version: 2.5.46-mm1
Distribution: Redhat 7.2 / 7.3
Hardware Environment: P4 PC
Software Environment: KDE
Problem Description: Enabling shared pagetables causes KDE to wierd out

Steps to reproduce: Start KDE
Comment 1 Martin J. Bligh 2002-11-29 09:35:48 UTC
This should be fixed in 2.5.49-mm1
Comment 2 mike.kravetz 2018-03-09 00:27:40 UTC
A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages
system call.  The hugetlbfs mmap routine uses the byte offset to
calculate reservations and file size.

A sequence such as:
  mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
  remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
  kernel BUG at mm/hugetlb.c:749!
Call Trace:
  hugetlbfs_evict_inode+0x2f/0x40
  evict+0xcb/0x190
  __dentry_kill+0xcb/0x150
  __fput+0x164/0x1e0
  task_work_run+0x84/0xa0
  exit_to_usermode_loop+0x7d/0x80
  do_syscall_64+0x18b/0x190
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The overflowed pgoff value causes hugetlbfs to try to set up a
mapping with a negative range (end < start) that leaves invalid
state which causes the BUG.

The previous overflow fix to this code was incomplete and did not
take the remap_file_pages system call into account.

Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Cc: <stable@vger.kernel.org>
Reported-by: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
Changes in v3
  * Use a simpler mask computation as suggested by Andrew Morton
Changes in v2
  * Use bitmask for overflow check as suggested by Yisheng Xie
  * Add explicit (from > to) check when setting up reservations
  * Cc stable

 fs/hugetlbfs/inode.c | 16 +++++++++++++---
 mm/hugetlb.c         |  6 ++++++
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 8fe1b0aa2896..e46117dc006a 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -108,6 +108,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
 	pagevec_reinit(pvec);
 }
 
+/*
+ * Mask used when checking the page offset value passed in via system
+ * calls.  This value will be converted to a loff_t which is signed.
+ * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
+ * value.  The extra bit (- 1 in the shift value) is to take the sign
+ * bit into account.
+ */
+#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) - 1))
+
 static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file_inode(file);
@@ -127,12 +136,13 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
 	vma->vm_ops = &hugetlb_vm_ops;
 
 	/*
-	 * Offset passed to mmap (before page shift) could have been
-	 * negative when represented as a (l)off_t.
+	 * page based offset in vm_pgoff could be sufficiently large to
+	 * overflow a (l)off_t when converted to byte offset.
 	 */
-	if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+	if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
 		return -EINVAL;
 
+	/* must be huge page aligned */
 	if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
 		return -EINVAL;
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7c204e3d132b..8eeade0a0b7a 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
+	/* This should never happen */
+	if (from > to) {
+		VM_WARN(1, "%s called with a negative range\n", __func__);
+		return -EINVAL;
+	}
+
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
 	 * attempt will be made for VM_NORESERVE to allocate a page
Comment 3 Andrew Morton 2018-03-09 01:05:15 UTC
On Thu,  8 Mar 2018 16:27:26 -0800 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> The previous overflow fix to this code was incomplete and did not
> take the remap_file_pages system call into account.
> 
> ...
>
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
>       struct resv_map *resv_map;
>       long gbl_reserve;
>  
> +     /* This should never happen */
> +     if (from > to) {
> +             VM_WARN(1, "%s called with a negative range\n", __func__);
> +             return -EINVAL;
> +     }

This is tidier, and that comment is a bit too obvious to live ;)

--- a/mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow-v3-fix
+++ a/mm/hugetlb.c
@@ -18,6 +18,7 @@
 #include <linux/bootmem.h>
 #include <linux/sysfs.h>
 #include <linux/slab.h>
+#include <linux/mmdebug.h>
 #include <linux/sched/signal.h>
 #include <linux/rmap.h>
 #include <linux/string_helpers.h>
@@ -4374,11 +4375,8 @@ int hugetlb_reserve_pages(struct inode *
 	struct resv_map *resv_map;
 	long gbl_reserve;
 
-	/* This should never happen */
-	if (from > to) {
-		VM_WARN(1, "%s called with a negative range\n", __func__);
+	if (VM_WARN(from > to, "%s called with a negative range\n", __func__))
 		return -EINVAL;
-	}
 
 	/*
 	 * Only apply hugepage reservation if asked. At fault time, an
Comment 4 Michal Hocko 2018-03-16 10:33:47 UTC
On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> A vma with vm_pgoff large enough to overflow a loff_t type when
> converted to a byte offset can be passed via the remap_file_pages
> system call.  The hugetlbfs mmap routine uses the byte offset to
> calculate reservations and file size.
> 
> A sequence such as:
>   mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
>   remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
> will result in the following when task exits/file closed,
>   kernel BUG at mm/hugetlb.c:749!
> Call Trace:
>   hugetlbfs_evict_inode+0x2f/0x40
>   evict+0xcb/0x190
>   __dentry_kill+0xcb/0x150
>   __fput+0x164/0x1e0
>   task_work_run+0x84/0xa0
>   exit_to_usermode_loop+0x7d/0x80
>   do_syscall_64+0x18b/0x190
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> The overflowed pgoff value causes hugetlbfs to try to set up a
> mapping with a negative range (end < start) that leaves invalid
> state which causes the BUG.
> 
> The previous overflow fix to this code was incomplete and did not
> take the remap_file_pages system call into account.
> 
> Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
> Cc: <stable@vger.kernel.org>
> Reported-by: Nic Losby <blurbdust@gmail.com>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

OK, looks good to me. Hairy but seems to be the easiest way around this.
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> Changes in v3
>   * Use a simpler mask computation as suggested by Andrew Morton
> Changes in v2
>   * Use bitmask for overflow check as suggested by Yisheng Xie
>   * Add explicit (from > to) check when setting up reservations
>   * Cc stable
> 
>  fs/hugetlbfs/inode.c | 16 +++++++++++++---
>  mm/hugetlb.c         |  6 ++++++
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 8fe1b0aa2896..e46117dc006a 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -108,6 +108,15 @@ static void huge_pagevec_release(struct pagevec *pvec)
>       pagevec_reinit(pvec);
>  }
>  
> +/*
> + * Mask used when checking the page offset value passed in via system
> + * calls.  This value will be converted to a loff_t which is signed.
> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
> + * value.  The extra bit (- 1 in the shift value) is to take the sign
> + * bit into account.
> + */
> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) -
> 1))
> +
>  static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct
>  *vma)
>  {
>       struct inode *inode = file_inode(file);
> @@ -127,12 +136,13 @@ static int hugetlbfs_file_mmap(struct file *file,
> struct vm_area_struct *vma)
>       vma->vm_ops = &hugetlb_vm_ops;
>  
>       /*
> -      * Offset passed to mmap (before page shift) could have been
> -      * negative when represented as a (l)off_t.
> +      * page based offset in vm_pgoff could be sufficiently large to
> +      * overflow a (l)off_t when converted to byte offset.
>        */
> -     if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
> +     if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
>               return -EINVAL;
>  
> +     /* must be huge page aligned */
>       if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
>               return -EINVAL;
>  
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 7c204e3d132b..8eeade0a0b7a 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4374,6 +4374,12 @@ int hugetlb_reserve_pages(struct inode *inode,
>       struct resv_map *resv_map;
>       long gbl_reserve;
>  
> +     /* This should never happen */
> +     if (from > to) {
> +             VM_WARN(1, "%s called with a negative range\n", __func__);
> +             return -EINVAL;
> +     }
> +
>       /*
>        * Only apply hugepage reservation if asked. At fault time, an
>        * attempt will be made for VM_NORESERVE to allocate a page
> -- 
> 2.13.6
Comment 5 mike.kravetz 2018-03-16 16:19:13 UTC
On 03/16/2018 03:17 AM, Michal Hocko wrote:
> On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> 
> OK, looks good to me. Hairy but seems to be the easiest way around this.
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
<snip>
>> +/*
>> + * Mask used when checking the page offset value passed in via system
>> + * calls.  This value will be converted to a loff_t which is signed.
>> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
>> + * value.  The extra bit (- 1 in the shift value) is to take the sign
>> + * bit into account.
>> + */
>> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) -
>> 1))

Thanks Michal,

However, kbuild found a problem with this definition on certain configs.
Consider a config where,
BITS_PER_LONG = 32 (32bit config)
PAGE_SHIFT = 16 (64K pages)
This results in the negative shift value.
Not something I would not immediately think of, but a valid config.

The definition has been changed to,
#define PGOFF_LOFFT_MAX \
	(((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT + 1)))

as discussed here,
http://lkml.kernel.org/r/432fb2a3-b729-9c3a-7d60-890b8f9b10dd@oracle.com
Comment 6 Michal Hocko 2018-03-16 16:53:10 UTC
On Fri 16-03-18 09:19:07, Mike Kravetz wrote:
> On 03/16/2018 03:17 AM, Michal Hocko wrote:
> > On Thu 08-03-18 16:27:26, Mike Kravetz wrote:
> > 
> > OK, looks good to me. Hairy but seems to be the easiest way around this.
> > Acked-by: Michal Hocko <mhocko@suse.com>
> > 
> <snip>
> >> +/*
> >> + * Mask used when checking the page offset value passed in via system
> >> + * calls.  This value will be converted to a loff_t which is signed.
> >> + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
> >> + * value.  The extra bit (- 1 in the shift value) is to take the sign
> >> + * bit into account.
> >> + */
> >> +#define PGOFF_LOFFT_MAX (PAGE_MASK << (BITS_PER_LONG - (2 * PAGE_SHIFT) -
> 1))
> 
> Thanks Michal,
> 
> However, kbuild found a problem with this definition on certain configs.
> Consider a config where,
> BITS_PER_LONG = 32 (32bit config)
> PAGE_SHIFT = 16 (64K pages)
> This results in the negative shift value.
> Not something I would not immediately think of, but a valid config.

Well, 64K pages on 32b doesn't sound even remotely sane to me but what
ever.

> The definition has been changed to,
> #define PGOFF_LOFFT_MAX \
>       (((1UL << (PAGE_SHIFT + 1)) - 1) <<  (BITS_PER_LONG - (PAGE_SHIFT +
> 1)))
> 
> as discussed here,
> http://lkml.kernel.org/r/432fb2a3-b729-9c3a-7d60-890b8f9b10dd@oracle.com

This looks more wild but seems correct as well. You can keep my acked-by

Thanks!
Comment 7 bluez.test.bot 2025-03-24 13:46:00 UTC
This is an automated email and please do not reply to this email.

Dear Submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
While preparing the CI tests, the patches you submitted couldn't be applied to the current HEAD of the repository.

----- Output -----

error: patch failed: drivers/bluetooth/btintel.c:3249
error: drivers/bluetooth/btintel.c: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch

Please resolve the issue and submit the patches again.


---
Regards,
Linux Bluetooth