Bug 199037 - Kernel bug at mm/hugetlb.c:741
Summary: Kernel bug at mm/hugetlb.c:741
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-06 21:11 UTC by Nic Losby
Modified: 2019-07-22 05:01 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.16.0-rc3
Tree: Mainline
Regression: No


Attachments
crash.c (435 bytes, text/plain)
2018-03-06 21:11 UTC, Nic Losby
Details
attachment-31844-0.html (1.29 KB, text/html)
2018-03-06 23:19 UTC, Nic Losby
Details
attachment-31931-0.html (2.49 KB, text/html)
2018-03-07 02:51 UTC, Nic Losby
Details

Description Nic Losby 2018-03-06 21:11:50 UTC
Created attachment 274595 [details]
crash.c

Hello,
I apologize as this is my first time reporting a bug. When I compile and run the attached file it crashes the latest kernel running in QEMU. Call trace here: https://pastebin.com/1mMQvH0E

Let me know if you have any questions.
Comment 1 Nic Losby 2018-03-06 21:19:24 UTC
Compiled with `gcc crash.c -o crash`
Comment 2 Andrew Morton 2018-03-06 21:31:39 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 06 Mar 2018 21:11:50 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=199037
> 
>             Bug ID: 199037
>            Summary: Kernel bug at mm/hugetlb.c:741
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.16.0-rc3
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: blurbdust@gmail.com
>         Regression: No
> 
> Created attachment 274595 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=274595&action=edit
> crash.c
> 
> Hello,
> I apologize as this is my first time reporting a bug. When I compile and run
> the attached file it crashes the latest kernel running in QEMU. Call trace
> here: https://pastebin.com/1mMQvH0E
> 
> Let me know if you have any questions.
> 

Thanks for the report.

That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().

Do you know if earlier kernel versions are affected?

It looks quite bisectable.  Does the crash happen every time the test
program is run?
Comment 3 Nic Losby 2018-03-06 22:16:27 UTC
Yes it happens every time I've ran it so far. 

I will get back to you on the earlier versions. I'll have to compile them. Is there a specific version you want me to target or start knocking off one each time as in 4.16.0-rc2?
Comment 4 Andrew Morton 2018-03-06 22:36:40 UTC
Please let's discuss this via emailed reply-to-all, not via the bugzilla interface.  So others get to see the discussion.

I'm mainly interested in knowing if 4.15 is affected.  If so, we have a denial-of-service attack in released kernels  and that's fairly serious.
Comment 5 Nic Losby 2018-03-06 23:19:05 UTC
Created attachment 274597 [details]
attachment-31844-0.html

Yes it does crash crash on 4.15-rc9.

Call trace: https://pastebin.com/kMJFBcKK
<https://www.google.com/url?q=https://pastebin.com/kMJFBcKK&sa=D&source=hangouts&ust=1520463891721000&usg=AFQjCNGWTGvPbueQTheVMzGcYbVCYltFog>

On Mar 6, 2018 16:36, <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=199037
>
> --- Comment #4 from Andrew Morton (akpm@linux-foundation.org) ---
> Please let's discuss this via emailed reply-to-all, not via the bugzilla
> interface.  So others get to see the discussion.
>
> I'm mainly interested in knowing if 4.15 is affected.  If so, we have a
> denial-of-service attack in released kernels  and that's fairly serious.
>
> --
> You are receiving this mail because:
> You reported the bug.
Comment 6 mike.kravetz 2018-03-06 23:45:51 UTC
On 03/06/2018 01:31 PM, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Tue, 06 Mar 2018 21:11:50 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=199037
>>
>>             Bug ID: 199037
>>            Summary: Kernel bug at mm/hugetlb.c:741
>>            Product: Memory Management
>>            Version: 2.5
>>     Kernel Version: 4.16.0-rc3
>>           Hardware: All
>>                 OS: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Page Allocator
>>           Assignee: akpm@linux-foundation.org
>>           Reporter: blurbdust@gmail.com
>>         Regression: No
>>
>> Created attachment 274595 [details]
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=274595&action=edit
>> crash.c
>>
>> Hello,
>> I apologize as this is my first time reporting a bug. When I compile and run
>> the attached file it crashes the latest kernel running in QEMU. Call trace
>> here: https://pastebin.com/1mMQvH0E
>>
>> Let me know if you have any questions.
>>
> 
> Thanks for the report.
> 
> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
> 
> Do you know if earlier kernel versions are affected?
> 
> It looks quite bisectable.  Does the crash happen every time the test
> program is run?

I'll take a look.  There was a previous bug in this area:
ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails
Comment 7 mike.kravetz 2018-03-07 02:41:58 UTC
On 03/06/2018 01:46 PM, Mike Kravetz wrote:
> On 03/06/2018 01:31 PM, Andrew Morton wrote:
>>
>> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
>>
>> Do you know if earlier kernel versions are affected?
>>
>> It looks quite bisectable.  Does the crash happen every time the test
>> program is run?
> 
> I'll take a look.  There was a previous bug in this area:
> ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails

This is similar to the issue addressed in 045c7a3f ("fix offset overflow
in hugetlbfs mmap").  The problem here is that the pgoff argument passed
to remap_file_pages() is 0x20000000000000.  In the process of converting
this to a page offset and putting it in vm_pgoff, and then converting back
to bytes to compute mapping length we end up with 0.  We ultimately end
up passing (from,to) page offsets into hugetlbfs where from is greater
than to. :( This confuses the heck out the the huge page reservation code
as the 'negative' range looks like an error and we never complete the
reservation process and leave the 'adds_in_progress'.

This issue has existed for a long time.  The VM_BUG_ON just happens to
catch the situation which was previously not reported or had some other
side effect.  Commit 045c7a3f tried to catch these overflow issues when
converting types, but obviously missed this one.  I can easily add a test
for this specific value/condition, but want to think about it a little
more and see if there is a better way to catch all of these.
Comment 8 Nic Losby 2018-03-07 02:51:01 UTC
Created attachment 274601 [details]
attachment-31931-0.html

Awesome. Let me know if you need anything else from me. I can keep testing
kernel versions if requested.

Getting a CVE is something that is high on my bucket list. Even though this
is only Denial of Service at best, what are the chances this would be
assigned a CVE?

On Tue, Mar 6, 2018 at 6:31 PM, Mike Kravetz <mike.kravetz@oracle.com>
wrote:

> On 03/06/2018 01:46 PM, Mike Kravetz wrote:
> > On 03/06/2018 01:31 PM, Andrew Morton wrote:
> >>
> >> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
> >>
> >> Do you know if earlier kernel versions are affected?
> >>
> >> It looks quite bisectable.  Does the crash happen every time the test
> >> program is run?
> >
> > I'll take a look.  There was a previous bug in this area:
> > ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails
>
> This is similar to the issue addressed in 045c7a3f ("fix offset overflow
> in hugetlbfs mmap").  The problem here is that the pgoff argument passed
> to remap_file_pages() is 0x20000000000000.  In the process of converting
> this to a page offset and putting it in vm_pgoff, and then converting back
> to bytes to compute mapping length we end up with 0.  We ultimately end
> up passing (from,to) page offsets into hugetlbfs where from is greater
> than to. :( This confuses the heck out the the huge page reservation code
> as the 'negative' range looks like an error and we never complete the
> reservation process and leave the 'adds_in_progress'.
>
> This issue has existed for a long time.  The VM_BUG_ON just happens to
> catch the situation which was previously not reported or had some other
> side effect.  Commit 045c7a3f tried to catch these overflow issues when
> converting types, but obviously missed this one.  I can easily add a test
> for this specific value/condition, but want to think about it a little
> more and see if there is a better way to catch all of these.
>
> --
> Mike Kravetz
>
Comment 9 mike.kravetz 2018-03-07 04:24:52 UTC
On 03/06/2018 04:31 PM, Mike Kravetz wrote:
> On 03/06/2018 01:46 PM, Mike Kravetz wrote:
>> On 03/06/2018 01:31 PM, Andrew Morton wrote:
>>>
>>> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
>>>
>>> Do you know if earlier kernel versions are affected?
>>>
>>> It looks quite bisectable.  Does the crash happen every time the test
>>> program is run?
>>
>> I'll take a look.  There was a previous bug in this area:
>> ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails
> 
> This is similar to the issue addressed in 045c7a3f ("fix offset overflow
> in hugetlbfs mmap").  The problem here is that the pgoff argument passed
> to remap_file_pages() is 0x20000000000000.  In the process of converting
> this to a page offset and putting it in vm_pgoff, and then converting back
> to bytes to compute mapping length we end up with 0.  We ultimately end
> up passing (from,to) page offsets into hugetlbfs where from is greater
> than to. :( This confuses the heck out the the huge page reservation code
> as the 'negative' range looks like an error and we never complete the
> reservation process and leave the 'adds_in_progress'.
> 
> This issue has existed for a long time.  The VM_BUG_ON just happens to
> catch the situation which was previously not reported or had some other
> side effect.  Commit 045c7a3f tried to catch these overflow issues when
> converting types, but obviously missed this one.  I can easily add a test
> for this specific value/condition, but want to think about it a little
> more and see if there is a better way to catch all of these.

Well, I instrumented hugetlbfs_file_mmap when called via the remap_file_pages
system call path.  Upon entry, vma->vm_pgoff is 0x20000000000000 which is
the same as the value of the argument pgoff passed to the system call.
vm_pgoff really should be a page offset (i.e. 0x20000000000000 >> PAGE_SHIFT).
So, there is also an issue earlier in the remap_file_pages system call
sequence.

For mmap(), there are architecture specific system call entry points that
do the 'offset >> PAGE_SHIFT' before passing on the value to arch independent
routines.  For remap_file_pages, it looks like sparc is the only arch which
has such a routine.  I know remap_file_pages is deprecated, but could it
really be broken that badly on all architectures but sparc?  Perhaps nobody
really uses it?

To fix, we could add arch specific entry points for all architectures.  But,
that seems like a bunch of effort for a system call that perhaps nobody is
using.  The other option is to remove the sparc entry point, and do the
'pgoff >> PAGE_SHIFT' in the arch independent code.

Thoughts?
Comment 10 mike.kravetz 2018-03-07 16:39:52 UTC
On 03/06/2018 08:19 PM, Mike Kravetz wrote:
> On 03/06/2018 04:31 PM, Mike Kravetz wrote:
>> On 03/06/2018 01:46 PM, Mike Kravetz wrote:
>>> On 03/06/2018 01:31 PM, Andrew Morton wrote:
>>>>
>>>> That's VM_BUG_ON(resv_map->adds_in_progress) in resv_map_release().
>>>>
>>>> Do you know if earlier kernel versions are affected?
>>>>
>>>> It looks quite bisectable.  Does the crash happen every time the test
>>>> program is run?
>>>
>>> I'll take a look.  There was a previous bug in this area:
>>> ff8c0c53: mm/hugetlb.c: don't call region_abort if region_chg fails
>>
>> This is similar to the issue addressed in 045c7a3f ("fix offset overflow
>> in hugetlbfs mmap").  The problem here is that the pgoff argument passed
>> to remap_file_pages() is 0x20000000000000.  In the process of converting
>> this to a page offset and putting it in vm_pgoff, and then converting back
>> to bytes to compute mapping length we end up with 0.  We ultimately end
>> up passing (from,to) page offsets into hugetlbfs where from is greater
>> than to. :( This confuses the heck out the the huge page reservation code
>> as the 'negative' range looks like an error and we never complete the
>> reservation process and leave the 'adds_in_progress'.
>>
>> This issue has existed for a long time.  The VM_BUG_ON just happens to
>> catch the situation which was previously not reported or had some other
>> side effect.  Commit 045c7a3f tried to catch these overflow issues when
>> converting types, but obviously missed this one.  I can easily add a test
>> for this specific value/condition, but want to think about it a little
>> more and see if there is a better way to catch all of these.
> 
> Well, I instrumented hugetlbfs_file_mmap when called via the remap_file_pages
> system call path.  Upon entry, vma->vm_pgoff is 0x20000000000000 which is
> the same as the value of the argument pgoff passed to the system call.
> vm_pgoff really should be a page offset (i.e. 0x20000000000000 >>
> PAGE_SHIFT).
> So, there is also an issue earlier in the remap_file_pages system call
> sequence.

My mistake.  The pgoff argument to remap_file_pages is a page offset in page
size units.  So, there should be no '>> PAGE_SHIFT' of the argument.

The hugetlbfs code wants to convert vm_pgoff to a byte offset by
'<< PAGE_SHIFT'.  This is what overflows and gets us into trouble.

My first thought is to simply check for this overflow in remap_file_pages.
Other code within the kernel converts vm_pgoff to a byte offset and I am
not sure they could handle/expect an overflow.

Note You need to log in before you can comment on or make changes to this bug.