Bug 217238

Summary: Creating shared read-only map is denied after add write seal to a memfd
Product: Memory Management Reporter: yshuiv7
Component: OtherAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: hi
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 6.2.8 Subsystem:
Regression: No Bisected commit-id:

Description yshuiv7 2023-03-24 03:34:23 UTC
Test case:

    int main() {
      int fd = memfd_create("test", MFD_ALLOW_SEALING);
      write(fd, "test", 4);
      fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE);
    
      void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0);
    }

This fails with EPERM. This is in contradiction with what's described in the documentation of F_SEAL_WRITE.
Comment 1 Andrew Morton 2023-03-24 20:36:48 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=217238
> 
>             Bug ID: 217238
>            Summary: Creating shared read-only map is denied after add
>                     write seal to a memfd
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 6.2.8
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: yshuiv7@gmail.com
>         Regression: No
> 
> Test case:
> 
>     int main() {
>       int fd = memfd_create("test", MFD_ALLOW_SEALING);
>       write(fd, "test", 4);
>       fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE);
> 
>       void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0);
>     }
> 
> This fails with EPERM. This is in contradiction with what's described in the
> documentation of F_SEAL_WRITE.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are the assignee for the bug.
Comment 2 Lorenzo Stoakes 2023-03-25 14:51:11 UTC
On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=217238
> >
> >             Bug ID: 217238
> >            Summary: Creating shared read-only map is denied after add
> >                     write seal to a memfd
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 6.2.8
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: yshuiv7@gmail.com
> >         Regression: No
> >
> > Test case:
> >
> >     int main() {
> >       int fd = memfd_create("test", MFD_ALLOW_SEALING);
> >       write(fd, "test", 4);
> >       fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE);
> >
> >       void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0);
> >     }
> >
> > This fails with EPERM. This is in contradiction with what's described in
> the
> > documentation of F_SEAL_WRITE.
> >
> > --
> > You may reply to this email to add a comment.
> >
> > You are receiving this mail because:
> > You are the assignee for the bug.
>

This issue seems to be the result of the use of the memfd's shmem region's
page cache object (struct address_space)'s i_mmap_writable field to denote
whether it is write-sealed.

The kernel assumes that a VM_SHARED mapping might become writable at any
time via mprotect(), therefore treats VM_SHARED mappings as if they were
writable as far as i_mmap_writable is concerned (this field's primary use
is to determine whether, for architectures that require it, flushing must
occur if this is set to avoid aliasing, see filemap_read() for example).

In theory we could convert all such checks to VM_SHARED | VM_WRITE
(importantly including on fork) and then update mprotect() to check
mapping_map_writable() if a user tries to make unwritable memory
writable.

I suspect however there are reasons relating to locking that make it
unreasonable to try to do this, but I may be mistaken (others might have
some insight on this). I also see some complexity around this in the
security checks on marking shared writable mappings executable (e.g. in
mmap_violation_check()).

In any case, it doesn't really make much sense to have a write-sealed
shared mapping, since you're essentially saying 'nothing _at all_ can write
to this' so it may as well be private. The semantics are unfortunate here,
the memory will still be shared read-only by MAP_PRIVATE mappings.

A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel
>=5.1) which does permit shared read-only mappings as this is explicitly
checked for in seal_check_future_write() invoked from shmem_mmap().

Regardless, I think the conclusion is that this is not a bug, but rather
that the documentation needs to be updated.
Comment 3 Lorenzo Stoakes 2023-03-30 19:24:43 UTC
On Sat, Mar 25, 2023 at 02:51:05PM +0000, Lorenzo Stoakes wrote:
> On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote:
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=217238
> > >
> > >             Bug ID: 217238
> > >            Summary: Creating shared read-only map is denied after add
> > >                     write seal to a memfd
> > >            Product: Memory Management
> > >            Version: 2.5
> > >     Kernel Version: 6.2.8
> > >           Hardware: All
> > >                 OS: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Other
> > >           Assignee: akpm@linux-foundation.org
> > >           Reporter: yshuiv7@gmail.com
> > >         Regression: No
> > >
> > > Test case:
> > >
> > >     int main() {
> > >       int fd = memfd_create("test", MFD_ALLOW_SEALING);
> > >       write(fd, "test", 4);
> > >       fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE);
> > >
> > >       void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0);
> > >     }
> > >
> > > This fails with EPERM. This is in contradiction with what's described in
> the
> > > documentation of F_SEAL_WRITE.
> > >
> > > --
> > > You may reply to this email to add a comment.
> > >
> > > You are receiving this mail because:
> > > You are the assignee for the bug.
> >
>
> This issue seems to be the result of the use of the memfd's shmem region's
> page cache object (struct address_space)'s i_mmap_writable field to denote
> whether it is write-sealed.
>
> The kernel assumes that a VM_SHARED mapping might become writable at any
> time via mprotect(), therefore treats VM_SHARED mappings as if they were
> writable as far as i_mmap_writable is concerned (this field's primary use
> is to determine whether, for architectures that require it, flushing must
> occur if this is set to avoid aliasing, see filemap_read() for example).
>
> In theory we could convert all such checks to VM_SHARED | VM_WRITE
> (importantly including on fork) and then update mprotect() to check
> mapping_map_writable() if a user tries to make unwritable memory
> writable.
>
> I suspect however there are reasons relating to locking that make it
> unreasonable to try to do this, but I may be mistaken (others might have
> some insight on this). I also see some complexity around this in the
> security checks on marking shared writable mappings executable (e.g. in
> mmap_violation_check()).
>
> In any case, it doesn't really make much sense to have a write-sealed
> shared mapping, since you're essentially saying 'nothing _at all_ can write
> to this' so it may as well be private. The semantics are unfortunate here,
> the memory will still be shared read-only by MAP_PRIVATE mappings.
>
> A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel
> >=5.1) which does permit shared read-only mappings as this is explicitly
> checked for in seal_check_future_write() invoked from shmem_mmap().
>
> Regardless, I think the conclusion is that this is not a bug, but rather
> that the documentation needs to be updated.
>

Adding docs people to cc list (sorry didn't think to do this in first
reply).
Comment 4 luto 2023-03-30 20:48:07 UTC
> On Mar 30, 2023, at 12:25 PM, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> 
> On Sat, Mar 25, 2023 at 02:51:05PM +0000, Lorenzo Stoakes wrote:
>>> On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote:
>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>> bugzilla web interface).
>>> 
>>>> On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote:
>>> 
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217238
>>>> 
>>>>            Bug ID: 217238
>>>>           Summary: Creating shared read-only map is denied after add
>>>>                    write seal to a memfd
>>>>           Product: Memory Management
>>>>           Version: 2.5
>>>>    Kernel Version: 6.2.8
>>>>          Hardware: All
>>>>                OS: Linux
>>>>              Tree: Mainline
>>>>            Status: NEW
>>>>          Severity: normal
>>>>          Priority: P1
>>>>         Component: Other
>>>>          Assignee: akpm@linux-foundation.org
>>>>          Reporter: yshuiv7@gmail.com
>>>>        Regression: No
>>>> 
>>>> Test case:
>>>> 
>>>>    int main() {
>>>>      int fd = memfd_create("test", MFD_ALLOW_SEALING);
>>>>      write(fd, "test", 4);
>>>>      fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE);
>>>> 
>>>>      void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0);
>>>>    }
>>>> 
>>>> This fails with EPERM. This is in contradiction with what's described in
>>>> the
>>>> documentation of F_SEAL_WRITE.
>>>> 
>>>> --
>>>> You may reply to this email to add a comment.
>>>> 
>>>> You are receiving this mail because:
>>>> You are the assignee for the bug.
>>> 
>> 
>> This issue seems to be the result of the use of the memfd's shmem region's
>> page cache object (struct address_space)'s i_mmap_writable field to denote
>> whether it is write-sealed.
>> 
>> The kernel assumes that a VM_SHARED mapping might become writable at any
>> time via mprotect(), therefore treats VM_SHARED mappings as if they were
>> writable as far as i_mmap_writable is concerned (this field's primary use
>> is to determine whether, for architectures that require it, flushing must
>> occur if this is set to avoid aliasing, see filemap_read() for example).
>> 
>> In theory we could convert all such checks to VM_SHARED | VM_WRITE
>> (importantly including on fork) and then update mprotect() to check
>> mapping_map_writable() if a user tries to make unwritable memory
>> writable.
>> 

Unless I’m missing something, we have VM_MAYWRITE for almost exactly this purpose.  Can’t we just make a shared mapping with both of these bits clear?

>> I suspect however there are reasons relating to locking that make it
>> unreasonable to try to do this, but I may be mistaken (others might have
>> some insight on this). I also see some complexity around this in the
>> security checks on marking shared writable mappings executable (e.g. in
>> mmap_violation_check()).
>> 
>> In any case, it doesn't really make much sense to have a write-sealed
>> shared mapping, since you're essentially saying 'nothing _at all_ can write
>> to this' so it may as well be private. The semantics are unfortunate here,
>> the memory will still be shared read-only by MAP_PRIVATE mappings.
>> 
>> A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel
>>> =5.1) which does permit shared read-only mappings as this is explicitly
>> checked for in seal_check_future_write() invoked from shmem_mmap().
>> 
>> Regardless, I think the conclusion is that this is not a bug, but rather
>> that the documentation needs to be updated.
>> 
> 
> Adding docs people to cc list (sorry didn't think to do this in first
> reply).
Comment 5 Lorenzo Stoakes 2023-03-30 21:46:27 UTC
On Thu, Mar 30, 2023 at 01:47:48PM -0700, Andy Lutomirski wrote:
>
>
> > On Mar 30, 2023, at 12:25 PM, Lorenzo Stoakes <lstoakes@gmail.com> wrote:
> >
> > On Sat, Mar 25, 2023 at 02:51:05PM +0000, Lorenzo Stoakes wrote:
> >>> On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote:
> >>> (switched to email.  Please respond via emailed reply-to-all, not via the
> >>> bugzilla web interface).
> >>>
> >>>> On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote:
> >>>
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=217238
> >>>>
> >>>>            Bug ID: 217238
> >>>>           Summary: Creating shared read-only map is denied after add
> >>>>                    write seal to a memfd
> >>>>           Product: Memory Management
> >>>>           Version: 2.5
> >>>>    Kernel Version: 6.2.8
> >>>>          Hardware: All
> >>>>                OS: Linux
> >>>>              Tree: Mainline
> >>>>            Status: NEW
> >>>>          Severity: normal
> >>>>          Priority: P1
> >>>>         Component: Other
> >>>>          Assignee: akpm@linux-foundation.org
> >>>>          Reporter: yshuiv7@gmail.com
> >>>>        Regression: No
> >>>>
> >>>> Test case:
> >>>>
> >>>>    int main() {
> >>>>      int fd = memfd_create("test", MFD_ALLOW_SEALING);
> >>>>      write(fd, "test", 4);
> >>>>      fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE);
> >>>>
> >>>>      void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0);
> >>>>    }
> >>>>
> >>>> This fails with EPERM. This is in contradiction with what's described in
> the
> >>>> documentation of F_SEAL_WRITE.
> >>>>
> >>>> --
> >>>> You may reply to this email to add a comment.
> >>>>
> >>>> You are receiving this mail because:
> >>>> You are the assignee for the bug.
> >>>
> >>
> >> This issue seems to be the result of the use of the memfd's shmem region's
> >> page cache object (struct address_space)'s i_mmap_writable field to denote
> >> whether it is write-sealed.
> >>
> >> The kernel assumes that a VM_SHARED mapping might become writable at any
> >> time via mprotect(), therefore treats VM_SHARED mappings as if they were
> >> writable as far as i_mmap_writable is concerned (this field's primary use
> >> is to determine whether, for architectures that require it, flushing must
> >> occur if this is set to avoid aliasing, see filemap_read() for example).
> >>
> >> In theory we could convert all such checks to VM_SHARED | VM_WRITE
> >> (importantly including on fork) and then update mprotect() to check
> >> mapping_map_writable() if a user tries to make unwritable memory
> >> writable.
> >>
>
> Unless I’m missing something, we have VM_MAYWRITE for almost exactly this
> purpose.  Can’t we just make a shared mapping with both of these bits clear?
>

That's a good point, and there's definitely quite a few places where
VM_SHARED is simply taken to imply writable which is a little irksome,
however sprinkling some VM_MAYWRITE checks in these places would resolve
that.

Let me take a look into this and perhaps spin up a RFC to iron out the
 details if this is indeed viable.

> >> I suspect however there are reasons relating to locking that make it
> >> unreasonable to try to do this, but I may be mistaken (others might have
> >> some insight on this). I also see some complexity around this in the
> >> security checks on marking shared writable mappings executable (e.g. in
> >> mmap_violation_check()).
> >>
> >> In any case, it doesn't really make much sense to have a write-sealed
> >> shared mapping, since you're essentially saying 'nothing _at all_ can
> write
> >> to this' so it may as well be private. The semantics are unfortunate here,
> >> the memory will still be shared read-only by MAP_PRIVATE mappings.
> >>
> >> A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel
> >>> =5.1) which does permit shared read-only mappings as this is explicitly
> >> checked for in seal_check_future_write() invoked from shmem_mmap().
> >>
> >> Regardless, I think the conclusion is that this is not a bug, but rather
> >> that the documentation needs to be updated.
> >>
> >
> > Adding docs people to cc list (sorry didn't think to do this in first
> > reply).