Bug 217238
Summary: | Creating shared read-only map is denied after add write seal to a memfd | ||
---|---|---|---|
Product: | Memory Management | Reporter: | yshuiv7 |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | NEW --- | ||
Severity: | normal | CC: | hi |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 6.2.8 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
yshuiv7
2023-03-24 03:34:23 UTC
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217238 > > Bug ID: 217238 > Summary: Creating shared read-only map is denied after add > write seal to a memfd > Product: Memory Management > Version: 2.5 > Kernel Version: 6.2.8 > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: yshuiv7@gmail.com > Regression: No > > Test case: > > int main() { > int fd = memfd_create("test", MFD_ALLOW_SEALING); > write(fd, "test", 4); > fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); > > void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0); > } > > This fails with EPERM. This is in contradiction with what's described in the > documentation of F_SEAL_WRITE. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are the assignee for the bug. On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=217238 > > > > Bug ID: 217238 > > Summary: Creating shared read-only map is denied after add > > write seal to a memfd > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 6.2.8 > > Hardware: All > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > Assignee: akpm@linux-foundation.org > > Reporter: yshuiv7@gmail.com > > Regression: No > > > > Test case: > > > > int main() { > > int fd = memfd_create("test", MFD_ALLOW_SEALING); > > write(fd, "test", 4); > > fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); > > > > void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0); > > } > > > > This fails with EPERM. This is in contradiction with what's described in > the > > documentation of F_SEAL_WRITE. > > > > -- > > You may reply to this email to add a comment. > > > > You are receiving this mail because: > > You are the assignee for the bug. > This issue seems to be the result of the use of the memfd's shmem region's page cache object (struct address_space)'s i_mmap_writable field to denote whether it is write-sealed. The kernel assumes that a VM_SHARED mapping might become writable at any time via mprotect(), therefore treats VM_SHARED mappings as if they were writable as far as i_mmap_writable is concerned (this field's primary use is to determine whether, for architectures that require it, flushing must occur if this is set to avoid aliasing, see filemap_read() for example). In theory we could convert all such checks to VM_SHARED | VM_WRITE (importantly including on fork) and then update mprotect() to check mapping_map_writable() if a user tries to make unwritable memory writable. I suspect however there are reasons relating to locking that make it unreasonable to try to do this, but I may be mistaken (others might have some insight on this). I also see some complexity around this in the security checks on marking shared writable mappings executable (e.g. in mmap_violation_check()). In any case, it doesn't really make much sense to have a write-sealed shared mapping, since you're essentially saying 'nothing _at all_ can write to this' so it may as well be private. The semantics are unfortunate here, the memory will still be shared read-only by MAP_PRIVATE mappings. A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel >=5.1) which does permit shared read-only mappings as this is explicitly checked for in seal_check_future_write() invoked from shmem_mmap(). Regardless, I think the conclusion is that this is not a bug, but rather that the documentation needs to be updated. On Sat, Mar 25, 2023 at 02:51:05PM +0000, Lorenzo Stoakes wrote: > On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=217238 > > > > > > Bug ID: 217238 > > > Summary: Creating shared read-only map is denied after add > > > write seal to a memfd > > > Product: Memory Management > > > Version: 2.5 > > > Kernel Version: 6.2.8 > > > Hardware: All > > > OS: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: normal > > > Priority: P1 > > > Component: Other > > > Assignee: akpm@linux-foundation.org > > > Reporter: yshuiv7@gmail.com > > > Regression: No > > > > > > Test case: > > > > > > int main() { > > > int fd = memfd_create("test", MFD_ALLOW_SEALING); > > > write(fd, "test", 4); > > > fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); > > > > > > void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0); > > > } > > > > > > This fails with EPERM. This is in contradiction with what's described in > the > > > documentation of F_SEAL_WRITE. > > > > > > -- > > > You may reply to this email to add a comment. > > > > > > You are receiving this mail because: > > > You are the assignee for the bug. > > > > This issue seems to be the result of the use of the memfd's shmem region's > page cache object (struct address_space)'s i_mmap_writable field to denote > whether it is write-sealed. > > The kernel assumes that a VM_SHARED mapping might become writable at any > time via mprotect(), therefore treats VM_SHARED mappings as if they were > writable as far as i_mmap_writable is concerned (this field's primary use > is to determine whether, for architectures that require it, flushing must > occur if this is set to avoid aliasing, see filemap_read() for example). > > In theory we could convert all such checks to VM_SHARED | VM_WRITE > (importantly including on fork) and then update mprotect() to check > mapping_map_writable() if a user tries to make unwritable memory > writable. > > I suspect however there are reasons relating to locking that make it > unreasonable to try to do this, but I may be mistaken (others might have > some insight on this). I also see some complexity around this in the > security checks on marking shared writable mappings executable (e.g. in > mmap_violation_check()). > > In any case, it doesn't really make much sense to have a write-sealed > shared mapping, since you're essentially saying 'nothing _at all_ can write > to this' so it may as well be private. The semantics are unfortunate here, > the memory will still be shared read-only by MAP_PRIVATE mappings. > > A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel > >=5.1) which does permit shared read-only mappings as this is explicitly > checked for in seal_check_future_write() invoked from shmem_mmap(). > > Regardless, I think the conclusion is that this is not a bug, but rather > that the documentation needs to be updated. > Adding docs people to cc list (sorry didn't think to do this in first reply). > On Mar 30, 2023, at 12:25 PM, Lorenzo Stoakes <lstoakes@gmail.com> wrote: > > On Sat, Mar 25, 2023 at 02:51:05PM +0000, Lorenzo Stoakes wrote: >>> On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote: >>> (switched to email. Please respond via emailed reply-to-all, not via the >>> bugzilla web interface). >>> >>>> On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote: >>> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=217238 >>>> >>>> Bug ID: 217238 >>>> Summary: Creating shared read-only map is denied after add >>>> write seal to a memfd >>>> Product: Memory Management >>>> Version: 2.5 >>>> Kernel Version: 6.2.8 >>>> Hardware: All >>>> OS: Linux >>>> Tree: Mainline >>>> Status: NEW >>>> Severity: normal >>>> Priority: P1 >>>> Component: Other >>>> Assignee: akpm@linux-foundation.org >>>> Reporter: yshuiv7@gmail.com >>>> Regression: No >>>> >>>> Test case: >>>> >>>> int main() { >>>> int fd = memfd_create("test", MFD_ALLOW_SEALING); >>>> write(fd, "test", 4); >>>> fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); >>>> >>>> void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0); >>>> } >>>> >>>> This fails with EPERM. This is in contradiction with what's described in >>>> the >>>> documentation of F_SEAL_WRITE. >>>> >>>> -- >>>> You may reply to this email to add a comment. >>>> >>>> You are receiving this mail because: >>>> You are the assignee for the bug. >>> >> >> This issue seems to be the result of the use of the memfd's shmem region's >> page cache object (struct address_space)'s i_mmap_writable field to denote >> whether it is write-sealed. >> >> The kernel assumes that a VM_SHARED mapping might become writable at any >> time via mprotect(), therefore treats VM_SHARED mappings as if they were >> writable as far as i_mmap_writable is concerned (this field's primary use >> is to determine whether, for architectures that require it, flushing must >> occur if this is set to avoid aliasing, see filemap_read() for example). >> >> In theory we could convert all such checks to VM_SHARED | VM_WRITE >> (importantly including on fork) and then update mprotect() to check >> mapping_map_writable() if a user tries to make unwritable memory >> writable. >> Unless I’m missing something, we have VM_MAYWRITE for almost exactly this purpose. Can’t we just make a shared mapping with both of these bits clear? >> I suspect however there are reasons relating to locking that make it >> unreasonable to try to do this, but I may be mistaken (others might have >> some insight on this). I also see some complexity around this in the >> security checks on marking shared writable mappings executable (e.g. in >> mmap_violation_check()). >> >> In any case, it doesn't really make much sense to have a write-sealed >> shared mapping, since you're essentially saying 'nothing _at all_ can write >> to this' so it may as well be private. The semantics are unfortunate here, >> the memory will still be shared read-only by MAP_PRIVATE mappings. >> >> A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel >>> =5.1) which does permit shared read-only mappings as this is explicitly >> checked for in seal_check_future_write() invoked from shmem_mmap(). >> >> Regardless, I think the conclusion is that this is not a bug, but rather >> that the documentation needs to be updated. >> > > Adding docs people to cc list (sorry didn't think to do this in first > reply). On Thu, Mar 30, 2023 at 01:47:48PM -0700, Andy Lutomirski wrote: > > > > On Mar 30, 2023, at 12:25 PM, Lorenzo Stoakes <lstoakes@gmail.com> wrote: > > > > On Sat, Mar 25, 2023 at 02:51:05PM +0000, Lorenzo Stoakes wrote: > >>> On Fri, Mar 24, 2023 at 01:36:46PM -0700, Andrew Morton wrote: > >>> (switched to email. Please respond via emailed reply-to-all, not via the > >>> bugzilla web interface). > >>> > >>>> On Fri, 24 Mar 2023 03:34:23 +0000 bugzilla-daemon@kernel.org wrote: > >>> > >>>> https://bugzilla.kernel.org/show_bug.cgi?id=217238 > >>>> > >>>> Bug ID: 217238 > >>>> Summary: Creating shared read-only map is denied after add > >>>> write seal to a memfd > >>>> Product: Memory Management > >>>> Version: 2.5 > >>>> Kernel Version: 6.2.8 > >>>> Hardware: All > >>>> OS: Linux > >>>> Tree: Mainline > >>>> Status: NEW > >>>> Severity: normal > >>>> Priority: P1 > >>>> Component: Other > >>>> Assignee: akpm@linux-foundation.org > >>>> Reporter: yshuiv7@gmail.com > >>>> Regression: No > >>>> > >>>> Test case: > >>>> > >>>> int main() { > >>>> int fd = memfd_create("test", MFD_ALLOW_SEALING); > >>>> write(fd, "test", 4); > >>>> fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE); > >>>> > >>>> void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0); > >>>> } > >>>> > >>>> This fails with EPERM. This is in contradiction with what's described in > the > >>>> documentation of F_SEAL_WRITE. > >>>> > >>>> -- > >>>> You may reply to this email to add a comment. > >>>> > >>>> You are receiving this mail because: > >>>> You are the assignee for the bug. > >>> > >> > >> This issue seems to be the result of the use of the memfd's shmem region's > >> page cache object (struct address_space)'s i_mmap_writable field to denote > >> whether it is write-sealed. > >> > >> The kernel assumes that a VM_SHARED mapping might become writable at any > >> time via mprotect(), therefore treats VM_SHARED mappings as if they were > >> writable as far as i_mmap_writable is concerned (this field's primary use > >> is to determine whether, for architectures that require it, flushing must > >> occur if this is set to avoid aliasing, see filemap_read() for example). > >> > >> In theory we could convert all such checks to VM_SHARED | VM_WRITE > >> (importantly including on fork) and then update mprotect() to check > >> mapping_map_writable() if a user tries to make unwritable memory > >> writable. > >> > > Unless I’m missing something, we have VM_MAYWRITE for almost exactly this > purpose. Can’t we just make a shared mapping with both of these bits clear? > That's a good point, and there's definitely quite a few places where VM_SHARED is simply taken to imply writable which is a little irksome, however sprinkling some VM_MAYWRITE checks in these places would resolve that. Let me take a look into this and perhaps spin up a RFC to iron out the details if this is indeed viable. > >> I suspect however there are reasons relating to locking that make it > >> unreasonable to try to do this, but I may be mistaken (others might have > >> some insight on this). I also see some complexity around this in the > >> security checks on marking shared writable mappings executable (e.g. in > >> mmap_violation_check()). > >> > >> In any case, it doesn't really make much sense to have a write-sealed > >> shared mapping, since you're essentially saying 'nothing _at all_ can > write > >> to this' so it may as well be private. The semantics are unfortunate here, > >> the memory will still be shared read-only by MAP_PRIVATE mappings. > >> > >> A better choice here might be F_SEAL_FUTURE_WRITE (available from kernel > >>> =5.1) which does permit shared read-only mappings as this is explicitly > >> checked for in seal_check_future_write() invoked from shmem_mmap(). > >> > >> Regardless, I think the conclusion is that this is not a bug, but rather > >> that the documentation needs to be updated. > >> > > > > Adding docs people to cc list (sorry didn't think to do this in first > > reply). |