I noticed the following strange behaviour with hugetlbfs mounts for kernel versions something north of 4.19, but certainly since 5.4: when the mode= option of the mount has the sticky bit set, the mode of the mount point gets scrambled. To test the following command line can be used: mkdir -p /tmp/tlbtest && mount -t hugetlbfs -o mode=.... none /tmp/tlbtest && stat -c 'mode: %04a' /tmp/tlbtest For kernel versions <= 4.19 or if the first byte of mode is 0, stat outputs what was set using "-o mode". But on newer kernel versions if the first byte of mode is 1 the output of stat is as follows (input -> output): 1700 -> 1244 1750 -> 1326 1770 -> 1352 1775 -> 1357 1777 -> 1361 The behaviour is reproducible across different kernel versions and architectures (5.4.47-amd64, 5.9.8-amd64, 5.10.40-ppc64el, 5.12.15-amd64).
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 19 Jul 2021 15:34:00 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=213785 > > Bug ID: 213785 > Summary: hugetlbfs scrambles mode when sticky bit is set > Product: Memory Management > Version: 2.5 > Kernel Version: >4.19 (certainly >= 5.4) > Hardware: All > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: akpm@linux-foundation.org > Reporter: bugs+kernel.org@dtnr.ch > Regression: Yes > > I noticed the following strange behaviour with hugetlbfs mounts for kernel > versions something north of 4.19, but certainly since 5.4: > > when the mode= option of the mount has the sticky bit set, the mode of the > mount point gets scrambled. > > To test the following command line can be used: > mkdir -p /tmp/tlbtest && mount -t hugetlbfs -o mode=.... none /tmp/tlbtest && > stat -c 'mode: %04a' /tmp/tlbtest > > For kernel versions <= 4.19 or if the first byte of mode is 0, stat outputs > what was set using "-o mode". > > But on newer kernel versions if the first byte of mode is 1 the output of > stat > is as follows (input -> output): > 1700 -> 1244 > 1750 -> 1326 > 1770 -> 1352 > 1775 -> 1357 > 1777 -> 1361 > > The behaviour is reproducible across different kernel versions and > architectures (5.4.47-amd64, 5.9.8-amd64, 5.10.40-ppc64el, 5.12.15-amd64). > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are the assignee for the bug.
Add David, Al On 7/20/21 3:07 PM, Andrew Morton wrote: >> https://bugzilla.kernel.org/show_bug.cgi?id=213785 >> >> Summary: hugetlbfs scrambles mode when sticky bit is set >> Product: Memory Management >> Version: 2.5 >> Kernel Version: >4.19 (certainly >= 5.4) >> Regression: Yes >> >> I noticed the following strange behaviour with hugetlbfs mounts for kernel >> versions something north of 4.19, but certainly since 5.4: >> >> when the mode= option of the mount has the sticky bit set, the mode of the >> mount point gets scrambled. >> >> To test the following command line can be used: >> mkdir -p /tmp/tlbtest && mount -t hugetlbfs -o mode=.... none /tmp/tlbtest >> && >> stat -c 'mode: %04a' /tmp/tlbtest >> >> For kernel versions <= 4.19 or if the first byte of mode is 0, stat outputs >> what was set using "-o mode". >> >> But on newer kernel versions if the first byte of mode is 1 the output of >> stat >> is as follows (input -> output): >> 1700 -> 1244 >> 1750 -> 1326 >> 1770 -> 1352 >> 1775 -> 1357 >> 1777 -> 1361 >> >> The behaviour is reproducible across different kernel versions and >> architectures (5.4.47-amd64, 5.9.8-amd64, 5.10.40-ppc64el, 5.12.15-amd64). I took a quick look and believe a change in behavior was caused by commit 32021982a324 "Convert the hugetlbfs to use the fs_context during mount". Prior to the commit, code processing the mode option used match_octal() to convert the command line string to a numeric value. Since match_octal expects a string representing an octal value, it does not require a leading '0'. As a result, prior to this commit the argument 'mode=1700' would result in a mode value of 01700. After the commit one must precede octal values with 0. So, mode=1700 would result in a mode value of 03244 (& 01777U) = 1244. If my analysis is correct, I am not sure how to proceed. IMO, the current behavior is 'more correct'. However, until v5.1 a preceeding 0 was not required when specifying mode for hugetlbfs. So, this was certainly a change in behavior. Suggestions?
Dear Mike, On Tue, 20 Jul 2021 21:38 -0700 Mike Kravetz wrote: > I took a quick look and believe a change in behavior was caused by > commit 32021982a324 "Convert the hugetlbfs to use the fs_context > during mount". Thanks for taking a look at this issue and pointing to the commit that introduced this change. It looks credible. > Prior to the commit, code processing the mode option used > match_octal() to convert the command line string to a numeric value. > Since match_octal expects a string representing an octal value, it > does > not require a leading '0'. As a result, prior to this commit the > argument 'mode=1700' would result in a mode value of 01700. After > the > commit one must precede octal values with 0. So, mode=1700 would > result > in a mode value of 03244 (& 01777U) = 1244. I've been racking my brain trying to figure out what was going on, but the solution is so simple… > If my analysis is correct, I am not sure how to proceed. IMO, the > current behavior is 'more correct'. However, until v5.1 a preceeding > 0 > was not required when specifying mode for hugetlbfs. So, this was > certainly a change in behavior. Suggestions? Generally, I would agree that interpreting integers as decimals is the correct way, but in the case of file modes I cannot remember having used or seen modes in anything but octal over all the years I've been working with Linux. As I understand Documentation/admin-guide/mm/hugetlbpage.rst the mode option should be interpreted as octal: > The mode option sets [the mode] to value & 01777. This value is given in octal. By default the value 0755 is picked. Furthermore, I had a look at the POSIX documentation for chmod(1) (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/chmod.html) which also interprets all integers as octals. So my suggestion would be to convert the mode option to a fsparam_u32oct and if possible backport this change to the supported >=5.1 kernel versions. Regards, Dennis
On Tue, Jul 20, 2021 at 09:38:48PM -0700, Mike Kravetz wrote: > Add David, Al > > On 7/20/21 3:07 PM, Andrew Morton wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=213785 > >> > >> Summary: hugetlbfs scrambles mode when sticky bit is set > >> Product: Memory Management > >> Version: 2.5 > >> Kernel Version: >4.19 (certainly >= 5.4) > >> Regression: Yes > >> > >> I noticed the following strange behaviour with hugetlbfs mounts for kernel > >> versions something north of 4.19, but certainly since 5.4: > >> > >> when the mode= option of the mount has the sticky bit set, the mode of the > >> mount point gets scrambled. > >> > >> To test the following command line can be used: > >> mkdir -p /tmp/tlbtest && mount -t hugetlbfs -o mode=.... none /tmp/tlbtest > && > >> stat -c 'mode: %04a' /tmp/tlbtest > >> > >> For kernel versions <= 4.19 or if the first byte of mode is 0, stat > outputs > >> what was set using "-o mode". > >> > >> But on newer kernel versions if the first byte of mode is 1 the output of > stat > >> is as follows (input -> output): > >> 1700 -> 1244 > >> 1750 -> 1326 > >> 1770 -> 1352 > >> 1775 -> 1357 > >> 1777 -> 1361 > >> > >> The behaviour is reproducible across different kernel versions and > >> architectures (5.4.47-amd64, 5.9.8-amd64, 5.10.40-ppc64el, 5.12.15-amd64). > > I took a quick look and believe a change in behavior was caused by > commit 32021982a324 "Convert the hugetlbfs to use the fs_context > during mount". > > Prior to the commit, code processing the mode option used > match_octal() to convert the command line string to a numeric value. > Since match_octal expects a string representing an octal value, it does > not require a leading '0'. As a result, prior to this commit the > argument 'mode=1700' would result in a mode value of 01700. After the > commit one must precede octal values with 0. So, mode=1700 would result > in a mode value of 03244 (& 01777U) = 1244. > > If my analysis is correct, I am not sure how to proceed. IMO, the > current behavior is 'more correct'. However, until v5.1 a preceeding 0 > was not required when specifying mode for hugetlbfs. So, this was > certainly a change in behavior. Suggestions? Probably should follow what shmem does: mm/shmem.c: fsparam_u32oct("mode", Opt_mode), (also several other filesystems)
On 7/21/21 4:57 AM, Matthew Wilcox wrote: > On Tue, Jul 20, 2021 at 09:38:48PM -0700, Mike Kravetz wrote: >> Add David, Al >> >> On 7/20/21 3:07 PM, Andrew Morton wrote: >>>> https://bugzilla.kernel.org/show_bug.cgi?id=213785 >> >> I took a quick look and believe a change in behavior was caused by >> commit 32021982a324 "Convert the hugetlbfs to use the fs_context >> during mount". >> >> Prior to the commit, code processing the mode option used >> match_octal() to convert the command line string to a numeric value. >> Since match_octal expects a string representing an octal value, it does >> not require a leading '0'. As a result, prior to this commit the >> argument 'mode=1700' would result in a mode value of 01700. After the >> commit one must precede octal values with 0. So, mode=1700 would result >> in a mode value of 03244 (& 01777U) = 1244. >> >> If my analysis is correct, I am not sure how to proceed. IMO, the >> current behavior is 'more correct'. However, until v5.1 a preceeding 0 >> was not required when specifying mode for hugetlbfs. So, this was >> certainly a change in behavior. Suggestions? > > Probably should follow what shmem does: > > mm/shmem.c: fsparam_u32oct("mode", Opt_mode), > > (also several other filesystems) > Thanks Matthew! That looks to be exactly what is needed.