Bug 217253

Summary: mbind, set_mempolicy, migrate_pages: maxnode description is off-by-one
Product: Documentation Reporter: Anthony J. Battersby (tonyb)
Component: man-pagesAssignee: documentation_man-pages (documentation_man-pages)
Status: NEEDINFO ---    
Severity: normal CC: akpm
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description Anthony J. Battersby 2023-03-27 14:05:42 UTC
linux/mm/mempolicy.c::get_nodes() does "--maxnode" at the beginning, so:
maxnode == 0 is invalid (-EINVAL).
maxnode == 1 specifies the empty set of nodes (the man pages currently say to use maxnode == 0).
maxnode == 2 indicates one valid bit in nodemask.
maxnode == 3 indicates two valid bits in nodemask.
etc.

Incorrect section from mbind manpage:

"nodemask points to a bit mask of nodes containing up to maxnode bits.  The bit mask size is rounded to the next multiple of sizeof(unsigned long), but the kernel will use bits only up to maxnode.  A NULL value of nodemask or a maxnode value of zero specifies the empty set of nodes.  If the value of maxnode is zero, the nodemask argument is ignored."

I am not sure if this was an intentional design choice or a bug that got enshrined in the userspace API, but userspace programs "in the know" seem to rely on this now:

https://gitlab.com/qemu-project/qemu/-/blob/60ca584b8af0de525656f959991a440f8c191f12/backends/hostmem.c#L369

Also, the commit message for linux commit c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall") shows using "new_nodes->size + 1", so this API bug/choice seems to be known within the kernel community.

Here is a related bugzilla entry that treats the problem as a kernel bug rather than a documentation issue:
https://bugzilla.kernel.org/show_bug.cgi?id=201433

But since "fixing" the bug (assuming that it was unintentional) might break existing userspace programs that work around the bug, I suggest fixing the documentation instead.  But that is just my opinion as a user who just ran into the bug and did some investigating; best to check with the kernel maintainers for their opinion.

Related:
linux commit 050c17f239fd ("numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES") commit message talks about calculating maxnode for get_mempolicy().
Comment 1 Alejandro Colomar 2023-05-19 11:46:24 UTC
Thanks for the investigation.  I CCed the maintainer.  If you have any
specific suggestions for fixing the documentation, would you mind
preparing a patch according to the ./CONTRIBUTING file in the
man-pages repository?