Bug 201433 - mbind, set_mempolicy: Most significant bit of node mask is ignored
Summary: mbind, set_mempolicy: Most significant bit of node mask is ignored
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: NUMA/discontigmem (show other bugs)
Hardware: All Linux
: P1 low
Assignee: mm_numa-discontigmem
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-15 18:58 UTC by João Neto
Modified: 2018-10-15 22:36 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.18.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description João Neto 2018-10-15 18:58:33 UTC
The most significant bit of the nodemask when calling `mbind` or `set_mempolicy` is ignored.

Calling `mbind` or `set_mempolicy` with `*nodemask=(1<<maxnode)-1` results in allocation in allocation in nodes `0`-`N-2`. No memory is allocated in node `N-1`.

The cause may be the function `get_nodes` in mm/mempolicy.c:1238 decrementing `maxnode` right away.

Documentation clearly states (e.g. `set_mempolicy()`) "nodemask  points  to  a  bit mask of node IDs that contains up to maxnode bits".


Steps to Reproduce:

(1) Attempt to set a memory policy that interleaves memory on N nodes
(2) Check the per-node memory usage
(3) Allocate and initialize a large array
(4) Check the per-node memory usage and compare with the previous one

The code below sets a interleave-on-all-nodes memory policy and displays the per-node usage before and after a large (1GB) allocation.

Compiled with `gcc-8 file.c -lnuma`.

-------------------
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <numa.h>
#include <numaif.h>
#include <unistd.h>
#include <sys/mman.h>

#define NUM_ELEMS ((1 << 30) / sizeof(int)) // 1GB array

void print_node_memusage() {
  for (size_t i=0; i < numa_num_configured_nodes(); i++) {
    FILE *fp;
    char buf[1024];

    snprintf(buf, sizeof(buf),
             "cat /sys/devices/system/node/node%lu/meminfo | grep MemUsed", i);

    if ((fp = popen(buf, "r")) == NULL) {
      perror("popen");
      exit(-1);
    }

    while(fgets(buf, sizeof(buf), fp) != NULL) {
      printf("%s", buf);
    }

    if(pclose(fp))  {
      perror("pclose");
      exit(-1);
    }
  }
}

int main() {
  uint64_t num_nodes = numa_num_configured_nodes();
  uint64_t all_nodes_mask = (1 << numa_num_configured_nodes()) - 1;
  set_mempolicy(MPOL_INTERLEAVE, &all_nodes_mask, num_nodes);

  // print per-node memory usage before
  print_node_memusage();

  // allocate large array and write to it
  int *a = malloc(NUM_ELEMS * sizeof(int));
  a[0] = 123;
  for (size_t i=1; i < NUM_ELEMS; i++) {
    a[i] = (a[i-1] * a[i-1]) % 1000000;
  }

  // print per-node memory usage after
  print_node_memusage();

  free(a);
  return 0;
}
-------------------


Expected Results:
It should allocate similar amounts in all N nodes.


Actual Results:
It allocated memory in the first N-1 nodes. No memory is allocated in the last node.

Example run on my machine:

Before:
Node 0 MemUsed:         3669964 kB
Node 1 MemUsed:          935864 kB
Node 2 MemUsed:         2921224 kB
Node 3 MemUsed:         2439580 kB

After:
Node 0 MemUsed:         4020876 kB (+343MB)
Node 1 MemUsed:         1287212 kB (+343MB)
Node 2 MemUsed:         3271468 kB (+342MB)
Node 3 MemUsed:         2439264 kB (no dif)


Build Date & Hardware:
Kernel: 4.18.10
OS:     Ubuntu 16.04.3 LTS
CPU:    2 x Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
RAM:    4x8GB DIMMs (8GBs per node)

Note You need to log in before you can comment on or make changes to this bug.