Bug 203537 - Why can I access memory(and has non-zero contents) beyond the end of mmap()-ed file without a segfault or SIGBUS ?
Summary: Why can I access memory(and has non-zero contents) beyond the end of mmap()-e...
Status: RESOLVED INVALID
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-07 03:21 UTC by GYt2bW
Modified: 2019-05-28 14:55 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.1.1
Tree: Mainline
Regression: No


Attachments
run ./go to reproduce the SIGBUS error (2.09 KB, application/x-xz)
2019-05-07 03:21 UTC, GYt2bW
Details
my /proc/config.gz (28.85 KB, application/gzip)
2019-05-13 11:15 UTC, GYt2bW
Details
mmap_access_beyond.c (2.71 KB, text/x-csrc)
2019-05-13 14:29 UTC, GYt2bW
Details
mmap_access_beyond.c (3.56 KB, text/x-csrc)
2019-05-14 00:05 UTC, GYt2bW
Details

Description GYt2bW 2019-05-07 03:21:12 UTC
Created attachment 282661 [details]
run ./go to reproduce the SIGBUS error

When mmap-ing a symlink(that points to 0 byte file) and accessing first byte of the resulting mmap addres, SIGBUS is triggered.
But when mmap-ing a symlink(that points to 1 byte file) then it works even if accessing byte 0x2DFFF  (it only segfaults at byte 0x2E000)

Tested on:
`Linux Z575 5.1.0-ge93c9c99a629 #78 SMP PREEMPT Mon May 6 22:38:25 CEST 2019 x86_64 GNU/Linux`

Code that reproduces it(it's also inside the attachment!):
```
// https://midnight-commander.org/ticket/3983#comment:7
#include <sys/types.h>
#include <sys/stat.h>

#define __USE_XOPEN2K8 1 //to get O_NOFOLLOW
#include <fcntl.h>

#include <unistd.h> // close
#include <stdio.h> //printf

#define __USE_MISC 1 //to get MAP_FILE
#include <sys/mman.h> //mmap

#include <stdlib.h>
#include <bits/mman-linux.h> // MAP_FILE
#include <string.h> // memcmp

#define CRASH 1

int main() {
  int file;
  off_t size=1; //should be size of the contents of the file that the symlink points to!
#ifdef CRASH
  file = open("./3/symlink_to_emptyfile", O_RDONLY // | O_NOFOLLOW //open will fail since the file is a symlink! 12 bytes symlink (ie. "../emptyfile")
      );
#else //don't crash:
  file = open("./3/symlink_to_1bytefile", O_RDONLY); //works!
#endif
  if (file >= 0) {
    printf("!! open success\n");
    char *addr;
    addr = mmap (0, size, PROT_READ, MAP_FILE | MAP_PRIVATE, file, 0);
    if (addr != MAP_FAILED){
      printf("!! mmap ok %p\n",addr);
      printf("!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here!
      printf("!! 2nd byte of mmap: %c\n", addr[1]);
      printf("!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1
      printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
      printf("!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1
      printf("!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1
      //printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault
      munmap(addr, size);
    }
  } else {
    printf("!! open failed\n");
  }
  close(file);
  return 0;
}
```

```
$ ./go
!! open success
!! mmap ok 0x7fd18dc32000
./go: line 1:  3674 Bus error               (core dumped) ./a.out
```

```
Core was generated by `./a.out'.
Program terminated with signal SIGBUS, Bus error.
#0  0x00005595bf25222a in main () at b.c:35
35	      printf("!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here!
(gdb) bt full
#0  0x00005595bf25222a in main () at b.c:35
        addr = 0x7fd18dc32000 <error: Cannot access memory at address 0x7fd18dc32000>
        file = 3
        size = 1
```
Comment 1 Bharath 2019-05-12 07:45:08 UTC
Able to reproduce this.

This is not a bug within the kernel. Also use NULL instead of 0 in the addr field of mmap, it is more portable. 

Case of one byte file:

When you allocate mmap, it always allocates memory whose size is a multiple of PAGE_SIZE. So even though you have specified size=1, memory allocated will still be PAGE_SIZE which is 4kb in my case but do 'getconf PAGE_SIZE' on your shell to see your PAGE_SIZE. According to the man page:
A file is mapped in multiples of the page size. For a file that  is not a multiple of the page size, the remaining memory is zeroed when mapped. 

This is why you are able to access addr[3](even though file size should be 2 for a single byte file, use stat on the file) also. 0XFFFF, OX2CFFF, 0x2DFFF give me a seg fault as my PAGE_SIZE is 4kb and it will seg fault if you access a memory address beyond the page. you can access till addr[PAGE_SIZE - 1].

Case of zero byte file:

According to the man page, if a file of 0 byte is mapped, mmap will not fail. It will return to you an address but you cannot access it. 
try using gdb to check it out: put a break on main and keep stepping into the program till you cross mmap. Try accessing addr you'll get:
(gdb) print(addr)
$1 = 0x7ffff7ff6000 <error: Cannot access memory at address 0x7ffff7ff6000>
This means that the memory is not allocated to the file.
According to the man page, SIGBUS is returned when you try to access a buffer that does not correspond to the file.

Hope this helps.
Comment 2 GYt2bW 2019-05-12 12:48:47 UTC
Thank you for your input.

Kernel page size is 4096 bytes for me too.
Also: `printf("!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096`

> According to the man page, if a file of 0 byte is mapped, mmap will not fail. It will return to you an address but you cannot access it. 

Could you paste the relevant section from man page please? The only* reason I created this issue was because I thought this wasn't documented, and admittedly I only skimmed the man page...
(* ok, the other reason was: no segfault/SIGBUS when accessing over the expected size)

I guess it must be this part from `man 2 mmap`:
```
SIGBUS Attempted  access to a portion of the buffer that does not correspond to the file (for example, beyond the end
              of the file, including the case where another process has truncated the file).
```
more specifically this "`including the case where another process has truncated the file`" which is another way of implying the truncation can happen at offset 0 too, thus yielding a 0 byte now truncated file, this being the equivalent of mmap-ing a 0 byte file!

So, then, indeed this is documented behaviour then. My bad.
And (re)thinking about it, it makes sense to me now how it works.
Thanks Bharath !

The other part of the issue, with accessing for non-zero byte file(eg. a symlink pointing to a 1 byte long file), the OP test was done on an AMD CPU/laptop and I was amazed that it didn't SIGBUS(or segfault even)!
The current test I'm doing on an Intel i7-8700k CPU and it still acts the same: only segfaults at:

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005984278502e8 in main () at b.c:41
41	      printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault
```

so not at any of 0x2DFFF or prior, but according to what you said it should:
 1. segfault at any offset equal or above to PAGE_SIZE(aka 4096 bytes) which would be anything at or above 0x1000 ie. addr[0x1000]

 2. And according to the above SIGBUS-quote from the `man 2 mmap` it should SIGBUS instead of segfault, because I am attempting to access memory beyond the end of the file.

So, what could be the reasons it doesn't do 1&2 ? Any ideas?

Kernel I'm using now:
Linux i87k 5.1.1-gb724e9356404 #3 SMP Sun May 12 12:12:10 CEST 2019 x86_64 GNU/Linux
from the stable repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.1.y
Comment 3 GYt2bW 2019-05-12 13:05:45 UTC
given the above, I've now changed bug title from:
`mmap() on a symlink to 0 byte file returns a 0 byte mmap which crashes with SIGBUS error when accessed`
to:
`mmap() with len=1 on a symlink to 1 byte file returns a mmap which doesn't crash with SIGBUS(or segfault) when accessed from offset 0 to 0x2DFFF`

Here's my current repro code (ie. modified file `b.c` from attachment):

```
// https://bugzilla.kernel.org/show_bug.cgi?id=203537
// https://midnight-commander.org/ticket/3983#comment:7
#include <sys/types.h>
#include <sys/stat.h>

#define __USE_XOPEN2K8 1 //to get O_NOFOLLOW
#include <fcntl.h>

#include <unistd.h> // close
#include <stdio.h> //printf

#define __USE_MISC 1 //to get MAP_FILE
#include <sys/mman.h> //mmap

#include <stdlib.h>
#include <bits/mman-linux.h> // MAP_FILE
#include <string.h> // memcmp

#define CRASH 0
//^ set to 0 or undefine it to not crash!

int main() {
  int file;
  off_t size=1; //should be size of the contents of the file that the symlink points to!
#if CRASH==1
  file = open("./3/symlink_to_emptyfile", O_RDONLY // | O_NOFOLLOW //open will fail since the file is a symlink! 12 bytes symlink (ie. "../emptyfile")
      );
#else //don't crash:
  file = open("./3/symlink_to_1bytefile", O_RDONLY); //works!
#endif
  if (file >= 0) {
    printf("!! open success\n");
    char *addr;
    addr = mmap (NULL, size, PROT_READ, MAP_FILE | MAP_PRIVATE, file, 0);
    if (addr != MAP_FAILED){
      printf("!! mmap ok %p\n",addr);
      printf("!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here!
      printf("!! 2nd byte of mmap: %c\n", addr[1]);
      printf("!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1
      long int page_size=sysconf(_SC_PAGE_SIZE);//4096
      printf("!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 
      printf("!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 
      printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
      printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
      printf("!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1
      printf("!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1
      printf("!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096
      printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault
      munmap(addr, size);
    }
  } else {
    printf("!! open failed\n");
  }
  close(file);
  return 0;
}
```

output:
```
$ ./go
!! open success
!! mmap ok 0x7cb61dee7000
!! 1st byte of mmap: 1
!! 2nd byte of mmap: 
!! 3nd byte of mmap: 
!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: 
!! 0x1000-th(PAGE_SIZE) byte of mmap: 
!! 0xFFFF-th byte of mmap: �
!! 0xFFFF-th byte of mmap: �
!! 0X2CFFF-th byte of mmap: 
!! 0x2DFFF-th byte of mmap: 
!! PAGE_SIZE:4096
./go: line 1:  2803 Segmentation fault      (core dumped) ./a.out
```

```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000057a00ef72393 in main () at b.c:48
48	      printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault

```

Tested on Intel i7-8700k CPU now. (OP is on an AMD CPU, I don't remember exact details right now)
Comment 4 GYt2bW 2019-05-12 13:12:59 UTC
In other words, I can access memory beyond the end of the mmaped-file:

`!! number of non-zero bytes beyond the end of mmaped-file: 138902`

```
--- orig/b.c	2019-05-12 15:11:45.945937911 +0200
+++ mod/b.c	2019-05-12 15:10:32.005938616 +0200
@@ -45,6 +45,14 @@ int main() {
       printf("!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1
       printf("!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1
       printf("!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096
+      printf("!! number of non-zero bytes beyond the end of mmaped-file: ");
+      unsigned int count=0;
+      for (unsigned int i=1; i < 0x2E000; i++) {
+        if (addr[i] != 0) {
+          count++;
+        }
+      }
+      printf("%u\n", count);
       printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault
       munmap(addr, size);
     }
```
Comment 5 GYt2bW 2019-05-12 13:38:33 UTC
dumping those memory contents to the terminal, it looks like the contents of `/usr/lib64/ld-linux-x86-64.so.2` which is in `ldd` list for `a.out`:
$ ldd ./a.out 
	linux-vdso.so.1 (0x00007ffd6bf1d000)
	libc.so.6 => /usr/lib/libc.so.6 (0x0000797e613a6000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x0000797e61595000)

So I'm accessing memory from my own program? that's why it didn't crash?
Comment 6 Bharath 2019-05-13 09:25:24 UTC
Here is something interesting I found. 

I did an strace of the executable of your code on a 0 byte file. I got the usual 
SIGBUS, but here is something cool:

This is the SIGBUS signal's attrs.
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7f6ea5011000} ---
+++ killed by SIGBUS (core dumped) +++
Look at the si_code 'BUS_ADRERR'. BUS_ADRERR occurs when there is non-existent physical memory. Since the file is 0 bytes and it has 0 data blocks, I think it has no physical address associated with its data. So accessing a file with no physical address will give a SIGBUS error. There is no physical address on disk which stores this file! I think when we access the file, a page fault will be triggered(as the kernel does lazy page allocation) and the kernel will try to make a page table entry for the virtual address(mmap given) and a pfn to hold the page. When it needs to get the contents of the file, It will see that there is no physical address to the file and will terminate. I am not entirely sure but it seems to make sense. 

But SIGSEGV will occur when you access an address which you don't have any right to touch. So when you access an address beyond the file in the case of non-zero byte file, you do not have any permissions to access the virtual address and you seg fault.

This is the SIGSEGV signal's attrs.
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fd5ef09afff} ---
+++ killed by SIGSEGV (core dumped) +++
SEGV_MAPPER occurs when the virtual address is not mapped to the file. 

In your context, I think SIGBUS occurs because the file is 0 bytes and has no physical address associated with it.(WTH are we reading when we do addr[0] is there is no physical address associated with it?!)

Now why does it give a segfault when you access an address beyond the PAGE_SIZE? My interpretation is as follows: 
Essentially you are just accessing an address in the process address space. So when you go beyond the memory mapped to you by mmap, you are touching addresses which were not mapped to you and it fails because by signal_code SEGV_MAPPER. It does not access a physical address here.

I am no memory management expert but these are my interpretations. I think you should put this on the memory management mailing lists so that the pros can give their opinions.  

PS: It segfaults for me after 0x1000(PAGE_SIZE) and does not go until 2E000 :( I am not sure why you can access till 0x2E000 :(

What do you think?
Comment 7 GYt2bW 2019-05-13 11:15:10 UTC
Created attachment 282735 [details]
my /proc/config.gz

It makes sense. I just wish it would be consistent with the manual, or have the manual changed to reflect that. But it's possible I misread it. Oh well :) I don't care about it anymore.

What I do care about, right now, if me not getting segfault like you do.

Bharath, may I please have your /proc/config.gz (aka .config) for the linux kernel which you used when you got segfault after 0x1000 ?
I'm thinking that(something in the kernel config) must be the only thing I've in common on the two systems(laptop and desktop) since they both don't segfault until 0x2E000.

Thank you for your insights, Bharath. Having never used mmap before, myself, I find I've learned quite a bit from what you wrote!

Any idea where's the code for mmap() in kernel? I tried to find it but it seems obscured or quite possibly under a different function name. I don't even know.
./arch/x86/mm/mmap.c
./mm/mmap.c
./include/linux/mm.h

> I think you should put this on the memory management mailing lists so that the pros can give their opinions.  

PS: I'm avoiding the mailing lists - next thing you know I'm going to be asked follow some obscure rules or sign-off on patches using real name or something both of which are whatever-s-the-antonym-for-incentive =)


Oh look, I mmap-ed 200MiB file (209715200 bytes of dd-ed /dev/zero into a file) and I can access data until at least 1806834 bytes after it ends. That's 1.7MiB more, of which 1470000 bytes are non-zero.
And I see text from another program like:
%s%s[%d](full:'%s') for user %s(%d(eff:%s(%d))) 2of2 FAILed to resolve requested hostname '%s'(raw), which %s transformed(eg. 𝙵𝟢𝟢->f00) into:

Now I wonder what allows this! What kernel .config option that I have is really allowing this... since I can't think of anything else (can two systems, one AMD and one Intel, be so alike to both allow this, or is it likely some kernel .config option that I have in common for both the culprit... maybe I should try with ArchLinux's provided kernels... )

```
// https://bugzilla.kernel.org/show_bug.cgi?id=203537
// https://midnight-commander.org/ticket/3983#comment:7
#include <sys/types.h>
#include <sys/stat.h>

#define __USE_XOPEN2K8 1 //to get O_NOFOLLOW
#include <fcntl.h>

#include <unistd.h> // close
#include <stdio.h> //printf

#define __USE_MISC 1 //to get MAP_FILE
#include <sys/mman.h> //mmap

#include <stdlib.h>
#include <bits/mman-linux.h> // MAP_FILE
#include <string.h> // memcmp

#define CRASH 0
//^ set to 0 or undefine it to not crash!
#define USE_SYMLINK 0
#define USE_BIGFILE 1 //set to 0 to use 1 byte file when USE_SYMLINK is 0 (and CRASH is 0)

int main() {
  int file;
  off_t size=1; //should be size of the contents of the file that the symlink points to!
#if CRASH==1
  file = open("./3/symlink_to_emptyfile", O_RDONLY // | O_NOFOLLOW //open will fail since the file is a symlink! 12 bytes symlink (ie. "../emptyfile")
      );
#else //don't crash:
#if USE_SYMLINK==1
  printf("!! symlink\n");
  file = open("./3/symlink_to_1bytefile", O_RDONLY); //works!
#elif USE_BIGFILE==0
  printf("!! normal file\n");
  file = open("./1bytefile", O_RDONLY); //works!
#else
  printf("!! big file\n");
  file = open("./bigfile", O_RDONLY); // created via: dd if=/dev/zero of=bifgile bs=1M count=200
  size=200*1024*1024;
#endif
#endif
  if (file >= 0) {
    printf("!! open success\n");
    char *addr;
    addr = mmap (NULL, size, PROT_READ, MAP_FILE | MAP_PRIVATE, file, 0);
    if (addr != MAP_FAILED){
      printf("!! mmap ok %p\n",addr);
      printf("!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here!
      printf("!! 2nd byte of mmap: %c\n", addr[1]);
      printf("!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1
      long int page_size=sysconf(_SC_PAGE_SIZE);//4096
      printf("!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 
      printf("!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 
      printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
      printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
      printf("!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1
      printf("!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1
      printf("!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096
      printf("!! number of non-zero bytes beyond the end of mmaped-file: ");
      unsigned int count=0;
      for (unsigned int i=1; i < size+1551160+0x2E000*2; i++) {
        if (addr[i] != 0) {
          printf("%c", addr[i]);
          count++;
        }
        if ((count > 0) && (count % 10000 == 0)) {
          printf("!! i=%u count='%u'\n", i, count);
        }
      }
      printf("%u\n", count);
      printf("!! 0x2E000-th byte of mmap: %c\n", addr[size+0x2E000-1]); // segfault
      munmap(addr, size);
    }
  } else {
    printf("!! open failed\n");
  }
  close(file);
  return 0;
}
```
Comment 8 GYt2bW 2019-05-13 11:49:26 UTC
I'm able to access memory beyond mmap-ed end of file with all kernels(listed below), so then it's likely not a .config thing(because only one is compiled by me), maybe a /proc/cmdline thing?

```local/linux 5.0.13.arch1-1 (base)
    The Linux kernel and modules
local/linux-stable 5.1.1.r0.gb724e9356404-1 (builtbydaddy)
    The Linux kernel and modules (stable version)
local/linux-lts 4.19.42-1
    The Linux-lts kernel and modules

Linux i87k 4.19.42-1-lts #1 SMP Sat May 11 06:58:51 CEST 2019 x86_64 GNU/Linux
Linux i87k 5.0.13-arch1-1-ARCH #1 SMP PREEMPT Sun May 5 18:05:41 UTC 2019 x86_64 GNU/Linux
Linux i87k 5.1.1-gb724e9356404 #9 SMP Sun May 12 22:02:58 CEST 2019 x86_64 GNU/Linux

```

This is my /proc/cmdline currently:
```
BOOT_IMAGE=/boot/vmlinuz-linux-stable root=UUID=2b8b9ab8-7ac5-4586-aa42-d7ffb12de92a rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 pax_sanitize_slab=full nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=128M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 noefi cpuidle.governor=teo zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold i915.alpha_support=1 i915.fastboot=1
```

I don't currently have access to the AMD system but its /proc/cmdline was:
```
BOOT_IMAGE=/boot/vmlinuz-linux-git root=UUID=a499587c-99ef-4c93-9ea0-b61cb5f13193 rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 pax_sanitize_slab=full nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=128M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 acpi_backlight=vendor CPUunderclocking noefi tsc=unstable radeon.audio=0 radeon.lockup_timeout=999000 radeon.test=0 radeon.agpmode=-1 radeon.benchmark=0 radeon.tv=0 radeon.hard_reset=1 radeon.msi=1 radeon.pcie_gen2=-1 radeon.no_wb=1 radeon.dynclks=0 radeon.r4xx_atom=0 radeonfb radeon.fastfb=1 radeon.dpm=1 radeon.runpm=1 radeon.modeset=1 radeon.aspm=0 pcie_aspm=off rcu_nocbs=1-3 cpuidle.governor=teo zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold
```

wdiff shows the following in common:
  rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 pax_sanitize_slab=full nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=128M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120
noefi
  cpuidle.governor=teo zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold 

Removing some obvious ones which couldn't be causing this, we're left with:
memory_corruption_check=1
log_buf_len=16M
fbcon=scrollback:4096k
pax_sanitize_slab=full
enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=128M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off
lpj=0 mce=bootlog
noexec=on 
cpuidle.governor=teo zram.num_devices=3

then there's
```
/etc/modprobe.d 
$ cat *|grep -v ^#| sed -e '/^\s*$/d'
alias char-major-10-175	agpgart
alias char-major-10-200	tun
alias char-major-81	bttv
alias char-major-108	ppp_generic
alias /dev/ppp		ppp_generic
alias tty-ldisc-3	ppp_async
alias tty-ldisc-14	ppp_synctty
alias ppp-compress-21	bsd_comp
alias ppp-compress-24	ppp_deflate
alias ppp-compress-26	ppp_deflate
alias loop-xfer-gen-0	loop_gen
alias loop-xfer-3	loop_fish2
alias loop-xfer-gen-10	loop_gen
alias cipher-2		des
alias cipher-3		fish2
alias cipher-4		blowfish
alias cipher-6		idea
alias cipher-7		serp6f
alias cipher-8		mars6
alias cipher-11		rc62
alias cipher-15		dfc2
alias cipher-16		rijndael
alias cipher-17		rc5
alias char-major-89    i2c-dev
blacklist ideapad_laptop
blacklist thinkpad_acpi
blacklist nvram
blacklist rfkill
blacklist led_class
blacklist fglrx
blacklist ipv6
install ipv6 /bin/true
blacklist nfs
blacklist pcspkr
blacklist uvcvideo
blacklist bluetooth
alias parport_lowlevel parport_pc
alias char-major-10-144 nvram
alias binfmt-0064 binfmt_aout
alias char-major-10-135 rtc
softdep uhci_hcd pre: ehci_hcd
softdep ohci_hcd pre: ehci_hcd
options zram num_devices=3
```

/etc/modules-load.d only has zram

They seem harmless.

Back to /proc/cmdline, further reducing:

```
  lpj=n   [KNL]                                                                                                                 
      Sets loops_per_jiffy to given constant, thus avoiding
      time-consuming boot-time autodetection (up to 250 ms per
      CPU). 0 enables autodetection (default). To determine
      the correct value for your kernel, boot with normal
      autodetection and see what value is printed. Note that
      on SMP systems the preset will be applied to all CPUs,
      which is likely to cause problems if your CPUs need
      significantly divergent settings. An incorrect value
      will cause delays in the kernel to be wrong, leading to
      unpredictable I/O errors and other breakage. Although
      unlikely, in the extreme case this might damage your
      hardware. 

   mce=bootlog
    Enable logging of machine checks left over from booting.
    Disabled by default on AMD Fam10h and older because some BIOS
    leave bogus ones.
    If your BIOS doesn't do that it's a good idea to enable though
    to make sure you log even machine check events that result
    in a reboot. On Intel systems it is enabled by default.

2. fbcon=scrollback:<value>[k]                                                                                                  

        The scrollback buffer is memory that is used to preserve display
        contents that has already scrolled past your view.  This is accessed
        by using the Shift-PageUp key combination.  The value 'value' is any
        integer. It defaults to 32KB.  The 'k' suffix is optional, and will
        multiply the 'value' by 1024.

  log_buf_len=n[KMG]  Sets the size of the printk ring buffer,                                                                  
      in bytes.  n must be a power of two and greater
      than the minimal size. The minimal size is defined
      by LOG_BUF_SHIFT kernel config parameter. There is
      also CONFIG_LOG_CPU_MAX_BUF_SHIFT config parameter
      that allows to increase the default size depending on
      the number of CPUs. See init/Kconfig for more details.

  memory_corruption_check=0/1 [X86]                                                                                             
      Some BIOSes seem to corrupt the first 64k of
      memory when doing things like suspend/resume.
      Setting this option will scan the memory
      looking for corruption.  Enabling this will
      both detect corruption and prevent the kernel
      from using the memory being corrupted.
      However, its intended as a diagnostic tool; if
      repeatable BIOS-originated corruption always
      affects the same memory, you can use memmap=
      to prevent the kernel from using that memory.

  enforcing [SELINUX] Set initial enforcing status.                                                                             
      Format: {"0" | "1"}
      See security/selinux/Kconfig help text.
      0 -- permissive (log only, no denials).
      1 -- enforcing (deny and log).
      Default value is 0.
      Value can be changed at runtime via /selinux/enforce.
  nohz=   [KNL] Boottime enable/disable dynamic ticks                                                                           
      Valid arguments: on, off
      Default: on

  nohz_full=  [KNL,BOOT,SMP,ISOL]
      The argument is a cpu list, as described above.
      In kernels built with CONFIG_NO_HZ_FULL=y, set 
      the specified list of CPUs whose tick will be stopped
      whenever possible. The boot CPU will be forced outside
      the range to maintain the timekeeping.  Any CPUs
      in this list will have their RCU callbacks offloaded,
      just as if they had also been called out in the
      rcu_nocbs= boot parameter.


  oops=panic  Always panic on oopses. Default is to just kill the                                                               
      process, but there is a small probability of
      deadlocking the machine.
      This will also cause panics on machine check exceptions.
      Useful together with panic=30 to trigger a reboot.

  panic=    [KNL] Kernel behaviour on panic: delay <timeout>
      timeout > 0: seconds before rebooting
      timeout = 0: wait forever
      timeout < 0: reboot immediately
      Format: <timeout>

  psi=    [KNL] Enable or disable pressure stall information                                                                    
      tracking.
      Format: <bool>

  sysrq_always_enabled                                                                                                          
      [KNL]
      Ignore sysrq setting - this boot parameter will
      neutralize any effect of /proc/sys/kernel/sysrq.
      Useful for debugging.

  random.trust_cpu={on,off}                                                                                                     
      [KNL] Enable or disable trusting the use of the
      CPU's random number generator (if available) to
      fully seed the kernel's CRNG. Default is controlled
      by CONFIG_RANDOM_TRUST_CPU.

  noexec    [IA-64]

  noexec    [X86]
      On X86-32 available only on PAE configured kernels.
      noexec=on: enable non-executable mappings (default)
      noexec=off: disable non-executable mappings


  cpuidle.governor=                                                                                                             
      [CPU_IDLE] Name of the cpuidle governor to use.

  crashkernel=size[KMG][@offset[KMG]]
      [KNL] Using kexec, Linux can switch to a 'crash kernel'
      upon panic. This parameter reserves the physical
      memory region [offset, offset + size] for that kernel
      image. If '@offset' is omitted, then a suitable offset
      is selected automatically. Check
      Documentation/kdump/kdump.txt for further details.

```

Can't find any info on 'pax_sanitize_slab'(tried `git log -S'pax_sanitize_slab'` and `git log -G"pax_sanitize_slab"` and `grep -nrIF pax_sanitize_slab` in kernel source tree), guessing it must be a remnant of using gentoo hardened kernels.

So what's left:
enforcing=0
cpuidle.governor=teo
zram.num_devices=3

Will reboot each of the 3 kernels with those options removed and only report in a new comment if there's a change. (I'll also temp replace /tmp with tmpfs - currently zram)
Comment 9 GYt2bW 2019-05-13 13:37:39 UTC
re prev. comment: no change!

Now using the following program, I'm getting a segfault at byte 1814528 after the end of the file. Dumping the contents since the first non-zero byte, shows some kind of ELF program. (not attaching since it might include some sensitive info, who knows, but it's not /lib64/ld-2.29.so)

```
(gdb) bt full
#0  0x000063307f21f468 in main () at b.c:63
        i = 211529728
        page_size = 4096
        count = 1474334
        addr = 0x7f893f33d000 <error: Cannot access memory at address 0x7f893f33d000>
        file = 3
        size = 209715200
```


```
// https://bugzilla.kernel.org/show_bug.cgi?id=203537
// https://midnight-commander.org/ticket/3983#comment:7
#include <sys/types.h>
#include <sys/stat.h>

#define __USE_XOPEN2K8 1 //to get O_NOFOLLOW
#include <fcntl.h>

#include <unistd.h> // close
#include <stdio.h> //printf

#define __USE_MISC 1 //to get MAP_FILE
#include <sys/mman.h> //mmap

#include <stdlib.h>
#include <bits/mman-linux.h> // MAP_FILE
#include <string.h> // memcmp

#define CRASH 0
//^ set to 0 or undefine it to not crash!
#define USE_SYMLINK 0
#define USE_BIGFILE 1 //set to 0 to use 1 byte file when USE_SYMLINK is 0 (and CRASH is 0)

int main() {
  int file;
  off_t size=1; //should be size of the contents of the file that the symlink points to!
#if CRASH==1
  file = open("./3/symlink_to_emptyfile", O_RDONLY // | O_NOFOLLOW //open will fail since the file is a symlink! 12 bytes symlink (ie. "../emptyfile")
      );
#else //don't crash:
#if USE_SYMLINK==1
  fprintf(stderr,"!! symlink\n");
  file = open("./3/symlink_to_1bytefile", O_RDONLY); //works!
#elif USE_BIGFILE==0
  fprintf(stderr, "!! normal file\n");
  file = open("./1bytefile", O_RDONLY); //works!
#else
  fprintf(stderr,"!! big file\n");
  file = open("./bigfile", O_RDONLY); // created via: dd if=/dev/zero of=bifgile bs=1M count=200
  size=200*1024*1024;
#endif
#endif
  if (file >= 0) {
    fprintf(stderr,"!! open success\n");
    char *addr;
    addr = mmap (NULL, size, PROT_READ, MAP_FILE | MAP_PRIVATE, file, 0);
    if (addr != MAP_FAILED){
      fprintf(stderr,"!! mmap ok %p\n",addr);
      fprintf(stderr,"!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here!
      fprintf(stderr,"!! 2nd byte of mmap: %c\n", addr[1]);
      fprintf(stderr,"!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1
      long int page_size=sysconf(_SC_PAGE_SIZE);//4096
      fprintf(stderr,"!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 
      fprintf(stderr,"!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 
      fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
      fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
      fprintf(stderr,"!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1
      fprintf(stderr,"!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1
      fprintf(stderr,"!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096
      fprintf(stderr,"!! number of non-zero bytes beyond the end of mmaped-file: ");
      unsigned int count=0;
      for (unsigned int i=1; i < size+1551160+0x2E000*2; i++) {
        if (addr[i] != 0) {
          count++;
        }
        if (count>0) { //print all after first non-zero which would be 'ELF'
          printf("%c", addr[i]); //XXX crashes at i=211529728, that's on accessing 443rd kernel page after the end of file
        }
/*        if ((count > 0) && (count % 10000 == 0)) {
          printf("!! i=%u count='%u'\n", i, count);
        }*/
      }
      fprintf(stderr,"%u\n", count);
      fprintf(stderr,"!! 0x2E000-th byte of mmap: %c\n", addr[size+0x2E000-1]); // segfault
      munmap(addr, size);
    }
  } else {
    fprintf(stderr,"!! open failed\n");
  }
  close(file);
  return 0;
}
```

I run it with `./go`
```
$ cat go
#!/bin/bash

rm ./a.out ; gcc -ggdb3 -O0 b.c && ./a.out >screen.out
```

Note: using `-O2` has exactly the same effect. (only `addr` is different in `gdb`) and using `-g0` still yields the same 1,810,432 bytes `screen.out` file, 
that's exactly 442 PAGE_SIZE(4096 bytes) pages, which means that when trying to access kernel page 443rd after the end of file I get the segfault (443=i-size=211529728-209715200, seen from gdb output, file size also being 209715200 aka 200MiB). So this also meant that there was one kernel page of zeroes after the file, since 442 pages were saved to file, thus 1 was skipped.


Oh crap, I've just realized something: I was looking for mmap in kernel tree, but:
$ pacman -Qo /usr/include/sys/mman.h
/usr/include/sys/mman.h is owned by glibc 2.29-1
So MAYBE it's part of glibc? or just the 'extern' about it is there, either way it seems to point to an `mmap64` which I definitely can't find in kernel code (not fox x86)

yep, I found something in glibc:
misc/mmap64.c
then I got lost into macros 
```
/* An architecture may override this.  */
  #ifndef MMAP_CALL
  # define MMAP_CALL(__nr, __addr, __len, __prot, __flags, __fd, __offset) \
    INLINE_SYSCALL_CALL (__nr, __addr, __len, __prot, __flags, __fd, __offset)
  #endif
```
I guess it still calls into something from the kernel, I just don't know the name of it.
oh wait, it's just `mmap`:
```
return (void *) MMAP_CALL (mmap, addr, len, prot, flags, fd,
             MMAP_ADJUST_OFFSET (offset));
```
or rather: __INLINE_SYSCALL_mmap ?
ahaaa, I have it:
`arch/x86/kernel/sys_x86_64.c:91:SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,`
oh yeah this is gotta be it:
```
  SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,                                                                
      unsigned long, prot, unsigned long, flags,
      unsigned long, fd, unsigned long, off)
  {
    long error;
    error = -EINVAL;
    if (off & ~PAGE_MASK)
      goto out;
  
    error = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
  out:
    return error;
  }
```

So `mmap` is basically `ksys_mmap_pgoff`, that's what I wanted to know!
which is defined in mm/mmap.c:
```
  unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len,
              unsigned long prot, unsigned long flags,
              unsigned long fd, unsigned long pgoff)
  {
    struct file *file = NULL;
    unsigned long retval;
  
    if (!(flags & MAP_ANONYMOUS)) {
      audit_mmap_fd(fd, flags);
      file = fget(fd);
      if (!file)
        return -EBADF;
      if (is_file_hugepages(file))
        len = ALIGN(len, huge_page_size(hstate_file(file)));
      retval = -EINVAL;
      if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file)))
        goto out_fput;
    } else if (flags & MAP_HUGETLB) {
      struct user_struct *user = NULL;
      struct hstate *hs;
  
      hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
      if (!hs)
        return -EINVAL;
  
      len = ALIGN(len, huge_page_size(hs));
      /*
       * VM_NORESERVE is used because the reservations will be
       * taken when vm_ops->mmap() is called
       * A dummy user value is used because we are not locking
       * memory so no accounting is necessary
       */
      file = hugetlb_file_setup(HUGETLB_ANON_FILE, len,
          VM_NORESERVE,
          &user, HUGETLB_ANONHUGE_INODE,
          (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
      if (IS_ERR(file))
        return PTR_ERR(file);
    }
  
    flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
  
    retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
  out_fput:
    if (file)
      fput(file);
    return retval;
  }
```

ok, I'll try to understand something from that mess! ("mess" to the untrained eye, that is)
Comment 10 GYt2bW 2019-05-13 14:22:18 UTC
I've modified the script to not mmap file because it works the same for MAP_ANONYMOUS:

```
$ cat go
#!/bin/bash
rm ./a.out ; gcc -ggdb3 -O0 mmap_access_beyond.c && ./a.out >screen.out ; ls -la ./screen.out                                                                                                                    


$ cat ./mmap_access_beyond.c 
// https://bugzilla.kernel.org/show_bug.cgi?id=203537

#include <unistd.h> // for close() or sysconf()/_SC_PAGE_SIZE
#include <stdio.h> //printf

#define __USE_MISC 1 //to get MAP_FILE or MAP_ANONYMOUS
#include <sys/mman.h> //mmap

#define SMALL_MMAP 1 //set to 0 to use a 200MiB mmap or set to 1 to use a 1 byte mmap!

int main() {
  off_t size=
#if SMALL_MMAP==1
    1
    // a 1 byte mmap
#else
    200*1024*1024
    // a 200MiB mmap
#endif
  ;
  char *addr;
  addr = mmap (NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); //same behaviour even without a file!
  if (addr != MAP_FAILED){
    fprintf(stderr,"!! mmap ok %p\n",addr);
    fprintf(stderr,"!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here!
    fprintf(stderr,"!! 2nd byte of mmap: %c\n", addr[1]);
    fprintf(stderr,"!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1
    const unsigned int page_size=sysconf(_SC_PAGE_SIZE);//4096
    fprintf(stderr,"!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 
    fprintf(stderr,"!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 
    fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
    fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
    fprintf(stderr,"!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1
    fprintf(stderr,"!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1
    fprintf(stderr,"!! PAGE_SIZE:%u\n", page_size); // 4096
    //fprintf(stderr,"!! number of non-zero bytes beyond the end of mmaped-file: ");

    unsigned int nonzerochars_seen=0;
    for (unsigned int i=1; i < size+1551160+0x2E000*2; i++) {
      if ( (i >= 211529728) || ((size == 1) && (i >= 188416)) ) {
        fprintf(stderr,"!! about to access addr at offset i=%u nonzerochars_seen='%u'\n", i, nonzerochars_seen);
      }
      if (addr[i] != 0) {
        nonzerochars_seen++;
      }
      if (nonzerochars_seen>0) { //print all after first non-zero which would be 'ELF'
        printf("%c", addr[i]);
        //XXX ^ crashes at i=211529728 when size == 200MiB, that's on accessing 443rd kernel page after the end of mmap-ed memory region
        //XXX ^ crases at i = 188416 if size == 1 and screen.out has first 172,016 bytes identical with /lib64/ld-2.29.so which is indirectly listed in a.out's ldd(differently named symlinks to it)
      }
    }
    //fprintf(stderr,"%u\n", nonzerochars_seen);
    fprintf(stderr,"!! 0x2E000-th byte of mmap: %c\n", addr[size+0x2E000-1]); // segfault
    munmap(addr, size);
  }
  return 0;
}

```

Run as `./go`

outputs:

for `SMALL_MMAP 1`

```
$ ./go
!! mmap ok 0x7faf8b2b6000
!! 1st byte of mmap: 
!! 2nd byte of mmap: 
!! 3nd byte of mmap: 
!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: 
!! 0x1000-th(PAGE_SIZE) byte of mmap: 
!! 0xFFFF-th byte of mmap: �
!! 0xFFFF-th byte of mmap: �
!! 0X2CFFF-th byte of mmap: 
!! 0x2DFFF-th byte of mmap: 
!! PAGE_SIZE:4096
!! about to access addr at offset i=188416 nonzerochars_seen='138902'
./go: line 7: 18175 Segmentation fault      (core dumped) ./a.out > screen.out
-rw-r--r-- 1 user user 180224 May 13 16:19 ./screen.out
```

for `SMALL_MMAP 0`
```
$ ./go
!! mmap ok 0x71804d3e3000
!! 1st byte of mmap: 
!! 2nd byte of mmap: 
!! 3nd byte of mmap: 
!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: 
!! 0x1000-th(PAGE_SIZE) byte of mmap: 
!! 0xFFFF-th byte of mmap: 
!! 0xFFFF-th byte of mmap: 
!! 0X2CFFF-th byte of mmap: 
!! 0x2DFFF-th byte of mmap: 
!! PAGE_SIZE:4096
!! about to access addr at offset i=211529728 nonzerochars_seen='1474334'
./go: line 7: 18323 Segmentation fault      (core dumped) ./a.out > screen.out
-rw-r--r-- 1 user user 1810432 May 13 16:20 ./screen.out
```
Comment 11 GYt2bW 2019-05-13 14:29:30 UTC
Created attachment 282739 [details]
mmap_access_beyond.c
Comment 12 GYt2bW 2019-05-13 15:23:21 UTC
valdis on ##kernel (freenode irc) found the reason:

> <valdis> howaboutsynergy: SO... I check your program... gdb it.  Single step, and catch /proc/PID/maps just before the mmap, and just after, and diff the two.  And the segment actually allocated is:
<valdis>  7ffff7fd3000-7ffff7ff3000 r-xp 00001000 fd:02 397289                     /usr/lib64/ld-2.29.9000.so
<valdis>  7ffff7ff3000-7ffff7ffb000 r--p 00021000 fd:02 397289                     /usr/lib64/ld-2.29.9000.so
<valdis> +7ffff7ffb000-7ffff7ffc000 r--p 00000000 00:00 0
<valdis>  7ffff7ffc000-7ffff7ffd000 r--p 00029000 fd:02 397289                     /usr/lib64/ld-2.29.9000.so
<valdis>  7ffff7ffd000-7ffff7ffe000 rw-p 0002a000 fd:02 397289                     /usr/lib64/ld-2.29.9000.so
<valdis> You asked for 1 page, and the kernel found 1 page, right between two mmaps already existing.  So you walk off the end of your 1 page mmap, but there's another page in the *next* mmap.
<valdis> Which is why your "off the end" looks suspiciously like ld.so :)

And he seems to be running the same glibc 2.29 just like me.

Someone else (<ayecee> on ##linux freenode irc) found that on Ubuntu 16.04.6 with glibc 2.23 and Ubuntu 18.04 and glibc 2.27 it segfaults as expected when size==1 at addr[PAGE_SIZE]

Meanwhile I'm attempting to reproduce valdis' results so I'm in the process of finding out how =) (ie. searching gdb help)
Comment 13 GYt2bW 2019-05-13 15:53:31 UTC
Alrightie then :D
Using valdis' ideas (also `'gdb ./a.out', then 'break main', *then* 'run'. Stops at beginning of main(), and then you use step from there. :` )

I found that with `SMALL_MMAP 1` I get:

```
 7ffff7fa0000-7ffff7fa3000 r--p 001bb000 00:14 365426                     /usr/lib/libc-2.29.so
 7ffff7fa3000-7ffff7fa6000 rw-p 001be000 00:14 365426                     /usr/lib/libc-2.29.so
 7ffff7fa6000-7ffff7fac000 rw-p 00000000 00:00 0 
+7ffff7fcd000-7ffff7fce000 r--p 00000000 00:00 0 
 7ffff7fce000-7ffff7fd1000 r--p 00000000 00:00 0                          [vvar]
 7ffff7fd1000-7ffff7fd2000 r-xp 00000000 00:00 0                          [vdso]
 7ffff7fd2000-7ffff7fd4000 r--p 00000000 00:14 365415                     /usr/lib/ld-2.29.so
```

and with `SMALL_MMAP 0` (aka 200MiB) mmap, I get:

```
$ colordiff -up /tmp/e_before  /tmp/e_after 
--- /tmp/e_before	2019-05-13 17:34:19.059892668 +0200
+++ /tmp/e_after	2019-05-13 17:35:13.831892145 +0200
@@ -3,6 +3,7 @@
 555555556000-555555557000 r--p 00002000 00:14 3107935                    /home/user/sandbox/c/mmap_symlink/a.out
 555555557000-555555558000 r--p 00002000 00:14 3107935                    /home/user/sandbox/c/mmap_symlink/a.out
 555555558000-555555559000 rw-p 00003000 00:14 3107935                    /home/user/sandbox/c/mmap_symlink/a.out
+7fffeb5e4000-7ffff7de4000 r--p 00000000 00:00 0 
 7ffff7de4000-7ffff7e09000 r--p 00000000 00:14 365426                     /usr/lib/libc-2.29.so
 7ffff7e09000-7ffff7f5c000 r-xp 00025000 00:14 365426                     /usr/lib/libc-2.29.so
 7ffff7f5c000-7ffff7f9f000 r--p 00178000 00:14 365426                     /usr/lib/libc-2.29.so
```



> <howaboutsynergy> oh yeah it workz! so cool! So somehow even after mmap-ing 200MiB, something put right after it 3 /usr/lib/libc-2.29.so
> <valdis> You have taht backwards.   libc-2.29.so got mapped there, and then the kernel mapped your mmap() call right up against it.
* valdis wonders how hard it would be to add an "allocate non-accessible guard page at both ends" flag to mmap().




`man 2 mmap`
>        SIGBUS Attempted  access to a portion of the buffer that does not correspond to the file (for example, beyond the end
              of the file, including the case where another process has truncated the file).

`man mmap` (ie. `map 3p mmap`)
>        The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out
       any modified portions of the last page of an object which are beyond its end.  References within  the  address  range
       starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of
       a SIGBUS signal.

>       An implementation may generate SIGBUS signals when a reference would cause an error in the  mapped  object,  such  as
       out-of-space condition.
Comment 14 GYt2bW 2019-05-13 23:56:49 UTC
closing as not-a-kernel-bug,
because glibc 2.28 seems to act differently/better with this:
it segfaults sooner, at page 4 beyond mmap,

```
$ ./go
!! mmap ok 0x7556715a2000
./go: line 7:  7991 Segmentation fault      (core dumped) ./a.out > screen.out
-rw-r--r-- 1 user user 8192 May 14 01:51 ./screen.out
--- /tmp/_before_mmap	2019-05-14 01:51:14.724989202 +0200
+++ /tmp/_after_mmap	2019-05-14 01:51:14.727989202 +0200
@@ -14,6 +14,7 @@
 755671579000-75567157b000 r--p 00000000 00:14 3607274                    /usr/lib/ld-2.28.so
 75567157b000-75567159b000 r-xp 00002000 00:14 3607274                    /usr/lib/ld-2.28.so
 75567159b000-7556715a2000 r--p 00022000 00:14 3607274                    /usr/lib/ld-2.28.so
+7556715a2000-7556715a3000 r--p 00000000 00:00 0 
 7556715a3000-7556715a4000 r--p 00029000 00:14 3607274                    /usr/lib/ld-2.28.so
 7556715a4000-7556715a5000 rw-p 0002a000 00:14 3607274                    /usr/lib/ld-2.28.so
 7556715a5000-7556715a6000 rw-p 00000000 00:00 0 

(gdb) bt full
#0  0x0000597354649357 in main () at mmap_access_beyond.c:67
        i = 16384
        rv2 = 0
        rv3 = 256
        nonzerochars_seen = 1976
        size = 1
        addr = 0x7556715a2000 <error: Cannot access memory at address 0x7556715a2000>
        selfpid = 7991
        wstatus = 1901412200
        cmd = 0x597355c14260 "cat /proc/7991/maps >/tmp/_after_mmap"
        cmd_size = 100
        rv = 0
```

instead of at page 46 with glibc 2.29:

```
$ ./go
!! mmap ok 0x7c30c96d1000
!! about to access addr at offset i=188416 nonzerochars_seen='139253'
./go: line 7:  8367 Segmentation fault      (core dumped) ./a.out > screen.out
-rw-r--r-- 1 user user 180224 May 14 01:54 ./screen.out
--- /tmp/_before_mmap	2019-05-14 01:54:19.728987437 +0200
+++ /tmp/_after_mmap	2019-05-14 01:54:19.730987437 +0200
@@ -11,6 +11,7 @@
 7c30c96a4000-7c30c96a7000 r--p 001bb000 00:14 3609092                    /usr/lib/libc-2.29.9000.so
 7c30c96a7000-7c30c96aa000 rw-p 001be000 00:14 3609092                    /usr/lib/libc-2.29.9000.so
 7c30c96aa000-7c30c96b0000 rw-p 00000000 00:00 0 
+7c30c96d1000-7c30c96d2000 r--p 00000000 00:00 0 
 7c30c96d2000-7c30c96d4000 r--p 00000000 00:14 3609081                    /usr/lib/ld-2.29.9000.so
 7c30c96d4000-7c30c96f4000 r-xp 00002000 00:14 3609081                    /usr/lib/ld-2.29.9000.so
 7c30c96f4000-7c30c96fc000 r--p 00022000 00:14 3609081                    /usr/lib/ld-2.29.9000.so

(gdb) bt full
#0  0x0000594711951357 in main () at mmap_access_beyond.c:67
        i = 188416
        rv2 = 0
        rv3 = 256
        nonzerochars_seen = 139253
        size = 1
        addr = 0x7c30c96d1000 <error: Cannot access memory at address 0x7c30c96d1000>
        selfpid = 8367
        wstatus = -915747000
        cmd = 0x594711ef6260 "cat /proc/8367/maps >/tmp/_after_mmap"
        cmd_size = 100
        rv = 0
```
Comment 15 GYt2bW 2019-05-14 00:05:31 UTC
Created attachment 282747 [details]
mmap_access_beyond.c

for completion,
Here's info for the 200MiB mmap:

with glibc 2.28.r0.g3c03baca37-1:
```
$ ./go
!! mmap ok 0x70a351e64000
./go: line 7:  8968 Segmentation fault      (core dumped) ./a.out > screen.out
-rw-r--r-- 1 user user 1806336 May 14 01:59 ./screen.out
--- /tmp/_before_mmap	2019-05-14 01:59:28.396984494 +0200
+++ /tmp/_after_mmap	2019-05-14 01:59:28.398984494 +0200
@@ -4,6 +4,7 @@
 5d7bceb0f000-5d7bceb10000 r--p 00002000 00:14 3611931                    /home/user/sandbox/c/mmap_symlink/a.out
 5d7bceb10000-5d7bceb11000 rw-p 00003000 00:14 3611931                    /home/user/sandbox/c/mmap_symlink/a.out
 5d7bcfad1000-5d7bcfaf2000 rw-p 00000000 00:00 0                          [heap]
+70a351e64000-70a35e664000 r--p 00000000 00:00 0 
 70a35e664000-70a35e689000 r--p 00000000 00:14 3610876                    /usr/lib/libc-2.28.so
 70a35e689000-70a35e7db000 r-xp 00025000 00:14 3610876                    /usr/lib/libc-2.28.so
 70a35e7db000-70a35e81e000 r--p 00177000 00:14 3610876                    /usr/lib/libc-2.28.so
-----------
user@i87k 2019/05/14 01:59:29 -bash5.0.7 t:7 j:0 d:3 pp:1054 p:7017 ut1628
!7963 15 0  5.1.1-gb724e9356404 #9 SMP Sun May 12 22:02:58 CEST 2019
/home/user/sandbox/c/mmap_symlink 
$ coredumpctl gdb
           PID: 8968 (a.out)
           UID: 1000 (user)
           GID: 1000 (user)
        Signal: 11 (SEGV)
     Timestamp: Tue 2019-05-14 01:59:28 CEST (3s ago)
  Command Line: ./a.out
    Executable: /home/user/sandbox/c/mmap_symlink/a.out
 Control Group: /user.slice/user-1000.slice/session-1.scope
          Unit: session-1.scope
         Slice: user-1000.slice
       Session: 1
     Owner UID: 1000 (user)
       Boot ID: 97842479cc604964a7a58a10e00f9e15
    Machine ID: 5767ef25f523419aaa049f3d74481940
      Hostname: i87k
       Storage: /var/lib/systemd/coredump/core.a\x2eout.1000.97842479cc604964a7a58a10e00f9e15.8968.1557791968000000
       Message: Process 8968 (a.out) of user 1000 dumped core.
                
                Stack trace of thread 8968:
                #0  0x00005d7bceb0d357 n/a (/home/user/sandbox/c/mmap_symlink/a.out)
                #1  0x000070a35e68b43b __libc_start_main (libc.so.6)
                #2  0x00005d7bceb0d0fe n/a (/home/user/sandbox/c/mmap_symlink/a.out)

GNU gdb (GDB) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/user/sandbox/c/mmap_symlink/a.out...done.
[New LWP 8968]
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005d7bceb0d357 in main () at mmap_access_beyond.c:67
67	      if (addr[i] != 0) {
(gdb) bt full
#0  0x00005d7bceb0d357 in main () at mmap_access_beyond.c:67
        i = 211525632
        rv2 = 0
        rv3 = 256
        nonzerochars_seen = 1466787
        size = 209715200
        addr = 0x70a351e64000 <error: Cannot access memory at address 0x70a351e64000>
        selfpid = 8968
        wstatus = 1585610600
        cmd = 0x5d7bcfad1260 "cat /proc/8968/maps >/tmp/_after_mmap"
        cmd_size = 100
        rv = 0
(gdb) quit
```

and with glibc 2.29.9000.r248.gf6efec90c8-1:

```
$ ./go
!! mmap ok 0x74139d730000
!! about to access addr at offset i=211529728 nonzerochars_seen='1473581'
./go: line 7:  8815 Segmentation fault      (core dumped) ./a.out > screen.out
-rw-r--r-- 1 user user 1810432 May 14 01:59 ./screen.out
--- /tmp/_before_mmap	2019-05-14 01:59:09.824984671 +0200
+++ /tmp/_after_mmap	2019-05-14 01:59:09.826984671 +0200
@@ -4,6 +4,7 @@
 600793f76000-600793f77000 r--p 00002000 00:14 3610153                    /home/user/sandbox/c/mmap_symlink/a.out
 600793f77000-600793f78000 rw-p 00003000 00:14 3610153                    /home/user/sandbox/c/mmap_symlink/a.out
 600794594000-6007945b5000 rw-p 00000000 00:00 0                          [heap]
+74139d730000-7413a9f30000 r--p 00000000 00:00 0 
 7413a9f30000-7413a9f55000 r--p 00000000 00:14 3609092                    /usr/lib/libc-2.29.9000.so
 7413a9f55000-7413aa0a8000 r-xp 00025000 00:14 3609092                    /usr/lib/libc-2.29.9000.so
 7413aa0a8000-7413aa0eb000 r--p 00178000 00:14 3609092                    /usr/lib/libc-2.29.9000.so
-----------
user@i87k 2019/05/14 01:59:10 -bash5.0.7 t:7 j:0 d:3 pp:1054 p:7017 ut1609
!7961 13 0  5.1.1-gb724e9356404 #9 SMP Sun May 12 22:02:58 CEST 2019
/home/user/sandbox/c/mmap_symlink 
$ coredumpctl gdb
           PID: 8815 (a.out)
           UID: 1000 (user)
           GID: 1000 (user)
        Signal: 11 (SEGV)
     Timestamp: Tue 2019-05-14 01:59:10 CEST (5s ago)
  Command Line: ./a.out
    Executable: /home/user/sandbox/c/mmap_symlink/a.out
 Control Group: /user.slice/user-1000.slice/session-1.scope
          Unit: session-1.scope
         Slice: user-1000.slice
       Session: 1
     Owner UID: 1000 (user)
       Boot ID: 97842479cc604964a7a58a10e00f9e15
    Machine ID: 5767ef25f523419aaa049f3d74481940
      Hostname: i87k
       Storage: /var/lib/systemd/coredump/core.a\x2eout.1000.97842479cc604964a7a58a10e00f9e15.8815.1557791950000000
       Message: Process 8815 (a.out) of user 1000 dumped core.
                
                Stack trace of thread 8815:
                #0  0x0000600793f74357 n/a (/home/user/sandbox/c/mmap_symlink/a.out)
                #1  0x00007413a9f56feb __libc_start_main (libc.so.6)
                #2  0x0000600793f740fe n/a (/home/user/sandbox/c/mmap_symlink/a.out)

GNU gdb (GDB) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/user/sandbox/c/mmap_symlink/a.out...done.
[New LWP 8815]
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000600793f74357 in main () at mmap_access_beyond.c:67
67	      if (addr[i] != 0) {
(gdb) bt full
#0  0x0000600793f74357 in main () at mmap_access_beyond.c:67
        i = 211529728
        rv2 = 0
        rv3 = 256
        nonzerochars_seen = 1473581
        size = 209715200
        addr = 0x74139d730000 <error: Cannot access memory at address 0x74139d730000>
        selfpid = 8815
        wstatus = -1441837240
        cmd = 0x600794594260 "cat /proc/8815/maps >/tmp/_after_mmap"
        cmd_size = 100
        rv = 0
(gdb) quit

```

there's basically no difference (just one page) between the two:
diff for `i=` is 211529728-211525632=4096 aka one page.

Here's the script I used in this and prev. comment:

```
// run like this: --rm ./a.out ; gcc -ggdb3 -O0 mmap_access_beyond.c && ./a.out >screen.out ; ls -la ./screen.out ; cat /tmp/_diff_mmap | colordiff


// https://bugzilla.kernel.org/show_bug.cgi?id=203537

#include <unistd.h> // for close() or sysconf()/_SC_PAGE_SIZE
#include <stdio.h> // for printf()

#define __USE_MISC 1 //to get MAP_FILE or MAP_ANONYMOUS
#include <sys/mman.h> // for mmap()

#include <stdlib.h> //for system()

#include <sys/wait.h> //for wait()

#define SMALL_MMAP 0 //set to 0 to use a 200MiB mmap or set to 1 to use a 1 byte mmap!

int main() {
  off_t size=
#if SMALL_MMAP==1
    1
    // a 1 byte mmap
#else
    200*1024*1024
    // a 200MiB mmap
#endif
  ;
  char *addr;
  int selfpid=getpid();

  int wstatus;
  char *cmd=NULL;
  const unsigned int cmd_size=100;
  cmd=malloc(cmd_size+1);
  snprintf(cmd, 1+cmd_size, "cat /proc/%d/maps >/tmp/_before_mmap", selfpid);
  int rv=system(cmd);
  wait(&wstatus);

  addr = mmap (NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); //same behaviour even without a file!
  if (addr != MAP_FAILED){
    snprintf(cmd, 1+cmd_size, "cat /proc/%d/maps >/tmp/_after_mmap", selfpid);
    int rv2=system(cmd);
    wait(&wstatus);
    int rv3=system("diff -up /tmp/_before_mmap /tmp/_after_mmap >/tmp/_diff_mmap 2>&1");
    wait(&wstatus);
    // /proc/self/maps idea from `valdis` on ##kernel freenode irc
    // on glibc 2.29 libc-2.29.so follows right after the above mmap, but it's read only! Still, it should SIGBUS as per `man 2/3p mmap`
    //fprintf(stderr,"!! colordiff rv= %d\n",rv3);
    fprintf(stderr,"!! mmap ok %p\n",addr);
//    fprintf(stderr,"!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here!
//    fprintf(stderr,"!! 2nd byte of mmap: %c\n", addr[1]);
//    fprintf(stderr,"!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1
//    const unsigned int page_size=sysconf(_SC_PAGE_SIZE);//4096
//    fprintf(stderr,"!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 
//    fprintf(stderr,"!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 
//    fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
//    fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1
//    fprintf(stderr,"!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1
//    fprintf(stderr,"!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1
//    fprintf(stderr,"!! PAGE_SIZE:%u\n", page_size); // 4096
//    //fprintf(stderr,"!! number of non-zero bytes beyond the end of mmaped-file: ");

    unsigned int nonzerochars_seen=0;
    for (unsigned int i=1; i < size+1551160+0x2E000*2; i++) {
      if ( (i >= 211529728) || ((size == 1) && (i >= 188416)) ) {
        fprintf(stderr,"!! about to access addr at offset i=%u nonzerochars_seen='%u'\n", i, nonzerochars_seen);
      }
      if (addr[i] != 0) {
        nonzerochars_seen++;
      }
      if (nonzerochars_seen>0) { //print all after first non-zero which would be 'ELF'
        printf("%c", addr[i]);
        //XXX ^ crashes at i=211529728 when size == 200MiB, that's on accessing 443rd kernel page after the end of mmap-ed memory region
        //XXX ^ crashes at i = 188416 if size == 1 and screen.out has first 172,016 bytes identical with /lib64/ld-2.29.so which is indirectly listed in a.out's ldd(differently named symlinks to it)
      }
    }
    //fprintf(stderr,"%u\n", nonzerochars_seen);
    fprintf(stderr,"!! 0x2E000-th byte of mmap: %c\n", addr[size+0x2E000-1]); // segfault
    munmap(addr, size);
  }
  return 0;
}
```

(also attached)

Note You need to log in before you can comment on or make changes to this bug.