Created attachment 282661 [details] run ./go to reproduce the SIGBUS error When mmap-ing a symlink(that points to 0 byte file) and accessing first byte of the resulting mmap addres, SIGBUS is triggered. But when mmap-ing a symlink(that points to 1 byte file) then it works even if accessing byte 0x2DFFF (it only segfaults at byte 0x2E000) Tested on: `Linux Z575 5.1.0-ge93c9c99a629 #78 SMP PREEMPT Mon May 6 22:38:25 CEST 2019 x86_64 GNU/Linux` Code that reproduces it(it's also inside the attachment!): ``` // https://midnight-commander.org/ticket/3983#comment:7 #include <sys/types.h> #include <sys/stat.h> #define __USE_XOPEN2K8 1 //to get O_NOFOLLOW #include <fcntl.h> #include <unistd.h> // close #include <stdio.h> //printf #define __USE_MISC 1 //to get MAP_FILE #include <sys/mman.h> //mmap #include <stdlib.h> #include <bits/mman-linux.h> // MAP_FILE #include <string.h> // memcmp #define CRASH 1 int main() { int file; off_t size=1; //should be size of the contents of the file that the symlink points to! #ifdef CRASH file = open("./3/symlink_to_emptyfile", O_RDONLY // | O_NOFOLLOW //open will fail since the file is a symlink! 12 bytes symlink (ie. "../emptyfile") ); #else //don't crash: file = open("./3/symlink_to_1bytefile", O_RDONLY); //works! #endif if (file >= 0) { printf("!! open success\n"); char *addr; addr = mmap (0, size, PROT_READ, MAP_FILE | MAP_PRIVATE, file, 0); if (addr != MAP_FAILED){ printf("!! mmap ok %p\n",addr); printf("!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here! printf("!! 2nd byte of mmap: %c\n", addr[1]); printf("!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1 printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 printf("!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1 printf("!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1 //printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault munmap(addr, size); } } else { printf("!! open failed\n"); } close(file); return 0; } ``` ``` $ ./go !! open success !! mmap ok 0x7fd18dc32000 ./go: line 1: 3674 Bus error (core dumped) ./a.out ``` ``` Core was generated by `./a.out'. Program terminated with signal SIGBUS, Bus error. #0 0x00005595bf25222a in main () at b.c:35 35 printf("!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here! (gdb) bt full #0 0x00005595bf25222a in main () at b.c:35 addr = 0x7fd18dc32000 <error: Cannot access memory at address 0x7fd18dc32000> file = 3 size = 1 ```
Able to reproduce this. This is not a bug within the kernel. Also use NULL instead of 0 in the addr field of mmap, it is more portable. Case of one byte file: When you allocate mmap, it always allocates memory whose size is a multiple of PAGE_SIZE. So even though you have specified size=1, memory allocated will still be PAGE_SIZE which is 4kb in my case but do 'getconf PAGE_SIZE' on your shell to see your PAGE_SIZE. According to the man page: A file is mapped in multiples of the page size. For a file that is not a multiple of the page size, the remaining memory is zeroed when mapped. This is why you are able to access addr[3](even though file size should be 2 for a single byte file, use stat on the file) also. 0XFFFF, OX2CFFF, 0x2DFFF give me a seg fault as my PAGE_SIZE is 4kb and it will seg fault if you access a memory address beyond the page. you can access till addr[PAGE_SIZE - 1]. Case of zero byte file: According to the man page, if a file of 0 byte is mapped, mmap will not fail. It will return to you an address but you cannot access it. try using gdb to check it out: put a break on main and keep stepping into the program till you cross mmap. Try accessing addr you'll get: (gdb) print(addr) $1 = 0x7ffff7ff6000 <error: Cannot access memory at address 0x7ffff7ff6000> This means that the memory is not allocated to the file. According to the man page, SIGBUS is returned when you try to access a buffer that does not correspond to the file. Hope this helps.
Thank you for your input. Kernel page size is 4096 bytes for me too. Also: `printf("!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096` > According to the man page, if a file of 0 byte is mapped, mmap will not fail. It will return to you an address but you cannot access it. Could you paste the relevant section from man page please? The only* reason I created this issue was because I thought this wasn't documented, and admittedly I only skimmed the man page... (* ok, the other reason was: no segfault/SIGBUS when accessing over the expected size) I guess it must be this part from `man 2 mmap`: ``` SIGBUS Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file). ``` more specifically this "`including the case where another process has truncated the file`" which is another way of implying the truncation can happen at offset 0 too, thus yielding a 0 byte now truncated file, this being the equivalent of mmap-ing a 0 byte file! So, then, indeed this is documented behaviour then. My bad. And (re)thinking about it, it makes sense to me now how it works. Thanks Bharath ! The other part of the issue, with accessing for non-zero byte file(eg. a symlink pointing to a 1 byte long file), the OP test was done on an AMD CPU/laptop and I was amazed that it didn't SIGBUS(or segfault even)! The current test I'm doing on an Intel i7-8700k CPU and it still acts the same: only segfaults at: ``` Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00005984278502e8 in main () at b.c:41 41 printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault ``` so not at any of 0x2DFFF or prior, but according to what you said it should: 1. segfault at any offset equal or above to PAGE_SIZE(aka 4096 bytes) which would be anything at or above 0x1000 ie. addr[0x1000] 2. And according to the above SIGBUS-quote from the `man 2 mmap` it should SIGBUS instead of segfault, because I am attempting to access memory beyond the end of the file. So, what could be the reasons it doesn't do 1&2 ? Any ideas? Kernel I'm using now: Linux i87k 5.1.1-gb724e9356404 #3 SMP Sun May 12 12:12:10 CEST 2019 x86_64 GNU/Linux from the stable repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.1.y
given the above, I've now changed bug title from: `mmap() on a symlink to 0 byte file returns a 0 byte mmap which crashes with SIGBUS error when accessed` to: `mmap() with len=1 on a symlink to 1 byte file returns a mmap which doesn't crash with SIGBUS(or segfault) when accessed from offset 0 to 0x2DFFF` Here's my current repro code (ie. modified file `b.c` from attachment): ``` // https://bugzilla.kernel.org/show_bug.cgi?id=203537 // https://midnight-commander.org/ticket/3983#comment:7 #include <sys/types.h> #include <sys/stat.h> #define __USE_XOPEN2K8 1 //to get O_NOFOLLOW #include <fcntl.h> #include <unistd.h> // close #include <stdio.h> //printf #define __USE_MISC 1 //to get MAP_FILE #include <sys/mman.h> //mmap #include <stdlib.h> #include <bits/mman-linux.h> // MAP_FILE #include <string.h> // memcmp #define CRASH 0 //^ set to 0 or undefine it to not crash! int main() { int file; off_t size=1; //should be size of the contents of the file that the symlink points to! #if CRASH==1 file = open("./3/symlink_to_emptyfile", O_RDONLY // | O_NOFOLLOW //open will fail since the file is a symlink! 12 bytes symlink (ie. "../emptyfile") ); #else //don't crash: file = open("./3/symlink_to_1bytefile", O_RDONLY); //works! #endif if (file >= 0) { printf("!! open success\n"); char *addr; addr = mmap (NULL, size, PROT_READ, MAP_FILE | MAP_PRIVATE, file, 0); if (addr != MAP_FAILED){ printf("!! mmap ok %p\n",addr); printf("!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here! printf("!! 2nd byte of mmap: %c\n", addr[1]); printf("!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1 long int page_size=sysconf(_SC_PAGE_SIZE);//4096 printf("!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 printf("!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 printf("!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1 printf("!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1 printf("!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096 printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault munmap(addr, size); } } else { printf("!! open failed\n"); } close(file); return 0; } ``` output: ``` $ ./go !! open success !! mmap ok 0x7cb61dee7000 !! 1st byte of mmap: 1 !! 2nd byte of mmap: !! 3nd byte of mmap: !! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: !! 0x1000-th(PAGE_SIZE) byte of mmap: !! 0xFFFF-th byte of mmap: � !! 0xFFFF-th byte of mmap: � !! 0X2CFFF-th byte of mmap: !! 0x2DFFF-th byte of mmap: !! PAGE_SIZE:4096 ./go: line 1: 2803 Segmentation fault (core dumped) ./a.out ``` ``` Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000057a00ef72393 in main () at b.c:48 48 printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault ``` Tested on Intel i7-8700k CPU now. (OP is on an AMD CPU, I don't remember exact details right now)
In other words, I can access memory beyond the end of the mmaped-file: `!! number of non-zero bytes beyond the end of mmaped-file: 138902` ``` --- orig/b.c 2019-05-12 15:11:45.945937911 +0200 +++ mod/b.c 2019-05-12 15:10:32.005938616 +0200 @@ -45,6 +45,14 @@ int main() { printf("!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1 printf("!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1 printf("!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096 + printf("!! number of non-zero bytes beyond the end of mmaped-file: "); + unsigned int count=0; + for (unsigned int i=1; i < 0x2E000; i++) { + if (addr[i] != 0) { + count++; + } + } + printf("%u\n", count); printf("!! 0x2E000-th byte of mmap: %c\n", addr[0x2E000]); // segfault munmap(addr, size); } ```
dumping those memory contents to the terminal, it looks like the contents of `/usr/lib64/ld-linux-x86-64.so.2` which is in `ldd` list for `a.out`: $ ldd ./a.out linux-vdso.so.1 (0x00007ffd6bf1d000) libc.so.6 => /usr/lib/libc.so.6 (0x0000797e613a6000) /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x0000797e61595000) So I'm accessing memory from my own program? that's why it didn't crash?
Here is something interesting I found. I did an strace of the executable of your code on a 0 byte file. I got the usual SIGBUS, but here is something cool: This is the SIGBUS signal's attrs. --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7f6ea5011000} --- +++ killed by SIGBUS (core dumped) +++ Look at the si_code 'BUS_ADRERR'. BUS_ADRERR occurs when there is non-existent physical memory. Since the file is 0 bytes and it has 0 data blocks, I think it has no physical address associated with its data. So accessing a file with no physical address will give a SIGBUS error. There is no physical address on disk which stores this file! I think when we access the file, a page fault will be triggered(as the kernel does lazy page allocation) and the kernel will try to make a page table entry for the virtual address(mmap given) and a pfn to hold the page. When it needs to get the contents of the file, It will see that there is no physical address to the file and will terminate. I am not entirely sure but it seems to make sense. But SIGSEGV will occur when you access an address which you don't have any right to touch. So when you access an address beyond the file in the case of non-zero byte file, you do not have any permissions to access the virtual address and you seg fault. This is the SIGSEGV signal's attrs. --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fd5ef09afff} --- +++ killed by SIGSEGV (core dumped) +++ SEGV_MAPPER occurs when the virtual address is not mapped to the file. In your context, I think SIGBUS occurs because the file is 0 bytes and has no physical address associated with it.(WTH are we reading when we do addr[0] is there is no physical address associated with it?!) Now why does it give a segfault when you access an address beyond the PAGE_SIZE? My interpretation is as follows: Essentially you are just accessing an address in the process address space. So when you go beyond the memory mapped to you by mmap, you are touching addresses which were not mapped to you and it fails because by signal_code SEGV_MAPPER. It does not access a physical address here. I am no memory management expert but these are my interpretations. I think you should put this on the memory management mailing lists so that the pros can give their opinions. PS: It segfaults for me after 0x1000(PAGE_SIZE) and does not go until 2E000 :( I am not sure why you can access till 0x2E000 :( What do you think?
Created attachment 282735 [details] my /proc/config.gz It makes sense. I just wish it would be consistent with the manual, or have the manual changed to reflect that. But it's possible I misread it. Oh well :) I don't care about it anymore. What I do care about, right now, if me not getting segfault like you do. Bharath, may I please have your /proc/config.gz (aka .config) for the linux kernel which you used when you got segfault after 0x1000 ? I'm thinking that(something in the kernel config) must be the only thing I've in common on the two systems(laptop and desktop) since they both don't segfault until 0x2E000. Thank you for your insights, Bharath. Having never used mmap before, myself, I find I've learned quite a bit from what you wrote! Any idea where's the code for mmap() in kernel? I tried to find it but it seems obscured or quite possibly under a different function name. I don't even know. ./arch/x86/mm/mmap.c ./mm/mmap.c ./include/linux/mm.h > I think you should put this on the memory management mailing lists so that the pros can give their opinions. PS: I'm avoiding the mailing lists - next thing you know I'm going to be asked follow some obscure rules or sign-off on patches using real name or something both of which are whatever-s-the-antonym-for-incentive =) Oh look, I mmap-ed 200MiB file (209715200 bytes of dd-ed /dev/zero into a file) and I can access data until at least 1806834 bytes after it ends. That's 1.7MiB more, of which 1470000 bytes are non-zero. And I see text from another program like: %s%s[%d](full:'%s') for user %s(%d(eff:%s(%d))) 2of2 FAILed to resolve requested hostname '%s'(raw), which %s transformed(eg. 𝙵𝟢𝟢->f00) into: Now I wonder what allows this! What kernel .config option that I have is really allowing this... since I can't think of anything else (can two systems, one AMD and one Intel, be so alike to both allow this, or is it likely some kernel .config option that I have in common for both the culprit... maybe I should try with ArchLinux's provided kernels... ) ``` // https://bugzilla.kernel.org/show_bug.cgi?id=203537 // https://midnight-commander.org/ticket/3983#comment:7 #include <sys/types.h> #include <sys/stat.h> #define __USE_XOPEN2K8 1 //to get O_NOFOLLOW #include <fcntl.h> #include <unistd.h> // close #include <stdio.h> //printf #define __USE_MISC 1 //to get MAP_FILE #include <sys/mman.h> //mmap #include <stdlib.h> #include <bits/mman-linux.h> // MAP_FILE #include <string.h> // memcmp #define CRASH 0 //^ set to 0 or undefine it to not crash! #define USE_SYMLINK 0 #define USE_BIGFILE 1 //set to 0 to use 1 byte file when USE_SYMLINK is 0 (and CRASH is 0) int main() { int file; off_t size=1; //should be size of the contents of the file that the symlink points to! #if CRASH==1 file = open("./3/symlink_to_emptyfile", O_RDONLY // | O_NOFOLLOW //open will fail since the file is a symlink! 12 bytes symlink (ie. "../emptyfile") ); #else //don't crash: #if USE_SYMLINK==1 printf("!! symlink\n"); file = open("./3/symlink_to_1bytefile", O_RDONLY); //works! #elif USE_BIGFILE==0 printf("!! normal file\n"); file = open("./1bytefile", O_RDONLY); //works! #else printf("!! big file\n"); file = open("./bigfile", O_RDONLY); // created via: dd if=/dev/zero of=bifgile bs=1M count=200 size=200*1024*1024; #endif #endif if (file >= 0) { printf("!! open success\n"); char *addr; addr = mmap (NULL, size, PROT_READ, MAP_FILE | MAP_PRIVATE, file, 0); if (addr != MAP_FAILED){ printf("!! mmap ok %p\n",addr); printf("!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here! printf("!! 2nd byte of mmap: %c\n", addr[1]); printf("!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1 long int page_size=sysconf(_SC_PAGE_SIZE);//4096 printf("!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 printf("!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 printf("!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 printf("!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1 printf("!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1 printf("!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096 printf("!! number of non-zero bytes beyond the end of mmaped-file: "); unsigned int count=0; for (unsigned int i=1; i < size+1551160+0x2E000*2; i++) { if (addr[i] != 0) { printf("%c", addr[i]); count++; } if ((count > 0) && (count % 10000 == 0)) { printf("!! i=%u count='%u'\n", i, count); } } printf("%u\n", count); printf("!! 0x2E000-th byte of mmap: %c\n", addr[size+0x2E000-1]); // segfault munmap(addr, size); } } else { printf("!! open failed\n"); } close(file); return 0; } ```
I'm able to access memory beyond mmap-ed end of file with all kernels(listed below), so then it's likely not a .config thing(because only one is compiled by me), maybe a /proc/cmdline thing? ```local/linux 5.0.13.arch1-1 (base) The Linux kernel and modules local/linux-stable 5.1.1.r0.gb724e9356404-1 (builtbydaddy) The Linux kernel and modules (stable version) local/linux-lts 4.19.42-1 The Linux-lts kernel and modules Linux i87k 4.19.42-1-lts #1 SMP Sat May 11 06:58:51 CEST 2019 x86_64 GNU/Linux Linux i87k 5.0.13-arch1-1-ARCH #1 SMP PREEMPT Sun May 5 18:05:41 UTC 2019 x86_64 GNU/Linux Linux i87k 5.1.1-gb724e9356404 #9 SMP Sun May 12 22:02:58 CEST 2019 x86_64 GNU/Linux ``` This is my /proc/cmdline currently: ``` BOOT_IMAGE=/boot/vmlinuz-linux-stable root=UUID=2b8b9ab8-7ac5-4586-aa42-d7ffb12de92a rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 pax_sanitize_slab=full nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=128M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 noefi cpuidle.governor=teo zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold i915.alpha_support=1 i915.fastboot=1 ``` I don't currently have access to the AMD system but its /proc/cmdline was: ``` BOOT_IMAGE=/boot/vmlinuz-linux-git root=UUID=a499587c-99ef-4c93-9ea0-b61cb5f13193 rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 pax_sanitize_slab=full nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=128M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 acpi_backlight=vendor CPUunderclocking noefi tsc=unstable radeon.audio=0 radeon.lockup_timeout=999000 radeon.test=0 radeon.agpmode=-1 radeon.benchmark=0 radeon.tv=0 radeon.hard_reset=1 radeon.msi=1 radeon.pcie_gen2=-1 radeon.no_wb=1 radeon.dynclks=0 radeon.r4xx_atom=0 radeonfb radeon.fastfb=1 radeon.dpm=1 radeon.runpm=1 radeon.modeset=1 radeon.aspm=0 pcie_aspm=off rcu_nocbs=1-3 cpuidle.governor=teo zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold ``` wdiff shows the following in common: rw root_trim=yes rd.luks.allow-discards rd.luks.options=discard ipv6.disable=1 ipv6.disable_ipv6=1 ipv6.autoconf=0 loglevel=15 log_buf_len=16M ignore_loglevel printk.always_kmsg_dump=y printk.time=y printk.devkmsg=on mminit_loglevel=4 memory_corruption_check=1 fbcon=scrollback:4096k fbcon=font:ProFont6x11 net.ifnames=0 pax_sanitize_slab=full nolvm dobtrfs console=tty1 earlyprintk=vga audit=0 systemd.log_target=kmsg systemd.journald.forward_to_console=1 enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=128M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off logo.nologo lpj=0 mce=bootlog reboot=force,cold noexec=on nohibernate scsi_mod.use_blk_mq=1 consoleblank=120 noefi cpuidle.governor=teo zram.num_devices=3 zswap.enabled=0 zswap.same_filled_pages_enabled=1 zswap.compressor=zstd zswap.max_pool_percent=40 zswap.zpool=z3fold Removing some obvious ones which couldn't be causing this, we're left with: memory_corruption_check=1 log_buf_len=16M fbcon=scrollback:4096k pax_sanitize_slab=full enforcing=0 udev.children-max=1256 rd.udev.children-max=1256 nohz=on oops=panic crashkernel=128M panic=0 psi=1 sysrq_always_enabled random.trust_cpu=off lpj=0 mce=bootlog noexec=on cpuidle.governor=teo zram.num_devices=3 then there's ``` /etc/modprobe.d $ cat *|grep -v ^#| sed -e '/^\s*$/d' alias char-major-10-175 agpgart alias char-major-10-200 tun alias char-major-81 bttv alias char-major-108 ppp_generic alias /dev/ppp ppp_generic alias tty-ldisc-3 ppp_async alias tty-ldisc-14 ppp_synctty alias ppp-compress-21 bsd_comp alias ppp-compress-24 ppp_deflate alias ppp-compress-26 ppp_deflate alias loop-xfer-gen-0 loop_gen alias loop-xfer-3 loop_fish2 alias loop-xfer-gen-10 loop_gen alias cipher-2 des alias cipher-3 fish2 alias cipher-4 blowfish alias cipher-6 idea alias cipher-7 serp6f alias cipher-8 mars6 alias cipher-11 rc62 alias cipher-15 dfc2 alias cipher-16 rijndael alias cipher-17 rc5 alias char-major-89 i2c-dev blacklist ideapad_laptop blacklist thinkpad_acpi blacklist nvram blacklist rfkill blacklist led_class blacklist fglrx blacklist ipv6 install ipv6 /bin/true blacklist nfs blacklist pcspkr blacklist uvcvideo blacklist bluetooth alias parport_lowlevel parport_pc alias char-major-10-144 nvram alias binfmt-0064 binfmt_aout alias char-major-10-135 rtc softdep uhci_hcd pre: ehci_hcd softdep ohci_hcd pre: ehci_hcd options zram num_devices=3 ``` /etc/modules-load.d only has zram They seem harmless. Back to /proc/cmdline, further reducing: ``` lpj=n [KNL] Sets loops_per_jiffy to given constant, thus avoiding time-consuming boot-time autodetection (up to 250 ms per CPU). 0 enables autodetection (default). To determine the correct value for your kernel, boot with normal autodetection and see what value is printed. Note that on SMP systems the preset will be applied to all CPUs, which is likely to cause problems if your CPUs need significantly divergent settings. An incorrect value will cause delays in the kernel to be wrong, leading to unpredictable I/O errors and other breakage. Although unlikely, in the extreme case this might damage your hardware. mce=bootlog Enable logging of machine checks left over from booting. Disabled by default on AMD Fam10h and older because some BIOS leave bogus ones. If your BIOS doesn't do that it's a good idea to enable though to make sure you log even machine check events that result in a reboot. On Intel systems it is enabled by default. 2. fbcon=scrollback:<value>[k] The scrollback buffer is memory that is used to preserve display contents that has already scrolled past your view. This is accessed by using the Shift-PageUp key combination. The value 'value' is any integer. It defaults to 32KB. The 'k' suffix is optional, and will multiply the 'value' by 1024. log_buf_len=n[KMG] Sets the size of the printk ring buffer, in bytes. n must be a power of two and greater than the minimal size. The minimal size is defined by LOG_BUF_SHIFT kernel config parameter. There is also CONFIG_LOG_CPU_MAX_BUF_SHIFT config parameter that allows to increase the default size depending on the number of CPUs. See init/Kconfig for more details. memory_corruption_check=0/1 [X86] Some BIOSes seem to corrupt the first 64k of memory when doing things like suspend/resume. Setting this option will scan the memory looking for corruption. Enabling this will both detect corruption and prevent the kernel from using the memory being corrupted. However, its intended as a diagnostic tool; if repeatable BIOS-originated corruption always affects the same memory, you can use memmap= to prevent the kernel from using that memory. enforcing [SELINUX] Set initial enforcing status. Format: {"0" | "1"} See security/selinux/Kconfig help text. 0 -- permissive (log only, no denials). 1 -- enforcing (deny and log). Default value is 0. Value can be changed at runtime via /selinux/enforce. nohz= [KNL] Boottime enable/disable dynamic ticks Valid arguments: on, off Default: on nohz_full= [KNL,BOOT,SMP,ISOL] The argument is a cpu list, as described above. In kernels built with CONFIG_NO_HZ_FULL=y, set the specified list of CPUs whose tick will be stopped whenever possible. The boot CPU will be forced outside the range to maintain the timekeeping. Any CPUs in this list will have their RCU callbacks offloaded, just as if they had also been called out in the rcu_nocbs= boot parameter. oops=panic Always panic on oopses. Default is to just kill the process, but there is a small probability of deadlocking the machine. This will also cause panics on machine check exceptions. Useful together with panic=30 to trigger a reboot. panic= [KNL] Kernel behaviour on panic: delay <timeout> timeout > 0: seconds before rebooting timeout = 0: wait forever timeout < 0: reboot immediately Format: <timeout> psi= [KNL] Enable or disable pressure stall information tracking. Format: <bool> sysrq_always_enabled [KNL] Ignore sysrq setting - this boot parameter will neutralize any effect of /proc/sys/kernel/sysrq. Useful for debugging. random.trust_cpu={on,off} [KNL] Enable or disable trusting the use of the CPU's random number generator (if available) to fully seed the kernel's CRNG. Default is controlled by CONFIG_RANDOM_TRUST_CPU. noexec [IA-64] noexec [X86] On X86-32 available only on PAE configured kernels. noexec=on: enable non-executable mappings (default) noexec=off: disable non-executable mappings cpuidle.governor= [CPU_IDLE] Name of the cpuidle governor to use. crashkernel=size[KMG][@offset[KMG]] [KNL] Using kexec, Linux can switch to a 'crash kernel' upon panic. This parameter reserves the physical memory region [offset, offset + size] for that kernel image. If '@offset' is omitted, then a suitable offset is selected automatically. Check Documentation/kdump/kdump.txt for further details. ``` Can't find any info on 'pax_sanitize_slab'(tried `git log -S'pax_sanitize_slab'` and `git log -G"pax_sanitize_slab"` and `grep -nrIF pax_sanitize_slab` in kernel source tree), guessing it must be a remnant of using gentoo hardened kernels. So what's left: enforcing=0 cpuidle.governor=teo zram.num_devices=3 Will reboot each of the 3 kernels with those options removed and only report in a new comment if there's a change. (I'll also temp replace /tmp with tmpfs - currently zram)
re prev. comment: no change! Now using the following program, I'm getting a segfault at byte 1814528 after the end of the file. Dumping the contents since the first non-zero byte, shows some kind of ELF program. (not attaching since it might include some sensitive info, who knows, but it's not /lib64/ld-2.29.so) ``` (gdb) bt full #0 0x000063307f21f468 in main () at b.c:63 i = 211529728 page_size = 4096 count = 1474334 addr = 0x7f893f33d000 <error: Cannot access memory at address 0x7f893f33d000> file = 3 size = 209715200 ``` ``` // https://bugzilla.kernel.org/show_bug.cgi?id=203537 // https://midnight-commander.org/ticket/3983#comment:7 #include <sys/types.h> #include <sys/stat.h> #define __USE_XOPEN2K8 1 //to get O_NOFOLLOW #include <fcntl.h> #include <unistd.h> // close #include <stdio.h> //printf #define __USE_MISC 1 //to get MAP_FILE #include <sys/mman.h> //mmap #include <stdlib.h> #include <bits/mman-linux.h> // MAP_FILE #include <string.h> // memcmp #define CRASH 0 //^ set to 0 or undefine it to not crash! #define USE_SYMLINK 0 #define USE_BIGFILE 1 //set to 0 to use 1 byte file when USE_SYMLINK is 0 (and CRASH is 0) int main() { int file; off_t size=1; //should be size of the contents of the file that the symlink points to! #if CRASH==1 file = open("./3/symlink_to_emptyfile", O_RDONLY // | O_NOFOLLOW //open will fail since the file is a symlink! 12 bytes symlink (ie. "../emptyfile") ); #else //don't crash: #if USE_SYMLINK==1 fprintf(stderr,"!! symlink\n"); file = open("./3/symlink_to_1bytefile", O_RDONLY); //works! #elif USE_BIGFILE==0 fprintf(stderr, "!! normal file\n"); file = open("./1bytefile", O_RDONLY); //works! #else fprintf(stderr,"!! big file\n"); file = open("./bigfile", O_RDONLY); // created via: dd if=/dev/zero of=bifgile bs=1M count=200 size=200*1024*1024; #endif #endif if (file >= 0) { fprintf(stderr,"!! open success\n"); char *addr; addr = mmap (NULL, size, PROT_READ, MAP_FILE | MAP_PRIVATE, file, 0); if (addr != MAP_FAILED){ fprintf(stderr,"!! mmap ok %p\n",addr); fprintf(stderr,"!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here! fprintf(stderr,"!! 2nd byte of mmap: %c\n", addr[1]); fprintf(stderr,"!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1 long int page_size=sysconf(_SC_PAGE_SIZE);//4096 fprintf(stderr,"!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 fprintf(stderr,"!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 fprintf(stderr,"!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1 fprintf(stderr,"!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1 fprintf(stderr,"!! PAGE_SIZE:%ld\n", sysconf(_SC_PAGE_SIZE)); // 4096 fprintf(stderr,"!! number of non-zero bytes beyond the end of mmaped-file: "); unsigned int count=0; for (unsigned int i=1; i < size+1551160+0x2E000*2; i++) { if (addr[i] != 0) { count++; } if (count>0) { //print all after first non-zero which would be 'ELF' printf("%c", addr[i]); //XXX crashes at i=211529728, that's on accessing 443rd kernel page after the end of file } /* if ((count > 0) && (count % 10000 == 0)) { printf("!! i=%u count='%u'\n", i, count); }*/ } fprintf(stderr,"%u\n", count); fprintf(stderr,"!! 0x2E000-th byte of mmap: %c\n", addr[size+0x2E000-1]); // segfault munmap(addr, size); } } else { fprintf(stderr,"!! open failed\n"); } close(file); return 0; } ``` I run it with `./go` ``` $ cat go #!/bin/bash rm ./a.out ; gcc -ggdb3 -O0 b.c && ./a.out >screen.out ``` Note: using `-O2` has exactly the same effect. (only `addr` is different in `gdb`) and using `-g0` still yields the same 1,810,432 bytes `screen.out` file, that's exactly 442 PAGE_SIZE(4096 bytes) pages, which means that when trying to access kernel page 443rd after the end of file I get the segfault (443=i-size=211529728-209715200, seen from gdb output, file size also being 209715200 aka 200MiB). So this also meant that there was one kernel page of zeroes after the file, since 442 pages were saved to file, thus 1 was skipped. Oh crap, I've just realized something: I was looking for mmap in kernel tree, but: $ pacman -Qo /usr/include/sys/mman.h /usr/include/sys/mman.h is owned by glibc 2.29-1 So MAYBE it's part of glibc? or just the 'extern' about it is there, either way it seems to point to an `mmap64` which I definitely can't find in kernel code (not fox x86) yep, I found something in glibc: misc/mmap64.c then I got lost into macros ``` /* An architecture may override this. */ #ifndef MMAP_CALL # define MMAP_CALL(__nr, __addr, __len, __prot, __flags, __fd, __offset) \ INLINE_SYSCALL_CALL (__nr, __addr, __len, __prot, __flags, __fd, __offset) #endif ``` I guess it still calls into something from the kernel, I just don't know the name of it. oh wait, it's just `mmap`: ``` return (void *) MMAP_CALL (mmap, addr, len, prot, flags, fd, MMAP_ADJUST_OFFSET (offset)); ``` or rather: __INLINE_SYSCALL_mmap ? ahaaa, I have it: `arch/x86/kernel/sys_x86_64.c:91:SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,` oh yeah this is gotta be it: ``` SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len, unsigned long, prot, unsigned long, flags, unsigned long, fd, unsigned long, off) { long error; error = -EINVAL; if (off & ~PAGE_MASK) goto out; error = ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT); out: return error; } ``` So `mmap` is basically `ksys_mmap_pgoff`, that's what I wanted to know! which is defined in mm/mmap.c: ``` unsigned long ksys_mmap_pgoff(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long pgoff) { struct file *file = NULL; unsigned long retval; if (!(flags & MAP_ANONYMOUS)) { audit_mmap_fd(fd, flags); file = fget(fd); if (!file) return -EBADF; if (is_file_hugepages(file)) len = ALIGN(len, huge_page_size(hstate_file(file))); retval = -EINVAL; if (unlikely(flags & MAP_HUGETLB && !is_file_hugepages(file))) goto out_fput; } else if (flags & MAP_HUGETLB) { struct user_struct *user = NULL; struct hstate *hs; hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); if (!hs) return -EINVAL; len = ALIGN(len, huge_page_size(hs)); /* * VM_NORESERVE is used because the reservations will be * taken when vm_ops->mmap() is called * A dummy user value is used because we are not locking * memory so no accounting is necessary */ file = hugetlb_file_setup(HUGETLB_ANON_FILE, len, VM_NORESERVE, &user, HUGETLB_ANONHUGE_INODE, (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); if (IS_ERR(file)) return PTR_ERR(file); } flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE); retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff); out_fput: if (file) fput(file); return retval; } ``` ok, I'll try to understand something from that mess! ("mess" to the untrained eye, that is)
I've modified the script to not mmap file because it works the same for MAP_ANONYMOUS: ``` $ cat go #!/bin/bash rm ./a.out ; gcc -ggdb3 -O0 mmap_access_beyond.c && ./a.out >screen.out ; ls -la ./screen.out $ cat ./mmap_access_beyond.c // https://bugzilla.kernel.org/show_bug.cgi?id=203537 #include <unistd.h> // for close() or sysconf()/_SC_PAGE_SIZE #include <stdio.h> //printf #define __USE_MISC 1 //to get MAP_FILE or MAP_ANONYMOUS #include <sys/mman.h> //mmap #define SMALL_MMAP 1 //set to 0 to use a 200MiB mmap or set to 1 to use a 1 byte mmap! int main() { off_t size= #if SMALL_MMAP==1 1 // a 1 byte mmap #else 200*1024*1024 // a 200MiB mmap #endif ; char *addr; addr = mmap (NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); //same behaviour even without a file! if (addr != MAP_FAILED){ fprintf(stderr,"!! mmap ok %p\n",addr); fprintf(stderr,"!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here! fprintf(stderr,"!! 2nd byte of mmap: %c\n", addr[1]); fprintf(stderr,"!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1 const unsigned int page_size=sysconf(_SC_PAGE_SIZE);//4096 fprintf(stderr,"!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 fprintf(stderr,"!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 fprintf(stderr,"!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1 fprintf(stderr,"!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1 fprintf(stderr,"!! PAGE_SIZE:%u\n", page_size); // 4096 //fprintf(stderr,"!! number of non-zero bytes beyond the end of mmaped-file: "); unsigned int nonzerochars_seen=0; for (unsigned int i=1; i < size+1551160+0x2E000*2; i++) { if ( (i >= 211529728) || ((size == 1) && (i >= 188416)) ) { fprintf(stderr,"!! about to access addr at offset i=%u nonzerochars_seen='%u'\n", i, nonzerochars_seen); } if (addr[i] != 0) { nonzerochars_seen++; } if (nonzerochars_seen>0) { //print all after first non-zero which would be 'ELF' printf("%c", addr[i]); //XXX ^ crashes at i=211529728 when size == 200MiB, that's on accessing 443rd kernel page after the end of mmap-ed memory region //XXX ^ crases at i = 188416 if size == 1 and screen.out has first 172,016 bytes identical with /lib64/ld-2.29.so which is indirectly listed in a.out's ldd(differently named symlinks to it) } } //fprintf(stderr,"%u\n", nonzerochars_seen); fprintf(stderr,"!! 0x2E000-th byte of mmap: %c\n", addr[size+0x2E000-1]); // segfault munmap(addr, size); } return 0; } ``` Run as `./go` outputs: for `SMALL_MMAP 1` ``` $ ./go !! mmap ok 0x7faf8b2b6000 !! 1st byte of mmap: !! 2nd byte of mmap: !! 3nd byte of mmap: !! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: !! 0x1000-th(PAGE_SIZE) byte of mmap: !! 0xFFFF-th byte of mmap: � !! 0xFFFF-th byte of mmap: � !! 0X2CFFF-th byte of mmap: !! 0x2DFFF-th byte of mmap: !! PAGE_SIZE:4096 !! about to access addr at offset i=188416 nonzerochars_seen='138902' ./go: line 7: 18175 Segmentation fault (core dumped) ./a.out > screen.out -rw-r--r-- 1 user user 180224 May 13 16:19 ./screen.out ``` for `SMALL_MMAP 0` ``` $ ./go !! mmap ok 0x71804d3e3000 !! 1st byte of mmap: !! 2nd byte of mmap: !! 3nd byte of mmap: !! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: !! 0x1000-th(PAGE_SIZE) byte of mmap: !! 0xFFFF-th byte of mmap: !! 0xFFFF-th byte of mmap: !! 0X2CFFF-th byte of mmap: !! 0x2DFFF-th byte of mmap: !! PAGE_SIZE:4096 !! about to access addr at offset i=211529728 nonzerochars_seen='1474334' ./go: line 7: 18323 Segmentation fault (core dumped) ./a.out > screen.out -rw-r--r-- 1 user user 1810432 May 13 16:20 ./screen.out ```
Created attachment 282739 [details] mmap_access_beyond.c
valdis on ##kernel (freenode irc) found the reason: > <valdis> howaboutsynergy: SO... I check your program... gdb it. Single step, and catch /proc/PID/maps just before the mmap, and just after, and diff the two. And the segment actually allocated is: <valdis> 7ffff7fd3000-7ffff7ff3000 r-xp 00001000 fd:02 397289 /usr/lib64/ld-2.29.9000.so <valdis> 7ffff7ff3000-7ffff7ffb000 r--p 00021000 fd:02 397289 /usr/lib64/ld-2.29.9000.so <valdis> +7ffff7ffb000-7ffff7ffc000 r--p 00000000 00:00 0 <valdis> 7ffff7ffc000-7ffff7ffd000 r--p 00029000 fd:02 397289 /usr/lib64/ld-2.29.9000.so <valdis> 7ffff7ffd000-7ffff7ffe000 rw-p 0002a000 fd:02 397289 /usr/lib64/ld-2.29.9000.so <valdis> You asked for 1 page, and the kernel found 1 page, right between two mmaps already existing. So you walk off the end of your 1 page mmap, but there's another page in the *next* mmap. <valdis> Which is why your "off the end" looks suspiciously like ld.so :) And he seems to be running the same glibc 2.29 just like me. Someone else (<ayecee> on ##linux freenode irc) found that on Ubuntu 16.04.6 with glibc 2.23 and Ubuntu 18.04 and glibc 2.27 it segfaults as expected when size==1 at addr[PAGE_SIZE] Meanwhile I'm attempting to reproduce valdis' results so I'm in the process of finding out how =) (ie. searching gdb help)
Alrightie then :D Using valdis' ideas (also `'gdb ./a.out', then 'break main', *then* 'run'. Stops at beginning of main(), and then you use step from there. :` ) I found that with `SMALL_MMAP 1` I get: ``` 7ffff7fa0000-7ffff7fa3000 r--p 001bb000 00:14 365426 /usr/lib/libc-2.29.so 7ffff7fa3000-7ffff7fa6000 rw-p 001be000 00:14 365426 /usr/lib/libc-2.29.so 7ffff7fa6000-7ffff7fac000 rw-p 00000000 00:00 0 +7ffff7fcd000-7ffff7fce000 r--p 00000000 00:00 0 7ffff7fce000-7ffff7fd1000 r--p 00000000 00:00 0 [vvar] 7ffff7fd1000-7ffff7fd2000 r-xp 00000000 00:00 0 [vdso] 7ffff7fd2000-7ffff7fd4000 r--p 00000000 00:14 365415 /usr/lib/ld-2.29.so ``` and with `SMALL_MMAP 0` (aka 200MiB) mmap, I get: ``` $ colordiff -up /tmp/e_before /tmp/e_after --- /tmp/e_before 2019-05-13 17:34:19.059892668 +0200 +++ /tmp/e_after 2019-05-13 17:35:13.831892145 +0200 @@ -3,6 +3,7 @@ 555555556000-555555557000 r--p 00002000 00:14 3107935 /home/user/sandbox/c/mmap_symlink/a.out 555555557000-555555558000 r--p 00002000 00:14 3107935 /home/user/sandbox/c/mmap_symlink/a.out 555555558000-555555559000 rw-p 00003000 00:14 3107935 /home/user/sandbox/c/mmap_symlink/a.out +7fffeb5e4000-7ffff7de4000 r--p 00000000 00:00 0 7ffff7de4000-7ffff7e09000 r--p 00000000 00:14 365426 /usr/lib/libc-2.29.so 7ffff7e09000-7ffff7f5c000 r-xp 00025000 00:14 365426 /usr/lib/libc-2.29.so 7ffff7f5c000-7ffff7f9f000 r--p 00178000 00:14 365426 /usr/lib/libc-2.29.so ``` > <howaboutsynergy> oh yeah it workz! so cool! So somehow even after mmap-ing 200MiB, something put right after it 3 /usr/lib/libc-2.29.so > <valdis> You have taht backwards. libc-2.29.so got mapped there, and then the kernel mapped your mmap() call right up against it. * valdis wonders how hard it would be to add an "allocate non-accessible guard page at both ends" flag to mmap(). `man 2 mmap` > SIGBUS Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file). `man mmap` (ie. `map 3p mmap`) > The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. > An implementation may generate SIGBUS signals when a reference would cause an error in the mapped object, such as out-of-space condition.
closing as not-a-kernel-bug, because glibc 2.28 seems to act differently/better with this: it segfaults sooner, at page 4 beyond mmap, ``` $ ./go !! mmap ok 0x7556715a2000 ./go: line 7: 7991 Segmentation fault (core dumped) ./a.out > screen.out -rw-r--r-- 1 user user 8192 May 14 01:51 ./screen.out --- /tmp/_before_mmap 2019-05-14 01:51:14.724989202 +0200 +++ /tmp/_after_mmap 2019-05-14 01:51:14.727989202 +0200 @@ -14,6 +14,7 @@ 755671579000-75567157b000 r--p 00000000 00:14 3607274 /usr/lib/ld-2.28.so 75567157b000-75567159b000 r-xp 00002000 00:14 3607274 /usr/lib/ld-2.28.so 75567159b000-7556715a2000 r--p 00022000 00:14 3607274 /usr/lib/ld-2.28.so +7556715a2000-7556715a3000 r--p 00000000 00:00 0 7556715a3000-7556715a4000 r--p 00029000 00:14 3607274 /usr/lib/ld-2.28.so 7556715a4000-7556715a5000 rw-p 0002a000 00:14 3607274 /usr/lib/ld-2.28.so 7556715a5000-7556715a6000 rw-p 00000000 00:00 0 (gdb) bt full #0 0x0000597354649357 in main () at mmap_access_beyond.c:67 i = 16384 rv2 = 0 rv3 = 256 nonzerochars_seen = 1976 size = 1 addr = 0x7556715a2000 <error: Cannot access memory at address 0x7556715a2000> selfpid = 7991 wstatus = 1901412200 cmd = 0x597355c14260 "cat /proc/7991/maps >/tmp/_after_mmap" cmd_size = 100 rv = 0 ``` instead of at page 46 with glibc 2.29: ``` $ ./go !! mmap ok 0x7c30c96d1000 !! about to access addr at offset i=188416 nonzerochars_seen='139253' ./go: line 7: 8367 Segmentation fault (core dumped) ./a.out > screen.out -rw-r--r-- 1 user user 180224 May 14 01:54 ./screen.out --- /tmp/_before_mmap 2019-05-14 01:54:19.728987437 +0200 +++ /tmp/_after_mmap 2019-05-14 01:54:19.730987437 +0200 @@ -11,6 +11,7 @@ 7c30c96a4000-7c30c96a7000 r--p 001bb000 00:14 3609092 /usr/lib/libc-2.29.9000.so 7c30c96a7000-7c30c96aa000 rw-p 001be000 00:14 3609092 /usr/lib/libc-2.29.9000.so 7c30c96aa000-7c30c96b0000 rw-p 00000000 00:00 0 +7c30c96d1000-7c30c96d2000 r--p 00000000 00:00 0 7c30c96d2000-7c30c96d4000 r--p 00000000 00:14 3609081 /usr/lib/ld-2.29.9000.so 7c30c96d4000-7c30c96f4000 r-xp 00002000 00:14 3609081 /usr/lib/ld-2.29.9000.so 7c30c96f4000-7c30c96fc000 r--p 00022000 00:14 3609081 /usr/lib/ld-2.29.9000.so (gdb) bt full #0 0x0000594711951357 in main () at mmap_access_beyond.c:67 i = 188416 rv2 = 0 rv3 = 256 nonzerochars_seen = 139253 size = 1 addr = 0x7c30c96d1000 <error: Cannot access memory at address 0x7c30c96d1000> selfpid = 8367 wstatus = -915747000 cmd = 0x594711ef6260 "cat /proc/8367/maps >/tmp/_after_mmap" cmd_size = 100 rv = 0 ```
Created attachment 282747 [details] mmap_access_beyond.c for completion, Here's info for the 200MiB mmap: with glibc 2.28.r0.g3c03baca37-1: ``` $ ./go !! mmap ok 0x70a351e64000 ./go: line 7: 8968 Segmentation fault (core dumped) ./a.out > screen.out -rw-r--r-- 1 user user 1806336 May 14 01:59 ./screen.out --- /tmp/_before_mmap 2019-05-14 01:59:28.396984494 +0200 +++ /tmp/_after_mmap 2019-05-14 01:59:28.398984494 +0200 @@ -4,6 +4,7 @@ 5d7bceb0f000-5d7bceb10000 r--p 00002000 00:14 3611931 /home/user/sandbox/c/mmap_symlink/a.out 5d7bceb10000-5d7bceb11000 rw-p 00003000 00:14 3611931 /home/user/sandbox/c/mmap_symlink/a.out 5d7bcfad1000-5d7bcfaf2000 rw-p 00000000 00:00 0 [heap] +70a351e64000-70a35e664000 r--p 00000000 00:00 0 70a35e664000-70a35e689000 r--p 00000000 00:14 3610876 /usr/lib/libc-2.28.so 70a35e689000-70a35e7db000 r-xp 00025000 00:14 3610876 /usr/lib/libc-2.28.so 70a35e7db000-70a35e81e000 r--p 00177000 00:14 3610876 /usr/lib/libc-2.28.so ----------- user@i87k 2019/05/14 01:59:29 -bash5.0.7 t:7 j:0 d:3 pp:1054 p:7017 ut1628 !7963 15 0 5.1.1-gb724e9356404 #9 SMP Sun May 12 22:02:58 CEST 2019 /home/user/sandbox/c/mmap_symlink $ coredumpctl gdb PID: 8968 (a.out) UID: 1000 (user) GID: 1000 (user) Signal: 11 (SEGV) Timestamp: Tue 2019-05-14 01:59:28 CEST (3s ago) Command Line: ./a.out Executable: /home/user/sandbox/c/mmap_symlink/a.out Control Group: /user.slice/user-1000.slice/session-1.scope Unit: session-1.scope Slice: user-1000.slice Session: 1 Owner UID: 1000 (user) Boot ID: 97842479cc604964a7a58a10e00f9e15 Machine ID: 5767ef25f523419aaa049f3d74481940 Hostname: i87k Storage: /var/lib/systemd/coredump/core.a\x2eout.1000.97842479cc604964a7a58a10e00f9e15.8968.1557791968000000 Message: Process 8968 (a.out) of user 1000 dumped core. Stack trace of thread 8968: #0 0x00005d7bceb0d357 n/a (/home/user/sandbox/c/mmap_symlink/a.out) #1 0x000070a35e68b43b __libc_start_main (libc.so.6) #2 0x00005d7bceb0d0fe n/a (/home/user/sandbox/c/mmap_symlink/a.out) GNU gdb (GDB) 8.2.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /home/user/sandbox/c/mmap_symlink/a.out...done. [New LWP 8968] Core was generated by `./a.out'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00005d7bceb0d357 in main () at mmap_access_beyond.c:67 67 if (addr[i] != 0) { (gdb) bt full #0 0x00005d7bceb0d357 in main () at mmap_access_beyond.c:67 i = 211525632 rv2 = 0 rv3 = 256 nonzerochars_seen = 1466787 size = 209715200 addr = 0x70a351e64000 <error: Cannot access memory at address 0x70a351e64000> selfpid = 8968 wstatus = 1585610600 cmd = 0x5d7bcfad1260 "cat /proc/8968/maps >/tmp/_after_mmap" cmd_size = 100 rv = 0 (gdb) quit ``` and with glibc 2.29.9000.r248.gf6efec90c8-1: ``` $ ./go !! mmap ok 0x74139d730000 !! about to access addr at offset i=211529728 nonzerochars_seen='1473581' ./go: line 7: 8815 Segmentation fault (core dumped) ./a.out > screen.out -rw-r--r-- 1 user user 1810432 May 14 01:59 ./screen.out --- /tmp/_before_mmap 2019-05-14 01:59:09.824984671 +0200 +++ /tmp/_after_mmap 2019-05-14 01:59:09.826984671 +0200 @@ -4,6 +4,7 @@ 600793f76000-600793f77000 r--p 00002000 00:14 3610153 /home/user/sandbox/c/mmap_symlink/a.out 600793f77000-600793f78000 rw-p 00003000 00:14 3610153 /home/user/sandbox/c/mmap_symlink/a.out 600794594000-6007945b5000 rw-p 00000000 00:00 0 [heap] +74139d730000-7413a9f30000 r--p 00000000 00:00 0 7413a9f30000-7413a9f55000 r--p 00000000 00:14 3609092 /usr/lib/libc-2.29.9000.so 7413a9f55000-7413aa0a8000 r-xp 00025000 00:14 3609092 /usr/lib/libc-2.29.9000.so 7413aa0a8000-7413aa0eb000 r--p 00178000 00:14 3609092 /usr/lib/libc-2.29.9000.so ----------- user@i87k 2019/05/14 01:59:10 -bash5.0.7 t:7 j:0 d:3 pp:1054 p:7017 ut1609 !7961 13 0 5.1.1-gb724e9356404 #9 SMP Sun May 12 22:02:58 CEST 2019 /home/user/sandbox/c/mmap_symlink $ coredumpctl gdb PID: 8815 (a.out) UID: 1000 (user) GID: 1000 (user) Signal: 11 (SEGV) Timestamp: Tue 2019-05-14 01:59:10 CEST (5s ago) Command Line: ./a.out Executable: /home/user/sandbox/c/mmap_symlink/a.out Control Group: /user.slice/user-1000.slice/session-1.scope Unit: session-1.scope Slice: user-1000.slice Session: 1 Owner UID: 1000 (user) Boot ID: 97842479cc604964a7a58a10e00f9e15 Machine ID: 5767ef25f523419aaa049f3d74481940 Hostname: i87k Storage: /var/lib/systemd/coredump/core.a\x2eout.1000.97842479cc604964a7a58a10e00f9e15.8815.1557791950000000 Message: Process 8815 (a.out) of user 1000 dumped core. Stack trace of thread 8815: #0 0x0000600793f74357 n/a (/home/user/sandbox/c/mmap_symlink/a.out) #1 0x00007413a9f56feb __libc_start_main (libc.so.6) #2 0x0000600793f740fe n/a (/home/user/sandbox/c/mmap_symlink/a.out) GNU gdb (GDB) 8.2.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /home/user/sandbox/c/mmap_symlink/a.out...done. [New LWP 8815] Core was generated by `./a.out'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000600793f74357 in main () at mmap_access_beyond.c:67 67 if (addr[i] != 0) { (gdb) bt full #0 0x0000600793f74357 in main () at mmap_access_beyond.c:67 i = 211529728 rv2 = 0 rv3 = 256 nonzerochars_seen = 1473581 size = 209715200 addr = 0x74139d730000 <error: Cannot access memory at address 0x74139d730000> selfpid = 8815 wstatus = -1441837240 cmd = 0x600794594260 "cat /proc/8815/maps >/tmp/_after_mmap" cmd_size = 100 rv = 0 (gdb) quit ``` there's basically no difference (just one page) between the two: diff for `i=` is 211529728-211525632=4096 aka one page. Here's the script I used in this and prev. comment: ``` // run like this: --rm ./a.out ; gcc -ggdb3 -O0 mmap_access_beyond.c && ./a.out >screen.out ; ls -la ./screen.out ; cat /tmp/_diff_mmap | colordiff // https://bugzilla.kernel.org/show_bug.cgi?id=203537 #include <unistd.h> // for close() or sysconf()/_SC_PAGE_SIZE #include <stdio.h> // for printf() #define __USE_MISC 1 //to get MAP_FILE or MAP_ANONYMOUS #include <sys/mman.h> // for mmap() #include <stdlib.h> //for system() #include <sys/wait.h> //for wait() #define SMALL_MMAP 0 //set to 0 to use a 200MiB mmap or set to 1 to use a 1 byte mmap! int main() { off_t size= #if SMALL_MMAP==1 1 // a 1 byte mmap #else 200*1024*1024 // a 200MiB mmap #endif ; char *addr; int selfpid=getpid(); int wstatus; char *cmd=NULL; const unsigned int cmd_size=100; cmd=malloc(cmd_size+1); snprintf(cmd, 1+cmd_size, "cat /proc/%d/maps >/tmp/_before_mmap", selfpid); int rv=system(cmd); wait(&wstatus); addr = mmap (NULL, size, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); //same behaviour even without a file! if (addr != MAP_FAILED){ snprintf(cmd, 1+cmd_size, "cat /proc/%d/maps >/tmp/_after_mmap", selfpid); int rv2=system(cmd); wait(&wstatus); int rv3=system("diff -up /tmp/_before_mmap /tmp/_after_mmap >/tmp/_diff_mmap 2>&1"); wait(&wstatus); // /proc/self/maps idea from `valdis` on ##kernel freenode irc // on glibc 2.29 libc-2.29.so follows right after the above mmap, but it's read only! Still, it should SIGBUS as per `man 2/3p mmap` //fprintf(stderr,"!! colordiff rv= %d\n",rv3); fprintf(stderr,"!! mmap ok %p\n",addr); // fprintf(stderr,"!! 1st byte of mmap: %c\n", addr[0]);// SIGBUS error here! // fprintf(stderr,"!! 2nd byte of mmap: %c\n", addr[1]); // fprintf(stderr,"!! 3nd byte of mmap: %c\n", addr[2]); //works even if size=1 // const unsigned int page_size=sysconf(_SC_PAGE_SIZE);//4096 // fprintf(stderr,"!! 0x0FFF-th(PAGE_SIZE-1) byte of mmap: %c\n", addr[page_size-1]); //works even if size=1 // fprintf(stderr,"!! 0x1000-th(PAGE_SIZE) byte of mmap: %c\n", addr[page_size]); //works even if size=1 // fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 // fprintf(stderr,"!! 0xFFFF-th byte of mmap: %c\n", addr[0xFFFF]); //works even if size=1 // fprintf(stderr,"!! 0X2CFFF-th byte of mmap: %c\n", addr[0x2CFFF]); //works even if size=1 // fprintf(stderr,"!! 0x2DFFF-th byte of mmap: %c\n", addr[0x2DFFF]); // works even if size=1 // fprintf(stderr,"!! PAGE_SIZE:%u\n", page_size); // 4096 // //fprintf(stderr,"!! number of non-zero bytes beyond the end of mmaped-file: "); unsigned int nonzerochars_seen=0; for (unsigned int i=1; i < size+1551160+0x2E000*2; i++) { if ( (i >= 211529728) || ((size == 1) && (i >= 188416)) ) { fprintf(stderr,"!! about to access addr at offset i=%u nonzerochars_seen='%u'\n", i, nonzerochars_seen); } if (addr[i] != 0) { nonzerochars_seen++; } if (nonzerochars_seen>0) { //print all after first non-zero which would be 'ELF' printf("%c", addr[i]); //XXX ^ crashes at i=211529728 when size == 200MiB, that's on accessing 443rd kernel page after the end of mmap-ed memory region //XXX ^ crashes at i = 188416 if size == 1 and screen.out has first 172,016 bytes identical with /lib64/ld-2.29.so which is indirectly listed in a.out's ldd(differently named symlinks to it) } } //fprintf(stderr,"%u\n", nonzerochars_seen); fprintf(stderr,"!! 0x2E000-th byte of mmap: %c\n", addr[size+0x2E000-1]); // segfault munmap(addr, size); } return 0; } ``` (also attached)
https://sourceware.org/bugzilla/show_bug.cgi?id=24551