Bug 215720

Summary: brk() regression on AArch64 on static-pie binary -- issue with ASLR and a guard page?
Product: Memory Management Reporter: Victor Stinner (vstinner)
Component: OtherAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: dominik, fweimer
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.17.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: empty.c reproducer

Description Victor Stinner 2022-03-22 02:24:57 UTC
Created attachment 300597 [details]
empty.c reproducer

I found a brk() syscall regression of Linux kernel 5.17 on AArch64.

A git bisect found the change "fs/binfmt_elf: use PT_LOAD p_align values for static PIE": commit 9630f0d60fec5fbcaa4435a66f75df1dc9704b66, changed related to the bz#215275.

Program to reproduce the bug, empty.c (attached to the issue):
---
_Thread_local int var1 = 0;
int main() {
    volatile int x = 1;
    var1 = x;
    return 0;
}
---

Build the program as a static PIE program:

    gcc -std=c11 -static-pie -g empty.c -o empty -O2

The program fails randomly, it takes 100 to 6000 runs to reproduce the crash.

Short shell loop to reproduce the crash:
---
$ i=0; while true; do ./empty; rc=$?; i=$(($i + 1)); echo "$i:
$(date): $rc"; if [ $rc -ne 0 ]; then break; fi; done
(...)
159: Tue Mar 22 01:54:22 CET 2022: 0
160: Tue Mar 22 01:54:22 CET 2022: 0
Segmentation fault (core dumped)
161: Tue Mar 22 01:54:22 CET 2022: 139
---

Disabling ASLR (write 0 to /proc/sys/kernel/randomize_va_space) works
around the bug.

Rather than using "empty.c" program, the "ldconfig -V > /dev/null" command can be used: standard static-pie program.

strace when the program works:
---
brk(NULL)                               = 0xaaaac3961000
brk(0xaaaac3961b78)                     = 0xaaaac3961b78
---

strace when the bug occurs:
---
brk(NULL)                               = 0xaaaabf3c3000
brk(0xaaaabf3c3b78)                     = 0xaaaabf3c3000
---

The following test of the brk() syscall fails when the bug occurs:
---
	/* Check against existing mmap mappings. */
	next = find_vma(mm, oldbrk);
	if (next && newbrk + PAGE_SIZE > vm_start_gap(next))
		goto out;
---

Note: When the bug occurs, the program crash with SIGSEGV: the glibc __libc_setup_tls() function calls sbrk(2936) to allocate TLS variables, but it doesn't handle the memory allocation failure.

Note: At the beginning, I discovered this kernel regression while checking for Python
buildbot failures on our Fedora Rawhide AArch64 machine.

* Fedora downstream issue: https://bugzilla.redhat.com/show_bug.cgi?id=2066147
* Python issue: https://bugs.python.org/issue47078
Comment 1 Victor Stinner 2022-03-22 02:41:00 UTC
See also the binutils issue: "p_align in ELF program headers should not exceed section alignment"
https://sourceware.org/bugzilla/show_bug.cgi?id=28689

See also this old (kernel 4.18) fixed x86-64 kernel bug: "kernel: brk can grow the heap into the area reserved for the stack"
https://bugzilla.redhat.com/show_bug.cgi?id=1749633
Comment 2 Florian Weimer 2022-04-27 15:13:54 UTC
Apparently the revert made it into v5.18-rc3:

commit 354e923df042a11d1ab8ca06b3ebfab3a018a4ec
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Thu Apr 14 19:13:55 2022 -0700

    revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
    
    Commit 925346c129da11 ("fs/binfmt_elf: fix PT_LOAD p_align values for
    loaders") was an attempt to fix regressions due to 9630f0d60fec5f
    ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE").

commit aeb7923733d100b86c6bc68e7ae32913b0cec9d8
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Thu Apr 14 19:13:58 2022 -0700

    revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"

It was Cc:ed to <stable@vger.kernel.org>, so hopefully it will make it into a 5.17.z kernel, too.