I'm seeing sporadic occurrences of the following crash: trap:157, a123=[28,5,0] results: got {r1=18446744073709551615,r2=0,err=1}, want {r1=0,r2=0,r3=0} panic: AllThreadsSyscall results differ between threads; runtime corrupted fatal error: panic on system stack runtime stack: syscall.(*allThreadsCaller).doSyscall(0xc000072050, 0x0) /usr/local/go/src/syscall/syscall_linux.go:1016 +0x28e goroutine 1 [running, locked to thread]: goroutine running on other thread; stack unavailable goroutine 5 [runnable, locked to thread]: kernel.org/pub/linux/libs/security/libcap/cap.launch(0xc0001cc000, 0xc0001ca000, {0x0, 0x0}, 0x0) /home/lorenz/go/pkg/mod/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.61/launch.go:343 +0x858 created by kernel.org/pub/linux/libs/security/libcap/cap.(*Launcher).Launch /home/lorenz/go/pkg/mod/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.61/launch.go:393 +0x205 If I read this report correctly this is a call to SYS_PRCTL with PR_SET_SECURE_BITS that fails with EPERM. The crash always happens in the testsuite of some utility functions, but only when the tests are run as part of a larger suite. Trying to isolate to a specific tests or invocation of cap.* has been unsuccessful. The tests are being run in a qemu VM, which might increase the race window.
This is a self-check feature of the Go runtime to ensure that the kernel state of all threads are being kept synchronized. This failure can only occur in code that is compiled without CGO enabled, so pure Go. Go's semantics expected by this functionality are that all OS threads that can run any goroutine have to have the same (security) state. On the face of it, if this code does any runtime.LockOSThread() sorts of things it could violate this condition.
On the face of it, the first time the call is made, it is yielding 0 (success) and the second (-1) with an errno=1 (EPERM). I'll look and see if I can figure out whether this can be reproduced just using the "cap" package and a test case. Anything more you can offer in terms of setup info, would be appreciated.
What architecture is this qemu hosting? (on x86_64, __NR_prctl=167), but this is complaining about a different system call (aka trap#): trap:157, a123=[28,5,0] results: got {r1=18446744073709551615,r2=0,err=1}, want {r1=0,r2=0,r3=0} panic: AllThreadsSyscall results differ between threads; runtime corrupted fatal error: panic on system stack If this is x86_64, that would be (grep 157 /usr/include/asm-generic/unistd.h): #define __NR_setsid 157 As "man 2 setsid" says, the arguments (a123) are ignored by this system call, and the syscall appear to be defined in terms of the process abstraction and not a thread abstraction. Calling it twice from the same process is expected to fail with errno=EPERM. What is less clear to me is whether calling it from a thread makes sense. It might work better if this was not performed as an allthreads call, but a regular or syscall.[Raw]Syscall(). Does this help with debugging?
Also, if the intention really is to set the securebits = 5, the way to do it with the cap package is: sb := cap.SecbitNoRoot | cap.SecbitNoSetUIDFixup err := sb.Set() When this is invoked inside a cap.Launch callback function, it will avoid doing an allthreads call, but one that works only on the locked thread. You might also want to try: err := cap.ModePure1EInit.Set() (But this will lock the secure bits and disable Ambient inheritance which may not be what you want to do.)
This is qemu running on x86_64, I think unistd.h is misleading: $ grep 157 arch/x86/entry/syscalls/syscall_64.tbl 157 common prctl sys_prctl So this really is a PRCTL SET_SECURE_BITS call that fails. This is roughly what causes the panic: func init() { const secbits = cap.SecbitNoRoot | cap.SecbitNoSetUIDFixup empty := cap.NewSet() if c, err := cap.GetProc().Compare(empty); err != nil { panic(err) } else if c == 0 { fmt.Println("Already unprivileged, not setting securebits") return } if err := secbits.Set(); err != nil { // panic happens here i believe panic(fmt.Errorf("set securebits: %s", err.Error())) } ... } func TestFoo(...) { l := cap.FuncLauncher(func(interface{}) error { set := cap.GetProc() if err := set.ClearFlag(cap.Effective); err != nil { return fmt.Errorf("clear effective: %s", err) } err := set.SetFlag(cap.Effective, true, cap.SYS_RESOURCE) if err != nil { return fmt.Errorf("set effective: %s", err) } if err := set.SetProc(); err != nil { return fmt.Errorf("set caps: %s", err) } }) _, err := l.Launch(nil) } I've captured the interaction via strace. Choice excerpts (I'll attach the uncommented log at the end): [pid 175] prctl(PR_SET_NAME, "cap-launcher") = 0 This tells us that tid 175 has acquired launchIdle -> launchActive. [pid 175] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0 [pid 175] prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1 [pid 175] prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 175] prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1 [pid 175] prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 175] prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 175] prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument) This is multisc.cInit. I'm not entirely sure how this is invoked. [pid 175] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {...}) = 0 This is the first line of the FuncLauncher closure executing. [pid 175] capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_SYS_RESOURCE, ...}) = 0 This is SetProc in FuncLauncher closure executing. [pid 172] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {...}) = 0 I think this is init() executing. [pid 172] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 180] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 176] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 175] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = -1 EPERM (Operation not permitted) [pid 174] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 177] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 179] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 173] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP <unfinished ...> [pid 178] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 173] <... prctl resumed>) = 0 I think this is secbits.Set() executing in init(). There are two interesting facts here: * tid 175 has an effective set without CAP_SETPCAP. * runtime.AllThreadsSyscall(PRCTL, SET_SECUREBITS) executes on tid 175 and gets EPERM capset on tid 175 happens before prctl on tid 175. prctl on tid 175 is triggered by tid 172. This means that there has to be an interleaving of the execution of tid 172 and 175. secbits.Set() acquires launchBlocked, and blocks until AllThreadsSyscall has returned. launch() acquires launchActive. I can only come up with one way such an interleaving could happen: * tid 175: launch() acquires launchActive, does capset, releases launchActive. thread does not exit! * tid 172: secbits.Set() acquires launchBlocked, does AllThreadsSyscall. Since tid 175 is still kicking around, prctl fails. I can't find any guarantee that the go runtime will terminate tid 175 straight away, so I'm fairly convinced that this race is real. I don't have definitive proof. However! The race shouldn't be possible since secbits.Set() is run from init() and the Go language specification says the following: Package initialization—variable initialization and the invocation of init functions—happens in a single goroutine, sequentially, one package at a time. An init function may launch other goroutines, which can run concurrently with the initialization code. However, initialization always sequences the init functions: it will not invoke the next one until the previous one has returned. Since secbits.Set() blocks until all threads have executed the syscall, init() should only return once SET_SECUREBITS has succeeded. This means that capset shouldn't be able to execute before SET_SECUREBITS. From https://go.dev/ref/spec#Package_initialization FULL LOG: strace: Process 173 attached strace: Process 174 attached strace: Process 175 attached strace: Process 176 attached strace: Process 177 attached [pid 175] prctl(PR_SET_NAME, "cap-launcher") = 0 [pid 175] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0 [pid 175] prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1 [pid 175] prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 175] prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1 [pid 175] prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 175] prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 175] prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 175] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, inheritable=0}) = 0 strace: Process 178 attached strace: Process 179 attached strace: Process 180 attached [pid 175] capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_SYS_RESOURCE, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, inheritable=0}) = 0 [pid 172] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, inheritable=0}) = 0 [pid 172] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 180] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 176] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 175] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = -1 EPERM (Operation not permitted) [pid 174] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUPtrap:) = 0 [pid 177] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 157, a123=[28,5,0] results: got {r1=[pid 179] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP18446744073709551615) = 0 ,r2=0,err=1}, want {r1=0,r2=0,r3=0} panic: AllThreadsSyscall results differ between threads; runtime corrupted fatal error: panic on system stack [pid 173] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP <unfinished ...> [pid 178] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 runtime stack: [pid 173] <... prctl resumed>) = 0 syscall.(*allThreadsCaller).doSyscall(0xc00006c0f0, 0x0) /usr/local/go/src/syscall/syscall_linux.go:1016 +0x28e goroutine 1 [running, locked to thread]: goroutine running on other thread; stack unavailable goroutine 6 [runnable, locked to thread]: kernel.org/pub/linux/libs/security/libcap/cap.launch(0xc0001d4000, 0xc0001d2000, {0x0, 0x0}, 0x0) /home/lorenz/go/pkg/mod/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.61/launch.go:343 +0x858 created by kernel.org/pub/linux/libs/security/libcap/cap.(*Launcher).Launch /home/lorenz/go/pkg/mod/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.61/launch.go:393 +0x205
From the Go sources, we have this, so yes this explains 157: src/syscall/zsysnum_linux_386.go: SYS_PRCTL = 172 src/syscall/zsysnum_linux_amd64.go: SYS_PRCTL = 157 src/syscall/zsysnum_linux_arm.go: SYS_PRCTL = 172 src/syscall/zsysnum_linux_arm64.go: SYS_PRCTL = 167 src/syscall/zsysnum_linux_mips.go: SYS_PRCTL = 4192 src/syscall/zsysnum_linux_mips64.go: SYS_PRCTL = 5153 src/syscall/zsysnum_linux_mips64le.go: SYS_PRCTL = 5153 src/syscall/zsysnum_linux_mipsle.go: SYS_PRCTL = 4192 src/syscall/zsysnum_linux_ppc64.go: SYS_PRCTL = 171 src/syscall/zsysnum_linux_ppc64le.go: SYS_PRCTL = 171 src/syscall/zsysnum_linux_riscv64.go: SYS_PRCTL = 167 src/syscall/zsysnum_linux_s390x.go: SYS_PRCTL = 172 I think the idea that init() functions are run serially on a single thread is only saying that a single thread is used for these functions. The Go runtime sets up at least two other service threads (which never run program gocode) and may or may not be actually clone()'d during the init phase. qq. If you run with CGO_ENABLED=1 do you see the same failure?
As a different workaround, if you rename your init() function my_init() and call it directly from the start of your TestFoo() do you experience the same failure? Also what version of the Go are you running?
> I think the idea that init() functions are run serially on a single thread is > only saying that a single thread is used for these functions. True, what I'm getting at is that all init() functions are guaranteed to have returned before main() is invoked by the language specification. >Also what version of the Go are you running? $ go version go version go1.17.5 linux/amd64 After some more noodling around I've been able to create a reproducer. Turns out that the problem wasn't between init() and my test, but between two init() functions... One of them doing secbits.Set(), the other one using a launcher to change the cap set. Here you go: package main import ( "fmt" "time" "kernel.org/pub/linux/libs/security/libcap/cap" ) func main() { const secbits = cap.SecbitNoRoot | cap.SecbitNoSetUIDFixup l := cap.FuncLauncher(func(interface{}) error { return cap.NewSet().SetProc() }) _, err := l.Launch(nil) if err != nil { panic(err) } if err := secbits.Set(); err != nil { panic(fmt.Errorf("set securebits: %s", err.Error())) } } Running this with CGo enabled gives: $ CGO_ENABLED=1 go build . && sudo strace -f -e trace=capset,prctl ./capissue strace: Process 2470311 attached strace: Process 2470312 attached strace: Process 2470313 attached strace: Process 2470314 attached strace: Process 2470315 attached strace: Process 2470316 attached [pid 2470314] prctl(PR_SET_NAME, "cap-launcher") = 0 [pid 2470314] prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1 [pid 2470314] prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 2470314] prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1 [pid 2470314] prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 2470314] prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 2470314] prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 2470314] capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0 [pid 2470312] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2470316] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} --- [pid 2470316] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2470315] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} --- [pid 2470315] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2470314] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} --- [pid 2470314] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = -1 EPERM (Operation not permitted) [pid 2470313] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} --- [pid 2470313] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2470311] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} --- [pid 2470311] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2470310] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} --- [pid 2470310] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2470314] +++ exited with 0 +++ [pid 2470315] +++ exited with 0 +++ [pid 2470313] +++ exited with 0 +++ [pid 2470311] +++ exited with 0 +++ [pid 2470316] +++ exited with 0 +++ [pid 2470312] +++ exited with 0 +++ +++ exited with 0 +++ You can see that prctl here also returns EPERM, but the error is swallowed by libpsx (?) With CGo disabled the race is much harder to observe. The following patch will open the race window: diff --git a/cap/launch.go b/cap/launch.go index 63959b4..987d9c1 100644 --- a/cap/launch.go +++ b/cap/launch.go @@ -6,6 +6,7 @@ import ( "runtime" "sync" "syscall" + "time" "unsafe" ) @@ -345,6 +346,7 @@ abort: pid: pid, err: err, } + time.Sleep(time.Second) } // Launch performs a callback function and/or new program launch with Now executing the reproducer crashes: $ CGO_ENABLED=0 go build . && sudo strace -f -e trace=capset,prctl ./capissue strace: Process 2474778 attached strace: Process 2474779 attached strace: Process 2474780 attached strace: Process 2474781 attached strace: Process 2474782 attached strace: Process 2474783 attached [pid 2474781] prctl(PR_SET_NAME, "cap-launcher") = 0 [pid 2474781] prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1 [pid 2474781] prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 2474781] prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1 [pid 2474781] prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 2474781] prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 2474781] prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument) [pid 2474781] capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0 [pid 2474780] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2474782] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2474781] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = -1 EPERM (Operation not permitted) trap:157[pid 2474779] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP <unfinished ...> [pid 2474777] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP, a123=[ <unfinished ...> [pid 2474779] <... prctl resumed>) = 0 [pid 2474777] <... prctl resumed>) = 0 28[pid 2474778] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0 [pid 2474783] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP,) = 0 5,0] results: got {r1=18446744073709551615,r2=0,err=1}, want {r1=0,r2=0,r3=0} panic: AllThreadsSyscall results differ between threads; runtime corrupted fatal error: panic on system stack runtime stack: syscall.(*allThreadsCaller).doSyscall(0xc0000ca000, 0x0) /usr/local/go/src/syscall/syscall_linux.go:1016 +0x28e goroutine 1 [running, locked to thread]: goroutine running on other thread; stack unavailable goroutine 18 [chan receive, locked to thread]: kernel.org/pub/linux/libs/security/libcap/cap.launch(0xc000094060, 0xc0000bc000, {0x0, 0x0}, 0x0) /home/lorenz/dev/libcap/cap/launch.go:256 +0x1fb created by kernel.org/pub/linux/libs/security/libcap/cap.(*Launcher).Launch /home/lorenz/dev/libcap/cap/launch.go:395 +0x205 goroutine 19 [sleep, locked to thread]: time.Sleep(0x3b9aca00) /usr/local/go/src/runtime/time.go:193 +0x12e kernel.org/pub/linux/libs/security/libcap/cap.launch(0xc000094060, 0xc0000bc000, {0x0, 0x0}, 0xc0000940c0) /home/lorenz/dev/libcap/cap/launch.go:349 +0x865 created by kernel.org/pub/linux/libs/security/libcap/cap.launch /home/lorenz/dev/libcap/cap/launch.go:253 +0x1ef [pid 2474783] +++ exited with 2 +++ [pid 2474782] +++ exited with 2 +++ [pid 2474781] +++ exited with 2 +++ [pid 2474780] +++ exited with 2 +++ [pid 2474779] +++ exited with 2 +++ [pid 2474778] +++ exited with 2 +++ +++ exited with 2 +++
The logic for goroutine teardown is here: https://github.com/golang/go/blob/9b0de0854d5a5655890ef0b2b9052da2541182a3/src/runtime/proc.go#L3631-L3703 (can be invoked via runtime.Goexit).
Thanks very much for this bug report and the reproducer. With this code, I can reproduce this. I'll figure out the fix. [I'll assume it is a "cap" package problem, but FWIW I find it fails using go1.16 and go1.17.]
I this this commit: https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=806b53d13a792d834622b2e546cfdceecc5af699 is a significant step in the right direction. Please give it a try.
I think HEAD now contains a reliable fix for this, this will be included in the libcap-2.62 version of the cap package. https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=e458889fbda4052919b61fd9f727bb1ac906d436 Please reopen if you find it isn't working reliably now.
FYI I chose not to include this next change in the 2.62 release: https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=af2bf057ed03a0c8b29898b721aef66b021124d0 It will be included in 2.63 when that release is cut. This was a change to address the observation that the cgo build did not crash the same way as the CGO_ENABLED=0 build: https://bugzilla.kernel.org/show_bug.cgi?id=215283#c8
Thanks Andrew, I'll give the fix a spin. I've been thinking about how to fix this, and one thing occurred to me about the wait using tgkill: doesn't that introduce another race? Linux is free to reuse tids, so if a new thread is spawned during Gosched we might end up in an infinite loop. I've been toying with the idea of using set_tid_address to get a futex wakeup in user space. This is race free, but could really screw up the Go runtime if it decides to use CLONE_CLEAR_TID or similar. However, it may be possible to combine this approach with a sentinel package like https://pkg.go.dev/go4.org/unsafe/assume-no-moving-gc and a unit test invoking PR_GET_TID_ADDRESS to make this somewhat safe?
For now, I can't think of a better mechanism. Also, a minor fix for PSX crash dumper (dump the right argument set based on whether or not the six arg call is invoked): https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=cbdd2b14e0b6e48ac5139c9d4327020cd6996d40 ./mismatch-cgo || exit 0 ; exit 1 psx_syscall result differs. trap:186 a123=[0,0,0] results: 10431={10431} 10430={10430} 10429={10429} 10428={10428} wanted={10427} SIGSYS: bad system call PC=0x7b0ba0cad5a1 m=0 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 1 [syscall, locked to thread]: runtime.cgocall(0x481db7, 0xc000042e80) /home/andrew/sdk/go1.17.2/src/runtime/cgocall.go:156 +0x5c fp=0xc000042e58 sp=0xc000042e20 pc=0x404b7c kernel.org/pub/linux/libs/security/libcap/psx._Cfunc_psx_syscall3(0xba, 0x0, 0x0, 0x0) _cgo_gotypes.go:72 +0x4d fp=0xc000042e80 sp=0xc000042e58 pc=0x4818cd kernel.org/pub/linux/libs/security/libcap/psx.Syscall3(0xc000034738, 0x458edb, 0xc0000001a0, 0x200000003) /home/andrew/gits/libcap/go/vendor/kernel.org/pub/linux/libs/security/libcap/psx/psx_cgo.go:64 +0xa5 fp=0xc000042f08 sp=0xc000042e80 pc=0x481ac5 main.main() /home/andrew/gits/libcap/go/mismatch.go:13 +0x2a fp=0xc000042f80 sp=0xc000042f08 pc=0x481c2a runtime.main() /home/andrew/sdk/go1.17.2/src/runtime/proc.go:255 +0x227 fp=0xc000042fe0 sp=0xc000042f80 pc=0x433ce7 runtime.goexit() /home/andrew/sdk/go1.17.2/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc000042fe8 sp=0xc000042fe0 pc=0x45ca41 rax 0x0 rbx 0x28bb rcx 0x7b0ba0cad5a1 [...]