Bug 215283 - go: AllThreadsSyscall panics in call to prctl
Summary: go: AllThreadsSyscall panics in call to prctl
Status: RESOLVED CODE_FIX
Alias: None
Product: Tools
Classification: Unclassified
Component: libcap (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew G. Morgan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-09 17:09 UTC by lmb
Modified: 2021-12-13 15:50 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.15
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description lmb 2021-12-09 17:09:27 UTC
I'm seeing sporadic occurrences of the following crash:

trap:157, a123=[28,5,0]
results: got {r1=18446744073709551615,r2=0,err=1}, want {r1=0,r2=0,r3=0}
panic: AllThreadsSyscall results differ between threads; runtime corrupted
fatal error: panic on system stack

runtime stack:
syscall.(*allThreadsCaller).doSyscall(0xc000072050, 0x0)
	/usr/local/go/src/syscall/syscall_linux.go:1016 +0x28e

goroutine 1 [running, locked to thread]:
	goroutine running on other thread; stack unavailable

goroutine 5 [runnable, locked to thread]:
kernel.org/pub/linux/libs/security/libcap/cap.launch(0xc0001cc000, 0xc0001ca000, {0x0, 0x0}, 0x0)
	/home/lorenz/go/pkg/mod/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.61/launch.go:343 +0x858
created by kernel.org/pub/linux/libs/security/libcap/cap.(*Launcher).Launch
	/home/lorenz/go/pkg/mod/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.61/launch.go:393 +0x205

If I read this report correctly this is a call to SYS_PRCTL with PR_SET_SECURE_BITS that fails with EPERM. The crash always happens in the testsuite of some utility functions, but only when the tests are run as part of a larger suite. Trying to isolate to a specific tests or invocation of cap.* has been unsuccessful.

The tests are being run in a qemu VM, which might increase the race window.
Comment 1 Andrew G. Morgan 2021-12-09 20:36:37 UTC
This is a self-check feature of the Go runtime to ensure that the kernel state of all threads are being kept synchronized. This failure can only occur in code that is compiled without CGO enabled, so pure Go.

Go's semantics expected by this functionality are that all OS threads that can run any goroutine have to have the same (security) state. On the face of it, if this code does any runtime.LockOSThread() sorts of things it could violate this condition.
Comment 2 Andrew G. Morgan 2021-12-09 20:54:24 UTC
On the face of it, the first time the call is made, it is yielding 0 (success) and the second (-1) with an errno=1 (EPERM).

I'll look and see if I can figure out whether this can be reproduced just using the "cap" package and a test case.

Anything more you can offer in terms of setup info, would be appreciated.
Comment 3 Andrew G. Morgan 2021-12-10 02:49:29 UTC
What architecture is this qemu hosting? (on x86_64, __NR_prctl=167), but this is complaining about a different system call (aka trap#):

trap:157, a123=[28,5,0]
results: got {r1=18446744073709551615,r2=0,err=1}, want {r1=0,r2=0,r3=0}
panic: AllThreadsSyscall results differ between threads; runtime corrupted
fatal error: panic on system stack

If this is x86_64, that would be (grep 157 /usr/include/asm-generic/unistd.h):

#define __NR_setsid 157

As "man 2 setsid" says, the arguments (a123) are ignored by this system call, and the syscall appear to be defined in terms of the process abstraction and not a thread abstraction. Calling it twice from the same process is expected to fail with errno=EPERM.

What is less clear to me is whether calling it from a thread makes sense. It might work better if this was not performed as an allthreads call, but a regular or syscall.[Raw]Syscall().

Does this help with debugging?
Comment 4 Andrew G. Morgan 2021-12-10 03:15:20 UTC
Also, if the intention really is to set the securebits = 5, the way to do it with the cap package is:

   sb := cap.SecbitNoRoot | cap.SecbitNoSetUIDFixup
   err := sb.Set()

When this is invoked inside a cap.Launch callback function, it will avoid doing an allthreads call, but one that works only on the locked thread.

You might also want to try:

   err := cap.ModePure1EInit.Set()

(But this will lock the secure bits and disable Ambient inheritance which may not be what you want to do.)
Comment 5 lmb 2021-12-10 14:31:33 UTC
This is qemu running on x86_64, I think unistd.h is misleading:

$ grep 157 arch/x86/entry/syscalls/syscall_64.tbl
157	common	prctl			sys_prctl

So this really is a PRCTL SET_SECURE_BITS call that fails.

This is roughly what causes the panic:

func init() {
	const secbits = cap.SecbitNoRoot | cap.SecbitNoSetUIDFixup

	empty := cap.NewSet()
	if c, err := cap.GetProc().Compare(empty); err != nil {
		panic(err)
	} else if c == 0 {
		fmt.Println("Already unprivileged, not setting securebits")
		return
	}

	if err := secbits.Set(); err != nil { // panic happens here i believe
		panic(fmt.Errorf("set securebits: %s", err.Error()))
	}

	...
}

func TestFoo(...) {
	l := cap.FuncLauncher(func(interface{}) error {
		set := cap.GetProc()
		if err := set.ClearFlag(cap.Effective); err != nil {
			return fmt.Errorf("clear effective: %s", err)
		}

		err := set.SetFlag(cap.Effective, true, cap.SYS_RESOURCE)
		if err != nil {
			return fmt.Errorf("set effective: %s", err)
		}

		if err := set.SetProc(); err != nil {
			return fmt.Errorf("set caps: %s", err)
		}
	})

	_, err := l.Launch(nil)
}

I've captured the interaction via strace. Choice excerpts (I'll attach the uncommented log at the end):

[pid   175] prctl(PR_SET_NAME, "cap-launcher") = 0
This tells us that tid 175 has acquired launchIdle -> launchActive.

[pid   175] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
[pid   175] prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1
[pid   175] prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid   175] prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1
[pid   175] prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid   175] prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid   175] prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
This is multisc.cInit. I'm not entirely sure how this is invoked.

[pid   175] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {...}) = 0
This is the first line of the FuncLauncher closure executing.

[pid   175] capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_SYS_RESOURCE, ...}) = 0
This is SetProc in FuncLauncher closure executing.

[pid   172] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {...}) = 0
I think this is init() executing.

[pid   172] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   180] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   176] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   175] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = -1 EPERM (Operation not permitted)
[pid   174] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   177] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   179] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   173] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP <unfinished ...>
[pid   178] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   173] <... prctl resumed>)        = 0
I think this is secbits.Set() executing in init().

There are two interesting facts here:

* tid 175 has an effective set without CAP_SETPCAP.
* runtime.AllThreadsSyscall(PRCTL, SET_SECUREBITS) executes on tid 175 and gets EPERM

capset on tid 175 happens before prctl on tid 175. prctl on tid 175 is triggered by tid 172.
This means that there has to be an interleaving of the execution of tid 172 and 175.

secbits.Set() acquires launchBlocked, and blocks until AllThreadsSyscall has
returned. launch() acquires launchActive. I can only come up with one way such
an interleaving could happen:

* tid 175: launch() acquires launchActive, does capset, releases launchActive.
  thread does not exit!
* tid 172: secbits.Set() acquires launchBlocked, does AllThreadsSyscall. Since
  tid 175 is still kicking around, prctl fails.

I can't find any guarantee that the go runtime will terminate tid 175 straight
away, so I'm fairly convinced that this race is real. I don't have definitive proof.

However! The race shouldn't be possible since secbits.Set() is run from init()
and the Go language specification says the following:

    Package initialization—variable initialization and the invocation of init
    functions—happens in a single goroutine, sequentially, one package at a time.
    An init function may launch other goroutines, which can run concurrently with
    the initialization code. However, initialization always sequences the init
    functions: it will not invoke the next one until the previous one has returned.

Since secbits.Set() blocks until all threads have executed the syscall, init()
should only return once SET_SECUREBITS has succeeded. This means that capset
shouldn't be able to execute before SET_SECUREBITS.

From https://go.dev/ref/spec#Package_initialization

FULL LOG:

strace: Process 173 attached
strace: Process 174 attached
strace: Process 175 attached
strace: Process 176 attached
strace: Process 177 attached
[pid   175] prctl(PR_SET_NAME, "cap-launcher") = 0
[pid   175] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, NULL) = 0
[pid   175] prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1
[pid   175] prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid   175] prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1
[pid   175] prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid   175] prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid   175] prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid   175] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, inheritable=0}) = 0
strace: Process 178 attached
strace: Process 179 attached
strace: Process 180 attached
[pid   175] capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_SYS_RESOURCE, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, inheritable=0}) = 0
[pid   172] capget({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, permitted=1<<CAP_CHOWN|1<<CAP_DAC_OVERRIDE|1<<CAP_DAC_READ_SEARCH|1<<CAP_FOWNER|1<<CAP_FSETID|1<<CAP_KILL|1<<CAP_SETGID|1<<CAP_SETUID|1<<CAP_SETPCAP|1<<CAP_LINUX_IMMUTABLE|1<<CAP_NET_BIND_SERVICE|1<<CAP_NET_BROADCAST|1<<CAP_NET_ADMIN|1<<CAP_NET_RAW|1<<CAP_IPC_LOCK|1<<CAP_IPC_OWNER|1<<CAP_SYS_MODULE|1<<CAP_SYS_RAWIO|1<<CAP_SYS_CHROOT|1<<CAP_SYS_PTRACE|1<<CAP_SYS_PACCT|1<<CAP_SYS_ADMIN|1<<CAP_SYS_BOOT|1<<CAP_SYS_NICE|1<<CAP_SYS_RESOURCE|1<<CAP_SYS_TIME|1<<CAP_SYS_TTY_CONFIG|1<<CAP_MKNOD|1<<CAP_LEASE|1<<CAP_AUDIT_WRITE|1<<CAP_AUDIT_CONTROL|1<<CAP_SETFCAP|1<<CAP_MAC_OVERRIDE|1<<CAP_MAC_ADMIN|1<<CAP_SYSLOG|1<<CAP_WAKE_ALARM|1<<CAP_BLOCK_SUSPEND|1<<CAP_AUDIT_READ|1<<CAP_PERFMON|1<<CAP_BPF|1<<CAP_CHECKPOINT_RESTORE, inheritable=0}) = 0
[pid   172] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   180] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   176] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid   175] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = -1 EPERM (Operation not permitted)
[pid   174] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUPtrap:) = 0
[pid   177] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
157, a123=[28,5,0]
results: got {r1=[pid   179] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP18446744073709551615) = 0
,r2=0,err=1}, want {r1=0,r2=0,r3=0}
panic: AllThreadsSyscall results differ between threads; runtime corrupted
fatal error: panic on system stack
[pid   173] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP <unfinished ...>
[pid   178] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0

runtime stack:
[pid   173] <... prctl resumed>)        = 0
syscall.(*allThreadsCaller).doSyscall(0xc00006c0f0, 0x0)
	/usr/local/go/src/syscall/syscall_linux.go:1016 +0x28e

goroutine 1 [running, locked to thread]:
	goroutine running on other thread; stack unavailable

goroutine 6 [runnable, locked to thread]:
kernel.org/pub/linux/libs/security/libcap/cap.launch(0xc0001d4000, 0xc0001d2000, {0x0, 0x0}, 0x0)
	/home/lorenz/go/pkg/mod/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.61/launch.go:343 +0x858
created by kernel.org/pub/linux/libs/security/libcap/cap.(*Launcher).Launch
	/home/lorenz/go/pkg/mod/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.61/launch.go:393 +0x205
Comment 6 Andrew G. Morgan 2021-12-10 15:24:06 UTC
From the Go sources, we have this, so yes this explains 157:

src/syscall/zsysnum_linux_386.go:       SYS_PRCTL                  = 172
src/syscall/zsysnum_linux_amd64.go:     SYS_PRCTL                  = 157
src/syscall/zsysnum_linux_arm.go:       SYS_PRCTL                  = 172
src/syscall/zsysnum_linux_arm64.go:     SYS_PRCTL                  = 167
src/syscall/zsysnum_linux_mips.go:      SYS_PRCTL                  = 4192
src/syscall/zsysnum_linux_mips64.go:    SYS_PRCTL                  = 5153
src/syscall/zsysnum_linux_mips64le.go:  SYS_PRCTL                  = 5153
src/syscall/zsysnum_linux_mipsle.go:    SYS_PRCTL                  = 4192
src/syscall/zsysnum_linux_ppc64.go:     SYS_PRCTL                  = 171
src/syscall/zsysnum_linux_ppc64le.go:   SYS_PRCTL                  = 171
src/syscall/zsysnum_linux_riscv64.go:   SYS_PRCTL                  = 167
src/syscall/zsysnum_linux_s390x.go:     SYS_PRCTL                  = 172

I think the idea that init() functions are run serially on a single thread is only saying that a single thread is used for these functions. The Go runtime sets up at least two other service threads (which never run program gocode) and may or may not be actually clone()'d during the init phase.

qq. If you run with CGO_ENABLED=1 do you see the same failure?
Comment 7 Andrew G. Morgan 2021-12-10 15:43:07 UTC
As a different workaround, if you rename your init() function my_init() and call it directly from the start of your TestFoo() do you experience the same failure?

Also what version of the Go are you running?
Comment 8 lmb 2021-12-10 17:52:52 UTC
> I think the idea that init() functions are run serially on a single thread is
> only saying that a single thread is used for these functions.

True, what I'm getting at is that all init() functions are guaranteed to have returned before main() is invoked by the language specification.

>Also what version of the Go are you running?

$ go version
go version go1.17.5 linux/amd64

After some more noodling around I've been able to create a reproducer. Turns out that the problem wasn't between init() and my test, but between two init() functions... One of them doing secbits.Set(), the other one using a launcher to change the cap set.

Here you go:

package main

import (
	"fmt"
	"time"

	"kernel.org/pub/linux/libs/security/libcap/cap"
)

func main() {
	const secbits = cap.SecbitNoRoot | cap.SecbitNoSetUIDFixup

	l := cap.FuncLauncher(func(interface{}) error {
		return cap.NewSet().SetProc()
	})

	_, err := l.Launch(nil)
	if err != nil {
		panic(err)
	}

	if err := secbits.Set(); err != nil {
		panic(fmt.Errorf("set securebits: %s", err.Error()))
	}
}


Running this with CGo enabled gives:

$ CGO_ENABLED=1 go build . && sudo strace -f -e trace=capset,prctl ./capissue
strace: Process 2470311 attached
strace: Process 2470312 attached
strace: Process 2470313 attached
strace: Process 2470314 attached
strace: Process 2470315 attached
strace: Process 2470316 attached
[pid 2470314] prctl(PR_SET_NAME, "cap-launcher") = 0
[pid 2470314] prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1
[pid 2470314] prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid 2470314] prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1
[pid 2470314] prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid 2470314] prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid 2470314] prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid 2470314] capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0
[pid 2470312] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2470316] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} ---
[pid 2470316] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2470315] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} ---
[pid 2470315] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2470314] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} ---
[pid 2470314] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = -1 EPERM (Operation not permitted)
[pid 2470313] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} ---
[pid 2470313] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2470311] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} ---
[pid 2470311] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2470310] --- SIGSYS {si_signo=SIGSYS, si_code=SI_TKILL, si_pid=2470310, si_uid=0} ---
[pid 2470310] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2470314] +++ exited with 0 +++
[pid 2470315] +++ exited with 0 +++
[pid 2470313] +++ exited with 0 +++
[pid 2470311] +++ exited with 0 +++
[pid 2470316] +++ exited with 0 +++
[pid 2470312] +++ exited with 0 +++
+++ exited with 0 +++

You can see that prctl here also returns EPERM, but the error is swallowed by libpsx (?)

With CGo disabled the race is much harder to observe. The following patch will
open the race window:

diff --git a/cap/launch.go b/cap/launch.go
index 63959b4..987d9c1 100644
--- a/cap/launch.go
+++ b/cap/launch.go
@@ -6,6 +6,7 @@ import (
 	"runtime"
 	"sync"
 	"syscall"
+	"time"
 	"unsafe"
 )
 
@@ -345,6 +346,7 @@ abort:
 		pid: pid,
 		err: err,
 	}
+	time.Sleep(time.Second)
 }
 
 // Launch performs a callback function and/or new program launch with

Now executing the reproducer crashes:

$ CGO_ENABLED=0 go build . && sudo strace -f -e trace=capset,prctl ./capissue
strace: Process 2474778 attached
strace: Process 2474779 attached
strace: Process 2474780 attached
strace: Process 2474781 attached
strace: Process 2474782 attached
strace: Process 2474783 attached
[pid 2474781] prctl(PR_SET_NAME, "cap-launcher") = 0
[pid 2474781] prctl(PR_CAPBSET_READ, CAP_MAC_OVERRIDE) = 1
[pid 2474781] prctl(PR_CAPBSET_READ, 0x30 /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid 2474781] prctl(PR_CAPBSET_READ, CAP_CHECKPOINT_RESTORE) = 1
[pid 2474781] prctl(PR_CAPBSET_READ, 0x2c /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid 2474781] prctl(PR_CAPBSET_READ, 0x2a /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid 2474781] prctl(PR_CAPBSET_READ, 0x29 /* CAP_??? */) = -1 EINVAL (Invalid argument)
[pid 2474781] capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=0, permitted=0, inheritable=0}) = 0
[pid 2474780] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2474782] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2474781] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = -1 EPERM (Operation not permitted)
trap:157[pid 2474779] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP <unfinished ...>
[pid 2474777] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP, a123=[ <unfinished ...>
[pid 2474779] <... prctl resumed>)      = 0
[pid 2474777] <... prctl resumed>)      = 0
28[pid 2474778] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP) = 0
[pid 2474783] prctl(PR_SET_SECUREBITS, SECBIT_NOROOT|SECBIT_NO_SETUID_FIXUP,) = 0
5,0]
results: got {r1=18446744073709551615,r2=0,err=1}, want {r1=0,r2=0,r3=0}
panic: AllThreadsSyscall results differ between threads; runtime corrupted
fatal error: panic on system stack

runtime stack:
syscall.(*allThreadsCaller).doSyscall(0xc0000ca000, 0x0)
	/usr/local/go/src/syscall/syscall_linux.go:1016 +0x28e

goroutine 1 [running, locked to thread]:
	goroutine running on other thread; stack unavailable

goroutine 18 [chan receive, locked to thread]:
kernel.org/pub/linux/libs/security/libcap/cap.launch(0xc000094060, 0xc0000bc000, {0x0, 0x0}, 0x0)
	/home/lorenz/dev/libcap/cap/launch.go:256 +0x1fb
created by kernel.org/pub/linux/libs/security/libcap/cap.(*Launcher).Launch
	/home/lorenz/dev/libcap/cap/launch.go:395 +0x205

goroutine 19 [sleep, locked to thread]:
time.Sleep(0x3b9aca00)
	/usr/local/go/src/runtime/time.go:193 +0x12e
kernel.org/pub/linux/libs/security/libcap/cap.launch(0xc000094060, 0xc0000bc000, {0x0, 0x0}, 0xc0000940c0)
	/home/lorenz/dev/libcap/cap/launch.go:349 +0x865
created by kernel.org/pub/linux/libs/security/libcap/cap.launch
	/home/lorenz/dev/libcap/cap/launch.go:253 +0x1ef
[pid 2474783] +++ exited with 2 +++
[pid 2474782] +++ exited with 2 +++
[pid 2474781] +++ exited with 2 +++
[pid 2474780] +++ exited with 2 +++
[pid 2474779] +++ exited with 2 +++
[pid 2474778] +++ exited with 2 +++
+++ exited with 2 +++
Comment 9 lmb 2021-12-10 18:06:12 UTC
The logic for goroutine teardown is here: https://github.com/golang/go/blob/9b0de0854d5a5655890ef0b2b9052da2541182a3/src/runtime/proc.go#L3631-L3703 (can be invoked via runtime.Goexit).
Comment 10 Andrew G. Morgan 2021-12-11 03:32:27 UTC
Thanks very much for this bug report and the reproducer. With this code, I can reproduce this. I'll figure out the fix.

[I'll assume it is a "cap" package problem, but FWIW I find it fails using go1.16 and go1.17.]
Comment 11 Andrew G. Morgan 2021-12-11 05:05:35 UTC
I this this commit:

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=806b53d13a792d834622b2e546cfdceecc5af699

is a significant step in the right direction. Please give it a try.
Comment 12 Andrew G. Morgan 2021-12-11 22:59:24 UTC
I think HEAD now contains a reliable fix for this, this will be included in the libcap-2.62 version of the cap package.

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=e458889fbda4052919b61fd9f727bb1ac906d436

Please reopen if you find it isn't working reliably now.
Comment 13 Andrew G. Morgan 2021-12-12 20:23:38 UTC
FYI I chose not to include this next change in the 2.62 release:

  https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=af2bf057ed03a0c8b29898b721aef66b021124d0

It will be included in 2.63 when that release is cut.

This was a change to address the observation that the cgo build did not crash the same way as the CGO_ENABLED=0 build:

  https://bugzilla.kernel.org/show_bug.cgi?id=215283#c8
Comment 14 lmb 2021-12-13 09:54:31 UTC
Thanks Andrew, I'll give the fix a spin. I've been thinking about how to fix this, and one thing occurred to me about the wait using tgkill: doesn't that introduce another race? Linux is free to reuse tids, so if a new thread is spawned during Gosched we might end up in an infinite loop.

I've been toying with the idea of using set_tid_address to get a futex wakeup in user space. This is race free, but could really screw up the Go runtime if it decides to use CLONE_CLEAR_TID or similar. However, it may be possible to combine this approach with a sentinel package like https://pkg.go.dev/go4.org/unsafe/assume-no-moving-gc and a unit test invoking PR_GET_TID_ADDRESS to make this somewhat safe?
Comment 15 Andrew G. Morgan 2021-12-13 15:50:16 UTC
For now, I can't think of a better mechanism. 

Also, a minor fix for PSX crash dumper (dump the right argument set based on whether or not the six arg call is invoked):

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=cbdd2b14e0b6e48ac5139c9d4327020cd6996d40

./mismatch-cgo || exit 0 ; exit 1
psx_syscall result differs.
trap:186 a123=[0,0,0]
results: 10431={10431} 10430={10430} 10429={10429} 10428={10428} wanted={10427}
SIGSYS: bad system call
PC=0x7b0ba0cad5a1 m=0 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x481db7, 0xc000042e80)
        /home/andrew/sdk/go1.17.2/src/runtime/cgocall.go:156 +0x5c fp=0xc000042e58 sp=0xc000042e20 pc=0x404b7c
kernel.org/pub/linux/libs/security/libcap/psx._Cfunc_psx_syscall3(0xba, 0x0, 0x0, 0x0)
        _cgo_gotypes.go:72 +0x4d fp=0xc000042e80 sp=0xc000042e58 pc=0x4818cd
kernel.org/pub/linux/libs/security/libcap/psx.Syscall3(0xc000034738, 0x458edb, 0xc0000001a0, 0x200000003)
        /home/andrew/gits/libcap/go/vendor/kernel.org/pub/linux/libs/security/libcap/psx/psx_cgo.go:64 +0xa5 fp=0xc000042f08 sp=0xc000042e80 pc=0x481ac5
main.main()
        /home/andrew/gits/libcap/go/mismatch.go:13 +0x2a fp=0xc000042f80 sp=0xc000042f08 pc=0x481c2a
runtime.main()
        /home/andrew/sdk/go1.17.2/src/runtime/proc.go:255 +0x227 fp=0xc000042fe0 sp=0xc000042f80 pc=0x433ce7
runtime.goexit()
        /home/andrew/sdk/go1.17.2/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc000042fe8 sp=0xc000042fe0 pc=0x45ca41

rax    0x0
rbx    0x28bb
rcx    0x7b0ba0cad5a1
[...]

Note You need to log in before you can comment on or make changes to this bug.