211919 – Feature request: expose go libcap api modifying a single thread.

Bug 211919 - Feature request: expose go libcap api modifying a single thread.

Summary: Feature request: expose go libcap api modifying a single thread.

Status:	RESOLVED CODE_FIX

Alias:	None

Product:	Tools
Classification:	Unclassified
Component:	libcap (show other bugs)
Hardware:	All Linux

Importance:	P1 low
Assignee:	Andrew G. Morgan

URL:
Keywords:

Depends on:
Blocks:

Reported:	2021-02-24 22:50 UTC by Gregory Fuchedzhy
Modified:	2021-03-11 06:25 UTC (History)
CC List:	1 user (show)

See Also:
Kernel Version:	all
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Gregory Fuchedzhy 2021-02-24 22:50:31 UTC

Hey, I'm using capset/capget syscalls in a go program to execute other programs in network namespaces, the approach is:

> go func() { 
>   runtime.LockOSThread() // exclusively bind the goroutine to a single os
>   thread.
>   unix.Capget/Capset
>   unix.Setns // change netns of the os thread
>   exec.Start/Wait/Run/etc
>   // exit goroutine without unlocking to kill the os thread
> } ()

I would really like to use libcap instead of naked syscalls. But it is not possible right now for my use case for several reasons:

> 

 - I need to raise capabilities into the effective set to call setns. However I don't want to do that for the whole process.

> 

- Launch() API also wouldn't work for me since it modifies capabilities right before fork/exec, the callback is called with current capabilities, so I can't do setns there. This was clearly designed for the use case of dropping capabilities. Also I would like to use standard go exec package to do io with child processes the standard way instead of relying on limited forkexec.

> 

So I was wondering if it would be a good idea to either expose API with cap.singlesc syscaller underneath. Or have an API similar to Launch() but without fork/exec. Instead it would accept a callback only(or two? pre/post cap change). This would give the user a clean API to run whatever they need in a modified thread hiding runtime.LockOSThread ugliness from the user.

> 

The start of the discussion: https://github.com/golang/go/issues/44312

Comment 1 Andrew G. Morgan 2021-02-25 01:22:37 UTC

I'm not sure I follow the desire to not do this for the whole process. The "cap" package will ensure you avoid subtle runtime races when you do this:

  orig := cap.GetProc()

  raised, _ := orig.Dup()
  raised.SetFlag(cap.Effective, true, cap.FOO, cap.BAR, ...)
  raised.SetProc()
  launcher.Launch()
  orig.SetProc()


This is a serious question for me.

The background for why I am curious is as follows:

If you try to do things inside the launch callback, you will run into a race condition with two separate launchers running simultaneously and, potentially, the rest of the process trying to do something else entirely. As it is, the cap package is quietly working around simultaneous capability manipulation going on in the Go runtime code that would otherwise interact badly with the cap package's uses.

Generally, Go's runtime model doesn't want you to think about the OS threads at all. Indeed, Go has its own distinct notion of procs and goroutines neither of which are the same as OSThreads.

The runtime.LockOSThread() thing is a hack for cgo library call compatibility and pure Go isn't supposed to need it. The longstanding lack of POSIX semantics support for security relevant things is where the workaround of locking threads and hacking with [Raw]Syscall()s has become common.

Comment 2 Andrew G. Morgan 2021-02-25 15:42:39 UTC

Can you sketch the full syscall sequence you are trying to get to work? I feel like I'm missing some subtlety that would help me understand. Thanks!

Comment 3 Gregory Fuchedzhy 2021-02-25 20:08:18 UTC

My program executes other unprivileged programs in parallel in a set of network namespaces. It may control the programs by writing to their stdin or to named pipes in the fs.

Nothing requires elevated privileges in this setup except starting a program in a network namespace. My program executable has cap_sys_admin+p file bit. I only raise cap_sys_admin to the effective set briefly to call setns to start a program in a net namespace.

I'm locking and then destroying the thread, because I don't want other goroutines(that may be doing other things, like opening files/named pipes) to have cap_sys_admin effective even briefly, as that could be potentially exploited.

I understand your argument about a race condition for the use case of dropping capabilities. But for the use case of raising capabilities it goes the other way. Raising capabilities for the whole process introduces a race condition. Because suddenly between raised.SetProc() and orig.SetProc() syscalls from other threads may have cap_sys_admin effective, which they are not supposed to.

I also understand the desire to not use runtime.LockOSThread() and never think about os threads. But goroutines (as any other non-trivial abstraction) are leaky abstractions :(

Launch like callback API could help hiding the leak, although still only to some extent. Because if callback starts a new goroutine it would have an original set of capabilities, which is bad for the use case of dropping capabilities, but fine for raising capabilities.

Should the API only allow raising per-thread capabilities and return an error if you try to drop them?

Comment 4 Andrew G. Morgan 2021-02-26 02:06:44 UTC

If the launcher had a method that you could use to list the effective capabilities you wanted raised for the duration of executing the callback function, would that address your need?

Comment 5 Gregory Fuchedzhy 2021-02-26 22:21:24 UTC

I think so, although I also don't want Launch to actually fork/exec at the end, since I'm going to do that myself from the callback.

Not sure what would be a good way to do that from the API point of view.

Comment 6 Andrew G. Morgan 2021-02-27 00:50:06 UTC

Is that just because you want to redirect stdin/out/err or something else?

There is no way to control which capabilities a go routine started from within the callback will have. Such a goroutine is randomly assigned an M from those already running, or one already in use.

If you set the capabilities you want for the callback before you call Launch() then all goroutines will inherit that state. Trying to micromanage them from within the callback is going to get tricky fast. It is for this reason we don't currently support special capabilities for the callback. I'm trying to make sure we think through the use case fully before committing to some API change/extension.

Comment 7 Andrew G. Morgan 2021-02-27 00:51:54 UTC

s/or one/or created from one/

Comment 8 Gregory Fuchedzhy 2021-03-01 19:41:44 UTC

> Is that just because you want to redirect stdin/out/err or something else?
I want to redirect stdin/out/err.

> Trying to micromanage them from within the callback is going to get tricky
> fast
I agree. This only works for simple callbacks, like in my use case.

Comment 9 Andrew G. Morgan 2021-03-01 20:32:20 UTC

The callback mechanism should currently support redirecting these file descriptors. See 

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/tree/cap/launch.go#n166

in the callback, you should be able to modify this slice to meet your needs.

Comment 10 Andrew G. Morgan 2021-03-04 04:49:09 UTC

Gregory,

Where are we this? I've been exploring allowing the callback to perform cap.Fn()s that write privilege/capability state. Specifically, in a way that will magically only apply to the callback goroutine itself. I think I have a path to get there.

Notably, with what I'm thinking, goroutines started by the callback function itself will share capability state with the rest of the process and not the callback function. They will also block the moment they try to change capability state until after the launch has happened.

I'm unclear, however, if you still see this general ability as necessary? If so, can you confirm that you think an implementation of the above type will work for your use case?

Thanks

Andrew

Comment 11 Gregory Fuchedzhy 2021-03-04 05:36:21 UTC

Hey Andrew,

Your proposal of managing capabilities in the callback sounds great! However, considering I'd have to move from exec package to custom process launch API, I might be better off locking the thread and calling unix.Capget myself instead.

I'd really love to see a more generic version of such an API (callback only, no fork/exec). Launching processes doesn't seem like the primary purpose of the cap package. Some users might need capabilities for something else entirely, not launching processes.

What do you think?

Comment 12 Andrew G. Morgan 2021-03-04 06:23:53 UTC

On other OSes, setuid and privileged operations in general execute on process-shared state: POSIX semantics. libcap/cap's threading model is trying to honor POSIX semantics for security state and hide the complexity of OS Threads, as Go tries to do itself for everything else (that isn't cgo).

The problem is that ForkExec functionality and Go are very tricky because fork() plays really poorly with all of the Go runtime's assumptions that threads are interchangeable. At some point during such a launch, the child diverges from the parent's security state. The tricky parts are all around how you propagate launch status (PID etc) back to the security context of the main parent process.

Recognizing that the launching process is special, libcap/cap provides this Launch mechanism.

I guess I am still open to changes that make it more useful, but I'm not really open to requiring users of libcap/cap to manually do runtime.LockOSThread() to use its API. That would effectively break the POSIX semantics core rationale for creating the package.

Comment 13 Gregory Fuchedzhy 2021-03-04 17:30:49 UTC

> requiring users of libcap/cap to manually do runtime.LockOSThread() to use
> its API

Sorry, I wasn't clear, that's not what I'm proposing.
I don't want users to lock threads themselves, I want cap to do that.
Basically I want what you proposed in comment10, but without mandatory fork/exec at the end. Maybe something like:

  cap.RunWithSeparateSecurityState(func() {
    // cap does all the magic, it launches this callback
    // in a separate go routine exclusively bound to an OS thread
    // that would be destroyed after the callback.
    cap.Fn() // Affects this callback only.

    ...exec.Start, open files, etc, do whatever is required.
  })

The difference with what you proposed in comment10 is that generic API doesn't do fork/exec at the end, because the user did everything they need in the callback.

Do you think that would work?

Comment 14 Andrew G. Morgan 2021-03-05 02:13:16 UTC

How would the process management handle get communicated to the parent in this case?

Comment 15 Andrew G. Morgan 2021-03-05 05:37:04 UTC

I guess you can do that via the data interface{} argument.

How about this for an API?:

  cap.FuncLauncher(fn func(interface{}) error) *Launcher

You can use it to prepare a *Launcher with no program to ForkExec(), and all it does is run the callback fn in a disposable security context. You can inline it as follows:

  if _, err := cap.FuncLauncher(func (data interface{}) error {

     // whatever you want.
     return anyError

  }).Launch(yourData); err != nil {
     log.Fatalf("that didn't work: %v", err)
  }

Comment 16 Gregory Fuchedzhy 2021-03-05 20:24:18 UTC

> the process management handle
I'm not sure what that is, could you explain? I didn't understand why `data interface{}` parameter is necessary. Callback can capture any required data, right?

I'm also not sure about callback return value. Would Launch() return both internal cap errors(if any) and the error comming from the callback itself? Do we want to separate those?
Should returning errors(and any other result values) from the callback be out of cap's scope? E.g. callback returns nothing, and API lets the author of the callback to decide if they want to write results to a channel or populate captured variables or smth else.
Or do you think it would be good to propagate an error as probably any use case would require handling errors?

Comment 17 Andrew G. Morgan 2021-03-05 20:43:58 UTC

From libcap/cap's perspective the .Launch() either succeeds or fails. It can fail for cap.* reasons or callbackFn() reasons, but it communicates whatever the reason via a single err value.

The data interface{} opaque object is a communication mechanism between your code and the callbackFn invocation. You can cast it to whatever type pointer you want to use inside the callbackFn, which could include a pre-opened channel, or even a channel opened in the callback and written into that casted data value.

Ex. 

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/tree/goapps/gowns/gowns.go#n92

Comment 18 Gregory Fuchedzhy 2021-03-05 21:23:56 UTC

Sounds great!

> communication mechanism
yeah I get that, I was just wondering if there was a reason to use the argument instead of letting the callback capture values.
But I then realized that using the argument you could run multiple Launch() calls in parallel on a single Launcher object. That is possible, right?

Comment 19 Andrew G. Morgan 2021-03-06 14:59:46 UTC

Yes, that is the idea. Such a callback would block whole process cap activity, but launches look like they can work in parallel.

Comment 20 Andrew G. Morgan 2021-03-07 05:25:10 UTC

I've pushed this commit which implements cap.FuncLauncher():

  https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=cb1f3fd143b95f9be6605d69256ae0438c83b84d

Please see if it works for you. I've tagged the very next commit, which adds a couple of convenience funtions cap.Prctl{,w}(...), 

I've added a tag for this pre-release module:

  https://pkg.go.dev/kernel.org/pub/linux/libs/security/libcap/cap@v1.2.49-rc3#FuncLauncher

If you want to try it and confirm it works for your use case you should refer to the cap module in your go.mod file as:

require kernel.org/pub/linux/libs/security/libcap/cap v1.2.49-rc3

Comment 21 Andrew G. Morgan 2021-03-08 14:48:44 UTC

All that being said, I suspect exec.Cmd ... Start etc is not going to actually work out for what you want to do.

I recall the reason I went with syscall.ForkExec for backing the launcher was it runs in the context of the goroutine that invokes it. I've not looked into exec.Cmd and friends much but, because the goroutine that requests one of these objects is not terminated as part of the API, I have to assume the Start is implemented with a goroutine or two of its own. One that won't get the security context you want.

Comment 22 Gregory Fuchedzhy 2021-03-09 23:38:52 UTC

Hey Andrew, I tried your changes and they work perfectly for my use case! Thanks a lot! I'm very happy with the current state and don't have any major comments, just a few nits/questions/suggestions below:

1. It feels a bit weird calling SetProc to change callback capabilities, as the name suggests whole process change. I understand renaming a function to a more generic Set would break the API. But I wonder if it would make sense to have a different function(SetLaunch?) and to require the user of the API to be explicit: calling SetProc in the callback fails and calling SetLaunch outside of the callback fails. Or add a generic Set function that works in both contexts, but let SetProc fail if called from the callback.

If you think it is better to keep a single function for both use cases, I think SetProc documentation should be updated to mention launcher use case.

2.
> // goroutines will block if they attempt to change privilege state.
I think it would be good to clarify here that only global process state change would block, while it is fine to have multiple parallel Launch()es each changing its own local capabilities.

3.
> return details from the callback functions exection.
typo execUtion


4. I wonder if the code could be simplified a bit by replacing 3 states(idle,active,blocked) with a boolean isBlocked. len(scwTIDs) could be used to indicate active/non-active state. In fact, you could go one step further and use the mutex itself instead of isBlocked.
E.g., something like this:

  // this one is called in launch and Launcher.Launch instead of scwSetState
  func scwSetActive(tid int, active bool) {
  	scwMu.Lock()
  	if active {
  		scwTIDs[tid] = true
  	} else {
  		delete(scwTIDs, tid)
  	}
  	scwCond.Broadcast()
  	scwMu.Unlock()
  }

  // don't need to return old state, the state is determined
  // by the state of the mutex and len(scwTIDs)
  func scwLockAndGetSC() *syscaller {
  	sc := multisc
  	scwMu.Lock()
  	for {
  		if len(scwTIDs) == 0 {
  			break
  		}
  		runtime.LockOSThread()
  		if scwTIDs[syscall.Gettid()] {
  			sc = singlesc
  			// note, we don't runtime.UnlockOSThread()
  			// here because we have no reason to ever
  			// allow this thread to return to normal use -
  			// we need it dead before we can return to the
  			// launchIdle state.
  			break
  		}
  		runtime.UnlockOSThread()
  		scwCond.Wait()
  	}
  	// do not unlock the mutex.
  	return sc
  }

Then in SetProc and friends:
  func (c *Set) SetProc() error {
  	defer scwMu.Unlock()
  	return scwLockAndGetSC().setProc(c)
  }

Comment 23 Andrew G. Morgan 2021-03-11 04:08:56 UTC

1. I've updated the documentation for cap.SetProc.

2. Done.

3. thanks!

4. I think this would mean we can't allow launchers to do syscalls in
   parallel, which will ultimately cause things to unnecessarily serialize
   on kernel calls. As it is the code allows two launchers to launch mostly in
   parallel.

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=d1445dd1e18bc3e662090d71bd08eb8d1db4c534

Comment 24 Gregory Fuchedzhy 2021-03-11 06:02:36 UTC

> unnecessarily serialize on kernel calls
I believe current implementation does that as well. That's why I assumed it is not a big deal. Is it?
Currently scwStateSC() changes state to launchBlocked which makes all other scwStateSC() block until the syscall is done. So calls from other launchers can't get syscaller even if currently active syscaller is singlesc.

Comment 25 Andrew G. Morgan 2021-03-11 06:25:30 UTC

True for the callback launched functions, but not for the internal "cap" functions - they can ignore the lock because they know they are never competing with external calls into the package once the state is active.

Note You need to log in before you can comment on or make changes to this bug.