Bug 216610 - libpsx does not work when building with cgo but without libc
Summary: libpsx does not work when building with cgo but without libc
Status: ASSIGNED
Alias: None
Product: Tools
Classification: Unclassified
Component: libcap (show other bugs)
Hardware: All Linux
: P3 low
Assignee: Andrew G. Morgan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-19 19:24 UTC by Günther Noack
Modified: 2023-02-12 05:10 UTC (History)
2 users (show)

See Also:
Kernel Version: -
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Günther Noack 2022-10-19 19:24:03 UTC
This is an obscure use case, but one user of go-landlock (depending on Go libpsx) had to revert the change because the program is built *with* cgo, but *without* libc.

Quoting from https://github.com/gokrazy/gokrazy/commit/d5cbca1b9083f8067be3c432b2c9faaf78956c35

> landlock depends on psx. When cgo is enabled, psx uses a cgo
> implementation, which is undesired in setups which use cgo
> (e.g. scan2drive with turbojpeg), but don’t come with a libc:
> the dhcp client doesn’t start, which means the device is no
> longer reachable over the network.

Do you think it would be feasible to make Go libpsx usable in such a configuration? It seems that this would only require a change to the build constraints, so that it can be overridden for this case?

(Brief derail about build constraints:

For reference, the build constraint syntax for Go 1.16 (and earlier):
https://pkg.go.dev/cmd/go@go1.16#hdr-Build_constraints

There is a new syntax for build constraints now which is a bit more boolean-expression-like, but you'll still need to use the old syntax if you want to stay compatible with Go 1.11. The command "gofmt -w *.go" generates the equivalent new-style build constraint next to it.

)
Comment 1 Andrew G. Morgan 2022-10-20 03:03:28 UTC
I think I need a bit more info here.

- Prior to go1.16 there was no way to build the "psx" Go package without linking it against the [lib]psx C library because the Go runtime itself couldn't support what "psx" does. I fixed that upstream in go1.16 to permit a "nocgo" build variant.

- So far as I can tell, "cgo" compilation forces the Go runtime to replace the raw "sys_clone" thread implementation, used by "nocgo" code, with a "C.pthread_create" version.

I'm not clear how it is possible to compile "cgo" without linking with the pthread library (which is effectively an alias for libc these days). psx works in cgo mode by wrapping the pthreads functions.

Perhaps what I'm looking for is a simple example of cgo linking that does not use libc. I can dig into what magic is at work in terms of linking and assess whether or not I can understand this request.

All that being said, can you also confirm that:

  CGO_ENABLED=0 go build ...

works or fails for this case?
Comment 2 Michael Stapelberg 2022-10-21 09:06:54 UTC
Hello!

I am the person who ran into the issue :)

(In reply to Andrew G. Morgan from comment #1)
> I'm not clear how it is possible to compile "cgo" without linking with the
> pthread library (which is effectively an alias for libc these days). psx
> works in cgo mode by wrapping the pthreads functions.
> 
> Perhaps what I'm looking for is a simple example of cgo linking that does
> not use libc. I can dig into what magic is at work in terms of linking and
> assess whether or not I can understand this request.

I am using cgo to call into a .syso object file of libturbojpeg: https://github.com/stapelberg/turbojpeg/commit/108fc19f61baf993240d0cbe502476201a59feff

The .syso file is entirely self-contained and contains no references to libc symbols.

I am also building my Go code with the build tags netgo and osusergo, which will make the net and os/user package not use cgo (which otherwise brings in a libc dependency).

The compiled programs are then run on a Raspberry Pi running Linux, where no C userland is present (no libc): https://gokrazy.org/

In the case of psx, Go links against libc (available on the host), but the program cannot start (not available on the target).

I realize this is an unusual setup, but I think being able to select the implementation is good practice for cases like these and potentially others. Take a look at https://github.com/golang/go/commit/62f0127d8104d8266d9a3fb5a87e2f09ec8b6f5b for an example where a similar build tag was introduced in Go itself. My suggestion would be to introduce a psxgo build tag.

> 
> All that being said, can you also confirm that:
> 
>   CGO_ENABLED=0 go build ...
> 
> works or fails for this case?

Building with CGO_ENABLED=0 does work correctly. gokrazy builds with CGO_ENABLED=0 by default, which is why I did not run into this issue with psx on the first few devices I updated. It is only more advanced setups that use compute kernels like turbojpeg with cgo (for my https://github.com/stapelberg/scan2drive appliance) that run into this problem.

Thanks for considering
Comment 3 Andrew G. Morgan 2022-10-21 14:51:13 UTC
Are you referring to the "C code without cgo" trick explained here?

   https://zchee.github.io/golang-wiki/GcToolchainTricks/

Or something more sophisticated?

The "compute kernel" comment concerns me. Do these kernels generate/manage their own threads in some way?
Comment 4 Michael Stapelberg 2022-10-21 20:25:33 UTC
> Are you referring to the "C code without cgo" trick explained here?

Yes.

> The "compute kernel" comment concerns me. Do these kernels generate/manage
> their own threads in some way?

I’m just using the term to refer to a library that does computation without side effects and library dependencies. Examples of such libraries are encoders like turbojpeg or compressors like zlib.
Comment 5 Andrew G. Morgan 2022-10-22 00:23:17 UTC
OK. So adding a psxgo tag would be a small change for me to do.

Can you share a build log failure from the CGO_ENABLED=0 build though? I want to understand how and what makes this "more advanced" build setup fail when used that way.
Comment 6 Michael Stapelberg 2022-10-22 11:50:46 UTC
The build actually succeeds (libc is available on the host), but the binary cannot be started on the target. The error message is "no such file or directory", because Linux cannot find libc on the target. 

This is particularly noticeable because the program in which I wanted to introduce psx is the DHCP client, so the failure mode is that the Raspberry Pi won’t bring up the network :)
Comment 7 Andrew G. Morgan 2022-10-22 15:04:54 UTC
That is what is confusing to me. If CGO_ENABLED=0, I don't see how the existing build tags in the psx package are not sufficient to prevent the cgo version of the code from being part of the build.

The piece that worries me is this code in Go's runtime package (which is the backend for the syscall.AllThreadsSyscall() which "psx" uses:

https://cs.opensource.google/go/go/+/master:src/runtime/os_linux.go;l=716-720?q=allthreadssyscall&ss=go%2Fgo

...
func syscall_runtime_doAllThreadsSyscall(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
	if iscgo {
		// In cgo, we are not aware of threads created in C, so this approach will not work.
		panic("doAllThreadsSyscall not supported with cgo enabled")
	}
...

I don't think there is anything like this in the "net" or "osuser" packages.
Comment 8 Andrew G. Morgan 2022-10-23 03:47:17 UTC
I've been exploring things, trying to generate an example. I can't get a cgo, stand alone, C function to work without linking against libc...

I note in your 

https://github.com/stapelberg/turbojpeg/commit/108fc19f61baf993240d0cbe502476201a59feff

that you are explicitly using cgo. Given that you have removed psx, can you tell me what the output of

 $ ldd your-program-binary

is on the "target" ? When you try it with psx, what does that command generate?

Thanks
Comment 9 Andrew G. Morgan 2022-10-23 22:43:38 UTC
FWIW I've not been able to provide a path to supporting "psxgo" (because of the panic() noted above), but I did create a self-contained example of calling some C code from a CGO_ENABLED=0 build, which invokes the Go native "psx" build options:

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=08d48b659aa59d2a5acd9cd13f640f6497718796

This is lacking arm target support, but patches welcome. :^)
Comment 10 Andrew G. Morgan 2022-10-30 03:46:54 UTC
I've spent a bit of time trying to understand why the above support didn't seem to work when I compiled the `.syso` file with `gcc -O3 ...`.

I found that there is a relative addressing subtlety on x86_64 (aka amd64) that `gcc` adds some sort of relative addressing cross linkage between the `.text` and `.rodata.c*` sections. I suspect the Go (internal) linker does not have support for this, so the relative address offset is silently dropped.

I've developed a workaround for this problem involving running `sed` over `gcc` generated assembly files. This enhancement is included in:

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=0d528688fe40e9703463b27f27c4dbe485e229a0

which now builds the Fibonacci Number compute kernel with `gcc -O3`, and importantly generates the correct result! The un-optimized compilation does not seem to need this relative addressing because it uses immediate data and not relatively addressed data.

Given this workaround and knowing to search for `R_X86_64_PC32` in Go's bug tracker, I found a pre-existing bug that mentioned what appears to be this exact linkage problem. So I briefly explained this work around there:

https://github.com/golang/go/issues/24321#issuecomment-1296084103

I'm probably doing something verboten, but perhaps that bug discussion will help show the path to the correct fix?

I plan to enhance this contrib/bug216610 example to be able to build something for a Raspberry Pi. If I can get that to work, I'll declare this bug resolved. (As I noted above, if someone wants to contribute that code first, that will hurry this along.)
Comment 11 Andrew G. Morgan 2023-02-03 03:16:04 UTC
Adding a note. It has been a while since the last libcap release. I've not resolved this present bug to my satisfaction, but the contrib code is at least stable. I'll pick this up again sometime after releasing 2.67.
Comment 12 Andrew G. Morgan 2023-02-12 05:10:07 UTC
This commit adds cross compilation support for arm (32 bit):

https://git.kernel.org/pub/scm/libs/libcap/libcap.git/commit/?id=7e41da10505189b8dbee93b25dea1dfb07a89d9b

I don't have a handy arm64 system yet, but I'll keep this bug open until I've been able to verify this support works there too.

Note You need to log in before you can comment on or make changes to this bug.