Bug 217768 - Build error
Summary: Build error
Status: NEW
Alias: None
Product: Tools
Classification: Unclassified
Component: Trace-cmd/Kernelshark (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Default virtual assignee for Trace-cmd and kernelshark
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-07 10:53 UTC by Douglas RAILLARD
Modified: 2024-10-28 12:26 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Douglas RAILLARD 2023-08-07 10:53:45 UTC
Building trace-cmd with this recipe on Alpine 3.18 (musl libc) results in a build error:

#! /bin/bash

ALPINE_VERSION=v3.18
ALPINE_BUILD_DEPENDENCIES=(bash gcc git make linux-headers musl-dev pkgconfig bison flex zstd-dev zstd-static)
BROKEN_CROSS_COMPILATION=1

download() {
    git clone git://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git
    git -C trace-cmd checkout trace-cmd-v3.2

    git clone https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git
    git -C libtraceevent checkout libtraceevent-1.7.3

    git clone https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git
    git -C libtracefs checkout libtracefs-1.7.0
}

build_libtraceevent() {
    cd libtraceevent
    ../trace-cmd/make-trace-cmd.sh install
}

build_libtracefs() {
    cd libtracefs
    ../trace-cmd/make-trace-cmd.sh install
}

build_tracecmd() {
    # Disable libaudit and python support
    cd trace-cmd
    # All variables need to be exported, NOT SET ON THE CLI of make-trace-cmd.sh
    # itself. Otherwise they will internally conflict with existing ones.
    export LDFLAGS="-static" NO_AUDIT=yes NO_PYTHON=yes CFLAGS="-O3"
    ./make-trace-cmd.sh install install_libs
    strip "$TRACE_CMD_BIN"
}

TRACE_CMD_BIN=tracecmd/trace-cmd

build() {
    export PYTHON_VERS=python3
    export INSTALL_PATH="$(pwd)/installed_lib_dir"
    (build_libtraceevent) && (build_libtracefs) && (build_tracecmd)
}



Build warnings and error:

Making x86_64 trace-cmd build in: /tmp/tmphy9chhbx/x86_64/source

  UPDATE                 /tmp/tmphy9chhbx/x86_64/source/libtraceevent/ep_version.h
tracefs-perf.c: In function 'perf_event_open':
tracefs-perf.c:34:16: warning: implicit declaration of function 'syscall' [-Wimplicit-function-declaration]
   34 |         return syscall(__NR_perf_event_open, event, pid, cpu, group_fd, flags);
      |                ^~~~~~~
tracefs-perf.c: In function 'perf_mmap':
tracefs-perf.c:37:23: warning: implicit declaration of function 'getpagesize'; did you mean 'tep_get_page_size'? [-Wimplicit-function-declaration]
   37 | #define MAP_SIZE (9 * getpagesize())
      |                       ^~~~~~~~~~~
tracefs-perf.c:44:32: note: in expansion of macro 'MAP_SIZE'
   44 |         perf_mmap = mmap(NULL, MAP_SIZE,
      |                                ^~~~~~~~
tracefs-perf.c: In function 'perf_read_maps':
tracefs-perf.c:63:42: warning: implicit declaration of function 'getpid' [-Wimplicit-function-declaration]
   63 |         fd = perf_event_open(&perf_attr, getpid(), cpu, -1, 0);
      |                                          ^~~~~~
tracefs-perf.c:69:17: warning: implicit declaration of function 'close'; did you mean 'pclose'? [-Wimplicit-function-declaration]
   69 |                 close(fd);
      |                 ^~~~~
      |                 pclose
flex -o sqlhist-lex.c sqlhist.l
  UPDATE             tfs_version.h
Have ZSTD compression support
/usr/lib/gcc/x86_64-alpine-linux-musl/12.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: /tmp/tmphy9chhbx/x86_64/source/installed_lib_dir/usr/lib64/libtracefs.a(tracefs-utils.o): in function `strstrip':
/tmp/tmphy9chhbx/x86_64/source/libtracefs/src/tracefs-utils.c:324: multiple definition of `strstrip'; /tmp/tmphy9chhbx/x86_64/source/trace-cmd/tracecmd/trace-stat.o:trace-stat.c:(.text+0x200): first defined here
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:75: /tmp/tmphy9chhbx/x86_64/source/trace-cmd/tracecmd/trace-cmd] Error 1
make: *** [Makefile:408: trace-cmd] Error 2
Comment 1 Douglas RAILLARD 2023-08-09 11:54:15 UTC
After some digging strstrip is defined both in trace-cmd:

  ./tracecmd/trace-stat.c:40:char *strstrip(char *str)

and in libtracefs:
  
  ./src/tracefs-utils.c:323:__hidden char *strstrip(char *str)

In libtracefs.a the symbol is defined as such by readelf:

      44: 0000000000000990   159 FUNC    GLOBAL HIDDEN     1 strstrip

"__hidden" in libtracefs would make the symbol only available to the linked binary including the tracefs-utils.o object file. This probably works fine for dynamic linking, since that binary is libtracefs.so and nothing else in it defines strstrip().

However during static linking, tracefs-utils.o ends up more or less being dumped wholesale in the binary linking to it, at which point it conflicts with the definition in trace-stat.c. The fact that one definition is STV_HIDDEN does not change the fact that both are STB_GLOBAL and as such conflict.


There are two paths to fix the issue:

  * The use of __hidden to control scoping is broken and a static function should be used instead (or rename it). This will ensure an STB_LOCAL symbol and the behavior will be identical between static and dynamic linking.

  * The objects file ending up in the static archive need a post-processing pass to remove any STV_HIDDEN symbol (or turn them from STB_GLOBAL into STB_LOCAL). This can be achieved with "objcopy --localize-hidden libtracefs.a". The symbol is turned to STB_LOCAL and as such will not conflict with anything when linking. It will also not be available to other compilation units in libtracefs.a itself which makes that equivalent to using "static" in C:

       21: 0000000000000990   159 FUNC    LOCAL  HIDDEN     1 strstrip


And indeed, compiling with such option leads to:

  /tmp/tmpfek7roba/x86_64/source/libtracefs/src/tracefs-dynevents.c:778: undefined reference to `get_tep_event'

With:

  ./src/tracefs-events.c:971:__hidden struct tep_event *get_tep_event(struct tep_handle *tep,
  ./src/tracefs-dynevents.c:778:  return get_tep_event(tep, dynevent->system, dynevent->event);


This is because __hidden get_tep_event() (STB_GLOBAL + STV_HIDDEN) got converted to (STB_LOCAL + STV_HIDDEN), which essentially turns it to "static get_tep_event()" and as such is not available to any compilation unit other than tracefs-events.o


So since there is no way to get a visibility-based namespace in C that works for libs with more than one ELF file in it (i.e. static lib with more than one .o), the options are:

  1. Change the name of one of the definitions.
  2. Move that function to a 3rd library and make the other 2 depend on it.
  3. pre-link all the object files in libtracefs.a into a single .o


This patch implements 3. for libtracefs and fixes static build of trace-cmd. A similar patch would be needed for libtraceevent and libtracecmd even though there is currently no symbol conflict (by chance):

-- >8 --

From a0deaedd52dc49548d6e0ccce262fbc3a29fa267 Mon Sep 17 00:00:00 2001
From: Douglas Raillard <douglas.raillard@arm.com>
Date: Wed, 9 Aug 2023 11:44:47 +0100
Subject: [PATCH] libtracefs: Fix ELF symbol binding in static lib

libtracefs currently uses __attribute__((visibility ("hidden"))) in
order to make some functions globally available inside libtracefs, but
hidden to the outside world.

This gives such function the ELF binding STB_GLOBAL and visibility
STV_HIDDEN.

This works for shared objects, as all the object files are linked into a
single .so and therefore the only consumers of symbols will be other
components linking to libtracefs.so (e.g an executable or another .so).

However, this is problematic when libtracefs is used as a static
archive: the archive is made of a collection of .o, and the consumers of
such __hidden function are other object files. There is unfortunately no
distinction made by the linkers between other object files in the same
archive and object files from another component. This leads to
STV_HIDDEN being essentially useless and it results in linking errors if
the symbol happens to be redefined elsewhere with STB_GLOBAL (which does
happen with libtracecmd).

In order to fix that, we can pre-link the object files in the static
archive to resolve all references to such __hidden function, and then
turn STB_GLOBAL + STV_HIDDEN into STB_LOCAL + STV_HIDDEN. This way, the
symbol becomes local and will be ignored when linking to another
component, restoring the same behavior as when a dynamic lib is used.
---
 Makefile         | 3 +++
 scripts/utils.mk | 4 +++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 9f377e9..74898bf 100644
--- a/Makefile
+++ b/Makefile
@@ -29,10 +29,13 @@ endef
 # Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
 $(call allow-override,CC,$(CROSS_COMPILE)gcc)
 $(call allow-override,AR,$(CROSS_COMPILE)ar)
+$(call allow-override,OBJCOPY,$(CROSS_COMPILE)objcopy)
 $(call allow-override,PKG_CONFIG,pkg-config)
 $(call allow-override,LD_SO_CONF_PATH,/etc/ld.so.conf.d/)
 $(call allow-override,LDCONFIG,ldconfig)
 
+export AR CC OBJCOPY PKG_CONFIG LD_SO_CONF_PATH LDCONFIG
+
 EXT = -std=gnu99
 INSTALL = install
 
diff --git a/scripts/utils.mk b/scripts/utils.mk
index 4d0f8bc..c725339 100644
--- a/scripts/utils.mk
+++ b/scripts/utils.mk
@@ -69,7 +69,9 @@ do_build_static_lib =                         \
        if [ -f $@ ]; then                      \
            mv $@ $@.rm; $(RM) $@.rm;           \
        fi;                                     \
-       $(AR) rcs $@ $^)
+       $(LD) -r $^ -o prelink.o;               \
+       $(AR) rcs $@ prelink.o;                 \
+       $(OBJCOPY) --localize-hidden $@)
 
 do_compile_shared_library =                    \
        ($(print_shared_lib_compile)            \
-- 
2.25.1
Comment 2 Steven Rostedt 2023-08-09 23:46:07 UTC
On Wed, 09 Aug 2023 11:54:15 +0000
bugzilla-daemon@kernel.org wrote:

https://bugzilla.kernel.org/show_bug.cgi?id=217768

--- Comment #1 from Douglas RAILLARD (douglas.raillard@arm.com) ---

> So since there is no way to get a visibility-based namespace in C that works
> for libs with more than one ELF file in it (i.e. static lib with more than
> one
> .o), the options are:

Hi Douglas,

Thanks for looking into this and debugging it. Yeah, I figured that
__hidden was going to bite me sooner or later. :-/


> 
>   1. Change the name of one of the definitions.
>   2. Move that function to a 3rd library and make the other 2 depend on it.
>   3. pre-link all the object files in libtracefs.a into a single .o
> 
> 
> This patch implements 3. for libtracefs and fixes static build of trace-cmd.
> A
> similar patch would be needed for libtraceevent and libtracecmd even though
> there is currently no symbol conflict (by chance):

Actually, I prefer #1. #2 is out of the question. But trace-cmd actually
depends on hidden functions being accessible during linking. That is, the
trace-cmd binary uses __hidden functions in libtracecmd. The __hidden()
just means that they should not be relied on.

Perhaps we need to make all __hidden functions have a prefix. Hmm, perhaps just add:

 "_tracecmd_" or "_tracefs_" to the front of it?

That is, for strstrip() in trace-cmd, rename it to: _tracecmd_strstrip()

and for libtracefs: _libtracefs_strstrip()

by convention, anything that starts with "tracecmd_" or "tracefs_" should
be exposed and considered API. We can have that anything with "_tracecmd_"
or "_tracefs_" is to be hidden and not part of the API.

-- Steve
Comment 3 Steven Rostedt 2023-08-09 23:47:48 UTC
On Wed, 9 Aug 2023 19:46:02 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Perhaps we need to make all __hidden functions have a prefix. Hmm, perhaps
> just add:
> 
>  "_tracecmd_" or "_tracefs_" to the front of it?
> 
> That is, for strstrip() in trace-cmd, rename it to: _tracecmd_strstrip()
> 
> and for libtracefs: _libtracefs_strstrip()

That should have been: _tracefs_strstrip()
Comment 4 Douglas RAILLARD 2023-08-10 10:19:04 UTC
Hi Steven,

> That is, the trace-cmd binary uses __hidden functions in libtracecmd. The
> __hidden() just means that they should not be relied on.

How does it currently work ? If the symbol is STV_HIDDEN in a DSO no linker or loader will use the definition in libtracecmd to provide a value for an undefined symbol in e.g. an executable.

Similarly, anything hidden in e.g. libtracefs will never be accessible from the outside world (unless linked statically as I experienced, but that breaks the build)

Other than that it would definitely simplify building, as the visibility shenanigans can be made to work with a Makefile as everything is "manual", but might be more tricky to apply in other build systems.


There is another hybrid option that might be overkill:

  * Define functions using a macro "void HIDDEN(myfunc)(char *param)"

  * For static builds, the macro expands to:
    #define HIDDEN(func) _thehiddenprefix_##func

  * For DSO builds, it expands to:
    #define HIDDEN(func) __attribute__((visbility("hidden"))) func

So that ensures that no-one can depend on anything private in the case where it's possible to swap a DSO for another one for a patch/minor upgrade, and for the static link updating would require a rebuild anyway so it's less important to have an airtight ABI. From a build system point of view nothing needs to be changed.



Douglas
Comment 5 Steven Rostedt 2023-08-11 18:13:48 UTC
Note, trace-cmd only links to the static libtracecmd.a.

As for libtracefs, nothing (but the testing and sample code) links to it statically and needs access to these functions.
Comment 6 Steven Rostedt 2023-08-11 18:16:36 UTC
How do other C libraries use functions between files within the library without exposing them to the rest of the world?
Comment 7 Douglas RAILLARD 2023-08-14 09:41:52 UTC
> How do other C libraries use functions between files within the library
> without exposing them to the rest of the world?

I guess they are just publicly exposed in the ABI for basic projects (and just not documented). Otherwise they are probably playing tricks like I've shown, or they use static inline functions for small utilities. AFAIR you can also filter the public ABI with symbol versioning system, but that only applies to DSO.

Having a look at e.g. zlib:

* The zmemcpy function has both ZLIB_INTERNAL marker and a lib prefix ("z"):
  void ZLIB_INTERNAL zmemcpy(dest, source, len)
  https://github.com/madler/zlib/blob/04f42ceca40f73e2978b50e93806c2a18c1281fc/zutil.c#L151

* But this function has no prefix:
  void ZLIB_INTERNAL inflate_fast(strm, start)
  https://github.com/madler/zlib/blob/04f42ceca40f73e2978b50e93806c2a18c1281fc/inffast.c#L50

* Then they also use the visibility trick when the platform supports it:
  #ifdef HAVE_HIDDEN
  #  define ZLIB_INTERNAL __attribute__((visibility ("hidden")))
  #else
  #  define ZLIB_INTERNAL
  #endif


On Ubuntu 20.04, I have:

    $ readelf -a /usr/lib/x86_64-linux-gnu/libz.a
    30: 0000000000000000     0 NOTYPE  GLOBAL HIDDEN   UND inflate_fast

And the symbol does not exist in the DSO at all, so it basically works like the current state of trace-cmd, which is, it doesn't and will blow up if someone tries to inflate_fast().

Otherwise in other languages the problem is solved for good with a module/namespace system.
Comment 8 Douglas RAILLARD 2024-07-29 10:10:42 UTC
Hi Steven,

Is there any update on this one ?

I checked the git history and there seems to be support for meson, but that probably does not fix the issue. So far we haven't needed to update our binary in the last year but if we do we will be in a pickle. Do we have a path out of the current situation ?

Cheers,
Douglas
Comment 9 Steven Rostedt 2024-07-29 19:16:00 UTC
> Is there any update on this one ?
> 
> I checked the git history and there seems to be support for meson, but that
> probably does not fix the issue. So far we haven't needed to update our
> binary
> in the last year but if we do we will be in a pickle. Do we have a path out
> of
> the current situation ?

Would the renaming of the problem functions work for you?

I believe that was the only acceptable solution you proposed.

-- Steve
Comment 10 Douglas RAILLARD 2024-07-31 11:35:12 UTC
After re-reading the discussion, it looks like it would work indeed. I also noticed this:

> But trace-cmd actually
> depends on hidden functions being accessible during linking. That is, the
> trace-cmd binary uses __hidden functions in libtracecmd. The __hidden()
> just means that they should not be relied on.

and 

> Note, trace-cmd only links to the static libtracecmd.a.
> As for libtracefs, nothing (but the testing and sample code) links to it
> statically and needs access to these functions.

So if I understand correctly, only libtracecmd.a is using __hidden in a way
that allows another component (trace-cmd binary) to use its hidden symbols.

Other uses such as in libtracefs should use a local symbol, either by using a
static function in header or by using __hidden and then used
"objcopy -localize-hidden libtracefs.a" in static builds (this will happen by
default for a DSO build according to [1]).  

So that would mean that:

  * libtracecmd.a needs to get its symbols renamed, since they are part of the
	(private) API of the component and used by other components. Only a naming
	convention can allow that at this level. 

  * libtracefs and others could retain the use of __hidden if they use objcopy
	for static builds. However, static functions or a naming convention is the
	more typical way to do that.
	

[1] https://unix.stackexchange.com/questions/701556/elf-symbol-globalhidden


Douglas

Note You need to log in before you can comment on or make changes to this bug.