Bug 204125

Summary: FTBFS on ppc64 big endian and gcc9 because of -mcall-aixdesc and missing __linux__
Product: Platform Specific/Hardware Reporter: Daniel Kolesa (linux)
Component: PPC-64Assignee: platform_ppc-64
Status: RESOLVED INVALID    
Severity: high CC: segher
Priority: P1    
Hardware: PPC-64   
OS: Linux   
Kernel Version: any Subsystem:
Regression: No Bisected commit-id:

Description Daniel Kolesa 2019-07-10 13:24:01 UTC
On ppc64 big endian, the kernel builds with `-mcall-aixdesc` which since gcc 9.x removes `__linux__` from the list of macros being defined. This behavior is supposed to be more correct (as it's in this case nothing but a hack, the flag should apparently only be used when building for AIX) but sadly it breaks build since several things within the tree rely on `__linux__` being defined and `#ifdef` some of their code based on said macro.

Just removing `-mcall-aixdesc` (and using just `-mabi=elfv1`) is however not enough, as that instead causes countless undefined references to just about every symbol when linking `vmlinux`. It would seem that `-mcall-aixdesc` changes the way symbols are declared in a way that is not expected.

Little endian is not affected because that one uses `-mabi=elfv2` exclusively.

For now I worked around it in my distro by explicitly adding `-D__linux__` in the kbuild where `-mcall-aixdesc` is inserted into flags, and it works, but that's obviously just a workaround.

I'm not sure what the proper fix would be.

Also, is there any reason not to allow an ELFv2 kernel to be built on big endian? There are some supposed performance benefits, and ELFv2 itself supports either endianness. The current kbuild logic pretty much forces ELFv1 for big endian and ELFv2 for little endian.
Comment 1 Segher Boessenkool 2019-07-10 15:37:36 UTC
(In reply to Daniel Kolesa from comment #0)
> On ppc64 big endian, the kernel builds with `-mcall-aixdesc` which since gcc
> 9.x removes `__linux__` from the list of macros being defined.

This is a bug.  Please report at https://gcc.gnu.org/bugzilla .

> This behavior
> is supposed to be more correct (as it's in this case nothing but a hack, the
> flag should apparently only be used when building for AIX)

What makes you think that?

OTOH, why does the kernel use that option?

> but sadly it
> breaks build since several things within the tree rely on `__linux__` being
> defined and `#ifdef` some of their code based on said macro.

Those are bugs as well, then.

> Just removing `-mcall-aixdesc` (and using just `-mabi=elfv1`) is however not
> enough, as that instead causes countless undefined references to just about
> every symbol when linking `vmlinux`. It would seem that `-mcall-aixdesc`
> changes the way symbols are declared in a way that is not expected.

> Little endian is not affected because that one uses `-mabi=elfv2`
> exclusively.

Of course, that is the only defined ABI for powerpc64le after all.
 
> Also, is there any reason not to allow an ELFv2 kernel to be built on big
> endian?

Building it _on_ BE works just fine, of course.  But you mean building a BE
kernel using the ELFv2 ABI.  This is not supported; it would require writing
other versions for various low-level things.

ELFv2 is not supported in BE userland, either, btw.

> There are some supposed performance benefits, and ELFv2 itself
> supports either endianness. The current kbuild logic pretty much forces
> ELFv1 for big endian and ELFv2 for little endian.

ELFv2 has a few little benefits; it is newer, there were lessons learnt.  It
would be surprising if it has better than trivial advantages for the BE kernel
though.  But feel free to try, of course :-)
Comment 2 Daniel Kolesa 2019-07-10 15:41:06 UTC
ELFv2 works perfectly fine in BE userland, the musl libc *requires* ELFv2 on both endians and glibc works okay using either. ELFv2 was defined for both endians and there are distros that make use of it on BE (Adélie Linux supports only BE with musl libc and ELFv2, Void Linux has both BE and LE on musl and glibc, all using ELFv2).
Comment 3 Daniel Kolesa 2019-07-10 15:57:53 UTC
Also, reported in gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91135

Let's see what the compiler people have to say...
Comment 4 Segher Boessenkool 2019-07-10 16:01:17 UTC
I meant GNU userland.  I don't know any project that officially support
BE ELFv2.  No BE ELFv2 Linux ABI is defined, either, as far as I know.

It's great to hear that a lot of it works fine though :-)
Comment 5 Daniel Kolesa 2019-07-10 16:04:49 UTC
I have an entire distro built with it. A small number of things require minor patches. Some of these have been upstreamed, some of these are pending (for example, to make OpenSSL assembly work on BE/ELFv2, it requires about 5 lines of changes to pass all of the testsuite, and a PR for that is up). Glibc used to not work about a year ago I think, these days it works perfectly fine and we generally have no major issues with any software that already worked on BE in the first place.
Comment 6 Daniel Kolesa 2019-07-12 01:41:58 UTC
This appears to be the actual reason why the kernel fails to link without -mcall-aixdesc:

<smaeul> specifically it's the -mcall-aixdesc that's problematic. but removing it breaks recordmcount.pl, because nm is looking in the wrong section for symbols
<smaeul> so it fails to recognize static symbols and the kernel fails to link
<smaeul> (quick hack is to update the local-symbol regex from "t" to "d" in recordmcount.pl)
Comment 7 Daniel Kolesa 2019-07-12 02:11:54 UTC
Btw, turns out an ELFv2 BE kernel requires little to no changes, these two commits produce a working kernel:

https://github.com/smaeul/linux/commit/7a9d26b7be68c21fd1be524ee4bf797d7b8c3c37
https://github.com/smaeul/linux/commit/c972894da682eff8905d9dbc7efd1bd0f1051bbf
Comment 8 Daniel Kolesa 2019-07-12 15:36:17 UTC
Using this patch on my machines now: https://gist.github.com/q66/625cbec5d7317829a302773f89533b51 seems to work well
Comment 9 Daniel Kolesa 2019-09-08 14:57:35 UTC
well, gcc 9.2 doesn't have this problem anymore, so i guess it can be closed here...