Let’s talk about something that has caused more arguments
than tabs vs spaces, more confusion than
./configure && make && make install, and
more gray hair than undefined behavior:
native vs cross compilation
This is not just a technical topic. This is culture. This is
history. This is… mild trauma for distro maintainers.
In the beginning, there
was only native
To understand the present landscape, we have to look back to the 90s,
when the foundations were laid. So…
A long time ago in a galaxy far, far away…
(read: before everything was ARM), life was simple.
You had:
- one machine
- one architecture
- one compiler
You wrote code, you compiled it, you ran it.
All on the same machine.
echo -e '#include <stdio.h>\nint main(){printf("hello\\n");}' | cc -x c - -o hello && ./hello
No "build vs host vs target."
No sysroots.
No cross toolchains.
Just vibes… No CI, No containers… Just you, your compiler, and a
questionable amount of confidence.
Then came the
rise of the Linux distributions
Early Linux was not a “distribution” in the modern sense.
It was:
- a kernel
- plus a pile of userland tools (mostly from GNU) (Ever wondered
why GNU Linux? This would leave the place for another war story: what is
Linux? The kernel or the complete system? But we need to focus, this
story will be for another time.)
- manually assembled by whoever was brave enough
Installing Linux in the early 90s was less “setup” and more
“ritual”.
But Linux grew, and things changed:
- more users
- more software
- more versions
- more dependencies
Suddenly, maintainers were not compiling one program.
They were compiling:
- hundreds → then thousands → then tens of thousands of packages
And not just once, but:
- for every release
- for every update
- for every security fix
At this point, distributions like Slackware,
Debian, and later Red Hat appeared
with a mission:
Take all this chaos and make it installable, updatable, and
consistent.
Then things started to evolve, the panorama quickly changed and new
requirements became necessary.
Binary distribution
Distributions don’t ship source (only).
They ship:
So they must ensure:
- the binary you download is:
- correct
- consistent
- built in a known environment
Later you will need a few more of these, but we were not at the point
yet.
Reproducibility
becomes a necessity, not a luxury
Here is the key shift: Distributions needed to be able to
rebuild the same software reliably, “it works on my
machine” was no more enough.
It seems trivial:
Given this source, we can rebuild the exact package again.
If you’re wonder why this requirement, here’s a few
reasons:
- Debugging - User reports “Program crashes” and
Maintainer needs to:
- rebuild the exact binary
- reproduce the issue
- inspect symbols, offsets, behavior
- Security - Security fixes require:
- rebuilding packages
- ensuring nothing unexpected changed
- Updates and upgrades - When updating:
- package A depends on B
- B changes ABI subtly
- A must be rebuilt consistently
Reproducibility is not the default state… you have to actively remove
sources of entropy.
For example, even changing only the source filename can be enough to
break bit-for-bit identity:
$ echo -e '#include <stdio.h>\nint main(){printf("hello\\n");}' >1.c
$ echo -e '#include <stdio.h>\nint main(){printf("hello\\n");}' >2.c
$ gcc 1.c -o 1
$ gcc 2.c -o 2
$ md5sum 1 2
6aa0460c6e9d029b5c40f27c86fadfa3 1
ada5719e11d8b6a17f020353f23f12d6 2
$ hexdump -C 1 >1.hex
$ hexdump -C 2 >2.hex
$ diff 1.hex 2.hex
258c258
< 000026d0 65 6e 74 72 79 00 31 2e 63 00 63 72 74 65 6e 64 |entry.1.c.crtend|
---
> 000026d0 65 6e 74 72 79 00 32 2e 63 00 63 72 74 65 6e 64 |entry.2.c.crtend|
Same code, same compiler, different hashes. In this instance, the
mere difference in filenames caused the compiler to embed unique strings
into the binary, breaking bit-for-bit identity.
Once reproducibility became a requirement, the question was no longer
how to build, but where and under what conditions to build.
The solution:
controlled build environments
There were times when builds were done by the “build computer”. A
machine developers used to build the new software. It was untouchable,
update or change anything could break the reproducibility of the
software. It was evidently not a scalable solution, if only because that
eventually broke. To solve all of this, distributions invented:
- build roots (clean environments)
- packaging systems (RPM, DEB)
- build farms (many machines building packages)
- strict dependency tracking
The idea was:
Every package is built in a clean, controlled environment.
This is where tools like:
come into play.
Then Linux escaped x86
Linux did not stay on x86.
It spread like a highly portable virus (in a good way):
| 1993 |
Dec Alpha |
| 1995 |
Sparc |
| 1995 |
MIPS |
| 1996 |
M68k |
| 1996 |
PowerPC |
| 1998 |
ARM |
…
Now we had a problem:
What if your build machine is x86, but your target is ARM?
Buying one machine per architecture works… until it doesn’t. And by
the way, a starting architecture, likely has not any native
environment… system does not exist yet.
Enter cross-compilation
Before even moving forward, have you ever stopped to consider what
cross compilation actually is?
If you compile a package using a container running Alpine
Linux for your Fedora, are you cross compiling
things?
In common usage, people often reserve “cross-compilation” for
different architectures. In toolchain terms, however, changing the ABI
(e.g. glibc vs musl) is already enough.
Real-world example: I once hit an underlinking issue
with sem_init() on MIPSEL caused by symbol versioning
differences in glibc. Everything compiled fine, nothing obvious was
missing, but the binary failed at runtime in ways that made no sense
until you looked at the symbol versions.
That’s the kind of problem you only get when ABI assumptions don’t
actually match reality.
In toolchain development, a “Platform” is defined by a
triplet that includes more than just the CPU:
- Architecture (ISA): e.g., x86_64, arm64,
riscv64.
- Operating System: e.g., linux, windows, darwin,
none (bare metal).
- C Library / Environment (ABI): e.g., gnu (glibc),
musl, uclibc, msvc.
If any of these three differ between the machine doing the work and
the machine running the code, you are technically in the
“cross-compilation landscape.”
Back to the original question, even if you are on the same
physical CPU, and even the same OS, containers
share the kernel, a compiler running on an Alpine Linux system (which
uses musl) targeting a Fedora system (which uses
glibc), you are technically cross compiling.
If not convinced yet, here are a few points to consider:
- The Linker needs to look for different library paths.
- The Headers define structures (like struct stat) differently.
- The Startup Code (
crt0.o) that initializes the program
before main() is unique to each C library.
Now it is safe to name
cross compilation
In a nutshell, Cross-compilation means:
I build here, but I target somewhere else.
You now have:
- a compiler that produces aarch64 binaries on x86_64
- a sysroot with aarch64 libraries
- a growing sense of complexity
Which means you can:
aarch64-linux-gnu-gcc hello.c -o hello
Congratulations, you’ve successfully built a binary you can’t run,
can’t test, and aren’t entirely sure is correct.
Oops.
The religion war begins
Two camps emerge:
Native camp
- Reality is the best test.
- If it runs, it works.
- Just build on the target.
Cross camp
- Define your environment properly.
- Stop relying on luck.
- Your build system is broken, not cross.
Both are right.
Both are wrong.
Both have scars.
At some point, every engineer becomes opinionated about this. Not
because they studied it, because they suffered through it.
Linux Distribution prefer
Native
Most major distributions prefer native builds:
- Fedora / RHEL → Koji + Mock
- Debian → buildd network
- Ubuntu → Launchpad
Why?
Not because cross is evil, but because:
Native builds tolerate messy software better.
And let’s be honest:
A lot of upstream software is messy.
Is
there any foundation why native is better than cross?
Let’s see the Native supporters claim:
- Cross compilation produces inferior grade quality executables.
- Cross compilation can fail to configure.
- Cross compilation can use wrong configuration and you only realize
it when binary won’t work or crashes.
- Cross compile implies Library Pollution.
- Native compilation allows in place testing.
- Can be situation you need to distribute intermediate products of the
build, cross compilation does produce no usable artifacts.
And also Cross supporters’ claims:
- It is not feasible to have a native builder of all the architecures
you want to support.
- In embedded world target are typically very small machines, build on
them will take forever.
- Cross builds can give the same exact result as Native
compilation.
- No bonds or ties, you can build anywhere, for anything.
Let’s address these claims and let’s see how things really are.
Do cross
builds produce same binaries as native builds?
Short answer:
They can, but it won’t typically happen.
It can be actually demonstrated that if cross and native run:
- same toolchain
- controlled environment
- And you add the secret sauce (compiler flags)
The result can be bitwise identical binaries.
Here comes the fun part: same toolchain and controlled environment
are not enough to have bitwise identical executable. You might need to
add the ‘secret sauce’ and control sources of non-determinism to
approach reproducible builds.
But the important point is that for “identical”
.text and .data same toolchain and controlled
environment can be enough.
Common reasons why binaries can differ despite same toolchain and
controlled environment are:
- Timestamps differences: build time embedded is
embedded in ELF comments and
__DATE__,
__TIME__ could be used by both compiler and usercode.
- Hostnames / usernames: embedded in debug info and
sometimes in build notes.
- Compiler fingerprints: Compilers are like Dogs,
they use to pee where the step, which means: linker version, build IDs,
etc.
- File ordering: This one is subtle and evil.
make collects object files, filesystem returns them in some
order, and linker consumes them in that order. Change file creation
order leads to changes in binary layout.
- Parallel build non-determinism: modern builds are
heavily parallelized (
make -j, LTO, linker jobs). Jobs
don’t always finish in the same order, and neither does the linker.
Sometimes, your binary depends on which core was faster that day.
If you want identical binaries, you need to remove as many sources of
entropy as possible.
Part of the entropy can be handled with compiler and linker flags,
using what I called the secret sauce.
| Timestamps |
SOURCE_DATE_EPOCH |
Instead of using “Now,” Compilers will use the timestamp you
provide. |
| Timestamps |
-Wno-builtin-macro-redefined |
Allows you to manually redefine __DATE__ and
__TIME__ via -D flags if you can’t use
SOURCE_DATE_EPOCH |
| Polluting sections |
-fno-ident |
Tells the compiler not to generate the .ident or
.comment section containing the compiler version. |
| Build Paths |
-fdebug-prefix-map=/old/=/new/ |
Replaces absolute paths in debug info and in __FILE__
with a generic string. |
But some sources of non-determinism live in the build system itself,
and require discipline, not just flags.
Does
this really exist cross compiler configuration problem?
Before we can understand why people distrust cross-compilation, we
need to talk about a legendary creature Autotools.
Specifically:
autoconf
automake
./configure
If you’ve ever typed:./configure && make …you’ve already met it.
So, what problem Autotools was solving?
To understand that, we need to go back to the 90s when there was
no:
- standardized Linux
- consistent libc behavior
- reliable compiler features
- unified system APIs
Instead, you had a nice zoo:
- different UNIX systems
- different compilers
- different headers
- different kernel quirks
Back then it was common to answering questions like:
- Does this system have
strlcpy?
- Is long 32 or 64 bits?
- Does
malloc(0) behave sanely?
- Is
fork() broken here?
And the answer was often: Maybe
Autotools solved the problem by saying:
Don’t guess. Ask the system.
But instead of asking politely, they do this:
- Generate a tiny C program
- Compile it
- Run it
- Observe the result
- Decide how to build the real software
Example:
int main() {
return sizeof(long) == 8 ? 0 : 1;
}
If it runs and returns 0 then we know long is 64-bit.
It was clever, for its time.
Because documentation was unreliable, headers lied and systems were
inconsistent.
But as you might guess, this model does not meet cross compilation
because you build on x86, but your target is ARM, to name one.
Autotools would compile a test program… …and then tries to run
it.
This is the origin of some native supporters’ claims about about
configuration.
Question is: are we still at that point?
The answer? Spin the wheel, because Autotools has
evolved, but it hasn’t entirely escaped its roots.
Current autotool can:
- use cached answers: Instead of running code the
code above it can use
ac_cv_sizeof_long=8
- Cross-aware configuration: by telling it
--host=aarch64-linux-gnu it disables some runtime checks,
switches to safer assumptions
- Replace execution with knowledge: many checks were
rewritten to inspect headers and use known architecture traits
But it still, sometimes, when execution fails, just… guesses…
and here’s where subtle bugs, wrong assumptions,
architecture-specific breakage come.
When the system can’t answer at build time, we can either guess… or
we can simulate the answer. Therefore, sometimes if you want to cross
build, you need also to emulate using qemu.
Qemu enables autotools to:
- Probe executables can run successfully under emulation
- Configuration becomes more accurate and less dependent on
assumptions
- Complex or runtime-dependent checks become feasible again
This effectively restores the original Autotools probing model, even
in cross environments.
Library pollution, what
is this thing.
Native supporters calims cross builds to suffer from library
pollution.
During a cross build, the build system accidentally links against
host libraries instead of target libraries.
So you think you built for ARM (or another target), but headers came
from one place, libraries came from another and the result is
inconsistent or outright broken.
Making it less “pollution” and more “you accidentally linked against
the wrong universe”.
Library pollution is usually discussed in the context of
cross-compilation, but the underlying issue is broader:
it’s really about insufficient isolation between dependency
domains.
At its core, library pollution happens whenever: the compiler and
linker pick up inconsistent headers and libraries from different
environments.
That can occur in:
- cross builds (host vs target)
- native builds (system vs custom prefixes, multiple versions,
etc.)
The difference is not whether it happens, but how it fails.
The real root cause: isolation failure
The key variable is: > how well your build environment is
isolated
This can involve:
- sysroots
- prefix separation (/usr vs /usr/local vs /opt/…)
- environment variables
- build system discipline
When isolation breaks, pollution becomes possible…
regardless of cross/native.
But then, why only the cross build gets the spotlight?
Cross builds make the problem obvious because architectures differ
and binaries are often not runnable on the host. So when pollution
happens linker may fail immediately or binary won’t even start or
crashes instantly.
In practice the failure is loud and early… And you can’t deploy your
package.
But beware that if happen on native builds thing can
be even worse because headers and libraries are often compatible enough
to compile, ABI differences may be subtle and symbol versions may
partially match.
The result is silent inconsistency, executable can work, and perhaps
only break on a few corner cases.
Cross vs Native pollution chart:
| Architecture mismatch |
Yes |
No |
| Build failure likelihood |
High |
Low |
| Runtime failure |
Immediate |
Delayed / conditional |
| Debug difficulty |
Moderate |
Very high |
| Typical symptom |
Won’t start / crashes instantly |
Sporadic crashes |
In a native build, all generated tools, scripts, and metadata are
produced for the same environment in which they will later be used. This
makes it possible to package and reuse the entire build context, such as
kernel headers, symbol information, and helper binaries… for downstream
tasks like out-of-tree module compilation.
In contrast, cross builds split the world into host and target
components.
While the final binaries target the desired architecture, many
intermediate tools are built for the host and are not portable. This
makes the resulting build artifacts harder to redistribute or reuse in a
different environment, especially without reconstructing the original
build setup.
The concrete example for the Kernel (few other may exist) When you
build something like the Linux kernel, you’re not just producing:
You’re also generating a whole ecosystem of intermediate tools and
metadata, such as:
scripts/ utilities (e.g. modpost,
genksyms)
- generated headers (include/generated/*)
- configuration state (
.config,
Module.symvers)
- build glue for modules (kbuild infrastructure)
These are not optional, they are required for:
building out-of-tree modules later
Some of these artifacts are binaries targeting the build
architecture. Because in users’ world OOT modules are built on the
machine that is going to use them cross build artifacts are not
usable.
Testing
Testing is one of the weak, if not the weakest, point of the cross
build. In native build, it’s easy just the test and you’re done.
On the Cross builds, on the other hand, testing is possible, but you
need to work a little.
The use of QEMU user-mode emulation, in particular qemu-user-static,
provides a practical way to execute binaries built for a foreign
architecture directly on a host system. This capability is especially
valuable in cross-compilation workflows, where executing target binaries
is otherwise not possible.
In a traditional cross-compilation setup, test binaries produced
during the build process cannot be executed because they target a
different architecture than the host. This limitation affects both:
- Test suites (unit and integration tests)
- Build-time probing, such as those performed by GNU Autoconf
By integrating QEMU user-mode with the Linux binfmt_misc mechanism,
foreign binaries can be executed transparently. When combined with a
target root filesystem (sysroot), this allows:
- Running test suites for the target architecture without dedicated
hardware
- Executing Autotools-generated probe binaries, enabling runtime
checks (e.g. via
AC_RUN_IFELSE) that would otherwise fail
during cross-compilation
As a result, builds can behave similarly to native builds, reducing
the need for manual overrides such as cached configuration
variables.
Execution under QEMU user-mode is generally slower than native
execution due to instruction translation. However:
- The overhead is often acceptable for build-time testing.
- In some cases, the difference is negligible compared to overall
build time.
- When targeting inherently slower architectures, the perceived
slowdown may be less significant, and in some constrained comparisons
the host may still outperform real hardware.
Despite this, performance-sensitive or timing-dependent tests may
exhibit different behavior under emulation.
While QEMU simplifies execution, it introduces additional setup
requirements:
Configuration of binfmt_misc to associate foreign
binaries with the appropriate QEMU interpreter Provision of a complete
target root filesystem, including: * Dynamic linker * Shared libraries *
Correct filesystem layout
This effectively shifts complexity from build configuration
(e.g. Autotools cross-compilation issues) to environment
preparation.
Summary of Cross vs Native
build
This is a comparison synthesis between the two approaches. No
judgement is given by choice, native vs build is a religion, and I don’t
want to be hostile to any.
| Artifact Reproduciblity |
Reproducible |
= |
Reproducible |
Both build methods can lead to reproducible artifacts. |
| Configuration problems |
Mature |
<== |
Minor problems exist |
Historically Autotool is using host execution go guess the
configuration. In cross environment this is not possible, butnewer
versions of autotool support cross build enviroment through caching.
Other approaches use qemu-user-static to make some complicated autotool
setup work on cross environments. |
| Binary Artifacts Quality |
Same |
= |
Same |
If setup correctly artifact can be bitwise identical. |
| Build Enviroment Pollution |
Bad Setup |
= |
Bad Setup |
Not a Cross build specific issue. it is more a sysroot isolation
problem. though, failing this, can have worse consequences in Native
than cross. |
| Build Product Testing |
Easy |
<== |
Complex |
Test a build product, means executes it under some conditions.
Native enviroment execute the artifact is trivial, in cross enviroment
it is not, unless you cheat with qemu-user-static. |
| Build performance (speed) |
Variable |
==> |
Optimal |
It is a fact, some minor architectures are slow. Iot devices can
target slow architectures where build can be challenging. |
| Per architecture builder |
Issues |
==> |
none |
A Linux distro targeting many architecures using Native compilation,
needs to have at least a worker per architecure, and this scales
bad. |
Linux build systems overview
As last step, lets see ta some of more common build system that
distro uses to generate linux.
| Koji |
Fedora, RHEL, CentOS |
Native (per-arch workers) |
RPM, Mock, chroot, distributed scheduler |
Centralized build farm; builds SRPMs on target-arch machines |
| Mock |
Fedora/RHEL (via Koji) |
Native (isolated) |
chroot, RPM |
Creates clean build roots; prevents host contamination |
| Open Build Service |
openSUSE, SLE |
Mostly native (multi-arch workers), supports cross |
RPM/DEB, chroot, distributed builds |
Public multi-distro build platform |
| Debian buildd network |
Debian |
Native |
dpkg, debhelper, buildd daemons |
Distributed autobuilders per architecture |
| Launchpad |
Ubuntu |
Native |
dpkg, chroot/containers |
CI + packaging + build farm |
| Arch Build System |
Arch Linux |
Native (mostly) |
PKGBUILD, makepkg |
Simple recipe-based builds; less centralized |
| Portage |
Gentoo |
Native (user builds), supports cross |
ebuild, bash |
Source-based; builds happen on user machine |
| Buildroot |
Embedded (various) |
Cross (primary) |
Make, Kconfig |
Builds full rootfs + toolchain; minimal, deterministic |
| Yocto Project / BitBake |
Embedded, industrial distros |
Cross (primary) |
BitBake, metadata layers |
Highly configurable, large-scale embedded builds |
| OpenWrt build system |
OpenWrt |
Cross |
Make, custom infra |
Router-focused; cross toolchains + package feeds |
| Nix |
NixOS, others |
Hybrid (native + cross) |
functional DSL, isolated store |
Strong reproducibility model |
| GNU Guix |
Guix System |
Hybrid (native + cross) |
Scheme, functional builds |
Similar to Nix, emphasizes purity |
How’s the future?
Newer ecosystems like Rust or Go make cross-compilation feel almost…
reasonable.
They remove a lot of the historical friction, but they don’t change
the fundamentals. You still need to control your environment, your
dependencies, and your assumptions. The difference is that now the
toolchain fails less creatively.
Native vs cross is still a debate. The difference is that now we have
better tools… and the same old problems.