Let’s talk about something that has caused more arguments
than tabs vs spaces, more confusion than
./configure && make && make install, and
more gray hair than undefined behavior:
native vs cross compilation
This is not just a technical topic. This is culture. This is history. This is… mild trauma for distro maintainers.
In the beginning, there was only native
To understand the present landscape, we have to look back to the 90s, when the foundations were laid. So…
A long time ago in a galaxy far, far away…
(read: before everything was ARM), life was simple.
You had:
- one machine
- one architecture
- one compiler
You wrote code, you compiled it, you ran it.
All on the same machine.
echo -e '#include <stdio.h>\nint main(){printf("hello\\n");}' | cc -x c - -o hello && ./hello
No "build vs host vs target." No sysroots. No cross toolchains.
Just vibes… No CI, No containers… Just you, your compiler, and a questionable amount of confidence.
Then came the rise of the Linux distributions
Early Linux was not a “distribution” in the modern sense.
It was:
- a kernel
- plus a pile of userland tools (mostly from GNU) (Ever wondered why GNU Linux? This would leave the place for another war story: what is Linux? The kernel or the complete system? But we need to focus, this story will be for another time.)
- manually assembled by whoever was brave enough
Installing Linux in the early 90s was less “setup” and more “ritual”.
But Linux grew, and things changed:
- more users
- more software
- more versions
- more dependencies
Suddenly, maintainers were not compiling one program.
They were compiling:
- hundreds → then thousands → then tens of thousands of packages
And not just once, but:
- for every release
- for every update
- for every security fix
At this point, distributions like Slackware, Debian, and later Red Hat appeared with a mission:
Take all this chaos and make it installable, updatable, and consistent.
Then things started to evolve, the panorama quickly changed and new requirements became necessary.
Binary distribution
Distributions don’t ship source (only).
They ship:
- precompiled binaries
So they must ensure:
- the binary you download is:
- correct
- consistent
- built in a known environment
Later you will need a few more of these, but we were not at the point yet.
Reproducibility becomes a necessity, not a luxury
Here is the key shift: Distributions needed to be able to rebuild the same software reliably, “it works on my machine” was no more enough.
It seems trivial:
Given this source, we can rebuild the exact package again.
If you’re wonder why this requirement, here’s a few reasons:
- Debugging - User reports “Program crashes” and
Maintainer needs to:
- rebuild the exact binary
- reproduce the issue
- inspect symbols, offsets, behavior
- Security - Security fixes require:
- rebuilding packages
- ensuring nothing unexpected changed
- Updates and upgrades - When updating:
- package A depends on B
- B changes ABI subtly
- A must be rebuilt consistently
Reproducibility is not the default state… you have to actively remove sources of entropy.
For example, even changing only the source filename can be enough to break bit-for-bit identity:
$ echo -e '#include <stdio.h>\nint main(){printf("hello\\n");}' >1.c $ echo -e '#include <stdio.h>\nint main(){printf("hello\\n");}' >2.c $ gcc 1.c -o 1 $ gcc 2.c -o 2 $ md5sum 1 2 6aa0460c6e9d029b5c40f27c86fadfa3 1 ada5719e11d8b6a17f020353f23f12d6 2 $ hexdump -C 1 >1.hex $ hexdump -C 2 >2.hex $ diff 1.hex 2.hex 258c258 < 000026d0 65 6e 74 72 79 00 31 2e 63 00 63 72 74 65 6e 64 |entry.1.c.crtend| --- > 000026d0 65 6e 74 72 79 00 32 2e 63 00 63 72 74 65 6e 64 |entry.2.c.crtend|
Same code, same compiler, different hashes. In this instance, the mere difference in filenames caused the compiler to embed unique strings into the binary, breaking bit-for-bit identity.
Once reproducibility became a requirement, the question was no longer how to build, but where and under what conditions to build.
The solution: controlled build environments
There were times when builds were done by the “build computer”. A machine developers used to build the new software. It was untouchable, update or change anything could break the reproducibility of the software. It was evidently not a scalable solution, if only because that eventually broke. To solve all of this, distributions invented:
- build roots (clean environments)
- packaging systems (RPM, DEB)
- build farms (many machines building packages)
- strict dependency tracking
The idea was:
Every package is built in a clean, controlled environment.
This is where tools like:
come into play.
Then Linux escaped x86
Linux did not stay on x86.
It spread like a highly portable virus (in a good way):
| Year | Arch |
|---|---|
| 1993 | Dec Alpha |
| 1995 | Sparc |
| 1995 | MIPS |
| 1996 | M68k |
| 1996 | PowerPC |
| 1998 | ARM |
…
Now we had a problem:
What if your build machine is x86, but your target is ARM?
Buying one machine per architecture works… until it doesn’t. And by the way, a starting architecture, likely has not any native environment… system does not exist yet.
Enter cross-compilation
Before even moving forward, have you ever stopped to consider what cross compilation actually is?
If you compile a package using a container running Alpine Linux for your Fedora, are you cross compiling things?
In common usage, people often reserve “cross-compilation” for different architectures. In toolchain terms, however, changing the ABI (e.g. glibc vs musl) is already enough.
Real-world example: I once hit an underlinking issue
with sem_init() on MIPSEL caused by symbol versioning
differences in glibc. Everything compiled fine, nothing obvious was
missing, but the binary failed at runtime in ways that made no sense
until you looked at the symbol versions.
That’s the kind of problem you only get when ABI assumptions don’t actually match reality.
The Three Pillars of a Platform
In toolchain development, a “Platform” is defined by a triplet that includes more than just the CPU:
- Architecture (ISA): e.g., x86_64, arm64, riscv64.
- Operating System: e.g., linux, windows, darwin, none (bare metal).
- C Library / Environment (ABI): e.g., gnu (glibc), musl, uclibc, msvc.
If any of these three differ between the machine doing the work and the machine running the code, you are technically in the “cross-compilation landscape.”
Back to the original question, even if you are on the same physical CPU, and even the same OS, containers share the kernel, a compiler running on an Alpine Linux system (which uses musl) targeting a Fedora system (which uses glibc), you are technically cross compiling.
If not convinced yet, here are a few points to consider:
- The Linker needs to look for different library paths.
- The Headers define structures (like struct stat) differently.
- The Startup Code (
crt0.o) that initializes the program beforemain()is unique to each C library.
Now it is safe to name cross compilation
In a nutshell, Cross-compilation means:
I build here, but I target somewhere else.
You now have:
- a compiler that produces aarch64 binaries on x86_64
- a sysroot with aarch64 libraries
- a growing sense of complexity
Which means you can:
aarch64-linux-gnu-gcc hello.c -o hello
Congratulations, you’ve successfully built a binary you can’t run, can’t test, and aren’t entirely sure is correct.
Oops.
The religion war begins
Two camps emerge:
Native camp
- Reality is the best test.
- If it runs, it works.
- Just build on the target.
Cross camp
- Define your environment properly.
- Stop relying on luck.
- Your build system is broken, not cross.
Both are right.
Both are wrong.
Both have scars.
At some point, every engineer becomes opinionated about this. Not because they studied it, because they suffered through it.
Linux Distribution prefer Native
Most major distributions prefer native builds:
- Fedora / RHEL → Koji + Mock
- Debian → buildd network
- Ubuntu → Launchpad
Why?
Not because cross is evil, but because:
Native builds tolerate messy software better.
And let’s be honest:
A lot of upstream software is messy.
Is there any foundation why native is better than cross?
Let’s see the Native supporters claim:
- Cross compilation produces inferior grade quality executables.
- Cross compilation can fail to configure.
- Cross compilation can use wrong configuration and you only realize it when binary won’t work or crashes.
- Cross compile implies Library Pollution.
- Native compilation allows in place testing.
- Can be situation you need to distribute intermediate products of the build, cross compilation does produce no usable artifacts.
And also Cross supporters’ claims:
- It is not feasible to have a native builder of all the architecures you want to support.
- In embedded world target are typically very small machines, build on them will take forever.
- Cross builds can give the same exact result as Native compilation.
- No bonds or ties, you can build anywhere, for anything.
Let’s address these claims and let’s see how things really are.
Do cross builds produce same binaries as native builds?
Short answer:
They can, but it won’t typically happen.
It can be actually demonstrated that if cross and native run:
- same toolchain
- controlled environment
- And you add the secret sauce (compiler flags)
The result can be bitwise identical binaries.
Here comes the fun part: same toolchain and controlled environment are not enough to have bitwise identical executable. You might need to add the ‘secret sauce’ and control sources of non-determinism to approach reproducible builds.
But the important point is that for “identical”
.text and .data same toolchain and controlled
environment can be enough.
Common reasons why binaries can differ despite same toolchain and controlled environment are:
- Timestamps differences: build time embedded is
embedded in ELF comments and
__DATE__,__TIME__could be used by both compiler and usercode. - Hostnames / usernames: embedded in debug info and sometimes in build notes.
- Compiler fingerprints: Compilers are like Dogs, they use to pee where the step, which means: linker version, build IDs, etc.
- File ordering: This one is subtle and evil.
makecollects object files, filesystem returns them in some order, and linker consumes them in that order. Change file creation order leads to changes in binary layout. - Parallel build non-determinism: modern builds are
heavily parallelized (
make -j, LTO, linker jobs). Jobs don’t always finish in the same order, and neither does the linker. Sometimes, your binary depends on which core was faster that day.
If you want identical binaries, you need to remove as many sources of entropy as possible.
Part of the entropy can be handled with compiler and linker flags, using what I called the secret sauce.
| Syndrome | Cure | Notes |
|---|---|---|
| Timestamps | SOURCE_DATE_EPOCH |
Instead of using “Now,” Compilers will use the timestamp you provide. |
| Timestamps | -Wno-builtin-macro-redefined |
Allows you to manually redefine __DATE__ and
__TIME__ via -D flags if you can’t use
SOURCE_DATE_EPOCH |
| Polluting sections | -fno-ident |
Tells the compiler not to generate the .ident or
.comment section containing the compiler version. |
| Build Paths | -fdebug-prefix-map=/old/=/new/ |
Replaces absolute paths in debug info and in __FILE__
with a generic string. |
But some sources of non-determinism live in the build system itself, and require discipline, not just flags.
Does this really exist cross compiler configuration problem?
Before we can understand why people distrust cross-compilation, we need to talk about a legendary creature Autotools.
Specifically:
autoconf automake ./configure
If you’ve ever typed:./configure && make …you’ve already met it.
So, what problem Autotools was solving?
To understand that, we need to go back to the 90s when there was no:
- standardized Linux
- consistent libc behavior
- reliable compiler features
- unified system APIs
Instead, you had a nice zoo:
- different UNIX systems
- different compilers
- different headers
- different kernel quirks
Back then it was common to answering questions like:
- Does this system have
strlcpy? - Is long 32 or 64 bits?
- Does
malloc(0)behave sanely? - Is
fork()broken here?
And the answer was often: Maybe
Autotools solved the problem by saying:
Don’t guess. Ask the system.
But instead of asking politely, they do this:
- Generate a tiny C program
- Compile it
- Run it
- Observe the result
- Decide how to build the real software
Example:
int main() { return sizeof(long) == 8 ? 0 : 1; }
If it runs and returns 0 then we know long is 64-bit.
It was clever, for its time.
Because documentation was unreliable, headers lied and systems were inconsistent.
But as you might guess, this model does not meet cross compilation because you build on x86, but your target is ARM, to name one.
Autotools would compile a test program… …and then tries to run it.
This is the origin of some native supporters’ claims about about configuration.
Question is: are we still at that point?
The answer? Spin the wheel, because Autotools has evolved, but it hasn’t entirely escaped its roots.
Current autotool can:
- use cached answers: Instead of running code the
code above it can use
ac_cv_sizeof_long=8 - Cross-aware configuration: by telling it
--host=aarch64-linux-gnuit disables some runtime checks, switches to safer assumptions - Replace execution with knowledge: many checks were rewritten to inspect headers and use known architecture traits
But it still, sometimes, when execution fails, just… guesses…
and here’s where subtle bugs, wrong assumptions, architecture-specific breakage come.
When the system can’t answer at build time, we can either guess… or we can simulate the answer. Therefore, sometimes if you want to cross build, you need also to emulate using qemu.
Qemu enables autotools to:
- Probe executables can run successfully under emulation
- Configuration becomes more accurate and less dependent on assumptions
- Complex or runtime-dependent checks become feasible again
This effectively restores the original Autotools probing model, even in cross environments.
Library pollution, what is this thing.
Native supporters calims cross builds to suffer from library pollution.
During a cross build, the build system accidentally links against host libraries instead of target libraries.
So you think you built for ARM (or another target), but headers came from one place, libraries came from another and the result is inconsistent or outright broken.
Making it less “pollution” and more “you accidentally linked against the wrong universe”.
Library pollution is usually discussed in the context of cross-compilation, but the underlying issue is broader:
it’s really about insufficient isolation between dependency domains.
At its core, library pollution happens whenever: the compiler and linker pick up inconsistent headers and libraries from different environments.
That can occur in:
- cross builds (host vs target)
- native builds (system vs custom prefixes, multiple versions, etc.)
The difference is not whether it happens, but how it fails.
The real root cause: isolation failure
The key variable is: > how well your build environment is isolated
This can involve:
- sysroots
- prefix separation (/usr vs /usr/local vs /opt/…)
- environment variables
- build system discipline
When isolation breaks, pollution becomes possible…
regardless of cross/native.
But then, why only the cross build gets the spotlight?
Cross builds make the problem obvious because architectures differ and binaries are often not runnable on the host. So when pollution happens linker may fail immediately or binary won’t even start or crashes instantly.
In practice the failure is loud and early… And you can’t deploy your package.
But beware that if happen on native builds thing can be even worse because headers and libraries are often compatible enough to compile, ABI differences may be subtle and symbol versions may partially match.
The result is silent inconsistency, executable can work, and perhaps only break on a few corner cases.
Cross vs Native pollution chart:
| Aspect | Cross build | Native build |
|---|---|---|
| Architecture mismatch | Yes | No |
| Build failure likelihood | High | Low |
| Runtime failure | Immediate | Delayed / conditional |
| Debug difficulty | Moderate | Very high |
| Typical symptom | Won’t start / crashes instantly | Sporadic crashes |
Intermediate prducts of the building
In a native build, all generated tools, scripts, and metadata are produced for the same environment in which they will later be used. This makes it possible to package and reuse the entire build context, such as kernel headers, symbol information, and helper binaries… for downstream tasks like out-of-tree module compilation.
In contrast, cross builds split the world into host and target components.
While the final binaries target the desired architecture, many intermediate tools are built for the host and are not portable. This makes the resulting build artifacts harder to redistribute or reuse in a different environment, especially without reconstructing the original build setup.
The concrete example for the Kernel (few other may exist) When you build something like the Linux kernel, you’re not just producing:
vmlinuz/*Image
You’re also generating a whole ecosystem of intermediate tools and metadata, such as:
scripts/utilities (e.g.modpost,genksyms)- generated headers (include/generated/*)
- configuration state (
.config,Module.symvers) - build glue for modules (kbuild infrastructure)
These are not optional, they are required for:
building out-of-tree modules later
Some of these artifacts are binaries targeting the build architecture. Because in users’ world OOT modules are built on the machine that is going to use them cross build artifacts are not usable.
Testing
Testing is one of the weak, if not the weakest, point of the cross build. In native build, it’s easy just the test and you’re done.
On the Cross builds, on the other hand, testing is possible, but you need to work a little.
The use of QEMU user-mode emulation, in particular qemu-user-static, provides a practical way to execute binaries built for a foreign architecture directly on a host system. This capability is especially valuable in cross-compilation workflows, where executing target binaries is otherwise not possible.
In a traditional cross-compilation setup, test binaries produced during the build process cannot be executed because they target a different architecture than the host. This limitation affects both:
- Test suites (unit and integration tests)
- Build-time probing, such as those performed by GNU Autoconf
By integrating QEMU user-mode with the Linux binfmt_misc mechanism, foreign binaries can be executed transparently. When combined with a target root filesystem (sysroot), this allows:
- Running test suites for the target architecture without dedicated hardware
- Executing Autotools-generated probe binaries, enabling runtime
checks (e.g. via
AC_RUN_IFELSE) that would otherwise fail during cross-compilation
As a result, builds can behave similarly to native builds, reducing the need for manual overrides such as cached configuration variables.
Emulation Performance and setup Considerations
Execution under QEMU user-mode is generally slower than native execution due to instruction translation. However:
- The overhead is often acceptable for build-time testing.
- In some cases, the difference is negligible compared to overall build time.
- When targeting inherently slower architectures, the perceived slowdown may be less significant, and in some constrained comparisons the host may still outperform real hardware.
Despite this, performance-sensitive or timing-dependent tests may exhibit different behavior under emulation.
While QEMU simplifies execution, it introduces additional setup requirements:
Configuration of binfmt_misc to associate foreign
binaries with the appropriate QEMU interpreter Provision of a complete
target root filesystem, including: * Dynamic linker * Shared libraries *
Correct filesystem layout
This effectively shifts complexity from build configuration (e.g. Autotools cross-compilation issues) to environment preparation.
Summary of Cross vs Native build
This is a comparison synthesis between the two approaches. No judgement is given by choice, native vs build is a religion, and I don’t want to be hostile to any.
| Issue | Native | Synthetic | Cross | Notes |
|---|---|---|---|---|
| Artifact Reproduciblity | Reproducible | = | Reproducible | Both build methods can lead to reproducible artifacts. |
| Configuration problems | Mature | <== | Minor problems exist | Historically Autotool is using host execution go guess the configuration. In cross environment this is not possible, butnewer versions of autotool support cross build enviroment through caching. Other approaches use qemu-user-static to make some complicated autotool setup work on cross environments. |
| Binary Artifacts Quality | Same | = | Same | If setup correctly artifact can be bitwise identical. |
| Build Enviroment Pollution | Bad Setup | = | Bad Setup | Not a Cross build specific issue. it is more a sysroot isolation problem. though, failing this, can have worse consequences in Native than cross. |
| Build Product Testing | Easy | <== | Complex | Test a build product, means executes it under some conditions. Native enviroment execute the artifact is trivial, in cross enviroment it is not, unless you cheat with qemu-user-static. |
| Build performance (speed) | Variable | ==> | Optimal | It is a fact, some minor architectures are slow. Iot devices can target slow architectures where build can be challenging. |
| Per architecture builder | Issues | ==> | none | A Linux distro targeting many architecures using Native compilation, needs to have at least a worker per architecure, and this scales bad. |
Linux build systems overview
As last step, lets see ta some of more common build system that distro uses to generate linux.
| System / Tool | Used by | Native / Cross | Core tech | Notes |
|---|---|---|---|---|
| Koji | Fedora, RHEL, CentOS | Native (per-arch workers) | RPM, Mock, chroot, distributed scheduler | Centralized build farm; builds SRPMs on target-arch machines |
| Mock | Fedora/RHEL (via Koji) | Native (isolated) | chroot, RPM | Creates clean build roots; prevents host contamination |
| Open Build Service | openSUSE, SLE | Mostly native (multi-arch workers), supports cross | RPM/DEB, chroot, distributed builds | Public multi-distro build platform |
| Debian buildd network | Debian | Native | dpkg, debhelper, buildd daemons | Distributed autobuilders per architecture |
| Launchpad | Ubuntu | Native | dpkg, chroot/containers | CI + packaging + build farm |
| Arch Build System | Arch Linux | Native (mostly) | PKGBUILD, makepkg | Simple recipe-based builds; less centralized |
| Portage | Gentoo | Native (user builds), supports cross | ebuild, bash | Source-based; builds happen on user machine |
| Buildroot | Embedded (various) | Cross (primary) | Make, Kconfig | Builds full rootfs + toolchain; minimal, deterministic |
| Yocto Project / BitBake | Embedded, industrial distros | Cross (primary) | BitBake, metadata layers | Highly configurable, large-scale embedded builds |
| OpenWrt build system | OpenWrt | Cross | Make, custom infra | Router-focused; cross toolchains + package feeds |
| Nix | NixOS, others | Hybrid (native + cross) | functional DSL, isolated store | Strong reproducibility model |
| GNU Guix | Guix System | Hybrid (native + cross) | Scheme, functional builds | Similar to Nix, emphasizes purity |
How’s the future?
Newer ecosystems like Rust or Go make cross-compilation feel almost… reasonable.
They remove a lot of the historical friction, but they don’t change the fundamentals. You still need to control your environment, your dependencies, and your assumptions. The difference is that now the toolchain fails less creatively.
Native vs cross is still a debate. The difference is that now we have better tools… and the same old problems.
