If you’ve been following my eternal battle with compilers, especially in the Linux kernel, you already know my hobbies: debugging things that work, fighting compiler optimizations, and getting emotional about function addresses. This post is no different… Except this time, the compiler and I are arguing about inlining.
Now, inlining itself isn’t evil. It’s actually a performance gift. A function call costs a few CPU cycles, so compilers try to be helpful and say, “Hey, what if I just paste the code right here instead of calling it?” And for most of the world, that’s great. Less overhead, faster code, happy life.
But when you’re working on a giant codebase like the Linux kernel, and you want to know where exactly something happened at runtime, inlining quickly turns from optimization to obfuscation.
The Problem: Who Called Me?
Imagine you’re adding a feature to the kernel to suppress certain error messages… Selectively. Maybe a BUG()
or WARN()
is firing, but you don’t want to see the report if it comes from that function. You want something like:
Clean and elegant, right?
The kernel offers kallsyms
, which can map an instruction pointer back to a function name. Perfect! So, all I need is to grab the address of the code where WARN()
is called, pass it to kallsyms
, and boom … I know the function name.
But here’s the catch: the original function where BUG()
and WARN()
were called might be inlined into another, and now kallsyms
for that address return the parent’s name. The obvious solution would be to mark all the functions where BUG()
and WARN()
exist.
Now, BUG()
and WARN()
are macros. They can be used anywhere: from deep in scheduler code to a random USB driver. While you technically can go and modifying every function that ever calls them, and just to slap a __attribute__ ((noinline))
on top, this is anything but practical to maintain.
In a constantly changing codebase like the Linux kernel, requiring everyone who adds a WARN()
to ensure the function is not inlined is like asking C to be friendly with undefined behavior... you're just asking for trouble.
The Compiler is Too Smart
Compilers like GCC and Clang use heuristics to decide whether to inline a function.
Some of those include:
- How big is the function?
- Is it called only once?
- Is it marked
inline
? (Spoiler: that’s just a suggestion.)
They’re allowed to ignore your hints. Even if you write inline
, the compiler might say, “Nah, I see this function better as standalone”. In a ceiratain sense, this also means that there's nothing preventing it from inline function where no preference is expressed.
Which, in my case, breaks everything… Because now the call to WARN()
has been absorbed into another function, and its address points to that function (former parent), not the one I was trying to filter.
Because of that, it is not resonable assume to mark every caller as noinline. A more clever way is needed.
My First Attempt: Macros and Label Addresses
The idea was simple: create a local label using &&label
(a GNU extension that gives the address of a label), make it static so it sticks, and pass that pointer to a dummy function. Here’s what it looked like:
I thought: this is surely will keep the compiler to back off from inlining. And indeed, GCC seemed to take the hint.
But I noticed that just by placing the label after use_pointer(p)
, could change the results. In GCC, it still sort of worked (because GCC loves a good shrug, I guess), but Clang had different plans. Things broke subtly… and one function in my test bed of two functions, got inlined.
Placing the label before the pointer usage and wrapping everything in a macro with a static volatile pointer, as shown in the code snipped, I got consistent results across both compilers… But…
But even then, I felt uneasy. This solution felt fragile. I was relying on undefined quirks and hoping compilers didn’t change their mind in a slightly different scenario.
Sneaky Anti-Inlining Techniques
I did some digging (read: asking smart people), and found a list of things that scare GCC and Clang enough to not inline a function.
Here’s a list, probably not exhaustive:
- Use
alloca()
(dynamic stack allocation) - Call
setjmp()
or similar - Use
va_arg
/va_end
(variadic function tricks) - Take the address of a label (
&&label
) — a GNU extension - Use computed gotos (
goto *ptr
) - Declare a variable whose value the compiler can’t predict
Any of these can make the compiler decide, “This is too weird. I’m not touching it.”
Enter: The Function-Preserving Incantation
The same smart person suggested a compact GNU statement expression that uses computed gotos, labels, and inline assembly to form what I now think of as an anti-inlining trick:
Let’s unpack this masterpiece:
__label__ lab; lab:
declares a local label inside a block.&&lab
gets the address of that label, a GCC extension.- The address is stored in a
static
pointerp
, markedvolatile
, so the compiler can’t just delete it. __asm__ volatile ("" : "+m" (p));
is an empty inline assembly trick that tells the compiler, “This memory might be touched, don’t optimize too much.”- Finally, the conditional
goto *p
is wrapped in a fakeif (never_true)
: enough to keep things technically live, without actually running anything.
This construct scared both GCC and Clang enough to not inline the function it appeared in, without needing any attributes, annotations, or source rewrites.
({ ... })
– GNU Statement Expression - Something I didn’t know about
- GNU extension, not standard C/C++
- Treats a block of code as an expression, and it returns a value
- Only works with GCC and Clang (not MSVC… I think I can live with this)
Example:
- The block runs, and the value of the last expression (
a + b
) is returned. - This allows you to write macros or inlined logic that act like expressions, not just code blocks.
And if you are wondering, as I did, if it is equivalent to do { ... } while(0)
, it is not. Here’s a quick comparison cheat sheet:
Feature | ({ ... }) |
do { ... } while (0) |
---|---|---|
Standard C | No (GNU extension) | Yes |
Returns a value | Yes (last expression) | No |
Used in expressions | Yes | No (statement-only) |
Scope control | Local block | Local block |
Portability | GCC/Clang only | Universal |
Common in macros | Yes (GNU/Linux kernel) | Yes (portable projects) |
({ ... })
is neat... and something I wasn’t aware of before. The original suggestion includes it, but since I don’t need the return value functionality in my case,
I think I’ll replace it with the more portable do { ... } while (0)
.
Conclusion: Please Don’t Try This at Home (Unless You Must)
Inlining is great… Until it is not…. If you simply need a function not to be inlined, the right thing to do is usually to use the tools your compiler provides: attributes. That’s clean, documented, and supported.
However, there are situations, like mine, where touching the function definition isn’t practical or even possible. In the Linux kernel, when WARN()
or BUG()
can appear virtually anywhere, and you need to reason about the call site without rewriting the world, you’re left with fewer options.
In those edge cases, using obscure tricks like computed gotos or label addresses might just be the only way to preserve the structure you need… Even if it means making the compiler a little nervous.
Just be aware: this kind of code is fragile, deeply non-portable, and should come with a warning label. It’s not a pattern… it’s a workaround.
Because let’s face it: compilers are smart, but weird code is forever.