alessandro.carminati: 2025

Thursday, July 24, 2025

I Just Wanted the File List... Now I’m Neck-Deep in Makefiles

So You Want to Know All the Files Used in a Linux Kernel Build?

At some point, we all ask ourselves this question, it is kinda rite of passage:

Hey, I just want to list all the files that go into building this particular Linux kernel configuration. How hard can it be?”

Ha. Hahaha.

You sweet summer child.

(credits: Game of Thrones Stark’s Old Nan to Bran S01E03)

Let me take you through my journey of trying to do exactly that: not just for a single object, but for the entire kernel build, based on a specific configuration. Spoiler: it sounds simple, but the rabbit hole goes deep.

My Original Genius Plan (That Didn’t Survive Reality)

I thought: “It’s just a matter of combining a few logical steps.”

Use .config to know which options are enabled.
Parse the Makefiles (Kconfig are not needed, config is supposed to be static) to find out which .c files will actually be compiled.
Recursively parse #include directives to find all needed headers.
Done!

Except not.

That third point hides drangos! That’s where dreams go to die. Because parsing includes in the kernel is a nightmare wrapped in a macro inside a conditional.

Just look to this abomination:


#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)

What is TRACE_INCLUDE_FILE?

It’s a macro.

Where is it defined?

Somewhere else... Not in this file... Possibly not even in this compilation unit... Possibly on the moon.

Static parsing? Good luck.

You Need a Preprocessor, Not Just a Text Scanner

If you really want to know what files a .c file pulls in, you need to preprocess it with the compiler. Like, for real. Not pretend. You need the full gcc -E or -MMD treatment. And actually, here’s the twist, the Linux kernel already does that! The kernel’s build system uses gcc -MMD by default, which generates a .d file alongside each .o or sometimes .s file. This .d file lists all the headers used for that .c file during preprocessing.

Awesome, right?

Except…

The Case of the Disappearing `.d` Files

I excitedly looked for these .d files, expecting a gold mine, and found… nothing. They’re gone. Vanished. Turns out, the kernel build system generates them, uses them, and then deletes them. Like a ninja with commitment issues.

Why?

Because they’re only meant to feed into a utility called…

Enter `fixdep` and the `.cmd` Files

Once upon a time, the kernel used .d files directly. But those were replaced by .cmd files, which are more versatile and kernel-specific. Here’s how it works now:

The compiler creates a .d file (thanks to -MMD).
The kernel build system runs fixdep on that .d file.
fixdep reads the dependencies, filters out junk (like system headers), adds references to CONFIG_* flags, and writes everything into a .cmd file.
Then it deletes the .d file like it never existed.

The resulting .cmd file (e.g., net/core/.sock.o.cmd) contains:

The full compile command - The source file path
A list of directly included headers
Wildcards for include/config/CONFIG_... dependencies

And this is what make (via Kbuild) uses to decide when to rebuild something.

So Are `.cmd` Better Than `.d` Files?

Sort of. .cmd files are more than .d files, but also less.

Yes:

They’re smarter.
They track CONFIG_ conditionals.
They integrate cleanly with the kernel’s Makefile logic.

But:

They’re not recursive.
They don’t include deeply nested headers.
They only list direct includes.

So what happens when you change a deeply nested header?

Nothing. Absolutely nothing.

Unless the header was directly listed in the .cmd file, your object won’t rebuild.

Ask Me How I Know

I can’t count how many times I’ve changed a header, rebuilt, and then… nothing changed. No errors. No recompiled object. Just my confused face staring into the void.

Eventually I learned: just delete the .o file manually and try again.

And now, finally, I understand why: the change wasn’t tracked because the .cmd file didn’t know about the header I changed. Because fixdep didn’t know. Because it’s not recursive. Because… reasons.

What Are Your Options Then?

If your goal is to get a full list of files used in a Linux kernel build, here are your tools:

Method	Pros	Cons
`.d` files	Accurate, compiler-generated	Deleted after use
`.cmd` files	Tracked by Kbuild, includes `CONFIG_`	Incomplete, no nested headers
`strace` the build	Very complete	Noisy, includes false positives
Static parsing	Fun in theory	Hell in practice (`TRACE_INCLUDE` etc.)

So, Can I Use Dependency Files to List All Files Used in the Build?

Yes… But you have to be a bit clever about it.

You might think: “Hey, the kernel already uses -MMD, so .d files are created… I’ll just grab those!” Well… yes, they are created. But they’re also brutally deleted right after they’re used.

That rm command? It’s not a side effect or compiler flag, it’s explicitly baked into the kernel’s build logic. You correctly traced it to scripts/Kbuild.include, and specifically into, the logic used by the kernel’s makefiles. It’s part of this pattern:


cmd_and_fixdep =                                                             \
        $(cmd);                                                              \
        scripts/basic/fixdep $(depfile) $@ '$(make-cmd)' > $(dot-target).cmd;\
        rm -f $(depfile)

So it’s not that you forgot to save the .d files, it’s that the kernel build system deliberately throws them away like yesterday’s logs. Why? Because it only needs them briefly, to feed into the fixdep tool, which extracts top-level config header dependencies (like include/config/FOO) and embeds them into the .cmd files.

What Can Be Actually Done

What can be done, rather than rerun the entire kernel build with strace, or try to reverse-engineer header includes statically (good luck parsing around TRACE_INCLUDE)...

“Wait… The .cmd files have the full compiler commands!”

Exactly.

Here’s what

Search for all *.cmd files in the build output directory.
Extract each compile command (it’s right there in the file).
Manually rerun that compile, with -MMD still in place…
…but skip running fixdep and don’t delete the .d file.

This gives you full .d files, unmolested by cleanup. Now you have the real, compiler-generated dependency lists, with every nested header, accurate and complete, not the trimmed-down config-focused ones baked into .cmd.

Yes, it takes a bit of scripting and time. But it’s deterministic, reproducible, and lets you trace exactly what went into a particular kernel build, without hacking the build system or playing syscall whack-a-mole with strace.

And this is what I'm going to implement after an ufair fight with makfiles and friends!

Friday, June 27, 2025

Logging Shell Commands in BusyBox? Yes, You Can Now

In an earlier post, I showed how to log every shell command to a remote server using PROMPT_COMMAND in bash. It’s a neat trick, especially if you’re interested in session tracking, auditing, or just being a responsible sysadmin. But if you’re using BusyBox as your shell, and a lot of embedded systems and network devices do, you’ll quickly find out that this trick doesn’t work at all. So, what gives?

Bash Has `PROMPT_COMMAND`, BusyBox Doesn’t

If you’re used to bash, you know PROMPT_COMMAND is a handy little feature: before each command runs, the shell will execute whatever you put in that variable. Perfect spot for a logging hook.

But BusyBox’s shell applet, ash, doesn’t have PROMPT_COMMAND. Not even a secret version of it hiding somewhere. So if you try to replicate this kind of behavior in BusyBox, the shell just stares blankly at you and does… nothing.

This isn’t a bug or a missing package. BusyBox ash is designed to be small, fast, and simple; which means it skips the bells and whistles you’d find in a full shell like bash.

But Wait... Is This Really Useful?

Yes, actually. This isn’t just a case of “because I can.”

Think about it: a lot of network gear these days runs Linux under the hood: firewalls, routers, switches, access points, you name it. Many of them use BusyBox because it’s lightweight and gets the job done.

But in those environments, auditing is not optional. People managing these devices often come from networking backgrounds where things like TACACS+ command logging have been standard for decades. They expect that every command typed into a shell is logged somewhere, preferably off the device.

So this isn’t just a fun weekend hack. It’s a small but important feature that brings BusyBox a little closer to what these environments need.

A Peek Under the Hood: Why `getenv()` Doesn’t Work

Let’s say you want to control this feature using environment variables, like this:


export LOG_RHOST=10.0.0.1
export LOG_RPORT=5555
export SESSIONID_=abc123

Then, in your shell code, you try to do something like:


const char *host = getenv("LOG_RHOST");

And you get… NULL.

Why? Well, here’s the fun part: BusyBox ash keeps its own private stash of environment variables, separate from the system-wide environment you get with getenv(). It only syncs them up when launching child processes... not inside the shell itself. At least, this is what I read... Somewhere in the code or maybe in a commit message... But if I need to say, I didn't experience this myself.

So the variable is set, exported, and even shows up when you run set, but your getenv() call is talking to the wrong guy.

To fix this, you need to use BusyBox’s internal API:


const char *host = lookupvar("LOG_RHOST");

This asks the shell directly, and it gives you the right answer.

Lesson learned: if you’re inside BusyBox ash, use the tools it gives you, not the standard C library ones.

The Trick: Dependency Injection (Just Like the Cool Parts of BusyBox)

Here’s what I did:
Define a function pointer type in libbb:


typedef const char* (*injected_var_lookup_t)(const char *name);

Declare a static variable to hold the injected function:


static injected_var_lookup_t injected_lookup_var;

Add a public setter function to libbb:


void injected_set_var_lookup(injected_var_lookup_t func) {
    injected_lookup_var = func;
}

In ash.c, during initialization, pass in the actual lookupvar() function: Now, anywhere in libbb, I can safely call:


const char *val = injected_lookup_var("LOG_RHOST");

And it’ll work because, at runtime, it’s calling the real lookupvar() function from ash. This pattern keeps the code clean, avoids nasty cross-module hacks, and is totally in line with how BusyBox handles similar internal wiring elsewhere. I thought I was a genius for inventing this, until I realized BusyBox already did the same thing in math.c. So… maybe I’m just a very good copy-paste engineer.

What This Patch Adds

So I wrote a patch for BusyBox that brings this remote command logging feature to life. Here’s what it does:

Watches each command entered in the shell.
If the logging environment variables are set (LOG_RHOST, LOG_RPORT, SESSIONID_), it sends the command over TCP to the remote server.
It includes the session ID in the log line, so you can trace which session ran what.

That’s it. Lightweight, optional, and it doesn’t get in the way if you’re not using it.

The Patch

Here’s the code that makes it happen: it applies nicely to busybox-1.37.0


From 5b663cd894f6418673686290248fe8776af2434d Mon Sep 17 00:00:00 2001
From: Alessandro Carminati <alessandro.carminati@gmail.com>
Date: Fri, 27 Jun 2025 14:05:21 +0200
Subject: [PATCH] ash: add support for logging executed commands to a remote
 server
Content-type: text/plain

This commit adds functionality to the ash shell that sends each executed command
to a remote logging server over TCP, enabling remote auditing and session tracking.

The design is inspired by the tacacs2 approach used in network devices. This is
particularly useful in embedded Linux environments replacing traditional routers,
where audit trails are essential.

Unlike bash, ash does not support PROMPT_COMMAND. This implementation fills that
gap using internal hooks in the shell.

The feature is controlled via three environment variables:

  - SESSIONID_ : unique identifier for the shell session
  - LOG_RHOST  : remote log server hostname or IP address
  - LOG_RPORT  : remote log server TCP port

When these variables are set, each command entered is sent to the specified
logging server, prepended with the session ID.

This enhancement is lightweight and optional, and does not impact users who
do not configure the environment variables

Signed-off-by: Alessandro Carminati <acarmina@redhat.com>
---
 include/libbb.h       |   7 +++
 libbb/Config.src      |  10 ++++
 libbb/Kbuild.src      |   1 +
 libbb/lineedit.c      |   3 ++
 libbb/loggers_utils.c | 114 ++++++++++++++++++++++++++++++++++++++++++
 shell/ash.c           |   3 ++
 6 files changed, 138 insertions(+)
 create mode 100644 libbb/loggers_utils.c

diff --git a/include/libbb.h b/include/libbb.h
index 01cdb1b..870b9f5 100644
--- a/include/libbb.h
+++ b/include/libbb.h
@@ -2003,6 +2003,9 @@ void free_line_input_t(line_input_t *n) FAST_FUNC;
 #else
 # define free_line_input_t(n) free(n)
 #endif
+# if ENABLE_FEATURE_SEND_COMMAND_REMOTE
+void loggers_utils_set_var_lookup(void *func);
+# endif
 /*
  * maxsize must be >= 2.
  * Returns:
@@ -2133,6 +2136,10 @@ enum {
    PSSCAN_RUIDGID  = (1 << 21) * ENABLE_FEATURE_PS_ADDITIONAL_COLUMNS,
    PSSCAN_TASKS    = (1 << 22) * ENABLE_FEATURE_SHOW_THREADS,
 };
+# if ENABLE_FEATURE_SEND_COMMAND_REMOTE
+int rlog_this(const char *history_itm);
+# endif
+
 //procps_status_t* alloc_procps_scan(void) FAST_FUNC;
 void free_procps_scan(procps_status_t* sp) FAST_FUNC;
 procps_status_t* procps_scan(procps_status_t* sp, int flags) FAST_FUNC;
diff --git a/libbb/Config.src b/libbb/Config.src
index b980f19..a6f5882 100644
--- a/libbb/Config.src
+++ b/libbb/Config.src
@@ -202,6 +202,16 @@ config FEATURE_EDITING_SAVE_ON_EXIT
    help
    Save history on shell exit, not after every command.
 
+config  FEATURE_SEND_COMMAND_REMOTE
+   bool "Send last command to remote logger for audit"
+   default n
+   depends on FEATURE_EDITING_SAVEHISTORY
+   help
+   Send last command to remote logger for audit.
+   It is mandatory that LOG_RHOST and LOG_RPORT environment variables
+   are defined to specify the remote ip and port where send logs.
+   It alse needs the environment SESSIONID_ to be defined as sessionid.
+
 config FEATURE_REVERSE_SEARCH
    bool "Reverse history search"
    default y
diff --git a/libbb/Kbuild.src b/libbb/Kbuild.src
index cb8d2c2..096a9f3 100644
--- a/libbb/Kbuild.src
+++ b/libbb/Kbuild.src
@@ -208,3 +208,4 @@ lib-$(CONFIG_FEATURE_CUT_REGEX) += xregcomp.o
 
 # Add the experimental logging functionality, only used by zcip
 lib-$(CONFIG_ZCIP) += logenv.o
+lib-$(CONFIG_FEATURE_SEND_COMMAND_REMOTE) += loggers_utils.o
diff --git a/libbb/lineedit.c b/libbb/lineedit.c
index 543a3f1..8140f00 100644
--- a/libbb/lineedit.c
+++ b/libbb/lineedit.c
@@ -1685,6 +1685,9 @@ static void remember_in_history(char *str)
    /* i <= state->max_history-1 */
    state->history[i++] = xstrdup(str);
    /* i <= state->max_history */
+# if ENABLE_FEATURE_SEND_COMMAND_REMOTE
+   rlog_this(state->history[i-1]);
+# endif
    state->cur_history = i;
    state->cnt_history = i;
 # if ENABLE_FEATURE_EDITING_SAVEHISTORY && !ENABLE_FEATURE_EDITING_SAVE_ON_EXIT
diff --git a/libbb/loggers_utils.c b/libbb/loggers_utils.c
new file mode 100644
index 0000000..d1266e8
--- /dev/null
+++ b/libbb/loggers_utils.c
@@ -0,0 +1,114 @@
+/*
+ * This code allows remote logging of the commands.
+ *
+ * Copyright (c) 2025 Alessandro Carminati <acarmina@redhat.com>
+ *
+ * Licensed under GPLv2 or later, see file LICENSE in this source tree.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include <unistd.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <netdb.h>
+#include <sys/socket.h>
+
+#define SESSION_ID_ENV "SESSIONID_"
+#define SESSION_LEN 9
+#define SESSION_RHOST "LOG_RHOST"
+#define SESSION_RPORT "LOG_RPORT"
+
+typedef const char* (*loggers_utils_var_lookup_t)(const char *name);
+
+void get_timestamp(char *, size_t);
+int send_log(const char *, const char *, const char *);
+int rlog_this(const char *);
+void loggers_utils_set_var_lookup(void *func);
+
+static loggers_utils_var_lookup_t loggers_utils_lookup_var;
+
+void loggers_utils_set_var_lookup(void *func) {
+   loggers_utils_lookup_var = (loggers_utils_var_lookup_t) func;
+}
+
+void get_timestamp(char *buf, size_t len) {
+   time_t now = time(NULL);
+   struct tm *tm_info = localtime(&now);
+   strftime(buf, len, "%Y%m%d.%H%M%S", tm_info);
+}
+
+int send_log(const char *line, const char *host, const char *port_str) {
+   int sockfd;
+   struct addrinfo hints, *res, *p;
+
+   memset(&hints, 0, sizeof(hints));
+   hints.ai_family = AF_UNSPEC;
+   hints.ai_socktype = SOCK_STREAM;
+
+   if (getaddrinfo(host, port_str, &hints, &res) != 0) {
+       fprintf(stderr, "send_log: cant' resolve host in %s\n",
+           SESSION_RHOST);
+       return -1;
+   }
+
+   for (p = res; p != NULL; p = p->ai_next) {
+       sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol);
+       if (sockfd < 0) continue;
+       if (connect(sockfd, p->ai_addr, p->ai_addrlen) == 0) break;
+       close(sockfd);
+   }
+
+   if (p == NULL) {
+       fprintf(stderr, "send_log: Unable to connect to %s:%s\n",
+           host, port_str);
+       freeaddrinfo(res);
+       return -1;
+   }
+
+   ssize_t len = strlen(line);
+   if (send(sockfd, line, len, 0) != len) {
+       fprintf(stderr, "send_log: Unable to send data\n");
+       close(sockfd);
+       freeaddrinfo(res);
+       return -1;
+   }
+
+   close(sockfd);
+   freeaddrinfo(res);
+   return 0;
+}
+
+int rlog_this(const char *history_itm) {
+   char timestamp[32], hostname[64];
+   char *sess_id, *r_ip, *r_port;
+   char logline[1500];
+
+   if (!loggers_utils_lookup_var) return -1;
+
+   sess_id = loggers_utils_lookup_var(SESSION_ID_ENV);
+   if (!sess_id || (strlen(sess_id) > 9)) return 1;
+
+   r_ip = loggers_utils_lookup_var(SESSION_RHOST);
+   if (!r_ip) return -1;
+
+   r_port = loggers_utils_lookup_var(SESSION_RPORT);
+   if (!r_port) return -1;
+
+   if (!atoi(r_port)) return -1;
+
+   get_timestamp(timestamp, sizeof(timestamp));
+   gethostname(hostname, sizeof(hostname));
+
+   snprintf(logline, sizeof(logline), "%s - %s - %s > %s\n",
+            timestamp, sess_id, hostname, history_itm);
+
+   if (send_log(logline, r_ip, r_port)!=0){
+       fprintf(stderr, "rlog_this: can't send log to remote.\n");
+       return -2;
+   };
+
+   return 0;
+}
diff --git a/shell/ash.c b/shell/ash.c
index bbd7307..e021def 100644
--- a/shell/ash.c
+++ b/shell/ash.c
@@ -9780,6 +9780,9 @@ setinteractive(int on)
            did_banner = 1;
        }
 #endif
+# if ENABLE_FEATURE_SEND_COMMAND_REMOTE
+       loggers_utils_set_var_lookup(&lookupvar);
+# endif
 #if ENABLE_FEATURE_EDITING
        if (!line_input_state) {
            line_input_state = new_line_input_t(FOR_SHELL | WITH_PATH_LOOKUP);
-- 
2.25.1

BusyBox contribution

Once you’ve developed a new feature for BusyBox, contributing your patch is a straightforward process, but there are a few nuances compared to projects like the Linux kernel. To submit your code, you must first subscribe to the BusyBox mailing list, as only subscribers can post. The list is publicly archived, so you can review past discussions for context, but posting requires a subscription. Unlike the Linux kernel, BusyBox doesn’t maintain a comprehensive MAINTAINERS file; instead, Denys Vlasenko is the main project maintainer, and individual applet maintainers (if any) are usually listed in the relevant source files. When sending your patch, address it to the mailing list and, if your change affects a specific applet, CC the maintainer named in the file’s header.

For new features, it’s best practice to first float your idea on the mailing list for feedback before investing significant development effort, unless you really want the feature. This informal discussion helps gauge interest and avoid duplicating work. Once your patch is ready, ensure it’s well-tested, follows the project’s coding style, and focuses on a single logical change. If your submission doesn’t get a response, a polite follow-up or using the BusyBox Bug and Patch Tracking System is recommended.

Sunday, May 18, 2025

Inline Trouble: Convincing the Compiler to Leave My Functions Alone

If you’ve been following my eternal battle with compilers, especially in the Linux kernel, you already know my hobbies: debugging things that work, fighting compiler optimizations, and getting emotional about function addresses. This post is no different… Except this time, the compiler and I are arguing about inlining.

Now, inlining itself isn’t evil. It’s actually a performance gift. A function call costs a few CPU cycles, so compilers try to be helpful and say, “Hey, what if I just paste the code right here instead of calling it?” And for most of the world, that’s great. Less overhead, faster code, happy life.

But when you’re working on a giant codebase like the Linux kernel, and you want to know where exactly something happened at runtime, inlining quickly turns from optimization to obfuscation.

The Problem: Who Called Me?

Imagine you’re adding a feature to the kernel to suppress certain error messages… Selectively. Maybe a BUG() or WARN() is firing, but you don’t want to see the report if it comes from that function. You want something like:

dont_complain_if_warn_is_called_from("driver_init_fn");

Clean and elegant, right?

The kernel offers kallsyms, which can map an instruction pointer back to a function name. Perfect! So, all I need is to grab the address of the code where WARN() is called, pass it to kallsyms, and boom … I know the function name.

But here’s the catch: the original function where BUG() and WARN() were called might be inlined into another, and now kallsyms for that address return the parent’s name. The obvious solution would be to mark all the functions where BUG() and WARN() exist.

Now, BUG() and WARN() are macros. They can be used anywhere: from deep in scheduler code to a random USB driver. While you technically can go and modifying every function that ever calls them, and just to slap a __attribute__ ((noinline)) on top, this is anything but practical to maintain. In a constantly changing codebase like the Linux kernel, requiring everyone who adds a WARN() to ensure the function is not inlined is like asking C to be friendly with undefined behavior... you're just asking for trouble.

The Compiler is Too Smart

Compilers like GCC and Clang use heuristics to decide whether to inline a function.

Some of those include:

How big is the function?
Is it called only once?
Is it marked inline? (Spoiler: that’s just a suggestion.)

They’re allowed to ignore your hints. Even if you write inline, the compiler might say, “Nah, I see this function better as standalone”. In a ceiratain sense, this also means that there's nothing preventing it from inline function where no preference is expressed.

Which, in my case, breaks everything… Because now the call to WARN() has been absorbed into another function, and its address points to that function (former parent), not the one I was trying to filter.

Because of that, it is not resonable assume to mark every caller as noinline. A more clever way is needed.

My First Attempt: Macros and Label Addresses

The idea was simple: create a local label using &&label (a GNU extension that gives the address of a label), make it static so it sticks, and pass that pointer to a dummy function. Here’s what it looked like:

#define THISLABEL(i) THISLABEL1(i, __LINE__) #define THISLABEL1(i, l) THISLABEL2(i, l) #define THISLABEL2(i, l) THISLABEL_##i##_##l #define BLOCK_INLINE() \ do { \ static void *volatile p = &&THISLABEL(0);\ THISLABEL(0): \ use_pointer(p); \ } while (0) #define use_pointer(x) ((void)(x))

I thought: this is surely will keep the compiler to back off from inlining. And indeed, GCC seemed to take the hint.

But I noticed that just by placing the label after use_pointer(p), could change the results. In GCC, it still sort of worked (because GCC loves a good shrug, I guess), but Clang had different plans. Things broke subtly… and one function in my test bed of two functions, got inlined.

Placing the label before the pointer usage and wrapping everything in a macro with a static volatile pointer, as shown in the code snipped, I got consistent results across both compilers… But…

But even then, I felt uneasy. This solution felt fragile. I was relying on undefined quirks and hoping compilers didn’t change their mind in a slightly different scenario.

Sneaky Anti-Inlining Techniques

I did some digging (read: asking smart people), and found a list of things that scare GCC and Clang enough to not inline a function.

Here’s a list, probably not exhaustive:

Use alloca() (dynamic stack allocation)
Call setjmp() or similar
Use va_arg/va_end (variadic function tricks)
Take the address of a label (&&label) — a GNU extension
Use computed gotos (goto *ptr)
Declare a variable whose value the compiler can’t predict

Any of these can make the compiler decide, “This is too weird. I’m not touching it.”

Enter: The Function-Preserving Incantation

The same smart person suggested a compact GNU statement expression that uses computed gotos, labels, and inline assembly to form what I now think of as an anti-inlining trick:

#define BLOCK_INLINE() ({ __label__ lab; volatile int never_true = 0;lab:; static void *p = &&lab; __asm volatile ("" : "+m" (p)); if (never_true) goto *p; })

Let’s unpack this masterpiece:

__label__ lab; lab: declares a local label inside a block.
&&lab gets the address of that label, a GCC extension.
The address is stored in a static pointer p, marked volatile, so the compiler can’t just delete it.
__asm__ volatile ("" : "+m" (p)); is an empty inline assembly trick that tells the compiler, “This memory might be touched, don’t optimize too much.”
Finally, the conditional goto *p is wrapped in a fake if (never_true): enough to keep things technically live, without actually running anything.

This construct scared both GCC and Clang enough to not inline the function it appeared in, without needing any attributes, annotations, or source rewrites.

`({ ... })` – GNU Statement Expression - Something I didn’t know about

GNU extension, not standard C/C++
Treats a block of code as an expression, and it returns a value
Only works with GCC and Clang (not MSVC… I think I can live with this)

Example:

int x = ({ int a = 3; int b = 4; a + b; }); // x = 7

The block runs, and the value of the last expression (a + b) is returned.
This allows you to write macros or inlined logic that act like expressions, not just code blocks.

And if you are wondering, as I did, if it is equivalent to do { ... } while(0), it is not. Here’s a quick comparison cheat sheet:

Feature	`({ ... })`	`do { ... } while (0)`
Standard C	No (GNU extension)	Yes
Returns a value	Yes (last expression)	No
Used in expressions	Yes	No (statement-only)
Scope control	Local block	Local block
Portability	GCC/Clang only	Universal
Common in macros	Yes (GNU/Linux kernel)	Yes (portable projects)

({ ... }) is neat... and something I wasn’t aware of before. The original suggestion includes it, but since I don’t need the return value functionality in my case, I think I’ll replace it with the more portable do { ... } while (0).

#define BLOCK_INLINE() do { \ __label__ lab; \ volatile int never_true = 0; \ lab:; \ static void *p = &&lab; \ __asm volatile ("" : "+m" (p)); \ if (never_true) goto *p; \ } while(0)

Conclusion: Please Don’t Try This at Home (Unless You Must)

Inlining is great… Until it is not…. If you simply need a function not to be inlined, the right thing to do is usually to use the tools your compiler provides: attributes. That’s clean, documented, and supported.

However, there are situations, like mine, where touching the function definition isn’t practical or even possible. In the Linux kernel, when WARN() or BUG() can appear virtually anywhere, and you need to reason about the call site without rewriting the world, you’re left with fewer options.

In those edge cases, using obscure tricks like computed gotos or label addresses might just be the only way to preserve the structure you need… Even if it means making the compiler a little nervous.

Just be aware: this kind of code is fragile, deeply non-portable, and should come with a warning label. It’s not a pattern… it’s a workaround.

Because let’s face it: compilers are smart, but weird code is forever.

Monday, April 28, 2025

When Wi-Fi Says "No" and Your Serial Says "Maybe"

Ever been stuck with an embedded board that looks like a spaceship but behaves like a potato?
No drivers, no network, no SSH, no hope. Just you and a lonely serial cable whispering bits of despair.
That’s the moment you wish you could throw a file at it... maybe a fresh build of busybox that finally has that utility you desperately need but isn’t available on the current system.

Enter: the send_console.
A gloriously simple hack to push a file over the same serial line you’re using to yell at your misbehaving device... because sometimes, network connectivity simply isn’t an option.

Let’s dive into what it is, why it matters, and why you’ll love it (or at least curse it slightly less than the alternative).

The Art of Smuggling Files Over Serial (Or: “How to Befriend Your Terminal”)

Let’s start with some very real pain points:

You’re enabling a brand-new platform. It just finished booting for the first time, half the drivers are still missing, and network connectivity is but a distant dream.
You’re working with a device controlled by a sidekick board: access through the back door, with device network not supposed to be accessed from outside.
You’re hacking an embedded device. You finally found the serial pins hidden on the board, connected to them, but you have no tool to upload a file on it.

In all these cases, you really need to get a file onto that machine to move forward, but there’s no standard way to do it.

In theory, serial lines were made for this.... In practice? Not anymore.

In the good old days, we had XMODEM and ZMODEM.
Cute protocols that handled file transfers over serial lines... Today?
You can’t bet they’re installed. Often, you’re lucky if you even have a working cat.

This brings us to the real star of the show: how terminals actually work.

Terminal Magic: Why You Can’t Just “Send the Bytes, Bro”

Imagine your terminal is a grumpy customs officer. You hand it a packet (a byte), but before it lets it through, it:

Checks for forbidden characters,
Buffers a bunch of data,
Sometimes “fixes” what it thinks you meant to send.

Terminals are not dumb pipes: they enforce line discipline, apply buffering, and often modify the data you send.

Enter stty, our magic wand.
It tells the terminal to behave properly: disable echo, set raw mode, turn off processing.

Why stty?

It’s incredibly portable.
It’s available almost everywhere: full Linux systems, BusyBox minimal environments, you name it. Hint: it is mandatory in the List of POSIX commands.

Another companion is terminal buffering:
Serial interfaces typically buffer input and output for efficiency, but this can ruin a clean file transfer if not managed.
Here, stdbuf comes to the rescue, forcing applications like cat to flush data properly as it’s processed.

send_console-ng: “Now With 50% More Reliability!”

Before send_console-ng, there was… well, just send_console.
A brave little utility with a lot of heart... and a lot of problems.

The first version tried to be smart:
It sent commands and checked their echoes to validate that everything arrived correctly.
Sounds reasonable, right?

Except…
In a real embedded system, kernel messages can appear at any moment.
Random printk noise: “driver XYZ initialized”, “thermal event detected”, “welcome to dmesg hell”, bursting onto your serial line like popcorn on a campfire.

This meant that even if your command was properly echoed, the echo might get interrupted mid-flight by a random kernel log.

In a heroic attempt to fight this chaos, I even crafted a function designed to validate echoes, despite random insertions:

func findFragmented(input, target string) [][2]int { inputRunes := []rune(input) targetRunes := []rune(target) if len(target) == 0 { return [][2]int{} } startPositions := []int{} for i, r := range inputRunes { if r == targetRunes[0] { startPositions = append(startPositions, i) } } newlineMap := map[int]struct{}{} for i, r := range inputRunes { if r == '\n' { newlineMap[i] = struct{}{} } } targetMap := make(map[rune][]int) for i, r := range inputRunes { targetMap[r] = append(targetMap[r], i) } var results [][2]int for _, start := range startPositions { if ok, path := matchFragmentedRecursive( inputRunes, targetRunes, targetMap, newlineMap, 1, start, []int{start}); ok { end := path[len(path)-1] + 1 results = append(results, [2]int{start, end}) } } return results } func matchFragmentedRecursive(inputRunes, targetRunes []rune, targetMap map[rune][]int, newlineMap map[int]struct{}, tIdx, currPos int, path []int, ) (bool, []int) { if tIdx == len(targetRunes) { return true, path } need := targetRunes[tIdx] validNext := map[int]struct{}{ currPos + 1: {}, } for nl := range newlineMap { if nl > currPos && nl+1 < len(inputRunes) { validNext[nl+1] = struct{}{} } } for _, pos := range targetMap[need] { if pos <= currPos { continue } if _, ok := validNext[pos]; !ok { continue } if ok, matched := matchFragmentedRecursive( inputRunes, targetRunes, targetMap, newlineMap, tIdx+1, pos, append(path, pos), ); ok { return true, matched } } return false, nil }

The function was smart. It could piece together broken echoes.
It battled bravely.
But in the end, I realized I was a modern Don Quixote, fighting not windmills, but terminal text wrapping feature, dmesg notifcation, ansi escape codes, and all their friends.

send_console-ng takes a much saner approach:
Instead of dancing with echoes, it takes control of the terminal settings on the remote side.
It flips the terminal into raw mode with stty, disables buffering shenanigans, and then simply blasts the file through cleanly.

On the host side, the command is beautifully simple:

send_console-ng -b 115200 -f busybox -d /dev/ttyUSB1

-b sets the baudrate,
-f specifies the file to send,
-d picks the serial device.

Behind the scenes, send_console-ng:

Compresses the file with gzip,
Encodes it safely with base64,
Manages terminal settings with stty,
Forces unbuffered transmission with stdbuf.
Handles the decoding on the remote side.

Disclaimer

send_console-ng is still immature. I'm not sure if it will ever mature, but it's certainly not at this point. For instance, when it reaches a terminal, it starts by testing the commands... However, it's currently incapable of determining if the terminal it's running on is the correct one: for instance, if it reaches a login prompt, or worse, a gdb terminal, it will most likely result in an error... The worst?!? I don't even want to think about it.

So, if I've written my regexes correctly, it might just throw an error, or maybe not...

Some Realistic Limits (And Why I’m Not Crying About It Yet)

Reality check!
This tool depends on the presence of:

stty
gzip
base64
cat
stdbuf

If any of these are missing on the host or target, the tool simply won’t work.

Can it be made work without them? Perhaps, here some ideas on how it could work in more hostile environments

For stty: if it’s missing, the missing functionality boils down to just two syscalls needed.
A small, architecture-dependent assembly utility could easily fill that gap.

Here's an example that could work on x86_64 machines:

.section .bss .lcomm termios_buf, 64 .section .text .global _start _start: mov $16, %rax # SYS_IOCTL mov $0, %rdi # stdin (fd=0) mov $0x5401, %rsi lea termios_buf(%rip), %rdx syscall lea termios_buf(%rip), %rbx add $12, %rbx mov (%rbx), %eax andl $~(0x0002 | 0x0008), %eax # clear ICANON and ECHO mov %eax, (%rbx) mov $16, %rax # SYS_IOCTL mov $0, %rdi mov $0x5402, %rsi # TCSETS lea termios_buf(%rip), %rdx syscall mov $60, %rax # SYS_exit xor %rdi, %rdi syscall

For base64:
If base64 is missing, an alternative is possible:
Send the file as a sequence of:

printf "\xXX\xYY\xZZ..."

However, this will significantly increase transfer time.
While base64 encoding uses about 12 bits per byte, raw hexadecimal uses 32 bits per byte... almost tripling the amount of data.
The upside?
It’s completely dependency-free.

For gzip:
Compression is a huge time saver but not strictly necessary.
Skipping gzip would work... At the cost of slower and bigger transfers.

Conclusion

Next time you find yourself staring at a device that’s technically alive but practically unreachable, remember:
You don’t need miracles. You just need a bit of clever system abuse and send_console-ng

Because in embedded work, persistence always beats perfection.

Wednesday, March 26, 2025

The Case of the Disappearing PID: A Debugging Mystery

Every developer, at some point, encounters a situation so baffling it makes them question their own sanity. This is the story of one such weirdness: a heavily multithreaded Golang application, a kernel module, and a PID that vanished into the abyss without a trace.

Spoiler alert: It wasn’t aliens.

The Setup: A Debugging Nightmare

The problem was simple: I was tracking the lifecycle of processes spawned by a third-party binary. To do this, I wrote a Linux Kernel Module that hooked into _do_fork() and do_exit(), logging every process birth and death. And yes, you read that right... _do_fork(). You know, that function that, for over 20 years, had ‘_do_fork’ as a name, even though ‘fork’ was actually just a special case of ‘clone’. Then, suddenly, in 5.10, someone in kernel land had a 'Wait a second!' moment and decided the name was too misleading. So, they renamed it to ‘kernel_clone()’, like, surprise! No more confusion, just 20 years of tradition down the drain. But hey, at least we now know what’s really going on... I think.

Back to the story, at first, everything seemed fine. Threads were born, threads died, logs were generated, and the universe remained in harmony. But then, something unholy happened: some PIDs vanished without ever triggering do_exit().

I know what you're thinking at... But NO, this was not a case of printk() lag, nor was it tracing inaccuracies. I double-checked using ftrace, netconsole, and even sacrificed a few coffee mugs at the pagan god of debugging... The logs were clear: the PID appeared, then POOF! Gone. No exit call, no final goodbye, no proper burial.

Step One: Denial (And the Stack Overflow Void)

Could a Linux process terminate without passing through do_exit()?

My first instinct was: Absolutely not.

If that were true, the very fabric of Linux process management would collapse. Chaos would reign. Cats and dogs would live together. And yet, my logs insisted otherwise.

So, like any good developer, I turned to Stack Overflow. Surely, someone must have encountered this before. I searched. No ready-made answer. Fine.

I did what any desperate soul would do: I asked the question myself.

Days passed. The responses trickled in, but nothing convinced me. The usual suspects, race conditions, tracing inaccuracies, were suggested, but I had already ruled them out. Stack Overflow had failed me.

I realized I wasn’t going to find the answer just by asking. I had to go hunting.

Step Two: Anger (aka Kernel Grep Hell)

I dug deep. Real deep. Into the Linux kernel source, into mailing lists from 2005, into the depths of Stack Overflow where unsolved mysteries go to die.

And then, I found it. The smoking gun.

Deep in fs/exec.c, hiding like a bug under the rug, was this delightful nugget (from the 4.19 kernel):

/* Become a process group leader with the old leader's pid. * The old leader becomes a thread of this thread group. * Note: The old leader also uses this pid until release_task * is called. Odd but simple and correct. */ tsk->pid = leader->pid;

I read it. I read it again. I re-read it while crying. And then it hit me.

Step Three: Bargaining (Can Two Processes Have the Same PID?)

If you had asked me before this, I’d have said no, absolutely not: two processes cannot share the same PID. That’s like realizing your passport was cloned, and now there's another ‘you’ vacationing in the Bahamas while you’re stuck debugging kernel code. That’s not how things work!

Except, sometimes, it is.

Here’s what happens (in 4.19):

A multithreaded process decides it wants a fresh start and calls execve().
The kernel, being the neat freak it is, has to clean up the old thread group.
But, in doing so, it needs to shuffle some PIDs around.
The newly exec’d thread gets the old leader’s PID, while the old leader, now a zombie, keeps using the same PID until it’s fully reaped.
If you were monitoring the old leader, you’d see its PID go through do_exit() twice. First, when the actual old leader dies. Then, when its "impostor", the thread that inherited its PID, finally meets its own end. So, from an external observer’s perspective, it looks like one process vanished without a trace, while another somehow managed to die twice. Linux: where even PIDs get second lives.

Now, fast-forward to kernel 6.14, and the behavior has been slightly refined:

/* Become a process group leader with the old leader's pid. * The old leader becomes a thread of this thread group. */ exchange_tids(tsk, leader);

The mechanism has changed, but it still involves shuffling PIDs in a similar way. With exchange_tids(), the process restructuring appears to follow the same logic, likely leading to the same observable effect: one PID seeming to vanish without an obvious do_exit(), while another might appear to exit twice. However, a deeper investigation would be needed to confirm the exact behavior in modern kernels.

This, ladies and gentlemen, was my bug. My missing do_exit() wasn’t missing. It was just… misdirected.

Step Four: Acceptance (And Trolling Future Debuggers)

Armed with this knowledge, I could now definitively answer some existential Linux questions:

Can a Linux process/thread terminate without passing through do_exit()?
No. Every process must pass through do_exit(), even if it’s via a sneaky backdoor.
Can two processes share the same PID?
Normally, no. The rule of unique PIDs is sacred... or so we’d like to believe. But every now and then, the kernel bends the rules in the name of sneaky process management. And while modern kernels seem to have repented on this particular trick, well... Where there’s one skeleton in the closet, there’s bound to be more.
Can a Linux process change its PID?
Yes, in at least one rare case: when de_thread() decides to reassign it.

Final Thoughts (or, How to Break a Debugger’s Mind)

If you ever find yourself debugging a disappearing PID, remember:

The kernel is a twisted, brilliant piece of engineering.
Process lifecycle tracking is a house of mirrors.
Never trust a PID: it might not be who you think it is.
Stack Overflow won’t always save you. Sometimes, you have to dig into the source code yourself.
And, most importantly: always suspect execve().

In the end, Linux remains a beautifully chaotic system. But at least now, when PIDs disappear into the void, I know exactly which corner of the kernel is laughing at me.

Happy debugging!

Monday, March 10, 2025

Kernel Testing for Not-So-Common Architectures

When developing kernel patches, thorough testing is crucial to ensure stability and correctness. While testing is straightforward for common architectures like x86 or ARM, thanks to abundant tools, binary distributions, and community support, the landscape changes drastically when dealing with less common or emerging architectures.

The Challenge of Less Common Architectures

Emerging architectures, such as RISC-V, are gaining momentum but still face limitations in tooling and ecosystem maturity. Even more challenging are esoteric architectures like loongarch64, which may have minimal community support, scarce documentation, or lack readily available toolchains. Testing kernel patches for these platforms introduces unique hurdles:

Toolchain Availability: Compilers and essential tools might be missing or outdated.
Userspace Construction: Creating a minimal userspace to boot and test the kernel can be complex, especially if standard frameworks don’t support the target architecture.

The Role of Buildroot

In many scenarios, buildroot has been an invaluable resource. It simplifies the process of building both the toolchain and a minimal userspace. Its automated and modular approach makes setting up environments for a wide range of architectures relatively straightforward. However, buildroot has its limitations and doesn’t support every architecture recognized by the Linux kernel (apparently old architectures like parisc32 is still supported by the kernel).

Understanding Userspace Construction

Userspace setup is a critical part of kernel testing. Traditionally, userspace resides on block devices, which introduces a series of complications:

Block Device Requirements: The device itself must be available and correctly configured.
Kernel Driver Support: The kernel must include the necessary drivers for the block device. If these are modules and not built-in, early boot stages can fail.

An effective alternative is using initramfs. This is a root filesystem packaged in a cpio archive and loaded directly into memory at boot. It simplifies boot processes by eliminating dependencies on block devices.

Building an Initramfs

Building an initramfs introduces its own challenges. Tools like Dracut can automate this process and work well in native build environments. However, in cross-build scenarios, Dracut’s complexity increases. It may struggle with cross-compilation environments, environment configurations, and dependency resolution.

Alternatively, frameworks like Buildroot and Yocto offer comprehensive solutions to build both toolchains and userspaces, including initramfs. These tools can handle cross-compilation but have their drawbacks:

Performance: Both tools can be slow.
Architecture Support: Not all architectures supported by the Linux kernel are covered.

When Buildroot-like approach Falls Short

Encountering an unsupported architecture can be a major roadblock. Without Buildroot, developers need to find alternative strategies to build the necessary toolchain and create a functional userspace for kernel testing.

An Alternative Approach: Crosstool-NG and BusyBox

One effective solution is leveraging Crosstool-NG to build the cross-compilation toolchain and using BusyBox to create a minimal userspace. This approach offers flexibility and control, ensuring that even esoteric architectures can be targeted. Here’s a detailed overview of this method:

Build the Toolchain with Crosstool-NG:
- Build and Install Crosstool-NG
- Initialize the wanted toolchain with ct-ng menuconfig.
- Select the target architecture and customize the build parameters.
- For esoteric architectures, enable the EXPERIMENTAL flag in the configuration menu. Some architectures are still considered experimental, and this flag is required to unlock their toolchain support.
- Proceed with building the toolchain using ct-ng build.
- Address any architecture-specific quirks or requirements during configuration and compilation.
Create a Minimal Userspace with BusyBox:
- Export the cross-compiler by setting the environment variable: export CROSS_COMPILE=<path-to-toolchain>/bin/<arch>-linux-.
- Configure and build BusyBox for a static build to avoid library dependencies: make CONFIG_STATIC=y.
- A static BusyBox build simplifies root filesystem creation, as it removes the need for organizing the /lib directory for shared libraries.
- Design the init system using BusyBox’s init with a simple SystemV style inittab:
- The rest of the filesystem can be minimal, with the /bin directory containing BusyBox and symlinks for the core tools.
- Make sure to have a /dev directory populated with at least console and tty0 devices, otherwise you won't see any messages and possibly your init will crash
Sample implementation of this concept is here.

Pack Userspace into an Initramfs:
- Assemble the userspace into a cpio archive with: find . -print0 | cpio --null -o --format=newc > ../initramfs.cpio.
- Ensure the kernel configuration is set to load the initramfs at boot.
Build and Test the Kernel:
- Compile the kernel using the cross-compiled toolchain:
- Be aware that excessively long CROSS_COMPILE strings can cause issues, leading the build system to fall back to the native toolchain.
- Use the kernel configuration symbol CONFIG_INITRAMFS_SOURCE to specify the initramfs for embedding directly into the kernel image, enabling quick validation with QEMU or similar tools.

This method demands more manual configuration than Buildroot but offers a path forward when conventional tools fall short.

Conclusion

Kernel development for less common architectures is a complex but rewarding challenge. When standard tools like Buildroot can’t cover the gap, combining Crosstool-NG and BusyBox provides a reliable and adaptable solution.

Saturday, March 1, 2025

The Tale of the Stubborn Cipher: A Debugging Saga

I’m a Red Hat guy since a while now, but lurking in my lab was a traitor: an old Ubuntu 20.04 machine still doing all my heavy lifting, mostly building kernels. Why? Because back in the day, before I saw the light of Red Hat, I was under the false impression that Fedora wasn’t great for cross-compilation. Turns out, I was dead wrong.

Fast-forward to today, and I need to work on LoongArch. Guess what? Ubuntu didn’t have the cross-compilation bits I needed… but Fedora did. Finally, I had a valid excuse to invest time, nuke that relic and bring it into the Fedora family.

But of course, nothing is ever that simple. See, I had an old encrypted disk on that setup, storing past projects, ancient treasures, and probably a few embarrassing bash scripts. No worries! I’ll just re-run the cryptsetup command on Fedora, and boom, I’m in…

Right?

Oh, how naive I was.

The Beginning: A Mysterious Failure

It all started with a simple goal: to mount an encrypted partition using AES-ESSIV:SHA256 on Fedora. The same setup had worked perfectly on an old setup for years, but now, Fedora refused to cooperate.

The process seemed straight-forward: cryptsetup creates the mapping, but...

$ sudo cryptsetup create --cipher aes-cbc-essiv --key-size 256 --hash sha256 pippo /dev/sdb1 Enter passphrase for /dev/sdb1: device-mapper: reload ioctl on pippo (253:2) failed: Invalid argument $

A classic error message, vague and infuriating. This called for serious debugging.

The First Hypothesis: A Kernel Mishap?

I’m a kernel developer, and you know what they say: when all you have is a hammer, everything looks like a kernel bug. So, naturally, my first suspicion landed straight on the Linux kernel. Because, let’s be honest, if something’s broken, it’s probably the kernel’s fault… right? So, ESSIV wasn’t appearing in /proc/crypto, the first suspicion was a kernel issue. Fedora is known for enforcing modern cryptographic policies, and legacy algorithms are often disabled by default.

To investigate, the essiv.ko module source was examined. It turns out that crypto_register_template(&essiv_tmpl); does not immediately appear in /proc/crypto. Instead, /proc/crypto reflects the state of crypto_alg_list, which only updates after the first use of an algorithm.

So while I was staring at /proc/crypto, expecting to see ESSIV magically appear, I was actually just looking at a list of algorithms that had already been used, not the registered templates. The kernel wasn’t necessarily broken: just playing hard to get.

I needed to be sure the upstream kernel code I was looking at was exactly the same running on my machine. Fedora typically does not modify the upstream code, but I needed a confirmation. Rather than hunting down Fedora’s kernel source repository, the decision was made to compare binary modules from Fedora and upstream Linux, but…

Ah, binary reproducibility… A dream everyone chases but few actually catch. The idea of building a kernel module and getting the exact same binary sounds simple, but in reality, it’s like trying to bake the same cake twice without measuring anything.

What can make binaries different? The obvious culprit is the code itself, but that’s just the start. Data embedded in the binary can also change things. Compiler versions and plugins play a role… If the same source code gets translated differently, you’ll end up with different binaries, no matter how pure your intentions. Then come the non-code factors. A kernel module is an ELF container, and ELF files carry metadata: timestamps, cryptographic signatures, and other bits that make your module unique (and sometimes annoying to compare). Even the flags that mark a module as Out-Of-Tree can introduce differences.

So, when doing a binary comparison, it’s not just a matter of checking if the bytes match... you have to strip out the noise and focus on the meaningful differences. Here’s what I did

$ objcopy -O binary --only-section=.text essiv.us.ko essiv.us.bin $ objcopy -O binary --only-section=.text essiv.fedora.ko essiv.fedora.bin $ cmp essiv.us.bin essiv.fedora.bin

And I was lucky, the result was bitwise identical. No funny business in the kernel. Time to look elsewhere.

The Second Hypothesis: A Cryptsetup Mismatch?

To check if Fedora’s cryptsetup was behaving differently, the same encryption command was run on both the old machine and Fedora:

sudo cryptsetup -v create pippo --cipher aes-cbc-essiv:sha256 --key-size 256 /dev/sdb1

On the old machine, this worked fine, and the partition mounted successfully.
On Fedora, it created the mapping but refused to mount.

The Real Culprit: The Command Line Argument Order

At this point, every possible difference between the Ubuntu and Fedora commands was scrutinized.

And then, the discovery:

cryptsetup create --cipher aes-cbc-essiv:sha256 --key-size 256 --hash sha256 pippo /dev/sdb1

vs.

cryptsetup create pippo --cipher aes-cbc-essiv:sha256 --key-size 256 --hash sha256 /dev/sdb1

The first one fails mysteriously (without any syntax error). The second one works not throws error.

$ sudo cryptsetup create --cipher aes-cbc-essiv --key-size 256 --hash sha256 pippo /dev/sdb1 Enter passphrase for /dev/sdb1: device-mapper: reload ioctl on pippo (253:2) failed: Invalid argument

Why? Because cryptsetup’s argument parser behaves differently depending on argument order. The correct order is:

cryptsetup create <name> <options> <device>

When the name (pippo) is placed before the options, everything just works. But if options come first, something breaks silently.

The Final Barrier: Key Derivation Algorithm Mismatch

With the argument order fixed, one final verification was done, the command now does not fail, but the filesystem still not mounts. Looking at visible crypto parameters, everything looked fine, but it was not.

$ sudo dmsetup table pippo

On Fedora, it returned:

0 3907026944 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 8:17 0

On old machine, it returned:

0 3907026944 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 8:97 0

Identical! This meant that the same crypto algorithm was being used, and I was providing the same passphrase. So, in theory, everything should have been correct.

And yet… mounting still failed.

The log only confirmed that the same encryption algorithm was in play; it didn’t prove that the same key was actually being used. Since the key is derived from the passphrase, hashing algorithm, and other parameters.

A final comparison of the cryptsetup debug logs revealed the culprit:

Even though both systems used the same hashing algorithm (aes-cbc-essiv:sha256), they used different passphrase-to-key derivation methods internally. Fedora’s version of cryptsetup was not deriving the same encryption key.

The Fix: Explicitly Specifying the Hash Algorithm (`RIPEMD-160`) and Mode

The final working command had to ensure that Fedora derived the key exactly like the old machine:

$ sudo cryptsetup create pippo --cipher aes-cbc-essiv:sha256 --key-size 256 --hash ripemd160 --type plain /dev/sda1

And finally:

$ sudo mount /dev/mapper/pippo /mnt/0 $ ls /mnt/0

Success! The partition mounted perfectly.

The Conclusion: Lessons Learned

Look things carefully before blaming the kernel.
Cryptographic defaults change across cryptsetup versions: be explicit!
The order of command-line arguments in cryptsetup matters.
Compare dmsetup table outputs is not just enough.
Key derivation methods can differ, and it is not evident!

After all the deep dives into kernel modules, crypto policies, and hashing algorithms, the entire issue boiled down to two things:

Wrong argument order in cryptsetup
Key derivation differences between cryptosetup versions.

A truly fitting end to a classic Linux troubleshooting adventure.

Thursday, January 9, 2025

Security implications with printk

Introduction

Kernel debugging is inherently a complex task due to the intricate and low-level nature of kernel operations. Surprisingly, one of the most proficient and useful tools for tackling this challenge is the printk function. While it may seem like a simple utility for printing messages, printk is a cornerstone of kernel debugging, offering critical insights into kernel behavior. The printk function in the Linux kernel might appear trivial at first glance, simply serving to print messages for debugging and logging purposes. However, it is one of the most intricate and critical components of the kernel. Its complexity arises from the requirement to function reliably in all possible kernel contexts, including interrupt handlers, non-preemptive sections, and even in cases of kernel panics. This complexity has made printk a major obstacle to the integration of the preempt_rt (real-time preemption) patch into the mainline kernel, as achieving deterministic behavior and low-latency logging in real-time systems poses significant challenges. So, kernel debugging often involves analyzing log messages to diagnose issues or understand system behavior. Among the formats used for printing data in the kernel, %pK and %pS serve specific purposes when dealing with pointers. However, their combined usage in the same message can introduce unintended information leaks, potentially undermining Kernel Address Space Layout Randomization (KASLR) security measures. This blog post explores the problem of combining %pK and %pS in a single message. We’ll start with an introduction to the problem, delve into how these formats work, and discuss specific scenarios, such as those involving kmemleak and module loading, where these issues can arise.

Potential Information Leak from Combining `%pK` and `%pS`

The kernel uses %pK to mask sensitive pointer addresses in logs based on the privilege level of the user reading the logs. This is particularly critical for preserving KASLR offsets, which are integral to modern system security. On the other hand, %pS resolves pointers to symbols, printing the function name and offset, or falling back to the raw address if the symbol cannot be resolved. When %pK and %pS are used together, the masking provided by %pK can be voided if %pS prints the same address as a raw pointer. This creates a potential vector for leaking sensitive information, especially when kallsyms fails to resolve the symbol and %pS defaults to showing the raw address.

Kernel Print Formats

To better understand this issue, it’s essential to look at the various print formats available in the kernel. The Documentation/core-api/printk-formats.rst provides an in-depth guide to these formats.

Pointer Type Formats

The printk function offers a variety of powerful format specifiers for handling pointers, enabling developers to extract and display detailed information about kernel symbols, memory addresses, and resource ranges. Depending on the specifier, pointers passed to printk can be printed as raw addresses (%px), symbolic names with or without offsets (%pS, %ps), kernel or user memory strings (%pks, %pus), physical or DMA addresses (%pa[p], %pad), or even complex structures like resources (%pr) or ranges (%pra). Each of these formats is designed to provide flexibility and precision in debugging and introspection, often requiring integration with kernel features such as kallsyms or security mechanisms.

`%pS`: Symbolic Representation of Function Pointers

The %pS specifier is used to print the symbolic name of a function pointer, including the offsets. For example, it outputs function_name for a given pointer. This feature relies on kallsyms, a kernel mechanism for resolving symbols, which must be enabled at build time. If kallsyms is disabled, %pS falls back to printing the raw address, as symbolic resolution is unavailable. This makes %pS an invaluable tool for debugging, providing human-readable insights into function pointers, especially in backtraces or dynamic kernel environments.

`%pK`: Security-Conscious Printing of Kernel Pointers

The %pK specifier addresses the security implications of exposing kernel pointers. By default, it prints masked or hashed values (e.g., 00000000) unless the kptr_restrict sysctl parameter allows unrestricted access. This behavior is essential for protecting kernel memory layout information, particularly against exploits like kernel address space layout randomization (KASLR) bypasses. The interaction with the Linux Security Module (LSM) subsystem, such as SELinux, adds another dimension of control. When SELinux is active, additional access checks might apply, ensuring that %pK outputs are aligned with the system's security policy. For instance, even privileged users may encounter restricted pointer output if SELinux policies enforce strict controls.

Complexity Behind a Simple `printk`

While printk appears to be a simple logging tool, passing a pointer to it can invoke deeply integrated kernel features. Printing with %pS may involve symbol resolution and handling optional features like kallsyms, while %pK necessitates checks against security configurations and LSM policies. This intricate interplay between debugging utility and security subsystem demonstrates how printk transcends its apparent simplicity to become a critical component of kernel functionality and protection.

Real-World Scenarios: kmemleak and Module Loading

There are practical cases where the combined usage of %pK and %pS manifests. One such example is in kmemleak debugging messages. Kmemleak is a kernel memory leak detector that maintains a log of unreferenced memory allocations. A concrete example of this issue can be seen in kmemleak debugging messages when kptr_restrict is set to 1. In this configuration, %pK effectively masks the kernel addresses to prevent leaking sensitive information. However, if %pK and %pS are used together, the masking becomes ineffective. For instance:

unreferenced object 0xffff465a8eb90000 (size 2048): comm "insmod", pid 129, jiffies 4294953078 hex dump (first 32 bytes): 80 c0 5e 8e 5a 46 ff ff 01 00 00 00 62 00 3c 04 ..^.ZF......b.<. 00 00 00 00 00 00 00 00 1c 02 b9 8e 5a 46 ff ff ............ZF.. backtrace (crc 2f5e480d): [<0000000000000000>] kmemleak_alloc+0xb4/0xc4 [<0000000000000000>] __kmem_cache_alloc_node+0x23c/0x270 [<0000000000000000>] kmalloc_trace+0x3c/0x90 [<0000000000000000>] 0xffffac0d743b204c [<0000000000000000>] do_one_initcall+0x178/0xc90 [<0000000000000000>] do_init_module+0x1d8/0x63c [<0000000000000000>] load_module+0x10a0/0x1670 [<0000000000000000>] init_module_from_file+0xdc/0x130 [<0000000000000000>] idempotent_init_module+0x2d8/0x534 [<0000000000000000>] __arm64_sys_finit_module+0xb4/0x130 [<0000000000000000>] invoke_syscall.constprop.0+0xd8/0x1d4 [<0000000000000000>] do_el0_svc+0x158/0x1dc [<0000000000000000>] el0_svc+0x54/0x130 [<0000000000000000>] el0t_64_sync_handler+0x134/0x150 [<0000000000000000>] el0t_64_sync+0x17c/0x180

In this example, even not considering the first line, where the pointer is printed using %08lx, %pK masks the address on the lines in the backtrace, but %pS exposes it in the fourth line if the symbol cannot be resolved. The redundancy of %pK and %pS in the same line can undermine the intended security provided by %pK.

Is this case rare or what?

The line [<0000000000000000>] 0xffffac0d743b204c appears in the log when %pS is unable to resolve an address into a symbol, falling back to printing the raw address instead. This situation is not uncommon and in this case occurs because the address corresponds to a module's initialization function that allocated the memory, is marked with the __init attribute. Functions marked as __init are automatically discarded once their execution is complete, freeing up memory. As a result, kallsyms cannot resolve the symbol since it no longer exists in the kernel's symbol table, leading to the fallback output of the raw address.

Conclusions

Combining %pK and %pS in kernel messages might seem like a harmless redundancy at first glance. However, this practice can introduce vulnerabilities by inadvertently exposing sensitive kernel information. Understanding the nuances of kernel print formats and their appropriate usage is essential for developers to maintain both system security and effective debugging capabilities.

Thursday, July 24, 2025

So You Want to Know All the Files Used in a Linux Kernel Build?

My Original Genius Plan (That Didn’t Survive Reality)

You Need a Preprocessor, Not Just a Text Scanner

The Case of the Disappearing .d Files

Enter fixdep and the .cmd Files

So Are .cmd Better Than .d Files?

Ask Me How I Know

What Are Your Options Then?

So, Can I Use Dependency Files to List All Files Used in the Build?

What Can Be Actually Done

Here’s what

Friday, June 27, 2025

Bash Has PROMPT_COMMAND, BusyBox Doesn’t

But Wait... Is This Really Useful?

A Peek Under the Hood: Why getenv() Doesn’t Work

The Trick: Dependency Injection (Just Like the Cool Parts of BusyBox)

What This Patch Adds

The Patch

BusyBox contribution

Sunday, May 18, 2025

The Problem: Who Called Me?

The Compiler is Too Smart

My First Attempt: Macros and Label Addresses

Sneaky Anti-Inlining Techniques

Enter: The Function-Preserving Incantation

({ ... }) – GNU Statement Expression - Something I didn’t know about

Example:

Conclusion: Please Don’t Try This at Home (Unless You Must)

Monday, April 28, 2025

The Art of Smuggling Files Over Serial (Or: “How to Befriend Your Terminal”)

Terminal Magic: Why You Can’t Just “Send the Bytes, Bro”

send_console-ng: “Now With 50% More Reliability!”

Some Realistic Limits (And Why I’m Not Crying About It Yet)

Conclusion

Wednesday, March 26, 2025

The Setup: A Debugging Nightmare

Step One: Denial (And the Stack Overflow Void)

Step Two: Anger (aka Kernel Grep Hell)

Step Three: Bargaining (Can Two Processes Have the Same PID?)

Step Four: Acceptance (And Trolling Future Debuggers)

Final Thoughts (or, How to Break a Debugger’s Mind)

Monday, March 10, 2025

The Challenge of Less Common Architectures

The Role of Buildroot

Understanding Userspace Construction

Building an Initramfs

When Buildroot-like approach Falls Short

An Alternative Approach: Crosstool-NG and BusyBox

Conclusion

Saturday, March 1, 2025

The Beginning: A Mysterious Failure

The First Hypothesis: A Kernel Mishap?

The Second Hypothesis: A Cryptsetup Mismatch?

The Real Culprit: The Command Line Argument Order

The Final Barrier: Key Derivation Algorithm Mismatch

The Fix: Explicitly Specifying the Hash Algorithm (RIPEMD-160) and Mode

The Conclusion: Lessons Learned

Thursday, January 9, 2025

Introduction

Potential Information Leak from Combining %pK and %pS

Kernel Print Formats

Pointer Type Formats

%pS: Symbolic Representation of Function Pointers

%pK: Security-Conscious Printing of Kernel Pointers

Complexity Behind a Simple printk

Real-World Scenarios: kmemleak and Module Loading

Is this case rare or what?

Conclusions

The Case of the Disappearing `.d` Files

Enter `fixdep` and the `.cmd` Files

So Are `.cmd` Better Than `.d` Files?

Bash Has `PROMPT_COMMAND`, BusyBox Doesn’t

A Peek Under the Hood: Why `getenv()` Doesn’t Work

`({ ... })` – GNU Statement Expression - Something I didn’t know about

The Fix: Explicitly Specifying the Hash Algorithm (`RIPEMD-160`) and Mode

Potential Information Leak from Combining `%pK` and `%pS`

`%pS`: Symbolic Representation of Function Pointers

`%pK`: Security-Conscious Printing of Kernel Pointers

Complexity Behind a Simple `printk`