Friday, February 6, 2026

Cheri & Fil-C, are they the C memory-safe solution?

Looking at the past

Pointers Were Never Meant to Be Just Numbers

(And C Didn’t Kill That Idea… We Did)

Ask a C programmer what a pointer is, and you’ll almost certainly hear:

“It’s just a memory address.”

That statement is true in practice, but historically wrong.

The idea that a pointer is “just a number” is not a law of computing. It is the result of a long chain of economic and engineering decisions that happened to align in the 1970s and 1980s.

Before that, and increasingly again today, a pointer was understood as something richer: a reference with rules, a capability, a guarded object.

And crucially: C did not invent flat memory. It merely adapted itself to it extremely well.

Before C: When Pointers Carried Meaning

Early computer systems had no illusion that memory access was harmless.

Machines such as the Burroughs B5000, the Unisys 1100/2200 series, and later the Lisp Machines all treated pointers as structured entities:

  • bounds-checked
  • tagged
  • validated by hardware
  • often unforgeable

A pointer was not an integer you could increment freely. It was closer to a capability, a permission to access a specific object.

This wasn’t academic purity. It was a necessity:

  • multi-user systems
  • shared memory
  • batch scheduling
  • safety over speed

These machines enforced correctness by design.

C Did Not “Flatten” Memory… It Adapted to It

It’s tempting to say:

“C introduced flat memory and unsafe pointers.”

That’s not quite true.

C was designed on the PDP-11, a machine that already had:

  • a flat address space
  • no hardware memory tagging
  • no segmentation protection at the language level

C didn’t invent this model… It embraced it.

But here’s the key point that often gets missed:

C was explicitly designed to be portable across architectures with very different memory models.

And that includes machines that did not have flat memory.

C on Non-Flat Architectures: The Forgotten Chapter

C was successfully implemented on:

  • segmented machines
  • descriptor-based systems
  • capability-like architectures

Including Unisys systems, where pointers were not simple integers.

As documented in historical work (and summarized well in begriffs.com – “C Portability”), early C compilers adapted to the host architecture rather than forcing a universal memory model.

On Unisys systems:

  • pointers were implemented using descriptors
  • arithmetic was constrained
  • bounds and access rules were enforced by hardware
  • the compiler handled translation

This worked because the C standard never required pointers to be raw addresses.

It required:

  • comparability
  • dereferenceability
  • consistent behavior

Not bit-level identity.

Even Henry Rabinowitz warned, in Portable C, that assuming pointer arithmetic behaved like integer arithmetic was already non-portable, even in the late 1980s.

So what changed?

The Real Shift: Economics, Not Language Design

The shift didn’t come from C.

It came from:

  • cheap RAM
  • fast CPUs
  • simple pipelines
  • RISC philosophy
  • UNIX portability

Flat memory was faster to implement and easier to optimize.

Once x86 and similar architectures dominated, the hardware stopped enforcing:

  • bounds
  • provenance
  • validity

And since C mapped perfectly onto that model, it became the dominant systems language.

From that point on:

  • pointers became integers
  • safety became a software problem
  • memory bugs became a security industry

Not because C demanded it, but because the hardware no longer helped.

The Long Detour We Are Now Undoing

For decades, the industry tried to patch this with:

  • ASLR
  • stack canaries
  • DEP
  • sanitizers
  • fuzzers

All useful. None fundamental.

They treat symptoms, not causes.

Which brings us back, full circle, to the idea that started this story:

A pointer should carry metadata.

A Short Detour: Even x86 Wasn’t Always Flat

Before moving forward, it’s worth correcting one more common simplification.

Even the architecture most associated with “flat pointers”, x86, did not start that way.

In real mode, x86 used segmented addressing:

physical_address = segment × 16 + offset

This meant:

  • pointers were effectively split into two components
  • address calculation wasn’t trivial
  • different segments could overlap
  • the same physical memory could be referenced in multiple ways

It wasn’t a capability system, there were no bounds or permissions, but it was a reminder that pointer arithmetic was never universally “just an integer add.”

What changed wasn’t the hardware’s ability to support structure.

What changed was that:

  • segmentation was seen as inconvenient
  • flat addressing was faster
  • compilers and operating systems optimized for simplicity

By the time protected mode and later 64-bit mode arrived, segmentation had been mostly sidelined. The industry standardized on:

Flat memory + software discipline

That decision stuck.

And that’s the world CHERI and FIL-C are now challenging.

Are CHERI and FIL-C fixing the problem?

The Common Idea Behind CHERI and FIL-C

At first glance, CHERI and FIL-C look very different.

One changes the CPU. The other changes the compiler.

But conceptually, they start from exactly the same premise:

A pointer is not an address. A pointer is authority.

Everything else follows from that.

The Shared Heritage: Capability Thinking

Both CHERI and FIL-C descend from the same historical lineage:

  • Burroughs descriptors
  • Lisp machine object references
  • Capability-based operating systems
  • Hardware-enforced segmentation

The core idea is simple:

A program should only be able to access memory it was explicitly given access to.

That means a pointer must carry:

  • where it points
  • how far it can go
  • what it is allowed to do
  • whether it is still valid

In other words: metadata.

The only real disagreement between CHERI and FIL-C is where that metadata lives.

Cheri: Making Capabilities a Hardware Primitive

CHERI takes the most direct route possible.

It says:

“If pointers are capabilities, the CPU should understand them.”

So CHERI extends the architecture itself.

A CHERI pointer (capability) contains:

  • an address (cursor)
  • bounds
  • permissions
  • a validity tag

The tag is critical:

  • it is stored out-of-band
  • it cannot be forged
  • it is cleared automatically if memory is corrupted
  • the CPU refuses to dereference invalid capabilities

This means:

  • no buffer overflows
  • no out-of-bounds accesses
  • no forged pointers
  • no accidental privilege escalation

And all of this happens without software checks.

The hardware enforces it.

This is not “fat pointers” in the C++ sense. This is architectural memory safety.

Importantly, CHERI preserves C semantics:

  • pointers still look like pointers
  • code still compiles
  • performance is predictable

But the machine simply refuses to execute illegal memory operations.

It’s the return of the capability machine, this time built with modern CPUs, caches, and toolchains.

By design, CHERI enforces only what can reasonably belong to the instruction set, leaving higher-level memory semantics to software.

FIL-C: Capability Semantics Through Compiler and Runtime

FIL-C starts from the same premise:

“C pointers need metadata.”

But instead of changing the hardware, it changes the compiler and runtime.

This choice allows FIL-C to enforce stronger guarantees than CHERI, but at the cost of changing not only pointer representation, but also object lifetime and allocation semantics.

In FIL-C:

  • pointers become InvisiCaps
  • bounds are tracked invisibly
  • provenance is preserved
  • invalid accesses trap

From the programmer’s point of view:

  • it’s still C
  • code still compiles
  • the ABI mostly stays intact

From the runtime’s point of view:

  • every pointer has hidden structure
  • every access is validated
  • dangling pointers are detected

FIL-C and CHERI start from the same idea: pointers as capabilities, but deliberately apply it at very different semantic depths.

At this point, the similarity ends.

While CHERI limits itself to enforcing spatial safety and capability integrity, FIL-C necessarily goes further. In order to provide temporal safety, FIL-C must change the object lifetime model itself.

In FIL-C, deallocation is decoupled from object lifetime: freed objects are often quarantined, delayed, or kept alive by a garbage collector. Memory reuse is intentionally conservative, because temporal safety cannot coexist with eager reuse.

This is not an implementation choice but a semantic requirement, and it has consequences that go well beyond pointer representation.

The Reality Check: It’s a Runtime, Not Just a Compiler

It is important to distinguish FIL-C from a simple “safe compiler”. Because it enforces temporal safety via an object lifetime model, it must bypass standard allocators like glibc. This means the high-performance concurrency optimizations (arenas, caches) that developers expect are gone, replaced by a managed runtime.

Furthermore, because FIL-C requires this complex runtime support, it is currently a user-space tool. Using it to build, for example, a kernel like Linux is architecturally unfeasible.

For standard userspace applications, the trade-offs can be summarized as follows:

Aspect CHERI FIL-C
Enforcement Hardware Software
Pointer metadata In registers & memory In runtime structures
Performance 2%-5% overhead 50%–200% overhead
Deployment Requires new hardware Works today

The higher cost of FIL-C is not primarily due to pointer checks, but to the changes required in allocation, lifetime management, and runtime semantics to enforce temporal safety.

CHERI makes the CPU safe. FIL-C makes the language safe.

Same Idea, Two Execution Models

CHERI and FIL-C are not competitors. They are two implementations of the same philosophy.

They both assert that:

  • pointer provenance matters
  • bounds must be enforced
  • safety must be deterministic
  • memory errors are architectural, not stylistic

They differ only in where that logic lives.

You can think of it this way:

  • CHERI -> capabilities in silicon
  • FIL-C -> capabilities in software
  • MTE -> capabilities with probabilities

Different tradeoffs. Same destination.

Why This Matters Now: The ELISA Perspective

At first glance, CHERI and FIL-C may look like academic exercises or long-term research projects. But their relevance becomes much clearer when viewed through the lens of ELISA.

ELISA exists for a very specific reason: Linux is increasingly used in safety-critical systems.

That includes:

  • automotive controllers
  • industrial automation
  • medical devices
  • aerospace and avionics
  • robotics and energy infrastructure

And Linux, for all its strengths, is still fundamentally:

A large C codebase running on hardware that does not enforce memory safety.

The Core Tension ELISA Faces

ELISA’s mission is not to redesign Linux.

It is to:

  • make Linux usable in safety-critical contexts
  • support certification efforts (ISO 26262, IEC 61508, etc.)
  • improve predictability, traceability, and robustness
  • do this without rewriting the kernel

That creates a fundamental tension:

  • Linux is written in C
  • C assumes unsafe pointers
  • Safety standards assume bounded, analyzable behavior

Most current ELISA work focuses on:

  • process isolation
  • static analysis
  • restricted subsets of C
  • coding guidelines
  • runtime monitoring
  • testing and verification

All valuable. All necessary.

But none of them change the underlying truth:

The C memory model is still unsafe by construction.

Why Cheri and FIL-C Enter the Conversation

CHERI and FIL-C do not propose rewriting Linux.

They propose something more subtle:

Making the existing C code mean something safer.

This matters because they address a layer below where most safety work happens today.

Instead of asking:

  • “Did the developer write correct code?”

They ask:

  • “Can the machine even express an invalid access?”

That’s a fundamentally different approach.

CHERI in the ELISA Context

CHERI is interesting to safety engineers because:

  • It enforces memory safety in hardware
  • Violations become deterministic faults, not undefined behavior
  • It supports fine-grained compartmentalization
  • It aligns well with safety certification principles

But CHERI is also realistic about its scope:

  • It requires new hardware
  • It requires a new ABI
  • It is not something you “turn on” in existing systems

Which means:

Cheri is not a short-term solution for ELISA, but it is a reference model for what correct looks like.

It provides a concrete answer to the question: “What would a memory-safe Linux look like if we could redesign the hardware?”

FIL-C in the ELISA Context

FIL-C sits at the opposite end of the spectrum.

It:

  • runs on existing hardware
  • keeps the C language
  • enforces safety at runtime
  • integrates with current toolchains

This makes it immediately relevant as:

  • a verification tool
  • a debugging platform
  • a reference implementation of memory safety
  • a way to experiment with safety properties on real code

But it also comes with trade-offs:

  • performance overhead
  • increased memory usage
  • reliance on runtime checks

So again, not a drop-in replacement, but a valuable experimental lens.

The Direction Is Clear (Even If the Path Is Long)

The real contribution of CHERI and FIL-C is not that they make C safer by rewriting it.

It is that they show memory safety can be improved by changing the semantics of pointers, while leaving existing code largely untouched.

This distinction matters.

Large systems like Linux cannot realistically be rewritten. Their value lies in the fact that they already exist, have been validated in the field, and continue to evolve. Any approach that requires wholesale code changes, new languages, or a redesigned programming model is unlikely to be adopted.

CHERI and FIL-C take a different approach. They act below the source level:

  • redefining what a pointer is allowed to represent
  • enforcing additional semantics outside application logic
  • turning undefined behavior into deterministic failure

In doing so, they demonstrate that memory safety can be introduced beneath existing software, rather than imposed on top of it.

That insight is more important than either implementation.

It shows that the path forward for Linux safety does not necessarily run through rewriting code, but through reintroducing explicit authority, bounds, and permissions into the way memory is accessed, even if this is done incrementally and imperfectly.

Looking Forward

Neither CHERI nor FIL-C is something Linux will adopt tomorrow.

CHERI depends on hardware that is not yet widely available and will inevitably influence ABIs, compilers, and toolchains. FIL-C works on current hardware, but with overheads that limit its use to specific contexts.

What they offer is not an immediate solution, but a reference direction.

They suggest that meaningful improvements to Linux safety are possible if we focus on:

  • enforcing memory permissions more precisely
  • narrowing the authority granted by pointers
  • moving checks closer to the hardware or runtime
  • minimizing required changes to existing code

This leaves room for intermediate approaches: solutions that do not redefine the language, but instead use existing mechanisms, such as the MMU, permission models, and controlled changes to pointer usage, to incrementally reduce the scope of memory errors.

In that sense, CHERI and FIL-C are less about what to deploy next and more about what properties future solutions must have.

They help clarify the goal: make memory access explicit, bounded, and enforceable… without rewriting Linux to get there.

Sunday, January 25, 2026

When Clever Hardware Hacks Bite Back: A Password Keeper Device Autopsy

Or: how I built a USB password keeper that mostly worked, sometimes lied, and taught me more than any success ever did.

I recently found these project files buried in a folder titled “Never Again.” At first, I thought they didn’t deserve a blog post. Mostly because the device has a mind of its own, it works perfectly when I’m just showing it off, but reliably develops stage fright the moment I actually need to log in. This little monster made it all the way to revision 7 of the PCB. I finally decided to archive the project after adding a Schmitt trigger : the component that was mathematically, logically, and spiritually supposed to solve the debouncing issues and save the day.

Spoiler: it didn’t.

Instead of a revolutionary security device, I ended up with a zero-cost, high-frustration random number generator built from whatever was lying in my junk drawer. It occasionally types my password correctly, provided the moon is in the right phase and I don’t look at it too directly. And yet… here we are.

The Idea That Seemed Reasonable at the Time

A long time ago, when “password manager” still meant a text file named passwords.txt, I had what felt like a good idea:

Build a tiny device that types passwords for me.

No drivers. No software installation. Just plug it in, press a button, and it types the password like a keyboard. From a security point of view, it sounded brilliant:

  • The OS already trusts keyboards
  • No clipboard
  • No background process
  • No software attack surface If it only types, it can’t be hacked… right?

(Yes. That sentence aged badly.)

Constraints That Created the Monster

This was not a commercial project. This was a “use what’s on the desk” project.

So the constraints were self-inflicted:

  • MCU: ATtiny85 (cheap, tiny, limited)
  • Display: HD44780 (old, everywhere, slow)
  • USB: bitbanged (no hardware USB)
  • GPIOs: basically none
  • PCB: single-sided, etched at home
  • Budget: close to zero

The only thing I had plenty of was optimism.

Driving an LCD With One Pin (Yes, Really)

The first problem: The ATtiny85 simply does not have enough pins to drive an HD44780 display.

Even in 4-bit mode, the display wants more pins than I could spare. So I did what any reasonable person would do:

I multiplexed everything through one GPIO using RC timing.

By carefully choosing resistor and capacitor values, I could:

  • Encode clock, data, and select signals
  • Decode them on a 74HC595
  • Drive the display using time-based signaling

It worked. Mostly. But it was also:

  • Sensitive to temperature
  • Sensitive to component tolerances
  • Sensitive to how long the board had been powered on
  • Fundamentally analog pretending to be digital

Lesson #1:

If your protocol depends on analog behavior, you don’t really control it.

Abusing USB HID for Fun and (Almost) Profit

This is the part I still like the most.

The problem A USB

keyboard is input-only. You can’t send data to it.

So how do you update the password database?

The bad idea

Use the keyboard LEDs.

  • Caps Lock
  • Num Lock
  • Scroll Lock

They’re controlled by the host. And yes: you can read them from firmware.

The result

I implemented a synchronous serial protocol over HID LEDs.

  • Clocked
  • Deterministic
  • Host-driven
  • No timing guessing
  • No race conditions

And surprisingly: This was the most reliable part of the whole project. It was slow, sure. But passwords are small. And since the clock came from the host, it was rock solid.

Lesson #2:

The ugliest hack is sometimes the most reliable one.

The Part Nobody Warns You About: Scancodes

The update tool was a small Linux application that sent password data to the device.

Here’s the catch:

Keyboards don’t send ASCII. They send scancodes. And:

  • PS/2 scancodes ≠ USB HID scancodes
  • Layout matters
  • Locale matters
  • Shift state matters

So the database wasn’t a list of characters. It was a list of HID scancodes.

That means:

  • The device was layout-dependent
  • The database was architecture-dependent
  • Portability was not free

This is one of those details nobody tells you until you trip over it yourself.

Lesson #3:

Text is an illusion. Keyboards don’t speak ASCII.

The USB Problem I Couldn’t Outsmart

Now for the real failure. The ATtiny85 has no USB hardware.

So USB had to be:

  • Bitbanged
  • Cycle-perfect
  • Timed in software
  • Extremely sensitive to clock drift

Sometimes it worked. Sometimes it didn’t enumerate. Sometimes it worked once and never again. Sometimes it depended on the USB host.

This wasn’t a bug. This was physics.

Lesson #4:

USB is not forgiving, and bitbanging it is an act of optimism.

The Hardware (Yes, It Actually Exists)

Despite everything:

  • I built two physical units
  • I etched the PCB myself (single-sided)
  • I assembled them by hand
  • They worked... Most of the time.

I still have them. They still boot. Sometimes.


Repository Structure (For the Curious)

The project is split into three parts:

Repository Structure (For PCB & Schematics)

  • Single-layer board
  • Home-etched
  • All compromises visible
  • No hiding from physics

Host-Side Tool

  • Linux-based
  • Sends HID scancodes
  • Talks to the device via LED protocol
  • No ASCII anywhere

Firmware

  • Arduino-based
  • Third-party bootloader
  • USB bitbanging
  • Display driving
  • HID handling

What Actually Failed (and What Didn’t)

Failed

  • USB reliability
  • Display robustness
  • Timing assumptions
  • Environmental tolerance

Worked

  • HID LED protocol
  • Password logic
  • Conceptual design
  • Learning value

The irony

  • The part that looked insane... worked.
  • The part that looked standard... didn’t.

Wednesday, January 14, 2026

hc: an agentless, multi-tenant shell history sink (because you will forget that command)

For a long time, my daily workflow looked like this:
SSH into a server… do something clever… forget it… SSH into another server… regret everything.

I work in an environment where servers are borrowed from a pool. You get one, you use it, and sooner or later you give it back. This sounds efficient, but it creates a very specific kind of pain: every time I landed on a “new” machine, all my carefully crafted commands in the history were gone.

And of course, the command I needed was that one. The long one. The ugly one. The one that worked only once, three months ago, at 2 a.m.

A configuration management tool could probably handle this. In theory. But my reality is a bit messier.

The servers I use are usually borrowed, automatically installed, and destined to disappear again. I didn’t want to “improve” them by leaving behind shell glue and half-forgotten tweaks. Not because someone might reuse them, but because someone else would have to clean them up.

On top of that, many of these machines live behind VPNs that really don’t want to talk to the outside world or the collector living in my home lab. If SSH works, I’m happy. If it needs anything more than that, it’s already too much.

I wanted something different:

  • no agent
  • no permanent changes
  • no files left behind
  • no assumptions about the remote network

In short: leave no trace.

How hc was born

This is how hc (History Collector) started.

The very first version was a small netcat hack in 2023. It worked… barely. But the idea behind it was solid, so I kept iterating. Eventually, it grew into a proper Go service with a SQL backend… (Postgres for today)

The core idea of hc is simple:

The remote machine should not need to know anything about the collector.

No agent. No configuration file. No outbound connectivity.
Instead, the trick is an SSH reverse tunnel.

From my laptop, I open an SSH session like this:

  • a reverse tunnel exposes a local port on the remote machine
  • that port points back to my hc service
  • from the remote shell’s point of view, the collector is just 127.0.0.1

This was the “aha!” moment.

Because the destination is always localhost, the injected logging payload is always identical, no matter which server I connect to. The shell doesn’t know it’s talking to a central service… and it doesn’t care.


Injecting history without leaving scars

When I connect, I inject a small shell payload before starting the interactive session. This payload: - generates a session ID - defines helper functions - installs a PROMPT_COMMAND hook - forwards command history through the tunnel

Nothing is written to disk. When the SSH session ends, everything disappears.

A typical ingested line looks like this:

20240101.120305 - a1b2c3d4 - host.example.com [cwd=/root] > ls -la

This tells me:

  • when the command ran
  • from which host
  • in which directory
  • and what I actually typed

It turns out this is surprisingly useful when you manage many machines and your memory is… optimistic.

Minimal ingestion, flexible transport

hc is intentionally boring when it comes to ingestion… and I mean that as a compliment.

On the client side, it’s just standard Unix plumbing:

  • nc for plaintext logging on trusted networks
  • socat for TLS when you need encryption

No custom protocol, no magic framing. Just lines over a pipe.

This also makes debugging very easy. If something breaks, you can literally cat the traffic.

Multi-tenancy without leaking secrets

Security became more important as hc grew.

I wanted one collector, multiple users, and no accidental data mixing. hc supports:

  • TLS client certificates
  • API keys

For API keys, I chose a slightly unusual format:

]apikey[key.secret]

The server detects this pattern in memory, uses it to identify the tenant, and then removes it immediately. The stripped command is what gets stored, both in the database and in the append-only spool.

This way: - secrets never hit disk - grep output never leaks credentials - logs stay safe to share

Searching is a different problem (and that’s good)

Ingestion and retrieval are intentionally separate.

When I want to find a command, hc exposes a simple HTTP(S) GET endpoint. I deliberately chose GET instead of POST because it plays nicely with the Unix philosophy.

Example:

wget \ --header="Authorization: Bearer my_key" \ "https://hc.example.com/export?grep1=docker&color=always" \ -O - | grep prune

This feels natural. hc becomes just another tool in the pipeline.

Shell archaeology: BusyBox, ash, and PS1 tricks

Working on hc also sent me down some unexpected rabbit holes.

For example: BusyBox ash doesn’t support PROMPT_COMMAND. Last year, I shared a workaround on Hacker News that required patching the shell at source level.

Then a user named tyingq showed me something clever:
you can embed runtime-evaluated expressions inside PS1, like:

PS1="\$(date) $ "

That expression is executed every time the prompt is rendered.

I’m currently experimenting with this approach to replace my previous patching strategy. If it works well enough, hc moves one step closer to being truly zero-artifact on every shell.

Where to find it (and what’s next)

You can find the source code, and BusyBox research notes.

Right now, I’m working on:

  • a SQLite backend for single-user setups
  • more shell compatibility testing
  • better documentation around

injection payloads

If you have opinions about:

  • the ]apikey[ stripping logic
  • using PS1 for high-volume logging
  • or weird shells I should test next

…I’d genuinely love to hear them.

Sunday, September 14, 2025

Schrödinger’s test: The /dev/mem case

Why I Went Down This Rabbit Hole

Back in 1993, when Linux 0.99.14 was released, /dev/mem made perfect sense. Computers were simpler, physical memory was measured in megabytes, and security basically boiled down to: “Don’t run untrusted programs.”

Fast-forward to today. We have gigabytes (or terabytes!) of RAM, multi-layered virtualization, and strict security requirements… And /dev/mem is still here, quietly sitting in the kernel, practically unchanged… A fossil from a different era. It’s incredibly powerful, terrifyingly dangerous, and absolutely fascinating.

My work on /dev/mem is part of a bigger effort by the ELISA Architecture working group, whose mission is to improve Linux kernel documentation and testing. This project is a small pilot in a broader campaign: build tests for old, fundamental pieces of the kernel that everyone depends on but few dare to touch.

In a previous blog post, “When kernel comments get weird”, I dug into the /dev/mem source code and traced its history, uncovering quirky comments and code paths that date back decades. That post was about exploration. This one is about action: turning that historical understanding into concrete tests to verify that /dev/mem behaves correctly… Without crashing the very systems those tests run on.

What /dev/mem Is and Why It Matters

/dev/mem is a character device that exposes physical memory directly to userspace. Open it like a file, and you can read or write raw physical addresses: no page tables, no virtual memory abstractions, just the real thing.

Why is this powerful? Because it lets you:

  • Peek at firmware data structures,
  • Poke device registers directly,
  • Explore memory layouts normally hidden from userspace.

It’s like being handed the keys to the kingdom… and also a grenade, with the pin halfway pulled.

A single careless write to /dev/mem can:

  • Crash the kernel,
  • Corrupt hardware state,
  • Or make your computer behave like a very expensive paperweight.

For me, that danger is exactly why this project matters. Testing /dev/mem itself is tricky: the tests must prove the driver works, without accidentally nuking the machine they run on.

STRICT_DEVMEM and Real-Mode Legacy

One of the first landmines you encounter with /dev/mem is the kernel configuration option STRICT_DEVMEM.

Think of it as a global policy switch:

  • If disabled, /dev/mem lets privileged userspace access almost any physical address: kernel RAM, device registers, firmware areas, you name it.
  • If enabled, the kernel filters which physical ranges are accessible through /dev/mem. Typically, it only permits access to low legacy regions, like the first megabyte of memory where real-mode BIOS and firmware tables traditionally live, while blocking everything else.

Why does this matter? Some very old software, like emulators for DOS or BIOS tools, still expects to peek and poke those legacy addresses as if running on bare metal. STRICT_DEVMEM exists so those programs can still work: but without giving them carte blanche access to all memory.

So when you’re testing /dev/mem, the presence (or absence) of STRICT_DEVMEM completely changes what your test can do. With it disabled, /dev/mem is a wild west. With it enabled, only a small, carefully whitelisted subset of memory is exposed.

A Quick Note on Architecture Differences

While /dev/mem always exposes what the kernel considers physical memory, the definition of physical itself can differ across architectures. For example, on x86, physical addresses are the real hardware addresses. On aarch64 with virtualization or secure firmware, EL1 may only see a subset of memory through a translated view, controlled by EL2 or EL3.

The main function that the STRICT_DEVMEM kernel configuration option provides in Linux is to filter and restrict access to physical memory addresses via /dev/mem. It controls which physical address ranges can be legitimately accessed from userspace by helping implement architecture-specific rules to prevent unsafe or insecure memory accesses.

32-Bit Systems and the Mystery of High Memory

On most systems, the kernel needs a direct way to access physical memory. To make that fast, it keeps a linear mapping: a simple, one-to-one correspondence between physical addresses and a range of kernel virtual addresses. If the kernel wants to read physical address 0x00100000, it just uses a fixed offset, like PAGE_OFFSET + 0x00100000. Easy and efficient.

But there’s a catch on 32-bit kernels: The kernel’s entire virtual address space is only 4 GB, and it has to share that with userspace. By convention, 3 GB is given to userspace, and 1 GB is reserved for the kernel, which includes its linear mapping.

Now here comes the tricky part: Physical RAM can easily exceed 1 GB. The kernel can’t linearly map all of it: there just isn’t enough virtual address space.

The extra memory beyond the first gigabyte is called highmem (short for high memory). Unlike the low 1 GB, which is always mapped, highmem pages are mapped temporarily, on demand, whenever the kernel needs them.

Why this matters for /dev/mem: /dev/mem depends on the permanent linear mapping to expose physical addresses. Highmem pages aren’t permanently mapped, so /dev/mem simply cannot see them. If you try to read those addresses, you’ll get zeros or an error, not because /dev/mem is broken, but because that part of memory is literally invisible to it.

For testing, this introduces extra complexity:

  • Some reads may succeed on lowmem addresses but fail on highmem.
  • Behavior on a 32-bit machine with highmem is fundamentally different from a 64-bit system, where all RAM is flat-mapped and visible.

Highmem is a deep topic that deserves its own article, but even this quick overview is enough to understand why it complicates /dev/mem testing.

How Reads and Writes Actually Happen

A common misconception is that a single userspace read() or write() call maps to one atomic access to the underlaying block device. In reality, the VFS layer and the device driver may split your request into multiple chunks, depending on alignment and boundaries, in this case.

Why does this happen?

  • Many devices can only handle fixed-size or aligned operations.
  • For physical memory, the natural unit is a page (commonly 4 KB).

When your request crosses a page boundary, the kernel internally slices it into:

  1. A first piece up to the page boundary,
  2. Several full pages,
  3. A trailing partial page.

For /dev/mem, this is a crucial detail: A single read or write might look seamless from userspace, but under the hood it’s actually several smaller operations, each with its own state. If the driver mishandles even one of them, you could see skipped bytes, duplicated data, or mysterious corruption.

Understanding this behavior is key to writing meaningful tests.

Safely Reading and Writing Physical Memory

At this point, we know what /dev/mem is and why it’s both powerful and terrifying. Now we’ll move to the practical side: how to interact with it safely, without accidentally corrupting your machine or testing in meaningless ways.

My very first test implementation kept things simple:

  • Only small reads or writes,
  • Always staying within a single physical page,
  • Never crossing dangerous boundaries.

Even with these restrictions, /dev/mem testing turned out to be more like diffusing a bomb than flipping a switch.

Why “success” doesn’t mean success (in this very specific case)

Normally, when you call a syscall like read() or write(), you can safely assume the kernel did exactly what you asked. If read() returns a positive number, you trust that the data in your buffer matches the file’s contents. That’s the contract between userspace and the kernel, and it works beautifully in everyday programming.

But here’s the catch: We’re not just using /dev/mem; we’re testing whether /dev/mem itself works correctly.

This changes everything.

If my test reads from /dev/mem and fills a buffer with data, I can’t assume that data is correct:

  • Maybe the driver returned garbage,
  • Maybe it skipped a region or duplicated bytes,
  • Maybe it silently failed in the middle but still updated the counters.

The same goes for writes: A return code of “success” doesn’t guarantee the write went where it was supposed to, only that the driver finished running without errors.

So in this very specific context, “success” doesn’t mean success. I need independent ways to verify the result, because the thing I’m testing is the thing that would normally be trusted.

Finding safe places to test: /proc/iomem

Before even thinking about reading or writing physical memory, I need to answer one critical question:

“Which parts of physical memory are safe to touch?”

If I just pick a random address and start writing, I could:

  • Overwrite the kernel’s own code,
  • Corrupt a driver’s I/O-mapped memory,
  • Trash ACPI tables that the system kernel depends on,
  • Or bring the whole machine down in spectacular fashion.

This is where /proc/iomem comes to the rescue. It’s a text file that maps out how the physical address space is currently being used. Each line describes a range of physical addresses and what they’re assigned to.

Here’s a small example:

00000000-00000fff : Reserved 00001000-0009ffff : System RAM 000a0000-000fffff : Reserved 000a0000-000dffff : PCI Bus 0000:00 000c0000-000ce5ff : Video ROM 000f0000-000fffff : System ROM 00100000-09c3efff : System RAM 09c3f000-09ffffff : Reserved 0a000000-0a1fffff : System RAM 0a200000-0a20efff : ACPI Non-volatile Storage 0a20f000-0affffff : System RAM 0b000000-0b01ffff : Reserved 0b020000-b696efff : System RAM b696f000-b696ffff : Reserved b6970000-b88acfff : System RAM b88ad000-b9ff0fff : Reserved b9fd0000-b9fd3fff : MSFT0101:00 b9fd0000-b9fd3fff : MSFT0101:00 b9fd4000-b9fd7fff : MSFT0101:00 b9fd4000-b9fd7fff : MSFT0101:00 b9ff1000-ba150fff : ACPI Tables ba151000-bbc0afff : ACPI Non-volatile Storage bbc0b000-bcbfefff : Reserved bcbff000-bdffffff : System RAM be000000-bfffffff : Reserved

By parsing /proc/iomem, my test program can:

  1. Identify which physical regions are safe to work with (like RAM already allocated to my process),
  2. Avoid regions that are reserved for hardware or critical firmware,
  3. Adapt dynamically to different machines and configurations.

This is especially important for multi-architecture support. While examples here often look like x86 (because /dev/mem has a long history there), the concept of mapping I/O regions isn’t x86-specific. On ARM, RISC-V, or others, you’ll see different labels… But the principle remains exactly the same.

In short: /proc/iomem is your treasure map, and the first rule of treasure hunting is “don’t blow up the ship while digging for gold.”

The Problem of Contiguous Physical Pages

Up to this point, my work focused on single-page operations. I wasn’t hand-picking physical addresses or trying to be clever about where memory came from. Instead, the process was simple and safe:

  1. Allocate a buffer in userspace, using mmap() so it’s page-aligned,
  2. Touch the page to make sure the kernel really backs it with physical memory,
  3. Walk /proc/self/pagemap to trace which physical pages back the virtual address in the buffer.

This gives me full visibility into how my userspace memory maps to physical memory. Since the buffer was created through normal allocation, it’s mine to play with, there’s no risk of trampling over the kernel or other userspace processes.

This worked beautifully for basic tests:

  • Pick a single page in the buffer,
  • Run a tiny read/write cycle through /dev/mem,
  • Verify the result,
  • Nothing explodes.

But then came the next challenge: What if a read or write crosses a physical page boundary?

Why boundaries matter

The Linux VFS layer doesn’t treat a read or write syscall as one giant, indivisible action. Instead, it splits large operations into chunks, moving through pages one at a time.

For example:

  • I request 10 KB from /dev/mem,
  • The first 4 KB comes from physical page A,
  • The next 4 KB comes from physical page B,
  • The last 2 KB comes from physical page C.

If the driver mishandles the transition between pages, I’d never notice unless my test forces it to cross that boundary. It’s like testing a car by only driving in a straight line: Everything looks fine… Until you try to turn the wheel.

To properly test /dev/mem, I need a buffer backed by at least two physically contiguous pages. That way, a single read or write naturally crosses from one physical page into the next… exactly the kind of situation where subtle bugs might hide.

And that’s when the real nightmare began.

Why this is so difficult

At first, this seemed easy. I thought:

“How hard can it be? Just allocate a buffer big enough, like 128 KB, and somewhere inside it, there must be two contiguous physical pages.”

Ah, the sweet summer child optimism. The harsh truth: modern kernels actively work against this happening by accident. It’s not because the kernel hates me personally (though it sure felt like it). It’s because of its duty to prevent memory fragmentation.

When you call brk() or mmap(), the kernel:

  1. Uses a buddy allocator to manage blocks of physical pages,
  2. Actively spreads allocations apart to keep them tidy,
  3. Reserves contiguous ranges for things like hugepages or DMA.

From the kernel’s point of view:

  • This keeps the system stable,
  • Prevents large allocations from failing later,
  • And generally makes life good for everyone.

From my point of view? It’s like trying to find two matching socks in a dryer while it is drying them.

Playing the allocation lottery

My first approach was simple: keep trying until luck strikes.

  1. Allocate a 128 KB buffer,
  2. Walk /proc/self/pagemap to see where all pages landed physically,
  3. If no two contiguous pages are found, free it and try again.

Statistically, this should work eventually. In reality? After thousands of iterations, I’d still end up empty-handed. It felt like buying lottery tickets and never even winning a free one.

The kernel’s buddy allocator is very good at avoiding fragmentation. Two consecutive physical pages are far rarer than you’d think, and that’s by design.

Trying to confuse the allocator

Naturally, my next thought was:

“If the allocator is too clever, let’s mess with it!”

So I wrote a perturbation routine:

  • Allocate a pile of small blocks,
  • Touch them so they’re actually backed by physical pages,
  • Free them in random order to create “holes.”

The hope was to trick the allocator into giving me contiguous pages next time. The result? It sometimes worked, but unpredictably. 4k attempts gave me >80% success. Not reliable enough for a test suite where failures must mean a broken driver, not a grumpy kernel allocator.

The options I didn’t want

There are sure-fire ways to get contiguous pages:

  • Writing a kernel module and calling alloc_pages().
  • Using hugepages.
  • Configuring CMA regions at boot.

But all of these require special setup or kernel cooperation. My goal was a pure userspace test, so they were off the table.

A new perspective: software MMU

Finally, I relaxed my original requirement. Instead of demanding two pages that are both physically and virtually contiguous, I only needed them to be physically contiguous somewhere in the buffer.

From there, I could build a tiny software MMU:

  • Find a contiguous physical pair using /proc/self/pagemap,
  • Expose them through a simple linear interface,
  • Run the test as if they were virtually contiguous.

This doesn’t eliminate the challenge, but it makes it practical. No kernel hacks, no special boot setup, just a bit of clever user-space logic.

From Theory to Test Code

All this theory eventually turned into a real test tool, because staring at /proc/self/pagemap is fun… but only for a while. The test lives here:

github.com/alessandrocarminati/devmem_test

It’s currently packaged as a Buildroot module, which makes it easy to run on different kernels and architectures without messing up your main system. The long-term goal is to integrate it into the kernel’s selftests framework, so these checks can run as part of the regular Linux testing pipeline. For now, it’s a standalone sandbox where you can:

  • Experiment with /dev/mem safely (on a test machine!),
  • Play with /proc/self/pagemap and see how virtual pages map to physical memory,
  • Try out the software MMU idea without needing kernel modifications.

And expect it still work in progress.

Thursday, September 11, 2025

When Kernel Comments Get Weird: The Tale of read_mem

When Kernel Comments Get Weird: The Tale of drivers/char/mem.c

As part of the Elisa community, we spend a good chunk of our time spelunking through the Linux kernel codebase. It’s like code archeology: you don’t always find treasure, but you do find lots of comments left behind by developers from the ’90s that make you go, “Wait… really?”

One of the ideas we’ve been chasing is to make kernel comments a bit smarter: not only human-readable, but also machine-readable. Imagine comments that could be turned into tests, so they’re always checked against reality. Less “code poetry from 1993”, more “living documentation”.

Speaking of code poetry, here’s one gem we stumbled across in mem.c:

/* The memory devices use the full 32/64 bits of the offset, * and so we cannot check against negative addresses: they are ok. * The return value is weird, though, in that case (0). */

This beauty has been hanging around since Linux 0.99.14… back when Bill Clinton was still president-elect, “Mosaic” was the hot new browser, and PDP-11 was still produced and sold.

Back then, it made sense, and reflected exactley what the code did.

Fast-forward thirty years, and the comment still kind of applies… but mostly in obscure corners of the architecture zoo. On the CPUs people actually use every day?

$ cat lseek.asm BITS 64 %define SYS_read 0 %define SYS_write 1 %define SYS_open 2 %define SYS_lseek 8 %define SYS_exit 60 ; flags %define O_RDONLY 0 %define SEEK_SET 0 section .data path: db "/dev/mem",0 section .bss align 8 buf: resq 1 section .text global _start _start: mov rax, SYS_open lea rdi, [rel path] xor esi, esi xor edx, edx syscall mov r12, rax ; save fd in r12 mov rax, SYS_lseek mov rdi, r12 mov rsi, 0x8000000000000001 xor edx, edx syscall mov [rel buf], rax mov rax, SYS_write mov edi, 1 lea rsi, [rel buf] mov edx, 8 syscall mov rax, SYS_exit xor edi, edi syscall $ nasm -f elf64 lseek.asm -o lseek.o $ ld lseek.o -o lseek $ sudo ./lseek| hexdump -C 00000000 01 00 00 00 00 00 00 80 |........| 00000008 $ # this is not what I expect, let's double check $ sudo gdb ./lseek GNU gdb (Fedora Linux) 16.3-1.fc42 Copyright (C) 2024 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./lseek... (No debugging symbols found in ./lseek) (gdb) b _start Breakpoint 1 at 0x4000b0 (gdb) r Starting program: /tmp/lseek Breakpoint 1, 0x00000000004000b0 in _start () (gdb) x/30i $pc => 0x4000b0 <_start>: mov $0x2,%eax 0x4000b5 <_start+5>: lea 0xf44(%rip),%rdi # 0x401000 0x4000bc <_start+12>: xor %esi,%esi 0x4000be <_start+14>: xor %edx,%edx 0x4000c0 <_start+16>: syscall 0x4000c2 <_start+18>: mov %rax,%r12 0x4000c5 <_start+21>: mov $0x8,%eax 0x4000ca <_start+26>: mov %r12,%rdi 0x4000cd <_start+29>: movabs $0x8000000000000001,%rsi 0x4000d7 <_start+39>: xor %edx,%edx 0x4000d9 <_start+41>: syscall 0x4000db <_start+43>: mov %rax,0xf2e(%rip) # 0x401010 0x4000e2 <_start+50>: mov $0x1,%eax 0x4000e7 <_start+55>: mov $0x1,%edi 0x4000ec <_start+60>: lea 0xf1d(%rip),%rsi # 0x401010 0x4000f3 <_start+67>: mov $0x8,%edx 0x4000f8 <_start+72>: syscall 0x4000fa <_start+74>: mov $0x3c,%eax 0x4000ff <_start+79>: xor %edi,%edi 0x400101 <_start+81>: syscall 0x400103: add %al,(%rax) 0x400105: add %al,(%rax) 0x400107: add %al,(%rax) 0x400109: add %al,(%rax) 0x40010b: add %al,(%rax) 0x40010d: add %al,(%rax) 0x40010f: add %al,(%rax) 0x400111: add %al,(%rax) 0x400113: add %al,(%rax) 0x400115: add %al,(%rax) (gdb) b *0x4000c2 Breakpoint 2 at 0x4000c2 (gdb) b *0x4000db Breakpoint 3 at 0x4000db (gdb) c Continuing. Breakpoint 2, 0x00000000004000c2 in _start () (gdb) i r rax 0x3 3 rbx 0x0 0 rcx 0x4000c2 4194498 rdx 0x0 0 rsi 0x0 0 rdi 0x401000 4198400 rbp 0x0 0x0 rsp 0x7fffffffe3a0 0x7fffffffe3a0 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x246 582 r12 0x0 0 r13 0x0 0 r14 0x0 0 r15 0x0 0 rip 0x4000c2 0x4000c2 <_start+18> eflags 0x246 [ PF ZF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fs_base 0x0 0 gs_base 0x0 0 (gdb) # fd is just fine rax=3 as expected. (gdb) c Continuing. Breakpoint 3, 0x00000000004000db in _start () (gdb) i r rax 0x8000000000000001 -9223372036854775807 rbx 0x0 0 rcx 0x4000db 4194523 rdx 0x0 0 rsi 0x8000000000000001 -9223372036854775807 rdi 0x3 3 rbp 0x0 0x0 rsp 0x7fffffffe3a0 0x7fffffffe3a0 r8 0x0 0 r9 0x0 0 r10 0x0 0 r11 0x246 582 r12 0x3 3 r13 0x0 0 r14 0x0 0 r15 0x0 0 rip 0x4000db 0x4000db <_start+43> eflags 0x246 [ PF ZF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fs_base 0x0 0 gs_base 0x0 0 (gdb) # According to that comment, rax should have been 0, but it is not. (gdb) c Continuing. �[Inferior 1 (process 186746) exited normally] (gdb)

Not so much. Seeking at 0x8000000000000001… Returns 0x8000000000000001 not 0 as anticipated in the comment. We’re basically facing the kernel version of that “Under Construction” GIF on websites from the 90s, still there, but mostly just nostalgic decoration now.

The Mysterious Line in read_mem

Let’s zoom in on one particular bit of code in read_mem:

phys_addr_t p = *ppos; /* ... other code ... */ if (p != *ppos) return 0;

At first glance, this looks like a no-op; why would p be different from *ppos when you just copied it? It’s like testing if gravity still works by dropping your phone… spoiler: it does.

But as usual with kernel code, the weirdness has a reason.

The Problem: Truncation on 32-bit Systems

Here’s what’s going on:

  • *ppos is a loff_t, which is a 64-bit signed integer.
  • p is a phys_addr_t, which holds a physical address.

On a 64-bit system, both are 64 bits wide. Assignment is clean, the check always fails (and compilers just toss it out).

But on a 32-bit system, phys_addr_t is only 32 bits. Assign a big 64-bit offset to it, and boom, the top half vanishes. Truncated, like your favorite TV series canceled after season 1.

That if (p != *ppos) check is the safety net. It spots when truncation happens and bails out early, instead of letting some unlucky app read from la-la land.

Assembly Time: 64-bit vs. 32-bit

On 64-bit builds (say, AArch64), the compiler optimizes away the check.

┌ 736: sym.read_mem (int64_t arg2, int64_t arg3, int64_t arg4); │ `- args(x1, x2, x3) vars(13:sp[0x8..0x70]) │ 0x08000b10 1f2003d5 nop │ 0x08000b14 1f2003d5 nop │ 0x08000b18 3f2303d5 paciasp │ 0x08000b1c fd7bb9a9 stp x29, x30, [sp, -0x70]! │ 0x08000b20 fd030091 mov x29, sp │ 0x08000b24 f35301a9 stp x19, x20, [var_10h] │ 0x08000b28 f40301aa mov x20, x1 │ 0x08000b2c f55b02a9 stp x21, x22, [var_20h] │ 0x08000b30 f30302aa mov x19, x2 │ 0x08000b34 750040f9 ldr x21, [x3] │ 0x08000b38 e10302aa mov x1, x2 │ 0x08000b3c e33700f9 str x3, [var_68h] ; phys_addr_t p = *ppos; │ 0x08000b40 e00315aa mov x0, x21 │ 0x08000b44 00000094 bl valid_phys_addr_range │ ┌─< 0x08000b48 40150034 cbz w0, 0x8000df0 ;if (!valid_phys_addr_range(p, count)) │ │ 0x08000b4c 00000090 adrp x0, segment.ehdr │ │ 0x08000b50 020082d2 mov x2, 0x1000 │ │ 0x08000b54 000040f9 ldr x0, [x0] │ │ 0x08000b58 01988152 mov w1, 0xcc0 │ │ 0x08000b5c f76303a9 stp x23, x24, [var_30h] [...]

Nothing to see here, move along. But on 32-bit builds (like old-school i386), the check shows up loud and proud in the assembly.

┌ 392: sym.read_mem (int32_t arg_8h); │ `- args(sp[0x4..0x4]) vars(5:sp[0x14..0x24]) │ 0x080003e0 55 push ebp │ 0x080003e1 89e5 mov ebp, esp │ 0x080003e3 57 push edi │ 0x080003e4 56 push esi │ 0x080003e5 53 push ebx │ 0x080003e6 83ec14 sub esp, 0x14 │ 0x080003e9 8955f0 mov dword [var_10h], edx │ 0x080003ec 8b5d08 mov ebx, dword [arg_8h] │ 0x080003ef c745ec0000.. mov dword [var_14h], 0 │ 0x080003f6 8b4304 mov eax, dword [ebx + 4] │ 0x080003f9 8b33 mov esi, dword [ebx] ; phys_addr_t p = *ppos; │ 0x080003fb 85c0 test eax, eax │ ┌─< 0x080003fd 7411 je 0x8000410 ; if (!valid_phys_addr_range(p, count)) │ ┌┌──> 0x080003ff 8b45ec mov eax, dword [var_14h] │ ╎╎│ 0x08000402 83c414 add esp, 0x14 │ ╎╎│ 0x08000405 5b pop ebx │ ╎╎│ 0x08000406 5e pop esi │ ╎╎│ 0x08000407 5f pop edi │ ╎╎│ 0x08000408 5d pop ebp │ ╎╎│ 0x08000409 c3 ret [...]

The CPU literally does a compare-and-jump to enforce it. So yes, this is a real guard, not some leftover fluff.

Return Value Oddities

Now, here’s where things get even funnier. If the check fails in read_mem, the function returns 0. That’s “no bytes read”, which in file I/O land is totally fine.

But in the twin function write_mem, the same situation returns -EFAULT. That’s kernel-speak for “Nope, invalid address, stop poking me”.

So, reading from a bad address? You get a polite shrug. Writing to it? You get a slap on the wrist. Fair enough, writing garbage into memory is way more dangerous than failing to read it. Come on, probably here we need to fix things up… Do we really?

Why does read_mem return 0 instead of an error?

This behavior comes straight from Unix I/O tradition.

In user space, tools like dd expect a read() call to return 0 to mean “end of file”. They loop until that happens and then exit cleanly.

Returning an error code instead would break that pattern and confuse programs that treat /dev/mem like a regular file. In other words, read_mem is playing nice with existing utilities: 0 here doesn’t mean “nothing went wrong”, it means “nothing left to read.”

Wrapping It Up

This little dive shows how a single “weird” line of code carries decades of context, architecture quirks, type definitions, and evolving assumptions. It also shows why comments like the one from 0.99.14 are dangerous: they freeze a moment in time, but reality keeps moving.

Our mission in Elisa Architecture WG is to bring comments back to life: keep them up-to-date, tie them to tests, and make sure they still tell the truth. Because otherwise, thirty years later, we’re all squinting at a line saying “the return value is weird though” and wondering if the developer was talking about code… or just their day.

And now, a brief word from our sponsors (a.k.a. me in a different hat): When I’m not digging up ancient kernel comments with the Architecture WG, I’m also leading the Linux Features for Safety-Critical Systems (LFSCS) WG. We’re cooking up some pretty exciting stuff there too.

So if you enjoy the kind of archaeology/renovation work we’re doing there, come check out LFSCS as well: same Linux, different adventure.

Sunday, August 31, 2025

Confessions of a Nano User: Tabs, Spaces, and the Forbidden Love of OSC 52

“Hi, my name is Alessandro, and… I use nano.”

There. I said it. After years of quietly pressing Ctrl+X, answering “Yes” to save, and living in fear of the VIM and EMACS inquisitors, I’ve finally come out. While the big kids fight eternal holy wars over modal editing and Lisp extensibility, some of us took the small editor from GNU and just… got work done. Don’t judge.

But even within our tiny community of nano users, we are not free of pain. Our cross to bear is called… tabs.

The mystery: why my tabs turned into spaces

The story begins innocently: I opened a file in nano, full of perfectly fine tab characters. But then, the moment I dared to use my mouse to copy some text from the terminal window… BAM! My tabs were gone. Replaced by spaces.

It didn’t matter if I used KDE Konsole or GNOME Terminal, the effect was the same: mouse copy -> spaces. I was betrayed.

Meanwhile, if I ran cat file.txt and selected text with the mouse, the tabs survived. It was as if the gods of whitespace personally mocked me.

First hypothesis: nano must be guilty!

Naturally, my first instinct was to point fingers at nano itself. After all, nano has options like tabsize and tabstospaces. Maybe nano secretly converts tabs into spaces when rendering text? Maybe I’d been living a lie, editing “tabs” that were never really tabs?

I started investigating, even hex-dumping what nano sends to the terminal. Made a file containing only a tab and a blank. What I found was not 09 (tab) bytes at all, but ANSI escape sequences like:

09 20 # what you'd expect for <TAB><SPACE> vs 1b 5b 33 38 3b 35 3b 32 30 38 6d 20 20 20 20 20 20 20 20 # The <TAB> 1b 5b 33 39 6d 1b 28 42 1b 5b 6d 20 # The <SPACE>

That, dear reader, is ncurses at work.

The real culprit: ncurses, the decorator

Nano is innocent, it loves tabs just as much as I do. The real problem is ncurses, the library nano uses to paint text on the screen.

ncurses doesn’t just pass \t straight to the terminal. Instead, it calculates how many spaces are needed to reach the next tab stop and paints that many literal spaces, usually wrapped in shiny SGR sequences (color codes).

So when your terminal emulator builds its screen buffer, all it sees are decorated blanks. And when you drag your mouse to copy… guess what you get? Spaces. Always spaces.

Meanwhile, cat writes literal \t to the terminal, and some emulators (notably VTE-based ones like GNOME Terminal used to, though mileage varies) preserve that information for mouse copy. That’s why cat behaves “correctly” and nano doesn’t.

So yes: the real villain in this love story is not nano, but ncurses… The overzealous decorator.

Escape plan: bypass the screen, go straight to clipboard

If the terminal screen can’t be trusted, we need another path. Enter: OSC 52.

OSC 52 is an ANSI escape sequence that lets a program say:

“Hey terminal, please put this base64-encoded text directly into the system clipboard.”

Example:

printf '\033]52;c;%s\a' "$(printf 'Hello\tWorld' | base64 -w0)"

Paste somewhere else -> boom, you get Hello<TAB>World.

This bypasses ncurses, bypasses the screen, bypasses mouse selection entirely. The text, tabs and all, travels straight into your clipboard.

Limitations: it’s not all sunshine and rainbows

  • Terminal support: Only terminals that implement OSC 52 can do this. xterm, iTerm2, Alacritty, recent Konsole are good. VTE-based terminals (GNOME Terminal, Tilix, etc.)… nope, they deliberately don’t support OSC 52 (for “security reasons”).
  • Buffer size: Many implementations cap OSC 52 payloads at ~100 KB. Big selections won’t copy entirely.
  • Security paranoia: Some distros disable it, since malicious programs could silently overwrite your clipboard. (But honestly, what’s worse: malware, or spaces where you wanted tabs?)

My dream: nano with native OSC 52 support

Right now, the only workarounds are… well, kind of clumsy:

  • Write the buffer (or a marked region) out to a pipe using ^O | osc52-copy.
  • Or just step outside nano entirely and run cat file | osc52-copy.

But there’s no way in nano today to say “when I press ^K, also shove this into the clipboard”. nano simply doesn’t have a hook for that.

That’s why my dream is to add a proper set osc52 option. With it enabled, nano would take whatever you cut (or marked) and send it straight to the terminal clipboard using OSC 52. Ideally, it would be optional, nobody wants to suddenly discover nano has hijacked their clipboard without asking, or just play with the multitude of clipboard Linux users has: system, primary, application…

Epilogue

So here I stand, a proud but slightly broken nano user, with tabs that keep turning into spaces when I least expect it. I’ve learned the truth: it’s not nano’s fault, but ncurses. I’ve found salvation in OSC 52, though only if my terminal plays along.

And who knows, maybe one day there’ll be a tiny patch upstream, and nano will finally get to shout “COPY WITH TABS!” directly into our clipboards. Until then… I’ll try to refine my proposal to make this ocs52 goal near.

It still lacks the feature to make it optional, but for the time being it demonstrate at least a possible approach.

Stay tuned

diff --git a/src/cut.c b/src/cut.c index a2d4aecf..c9f12d86 100644 --- a/src/cut.c +++ b/src/cut.c @@ -24,6 +24,7 @@ #include <string.h> +#define MAX_OSC52_BUF 655536 /* Delete the character at the current position, and * add or update an undo item for the given action. */ void expunge(undo_type action) @@ -249,6 +250,54 @@ void chop_next_word(void) } #endif /* !NANO_TINY */ +void osc52(void) { + static const char b64[] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; + + size_t cap = MAX_OSC52_BUF; + unsigned char *buf = malloc(cap); + if (!buf) return; + + linestruct *current = cutbuffer; + int pos=0; +// while (current->next != NULL) { + while (current != NULL) { + int l = strlen(current->data); + if (pos + l > MAX_OSC52_BUF) break; //osc52 has a recomanded length < 100k Feel appropriate to restrict to 64k + memcpy(buf + pos, current->data, l); + pos+=l; + if (current->next) *(buf + pos++)='\n'; + current = current->next; + } + + printf("\033]52;c;"); + + for (size_t i = 0; i < pos; i += 3) { + unsigned int octet_a = i < pos ? buf[i] : 0; + unsigned int octet_b = (i+1) < pos ? buf[i+1] : 0; + unsigned int octet_c = (i+2) < pos ? buf[i+2] : 0; + + unsigned int triple = (octet_a << 16) | (octet_b << 8) | octet_c; + + putchar(b64[(triple >> 18) & 0x3F]); + putchar(b64[(triple >> 12) & 0x3F]); + if ((i+1) < pos) + putchar(b64[(triple >> 6) & 0x3F]); + else + putchar('='); + if ((i+2) < pos) + putchar(b64[triple & 0x3F]); + else + putchar('='); + } + + putchar('\a'); + fflush(stdout); + + free(buf); + return; +} + /* Excise the text between the given two points and add it to the cutbuffer. */ void extract_segment(linestruct *top, size_t top_x, linestruct *bot, size_t bot_x) { @@ -365,6 +414,7 @@ void extract_segment(linestruct *top, size_t top_x, linestruct *bot, size_t bot_ /* If the text doesn't end with a newline, and it should, add one. */ if (!ISSET(NO_NEWLINES) && openfile->filebot->data[0] != '\0') new_magicline(); + osc52(); } /* Meld the buffer that starts at topline into the current file buffer

UPDATE1: On Sep 1, 2025, I sent a regular patch to the nano maintainers; let's see what happens.

UPDATE2: After I sent the patch, Benno (nano’s maintainer) gently reminded me that I hadn’t done my homework well. Turns out I wrongly accused poor ncurses of tab treachery, but the real culprit is none other than nano itself. Yes, my beloved editor is the true tab betrayer! (See src/winioc.c:1872). Still, love is blind: I don’t care, I’m sticking with nano. At the end of the day nano or ncurses, it doesn't really matters: if you try to select and take the text on nano, you won't have your tabs, so I still have reasons for submiting. And since my patch hasn’t been rejected yet, there's still chances the OCS52 can hit nano’s codebase.