Saturday, March 1, 2025

The Tale of the Stubborn Cipher: A Debugging Saga

I’m a Red Hat guy since a while now, but lurking in my lab was a traitor: an old Ubuntu 20.04 machine still doing all my heavy lifting, mostly building kernels. Why? Because back in the day, before I saw the light of Red Hat, I was under the false impression that Fedora wasn’t great for cross-compilation. Turns out, I was dead wrong.

Fast-forward to today, and I need to work on LoongArch. Guess what? Ubuntu didn’t have the cross-compilation bits I needed… but Fedora did. Finally, I had a valid excuse to invest time, nuke that relic and bring it into the Fedora family.

But of course, nothing is ever that simple. See, I had an old encrypted disk on that setup, storing past projects, ancient treasures, and probably a few embarrassing bash scripts. No worries! I’ll just re-run the cryptsetup command on Fedora, and boom, I’m in…

Right?

Oh, how naive I was.

The Beginning: A Mysterious Failure

It all started with a simple goal: to mount an encrypted partition using AES-ESSIV:SHA256 on Fedora. The same setup had worked perfectly on an old setup for years, but now, Fedora refused to cooperate.

The process seemed straight-forward: cryptsetup creates the mapping, but...

$ sudo cryptsetup create --cipher aes-cbc-essiv --key-size 256 --hash sha256 pippo /dev/sdb1 Enter passphrase for /dev/sdb1: device-mapper: reload ioctl on pippo (253:2) failed: Invalid argument $

A classic error message, vague and infuriating. This called for serious debugging.

The First Hypothesis: A Kernel Mishap?

I’m a kernel developer, and you know what they say: when all you have is a hammer, everything looks like a kernel bug. So, naturally, my first suspicion landed straight on the Linux kernel. Because, let’s be honest, if something’s broken, it’s probably the kernel’s fault… right? So, ESSIV wasn’t appearing in /proc/crypto, the first suspicion was a kernel issue. Fedora is known for enforcing modern cryptographic policies, and legacy algorithms are often disabled by default.

To investigate, the essiv.ko module source was examined. It turns out that crypto_register_template(&essiv_tmpl); does not immediately appear in /proc/crypto. Instead, /proc/crypto reflects the state of crypto_alg_list, which only updates after the first use of an algorithm.

So while I was staring at /proc/crypto, expecting to see ESSIV magically appear, I was actually just looking at a list of algorithms that had already been used, not the registered templates. The kernel wasn’t necessarily broken: just playing hard to get.

I needed to be sure the upstream kernel code I was looking at was exactly the same running on my machine. Fedora typically does not modify the upstream code, but I needed a confirmation. Rather than hunting down Fedora’s kernel source repository, the decision was made to compare binary modules from Fedora and upstream Linux, but…

Ah, binary reproducibility… A dream everyone chases but few actually catch. The idea of building a kernel module and getting the exact same binary sounds simple, but in reality, it’s like trying to bake the same cake twice without measuring anything.

What can make binaries different? The obvious culprit is the code itself, but that’s just the start. Data embedded in the binary can also change things. Compiler versions and plugins play a role… If the same source code gets translated differently, you’ll end up with different binaries, no matter how pure your intentions. Then come the non-code factors. A kernel module is an ELF container, and ELF files carry metadata: timestamps, cryptographic signatures, and other bits that make your module unique (and sometimes annoying to compare). Even the flags that mark a module as Out-Of-Tree can introduce differences.

So, when doing a binary comparison, it’s not just a matter of checking if the bytes match... you have to strip out the noise and focus on the meaningful differences. Here’s what I did

$ objcopy -O binary --only-section=.text essiv.us.ko essiv.us.bin $ objcopy -O binary --only-section=.text essiv.fedora.ko essiv.fedora.bin $ cmp essiv.us.bin essiv.fedora.bin

And I was lucky, the result was bitwise identical. No funny business in the kernel. Time to look elsewhere.

The Second Hypothesis: A Cryptsetup Mismatch?

To check if Fedora’s cryptsetup was behaving differently, the same encryption command was run on both the old machine and Fedora:

sudo cryptsetup -v create pippo --cipher aes-cbc-essiv:sha256 --key-size 256 /dev/sdb1
  • On the old machine, this worked fine, and the partition mounted successfully.
  • On Fedora, it created the mapping but refused to mount.

The Real Culprit: The Command Line Argument Order

At this point, every possible difference between the Ubuntu and Fedora commands was scrutinized.

And then, the discovery:

cryptsetup create --cipher aes-cbc-essiv:sha256 --key-size 256 --hash sha256 pippo /dev/sdb1

vs.

cryptsetup create pippo --cipher aes-cbc-essiv:sha256 --key-size 256 --hash sha256 /dev/sdb1

The first one fails mysteriously (without any syntax error). The second one works not throws error.

$ sudo cryptsetup create --cipher aes-cbc-essiv --key-size 256 --hash sha256 pippo /dev/sdb1 Enter passphrase for /dev/sdb1: device-mapper: reload ioctl on pippo (253:2) failed: Invalid argument

Why? Because cryptsetup’s argument parser behaves differently depending on argument order. The correct order is:

cryptsetup create <name> <options> <device>

When the name (pippo) is placed before the options, everything just works. But if options come first, something breaks silently.

The Final Barrier: Key Derivation Algorithm Mismatch

With the argument order fixed, one final verification was done, the command now does not fail, but the filesystem still not mounts. Looking at visible crypto parameters, everything looked fine, but it was not.

$ sudo dmsetup table pippo

On Fedora, it returned:

0 3907026944 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 8:17 0

On old machine, it returned:

0 3907026944 crypt aes-cbc-essiv:sha256 0000000000000000000000000000000000000000000000000000000000000000 0 8:97 0

Identical! This meant that the same crypto algorithm was being used, and I was providing the same passphrase. So, in theory, everything should have been correct.

And yet… mounting still failed.

The log only confirmed that the same encryption algorithm was in play; it didn’t prove that the same key was actually being used. Since the key is derived from the passphrase, hashing algorithm, and other parameters.

A final comparison of the cryptsetup debug logs revealed the culprit:

Even though both systems used the same hashing algorithm (aes-cbc-essiv:sha256), they used different passphrase-to-key derivation methods internally. Fedora’s version of cryptsetup was not deriving the same encryption key.

The Fix: Explicitly Specifying the Hash Algorithm (RIPEMD-160) and Mode

The final working command had to ensure that Fedora derived the key exactly like the old machine:

$ sudo cryptsetup create pippo --cipher aes-cbc-essiv:sha256 --key-size 256 --hash ripemd160 --type plain /dev/sda1

And finally:

$ sudo mount /dev/mapper/pippo /mnt/0 $ ls /mnt/0

Success! The partition mounted perfectly.

The Conclusion: Lessons Learned

  1. Look things carefully before blaming the kernel.
  2. Cryptographic defaults change across cryptsetup versions: be explicit!
  3. The order of command-line arguments in cryptsetup matters.
  4. Compare dmsetup table outputs is not just enough.
  5. Key derivation methods can differ, and it is not evident!

After all the deep dives into kernel modules, crypto policies, and hashing algorithms, the entire issue boiled down to two things:

  1. Wrong argument order in cryptsetup
  2. Key derivation differences between cryptosetup versions.

A truly fitting end to a classic Linux troubleshooting adventure.

No comments:

Post a Comment