Introduction
In the world of Linux kernel development, one often encounters intriguing anomalies that spark curiosity and investigation. My journey into exploring such peculiarities began with a previous deep dive into duplicate symbols within the Linux kernel. This exploration revealed fascinating insights into how certain symbols names, appears multiple times having different addresses. It was fun to discover that among multiple different addresses having the same name, there were also actual duplicates of the same function (name and body), even thought, the majority of those symbols having the same name were actually different objects. Building on that foundation, my current investigation delves into another set of mysterious symbols, those that appear to be aliases for given addresses in the kernel (multiple names for the same address), but whose origins are not immediately obvious. Their presence had significant consequences in my new effort. I'm currently adding a new feature to ks-nav, a nifty tool that generates diagrams from the kernel binary image. The goal is to provide kernel analysts with valuable insights into the kernel code, because who doesn't love a good kernel investigation? The tool already produces call tree diagrams and visualize subsystem interactions triggered by specific functions. My latest endeavor? To add functionality that reveals how global variables are used and shared among functions. The topic of this blog post springs from analyzing the output of this tool. Here's an image produced by investigating the global symbols shared starting from the functionhugetlb_vma_lock_alloc
.
The Problem of Macro Expansion and Symbol Aliasing
Unlike the previous investigation where symbols were straightforward duplicates, the issue at hand now involves a more complex phenomenon stemming from macro expansion. The process of macro expansion in the kernel can result in multiple symbols being generated with the same name, even though, each of these are actually different variables in memory. You can have the same phenomenon originate by compiler multiple mangling of the code such as inlining, or macro expansion, but when it happens, to allow the compiler to manage these same name symbols as different, the compiler must transform these names to allow it to differentiate. In practical terms, this just means that the compiler appends numbers to the identifier name to produce a new unique identifier. A simple example can clarify this:$ cat h.c #includeint pippo(int i){ static int paperino; if (i>=0) paperino=i; return paperino; } int pluto(int i){ static int paperino; if (i>=0) paperino=i; return paperino; } int main(){ printf("paperino= %d\n", pippo(55) ); printf("paperino= %d\n", pippo(-1) ); printf("paperino= %d\n", pluto(99) ); printf("paperino= %d\n", pluto(-1) ); } $ gcc -g h.c -o h $ ./h paperino= 55 paperino= 55 paperino= 99 paperino= 99 $ nm -n h w __cxa_finalize@@GLIBC_2.2.5 w __gmon_start__ w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable U __libc_start_main@@GLIBC_2.2.5 U printf@@GLIBC_2.2.5 0000000000001000 t _init 0000000000001060 T _start 0000000000001090 t deregister_tm_clones 00000000000010c0 t register_tm_clones 0000000000001100 t __do_global_dtors_aux 0000000000001140 t frame_dummy 0000000000001149 T pippo 000000000000116b T pluto 000000000000118d T main 0000000000001210 T __libc_csu_init 0000000000001280 T __libc_csu_fini 0000000000001288 T _fini 0000000000002000 R _IO_stdin_used 0000000000002014 r __GNU_EH_FRAME_HDR 00000000000021ac r __FRAME_END__ 0000000000003db8 d __frame_dummy_init_array_entry 0000000000003db8 d __init_array_start 0000000000003dc0 d __do_global_dtors_aux_fini_array_entry 0000000000003dc0 d __init_array_end 0000000000003dc8 d _DYNAMIC 0000000000003fb8 d _GLOBAL_OFFSET_TABLE_ 0000000000004000 D __data_start 0000000000004000 W data_start 0000000000004008 D __dso_handle 0000000000004010 B __bss_start 0000000000004010 b completed.8061 0000000000004010 D _edata 0000000000004010 D __TMC_END__ 0000000000004014 b paperino.2316 0000000000004018 b paperino.2320 0000000000004020 B _end $
This example shows, how the conflict generated by having two global variables having the same name, paperino, forced the compiler to differentiate them by appending a number. It is lesser known, but static local variables defined in functions are actually global variables. In the function namespace they do not generate any conflict, but in the compiler unit namespace they do, and this is why the compiler mangles names like that in the binary.
Back to the problem identified by the ks-nav new feature, in the diagram, there are two global data symbols that are evidently mangled by the compiler: the __key.11
and the __already_done.1
Let's start focusing on the simpler, just to familiarize with the phenomenon: the __already_done
family of symbols.
The analysis evidenced it comes from pr_warn_once
.
This function uses a macro to ensure that the warning message is printed only once. This mechanism ensures that each warning instance is tracked separately using a dedicated variable.
To illustrate how this works, let's track down how the pr_warn_once
macro is expanded.
#define pr_warn_once(fmt, ...) \ printk_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
#define printk_once(fmt, ...) \ DO_ONCE_LITE(printk, fmt, ##__VA_ARGS__)
#define DO_ONCE_LITE(func, ...) \ DO_ONCE_LITE_IF(true, func, ##__VA_ARGS__)
#define DO_ONCE_LITE_IF(condition, func, ...) \ ({ \ bool __ret_do_once = !!(condition); \ \ if (__ONCE_LITE_IF(__ret_do_once)) \ func(__VA_ARGS__); \ \ unlikely(__ret_do_once); \ })
#define __ONCE_LITE_IF(condition) \ ({ \ static bool __section(".data.once") __already_done; \ bool __ret_cond = !!(condition); \ bool __ret_once = false; \ \ if (unlikely(__ret_cond && !__already_done)) { \ __already_done = true; \ __ret_once = true; \ } \ unlikely(__ret_once); \ })
The last expansion step finally provides evidences where the symbol __already_done.1
is coming from. It is easy to understand that if more than one pr_warn_once
is present into the same compilation unit, the compiler ends up in having several __already_done
instances actually referring different memory area, hence it is forced to change these names.
This is how __already_done.[0-9]+
symbol family is generated.
But if the compiler is so careful with names and addresses, how the aliases I mentioned at the beginning are even possible?
The Curious Case of __key
Symbols
The __key
family of symbols presents a different kind of anomaly.
These symbols are closely tied to the spin_lock_init
function and exhibit unique behavior compared to the __already_done
family.
The crux of the issue lies in how the compiler handles structures with no members in C.
In the context of the Linux kernel, when the lockdep feature is disabled (this what happen when it is enabled), the lock_class_key
structure becomes an empty struct.
This means that when the compiler allocates such a variable in the data or BSS sections, it effectively allocates a zero-sized object. As a result, the next object allocated immediately afterward, ends up sharing the same address as the zero-sized object. This is the cause of the presence of these alias like symbols. They are not meant to be alias, they just happen to be such.
The __key
symbols thus become aliases, purely due to the lock_class_key
zero-sized nature when lockdep is disabled. This behavior is both unintended and inconsistent, as enabling lockdep causes the __key
symbols to have a non-zero size, thereby
preventing them from aliasing with other symbols.
Here is an example of zero sized __key
objects, compared with the same, when the lockdep is enabled:
as it appears when lockdep is disabled
$ cat System.map| grep ffffffff83534360 ffffffff83534360 b __key.11 ffffffff83534360 b __key.12 ffffffff83534360 b static_call_initialized $ readelf -Wa vmlinux |grep __key.1[12] 11513: ffffffff83534360 0 OBJECT LOCAL DEFAULT 35 __key.12 11514: ffffffff83534360 0 OBJECT LOCAL DEFAULT 35 __key.11 19420: ffffffff83541710 0 OBJECT LOCAL DEFAULT 35 __key.12 19421: ffffffff83541710 0 OBJECT LOCAL DEFAULT 35 __key.11 45259: ffffffff835690b8 0 OBJECT LOCAL DEFAULT 35 __key.11 47597: ffffffff83569b38 0 OBJECT LOCAL DEFAULT 35 __key.12 47598: ffffffff83569b38 0 OBJECT LOCAL DEFAULT 35 __key.11 51424: ffffffff8356dac0 0 OBJECT LOCAL DEFAULT 35 __key.12
readelf
shows 0 sized objects, and kernel's system map shows the collision between symbols
as it appears when lockdep is enabled
$ readelf -Wa vmlinux |grep __key.1[12] 6080: ffffffff837ae610 16 OBJECT LOCAL DEFAULT 35 __key.12 6081: ffffffff837ae600 16 OBJECT LOCAL DEFAULT 35 __key.11 8402: ffffffff842624d0 16 OBJECT LOCAL DEFAULT 35 __key.11 8693: ffffffff842626b0 16 OBJECT LOCAL DEFAULT 35 __key.11 8703: ffffffff842626c0 16 OBJECT LOCAL DEFAULT 35 __key.12 8975: ffffffff84262790 16 OBJECT LOCAL DEFAULT 35 __key.12 8976: ffffffff84262780 16 OBJECT LOCAL DEFAULT 35 __key.11 10437: ffffffff84265030 16 OBJECT LOCAL DEFAULT 35 __key.11 12666: ffffffff8426ba60 16 OBJECT LOCAL DEFAULT 35 __key.12 12916: ffffffff8426bc20 16 OBJECT LOCAL DEFAULT 35 __key.12 20464: ffffffff8427b900 16 OBJECT LOCAL DEFAULT 35 __key.11 21593: ffffffff8427bb50 16 OBJECT LOCAL DEFAULT 35 __key.12 21594: ffffffff8427bb40 16 OBJECT LOCAL DEFAULT 35 __key.11 23931: ffffffff8427d240 16 OBJECT LOCAL DEFAULT 35 __key.12 23933: ffffffff8427d230 16 OBJECT LOCAL DEFAULT 35 __key.11 27527: ffffffff8428cf50 16 OBJECT LOCAL DEFAULT 35 __key.11 27902: ffffffff8428d050 16 OBJECT LOCAL DEFAULT 35 __key.12 27904: ffffffff8428d040 16 OBJECT LOCAL DEFAULT 35 __key.11 28675: ffffffff8428e1b0 16 OBJECT LOCAL DEFAULT 35 __key.11 32713: ffffffff842a0b10 16 OBJECT LOCAL DEFAULT 35 __key.12 32714: ffffffff842a0b00 16 OBJECT LOCAL DEFAULT 35 __key.11 33307: ffffffff842a2d10 16 OBJECT LOCAL DEFAULT 35 __key.11 42165: ffffffff842adb60 16 OBJECT LOCAL DEFAULT 35 __key.12 42167: ffffffff842adb50 16 OBJECT LOCAL DEFAULT 35 __key.11 44247: ffffffff842ae950 16 OBJECT LOCAL DEFAULT 35 __key.11 44865: ffffffff842aee00 16 OBJECT LOCAL DEFAULT 35 __key.12 44887: ffffffff842aedf0 16 OBJECT LOCAL DEFAULT 35 __key.11 45016: ffffffff842aeed0 16 OBJECT LOCAL DEFAULT 35 __key.12 45017: ffffffff842aeec0 16 OBJECT LOCAL DEFAULT 35 __key.11 48389: ffffffff842b0760 16 OBJECT LOCAL DEFAULT 35 __key.12 48390: ffffffff842b0750 16 OBJECT LOCAL DEFAULT 35 __key.11 49274: ffffffff842b1500 16 OBJECT LOCAL DEFAULT 35 __key.11 51779: ffffffff842b2820 16 OBJECT LOCAL DEFAULT 35 __key.12 51780: ffffffff842b2810 16 OBJECT LOCAL DEFAULT 35 __key.11 52060: ffffffff842b2cb0 16 OBJECT LOCAL DEFAULT 35 __key.12 52061: ffffffff842b2ca0 16 OBJECT LOCAL DEFAULT 35 __key.11 55853: ffffffff842b95c0 16 OBJECT LOCAL DEFAULT 35 __key.12 62007: ffffffff842cf910 16 OBJECT LOCAL DEFAULT 35 __key.12 62009: ffffffff842cf900 16 OBJECT LOCAL DEFAULT 35 __key.11 63425: ffffffff842d6580 16 OBJECT LOCAL DEFAULT 35 __key.12 63426: ffffffff842d6570 16 OBJECT LOCAL DEFAULT 35 __key.11 64498: ffffffff842d7230 16 OBJECT LOCAL DEFAULT 35 __key.12 64499: ffffffff842d7220 16 OBJECT LOCAL DEFAULT 35 __key.11 66813: ffffffff842d8710 16 OBJECT LOCAL DEFAULT 35 __key.12 66814: ffffffff842d8700 16 OBJECT LOCAL DEFAULT 35 __key.11 69350: ffffffff842d88c0 16 OBJECT LOCAL DEFAULT 35 __key.12 69351: ffffffff842d88b0 16 OBJECT LOCAL DEFAULT 35 __key.11 $ cat System.map| grep static_call_initialized ffffffff8426ba80 b static_call_initialized $ cat System.map| grep ffffffff8426ba80 ffffffff8426ba80 b static_call_initialized
as a consequence of the fact that lockdep structures are no more zero sized, the address conflict disappeared
Conclusion
The phenomena described above highlight how these lesser-known mechanisms induced a bug in the current implementation of the new ks-nav feature. It turns out ks-nav now needs a mechanism to detect zero-sized objects and skip them from evaluation. There's still work to do, but at least now I know what to blame for the hiccup. Time to teach ks-nav a new trick!