Thursday, May 23, 2024

Investigate Obscure Kernel Symbols

Introduction

In the world of Linux kernel development, one often encounters intriguing anomalies that spark curiosity and investigation. My journey into exploring such peculiarities began with a previous deep dive into duplicate symbols within the Linux kernel. This exploration revealed fascinating insights into how certain symbols names, appears multiple times having different addresses. It was fun to discover that among multiple different addresses having the same name, there were also actual duplicates of the same function (name and body), even thought, the majority of those symbols having the same name were actually different objects. Building on that foundation, my current investigation delves into another set of mysterious symbols, those that appear to be aliases for given addresses in the kernel (multiple names for the same address), but whose origins are not immediately obvious. Their presence had significant consequences in my new effort. I'm currently adding a new feature to ks-nav, a nifty tool that generates diagrams from the kernel binary image. The goal is to provide kernel analysts with valuable insights into the kernel code, because who doesn't love a good kernel investigation? The tool already produces call tree diagrams and visualize subsystem interactions triggered by specific functions. My latest endeavor? To add functionality that reveals how global variables are used and shared among functions. The topic of this blog post springs from analyzing the output of this tool. Here's an image produced by investigating the global symbols shared starting from the function hugetlb_vma_lock_alloc.

The Problem of Macro Expansion and Symbol Aliasing

Unlike the previous investigation where symbols were straightforward duplicates, the issue at hand now involves a more complex phenomenon stemming from macro expansion. The process of macro expansion in the kernel can result in multiple symbols being generated with the same name, even though, each of these are actually different variables in memory. You can have the same phenomenon originate by compiler multiple mangling of the code such as inlining, or macro expansion, but when it happens, to allow the compiler to manage these same name symbols as different, the compiler must transform these names to allow it to differentiate. In practical terms, this just means that the compiler appends numbers to the identifier name to produce a new unique identifier. A simple example can clarify this:
$ cat h.c
#include 

int pippo(int i){
        static int paperino;
        if (i>=0) paperino=i;
        return paperino;
}
int pluto(int i){
        static int paperino;
        if (i>=0) paperino=i;
        return paperino;
}

int main(){
        printf("paperino= %d\n", pippo(55) );
        printf("paperino= %d\n", pippo(-1) );
        printf("paperino= %d\n", pluto(99) );
        printf("paperino= %d\n", pluto(-1) );
}
$ gcc -g h.c -o h
$ ./h
paperino= 55
paperino= 55
paperino= 99
paperino= 99
$ nm -n h
                 w __cxa_finalize@@GLIBC_2.2.5
                 w __gmon_start__
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
                 U __libc_start_main@@GLIBC_2.2.5
                 U printf@@GLIBC_2.2.5
0000000000001000 t _init
0000000000001060 T _start
0000000000001090 t deregister_tm_clones
00000000000010c0 t register_tm_clones
0000000000001100 t __do_global_dtors_aux
0000000000001140 t frame_dummy
0000000000001149 T pippo
000000000000116b T pluto
000000000000118d T main
0000000000001210 T __libc_csu_init
0000000000001280 T __libc_csu_fini
0000000000001288 T _fini
0000000000002000 R _IO_stdin_used
0000000000002014 r __GNU_EH_FRAME_HDR
00000000000021ac r __FRAME_END__
0000000000003db8 d __frame_dummy_init_array_entry
0000000000003db8 d __init_array_start
0000000000003dc0 d __do_global_dtors_aux_fini_array_entry
0000000000003dc0 d __init_array_end
0000000000003dc8 d _DYNAMIC
0000000000003fb8 d _GLOBAL_OFFSET_TABLE_
0000000000004000 D __data_start
0000000000004000 W data_start
0000000000004008 D __dso_handle
0000000000004010 B __bss_start
0000000000004010 b completed.8061
0000000000004010 D _edata
0000000000004010 D __TMC_END__
0000000000004014 b paperino.2316
0000000000004018 b paperino.2320
0000000000004020 B _end
$

This example shows, how the conflict generated by having two global variables having the same name, paperino, forced the compiler to differentiate them by appending a number. It is lesser known, but static local variables defined in functions are actually global variables. In the function namespace they do not generate any conflict, but in the compiler unit namespace they do, and this is why the compiler mangles names like that in the binary.

Back to the problem identified by the ks-nav new feature, in the diagram, there are two global data symbols that are evidently mangled by the compiler: the __key.11 and the __already_done.1 Let's start focusing on the simpler, just to familiarize with the phenomenon: the __already_done family of symbols. The analysis evidenced it comes from pr_warn_once. This function uses a macro to ensure that the warning message is printed only once. This mechanism ensures that each warning instance is tracked separately using a dedicated variable. To illustrate how this works, let's track down how the pr_warn_once macro is expanded.

step 1

  #define pr_warn_once(fmt, ...)                                  \
        printk_once(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
  

step 2

  #define printk_once(fmt, ...)                                   \
        DO_ONCE_LITE(printk, fmt, ##__VA_ARGS__)
  

step 3

  #define DO_ONCE_LITE(func, ...)                                         \
        DO_ONCE_LITE_IF(true, func, ##__VA_ARGS__)
  

step 4

  #define DO_ONCE_LITE_IF(condition, func, ...)                           \
        ({                                                              \
                bool __ret_do_once = !!(condition);                     \
                                                                        \
                if (__ONCE_LITE_IF(__ret_do_once))                      \
                        func(__VA_ARGS__);                              \
                                                                        \
                unlikely(__ret_do_once);                                \
        })
  

step 5

  #define __ONCE_LITE_IF(condition)                                       \
        ({                                                              \
                static bool __section(".data.once") __already_done;     \
                bool __ret_cond = !!(condition);                        \
                bool __ret_once = false;                                \
                                                                        \
                if (unlikely(__ret_cond && !__already_done)) {          \
                        __already_done = true;                          \
                        __ret_once = true;                              \
                }                                                       \
                unlikely(__ret_once);                                   \
        })
  

The last expansion step finally provides evidences where the symbol __already_done.1 is coming from. It is easy to understand that if more than one pr_warn_once is present into the same compilation unit, the compiler ends up in having several __already_done instances actually referring different memory area, hence it is forced to change these names. This is how __already_done.[0-9]+ symbol family is generated.

But if the compiler is so careful with names and addresses, how the aliases I mentioned at the beginning are even possible?

The Curious Case of __key Symbols

The __key family of symbols presents a different kind of anomaly. These symbols are closely tied to the spin_lock_init function and exhibit unique behavior compared to the __already_done family. The crux of the issue lies in how the compiler handles structures with no members in C. In the context of the Linux kernel, when the lockdep feature is disabled (this what happen when it is enabled), the lock_class_key structure becomes an empty struct. This means that when the compiler allocates such a variable in the data or BSS sections, it effectively allocates a zero-sized object. As a result, the next object allocated immediately afterward, ends up sharing the same address as the zero-sized object. This is the cause of the presence of these alias like symbols. They are not meant to be alias, they just happen to be such.

The __key symbols thus become aliases, purely due to the lock_class_key zero-sized nature when lockdep is disabled. This behavior is both unintended and inconsistent, as enabling lockdep causes the __key symbols to have a non-zero size, thereby preventing them from aliasing with other symbols.

Here is an example of zero sized __key objects, compared with the same, when the lockdep is enabled:

as it appears when lockdep is disabled

$ cat System.map| grep  ffffffff83534360
ffffffff83534360 b __key.11
ffffffff83534360 b __key.12
ffffffff83534360 b static_call_initialized
$ readelf -Wa vmlinux |grep __key.1[12]
 11513: ffffffff83534360     0 OBJECT  LOCAL  DEFAULT   35 __key.12
 11514: ffffffff83534360     0 OBJECT  LOCAL  DEFAULT   35 __key.11
 19420: ffffffff83541710     0 OBJECT  LOCAL  DEFAULT   35 __key.12
 19421: ffffffff83541710     0 OBJECT  LOCAL  DEFAULT   35 __key.11
 45259: ffffffff835690b8     0 OBJECT  LOCAL  DEFAULT   35 __key.11
 47597: ffffffff83569b38     0 OBJECT  LOCAL  DEFAULT   35 __key.12
 47598: ffffffff83569b38     0 OBJECT  LOCAL  DEFAULT   35 __key.11
 51424: ffffffff8356dac0     0 OBJECT  LOCAL  DEFAULT   35 __key.12
  

readelf shows 0 sized objects, and kernel's system map shows the collision between symbols

as it appears when lockdep is enabled

$ readelf -Wa vmlinux |grep __key.1[12]
  6080: ffffffff837ae610    16 OBJECT  LOCAL  DEFAULT   35 __key.12
  6081: ffffffff837ae600    16 OBJECT  LOCAL  DEFAULT   35 __key.11
  8402: ffffffff842624d0    16 OBJECT  LOCAL  DEFAULT   35 __key.11
  8693: ffffffff842626b0    16 OBJECT  LOCAL  DEFAULT   35 __key.11
  8703: ffffffff842626c0    16 OBJECT  LOCAL  DEFAULT   35 __key.12
  8975: ffffffff84262790    16 OBJECT  LOCAL  DEFAULT   35 __key.12
  8976: ffffffff84262780    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 10437: ffffffff84265030    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 12666: ffffffff8426ba60    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 12916: ffffffff8426bc20    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 20464: ffffffff8427b900    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 21593: ffffffff8427bb50    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 21594: ffffffff8427bb40    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 23931: ffffffff8427d240    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 23933: ffffffff8427d230    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 27527: ffffffff8428cf50    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 27902: ffffffff8428d050    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 27904: ffffffff8428d040    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 28675: ffffffff8428e1b0    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 32713: ffffffff842a0b10    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 32714: ffffffff842a0b00    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 33307: ffffffff842a2d10    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 42165: ffffffff842adb60    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 42167: ffffffff842adb50    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 44247: ffffffff842ae950    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 44865: ffffffff842aee00    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 44887: ffffffff842aedf0    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 45016: ffffffff842aeed0    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 45017: ffffffff842aeec0    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 48389: ffffffff842b0760    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 48390: ffffffff842b0750    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 49274: ffffffff842b1500    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 51779: ffffffff842b2820    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 51780: ffffffff842b2810    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 52060: ffffffff842b2cb0    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 52061: ffffffff842b2ca0    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 55853: ffffffff842b95c0    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 62007: ffffffff842cf910    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 62009: ffffffff842cf900    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 63425: ffffffff842d6580    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 63426: ffffffff842d6570    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 64498: ffffffff842d7230    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 64499: ffffffff842d7220    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 66813: ffffffff842d8710    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 66814: ffffffff842d8700    16 OBJECT  LOCAL  DEFAULT   35 __key.11
 69350: ffffffff842d88c0    16 OBJECT  LOCAL  DEFAULT   35 __key.12
 69351: ffffffff842d88b0    16 OBJECT  LOCAL  DEFAULT   35 __key.11

$ cat System.map| grep  static_call_initialized
ffffffff8426ba80 b static_call_initialized
$ cat System.map| grep  ffffffff8426ba80
ffffffff8426ba80 b static_call_initialized
  

as a consequence of the fact that lockdep structures are no more zero sized, the address conflict disappeared

Conclusion

The phenomena described above highlight how these lesser-known mechanisms induced a bug in the current implementation of the new ks-nav feature. It turns out ks-nav now needs a mechanism to detect zero-sized objects and skip them from evaluation. There's still work to do, but at least now I know what to blame for the hiccup. Time to teach ks-nav a new trick!

No comments:

Post a Comment