Tuesday, October 8, 2024

Symbol suffixes compilers use to confuse developers

Ever taken a peek at the symbol table of an object or executable? Or, if you're feeling particularly adventurous, have you snooped around the Linux kernel symbol table (kallsyms, for those on a first-name basis)? If so, you've probably been baffled by the bizarre names the compiler uses to mangle your precious symbols. Well, you’re not alone. I’ve spent some quality time collecting bits and pieces of information from various corners of the internet, carefully cutting and pasting (with great skill, I might add) to create this handy page where everything is nicely grouped. If you're anything like me, you’ll appreciate having all this info in one convenient place. Enjoy the madness!

Suffix Description Note
.constprop.<n> Constant propagation. Indicates that the function has been optimized using constant propagation, where constant values were propagated through the code. The is usually a version number and can increment if the function is optimized multiple times.
.isra.<n> Interprocedural Scalar Replacement of Aggregates (ISRA). This optimization involves breaking down aggregates (like structs or arrays) passed to functions into individual scalar values. It helps improve register usage and reduces memory accesses. The indicates the version or stage of the transformation.
.clone.<n> Function cloning. When the compiler creates a cloned version of a function to optimize it for specific use cases (e.g., with certain constant arguments, different calling conventions, or to assist in function inlining), it adds the .clone. suffix. This is useful for tailoring the function to certain conditions, such as a specific set of input parameters.
.part.<n> Function partitioning. The .part. suffix appears when the compiler splits a large function into smaller parts. This often happens due to the function being too complex for certain optimizations or for inlining purposes. The refers to the specific part number.
.cold Cold code. This suffix is added to functions or parts of functions that are considered "cold" by the compiler. Cold code refers to parts of the program that are rarely executed, such as error handling code or unlikely branches in conditionals. The compiler may optimize these functions for size rather than speed, or move them to separate sections of memory to improve cache performance for "hot" code (code that is frequently executed).
.hot Hot code. Similar to .cold, this suffix indicates "hot" code, which is frequently executed. The compiler might apply aggressive optimizations focused on improving execution speed, such as loop unrolling, function inlining, or improved branch prediction.
.likely/.unlikely Likely or unlikely branches. These suffixes indicate that the compiler has predicted whether a particular branch of code is likely or unlikely to be executed, usually based on profiling data or heuristics. The likely branch is optimized for speed, while the unlikely branch might be optimized for size or minimized in terms of performance impact.
.lto_priv.<hash> Link-Time Optimization (LTO) private function. This suffix appears during link-time optimization (LTO), where functions are optimized across translation units (source files). The .lto_priv. suffix is added to private (non-exported) functions that have been internalized and optimized during the LTO phase. The is typically a unique identifier.
.omp_fn.<n> OpenMP function. Functions generated as part of OpenMP parallelization are often given this suffix. It indicates that the function was created or modified to handle a parallel region of code as specified by OpenMP directives. The is the version or index of the OpenMP function.
.split.<n> Split functions. This suffix appears when a large function is split into smaller pieces, either for optimization reasons or due to certain compiler strategies (like function outlining). The indicates the part or section number of the split function.
.inline Inlined function. Functions marked with this suffix have been aggressively inlined by the compiler. Sometimes, a specialized inlined version of the function is created, while the original remains intact for non-inlined calls.
.to/.from Conversion functions. These suffixes are used when functions are involved in some sort of conversion process, such as casting or transforming data structures from one form to another. .to typically refers to converting to a certain form, and .from refers to converting from a form.
.gcda Profiling data generation (related to GCov). This suffix is associated with functions that produce profiling data used by GCov (GNU's code coverage tool). These functions track execution counts and other statistics to generate code coverage information.
.llvm.<hash> Local linkage promoted to External linkage With ThinLTO, you might see mangled names having this suffix. This happens because functions inlined across units need their local references made global, causing name changes.

Constant Propagation: Overview

Constant Propagation is an important optimization technique used by compilers to improve the performance of generated code. It involves analyzing the code to identify constant values that are known at compile-time and propagating these constants throughout the code. By substituting variables with their constant values, the compiler can simplify expressions and potentially remove unnecessary calculations, improving both the runtime performance and memory usage of the program.

How Constant Propagation Works:

  1. Identify Constants During the compilation process, the compiler looks for variables that are assigned constant values. For example:

    int x = 5; 
    int y = 2 \* x;
    Here, x is a constant because it is assigned a known value 5.
  2. Propagation: Once the compiler identifies a constant, it replaces the variable with its constant value wherever it appears in subsequent code. Continuing the above example:

    int y = 2 * 5;
  3. Simplification: After propagation, the compiler can further simplify expressions that involve constants:

    int y = 10;    
  4. Dead Code Elimination: Sometimes, constant propagation leads to opportunities for other optimizations, such as dead code elimination. For instance, after propagating constants, conditional branches that always evaluate to true or false can be simplified, allowing the compiler to remove unnecessary branches:

    if (5 < 10) { 
        // This block is always executed, so the condition can be removed. 
    }

Benefits of Constant Propagation:

  • Improved Performance: Constant propagation can eliminate runtime calculations, reducing the overall number of operations in the code. This leads to faster execution times.
  • Reduced Code Size: Simplifying expressions and removing redundant code can reduce the size of the compiled binary.
  • Better Memory Usage: By eliminating unnecessary variables and operations, constant propagation can reduce memory consumption.

Example of Constant Propagation:

Consider the following simple C program:

void example() {
    int a = 10;
    int b = a + 5;
    int c = b * 2;
    printf("%d\n", c);
}

Without constant propagation, this program might be compiled as-is, performing the operations b = a + 5 and c = b * 2 at runtime. However, with constant propagation, the compiler could optimize the program as follows:

void example() {
    int c = 30;
    printf("%d\n", c);
}

Constant Propagation vs. Constant Folding:

  • Constant Propagation focuses on replacing variables that hold constant values with those constants wherever possible in the code.
  • Constant Folding is another optimization technique that involves evaluating constant expressions at compile-time rather than runtime. For example:

    int x = 5 + 3;

    Here, constant folding would replace 5 + 3 with 8 at compile-time, eliminating the need for the addition operation at runtime.

Both techniques often work together, with constant propagation creating opportunities for constant folding and vice versa.

Types of Constant Propagation:

  1. Intra-Procedural Constant Propagation: This type of propagation occurs within a single function or block of code. The compiler tracks constants within the boundaries of the function or block, but it does not propagate values across different functions.
  2. Inter-Procedural Constant Propagation: This is a more advanced form of propagation where the compiler tracks and propagates constants across function boundaries. It requires more complex analysis but can result in better optimization, especially in large programs with function calls.

What .constprop.0 Means

  • Suffix Meaning: The .constprop.0 suffix is added by the compiler (usually by GCC or Clang) to signify that the function has been subjected to constant propagation optimization. The number (0 in .constprop.0) is just a version indicator and can be incremented if the function undergoes further stages of optimization.
  • Constant Propagation at the Function Level: When a compiler identifies that certain arguments to a function are constants, it can create a specialized version of the function where those constants are hardcoded. This allows the function to be optimized more aggressively. The suffix is attached to the optimized function to distinguish it from the original unoptimized version. For example, consider the following function:

    int add(int x, int y) {
        return x + y;
    } 

    If, during optimization, the compiler finds that this function is frequently called with constant values, say add(3, 5), it might create a specialized version of the function where the constants are propagated:

    int add.constprop.0() {
        return 8;  // 3 + 5 has been precomputed
    }

    In the compiled code, this new function might be named something like add.constprop.0 to reflect that it has been optimized based on constant propagation.

When Does Constant Propagation Trigger This?

The compiler performs constant propagation across function boundaries when it can determine that certain function parameters are constant in all or some of the calls to that function. This optimization is often triggered in conjunction with inlining, constant folding, and function specialization. Here’s how it works: 1. Function Inlining: When the compiler decides to inline a function (replace the call to the function with the actual function code), it can propagate constant arguments into the inlined function. This can lead to opportunities for further simplification. If the function isn’t fully inlined for all calls, the compiler might create a specialized version with constant propagation applied for those constant cases. 2. Function Specialization: If a function is called multiple times with certain arguments that are constant, the compiler might generate a specialized version of the function for those constant values. The .constprop.0 function is such a specialized version where constant propagation and potentially other optimizations (like dead code elimination or loop unrolling) have been applied. 3. Rewriting Calls: After creating the specialized version of the function, the compiler rewrites calls to the original function with constant arguments to point to the optimized .constprop.0 version. This way, the optimized version is used where possible, but the original version remains available for cases where the arguments aren’t constant.

Benefits of .constprop.0 Functions

The creation of these specialized functions with constant propagation offers several benefits: - Performance Gains: The compiler can optimize away redundant computations and simplify the function, leading to faster execution. For example, expressions that depend on constants might be precomputed, conditional branches might be eliminated, and loops might be unrolled. - Reduced Code Size: In some cases, specialized functions with constant propagation can actually reduce the code size, as the compiler might remove code paths that are no longer needed (for example, dead branches or unnecessary variable assignments). - Better Cache Usage: Specialized versions of functions can be smaller and more cache-friendly since they focus only on the specific case where certain inputs are constant.

Example in Practice

Consider this C code:

int compute(int a, int b) {
    return a * 2 + b;
}

int main() {
    return compute(4, 5);
}

Without optimization, the compute function would be called at runtime with the arguments 4 and 5. However, during constant propagation, the compiler detects that 4 and 5 are constants and creates a specialized version of compute:

int compute.constprop.0() {
    return 13;  // Precomputed: 4 * 2 + 5
}

The call in main() would be replaced by a direct call to compute.constprop.0(), and no runtime multiplication or addition would be required.

Why Does the Original Function Stay?

The original, non-specialized version of the function typically stays in the binary if there are calls to it with non-constant arguments or if it cannot be fully optimized in all cases. The .constprop.0 function is just an optimized variant for cases where constants are known, so the compiler keeps both versions to handle different calling scenarios.

Possible Reasons for .inline Suffix Existence:

  1. Partial Inlining:
    • What Happens: Sometimes, the compiler may choose to inline a function only in certain places (e.g., hot paths where performance is critical) while retaining the original non-inlined version for other calls. This can happen when the function is small enough to be inlined in performance-critical paths but also used in non-critical paths or in situations where inlining might increase code size too much.
    • Result: In this case, an inlined version may be created, but the original function with an .inline suffix might still be retained for non-inlined calls. This allows the compiler to balance performance and code size.
  2. Inlining Across Translation Units (LTO):
    • What Happens: During Link-Time Optimization (LTO), functions might be inlined across different translation units (source files). However, the function might still need to be retained in its original form for other purposes (such as if it’s part of a shared library or called from another compilation unit that was not optimized in the same way).
    • Result: A version of the function with the .inline suffix could be preserved as an internal symbol, ensuring that the compiler can still generate callable code if needed, while simultaneously allowing aggressive inlining across units.
  3. Multiple Optimization Levels:
    • What Happens: The compiler might generate different versions of a function to optimize for specific use cases. For instance, it could create an inlined version for certain contexts and a standalone version for others, especially if different parts of the code are compiled with different optimization flags or under different constraints (e.g., space vs. speed optimizations).
    • Result: The .inline suffix would then be used to distinguish the inlined version from the original, non-inlined function, even though the function is still present as a callable entity.
  4. Debugging and Profiling:
    • What Happens: Compilers sometimes retain inlined function symbols in the binary even though the code has been inlined, for the purpose of debugging and profiling. Tools like gdb or performance profilers may need to refer to the original function for accurate stack traces, debugging information, or code coverage data.
    • Result: The compiler might keep a symbol with the .inline suffix so that debugging information remains available, even if the function body no longer exists in its original form.
  5. Function Attributes:
    • What Happens: Certain function attributes or calling conventions may require that a function symbol still exists in the binary, even if the function has been inlined elsewhere. For instance, a function might be declared inline but also weak (meaning it can be overridden) or have other attributes that necessitate keeping a symbol for linking purposes.
    • Result: The compiler may generate both an inlined version and retain a separate version of the function marked with .inline, to fulfill these attributes or constraints.

### Scenario 1: The Function Is Declared inline

When a function is explicitly declared as inline in the source code: - Expectation: The programmer indicates that they would prefer the function to be inlined to avoid the overhead of a function call. This, however, is a hint, not a guarantee. The compiler can still choose not to inline the function, especially if inlining it would increase code size excessively or if the function is too complex. - Linkage and Visibility: Typically, inline functions are defined in headers or in multiple translation units because they should be available to multiple parts of the program. If you declare a function as inline, but it has external linkage, this means the function is visible across multiple translation units, and the linker might still need to ensure that only one definition is used. As a result, the function may still need a symbol in the binary. - Compilers can generate a symbol for such inline functions, especially if they are not inlined in all cases. The symbol might be suffixed with .inline if the compiler creates a specialized version after attempting partial inlining. - Why retain a symbol?: Even though the function is marked inline, the compiler might not inline it everywhere. It might still create a regular function for some call sites while inlining others. The symbol could remain to provide an externally accessible version in case the inlining isn’t performed universally. - In Public Libraries or Interfaces: Despite being marked inline, such functions might still need Interprocedural Scalar Replacement of Aggregates (ISRA) is a compiler optimization technique aimed at improving performance by breaking down large data structures (such as arrays, structs, or classes, collectively called aggregates) into their individual scalar components (like integers or floating-point values). This allows the compiler to perform more efficient optimizations on those individual parts rather than working with the entire structure as a whole. Interprocedural means that this optimization can take place across function boundaries, not just within a single function.

Let’s explore ISRA in detail:

Key Concepts in ISRA

  1. Aggregate Data Structures:
    • Aggregate types refer to complex data structures such as structs, arrays, or classes, which group together multiple individual variables into a single entity.
    • For example, in C, a struct might look like this:

      struct Point {
          int x;
          int y;
      };
    • The Point structure holds two integers, x and y, as part of one entity. Passing and manipulating this entire structure at once can be inefficient, especially when only some of its fields are used in a function.
  2. Scalar Replacement:
    • Scalar replacement is the process of breaking down an aggregate into its individual scalar components, such as integers, floats, or pointers. This allows the compiler to work with these smaller, more manageable parts instead of the entire structure.
    • For example, the compiler could split struct Point into two scalar variables, int x and int y, allowing it to perform optimizations on x and y independently.

How ISRA Works

In the context of interprocedural optimization, ISRA looks at the data being passed between functions (i.e., across function boundaries) and determines whether the entire aggregate needs to be passed, or if the individual fields of the aggregate can be passed as independent scalars. Consider this simple example:

struct Point {
    int x;
    int y;
};

int computeDistance(struct Point p) {
    return p.x * p.x + p.y * p.y;
}

Without ISRA, the computeDistance function would take a struct Point argument by value, which means that both x and y are passed as part of the struct Point object. This may involve unnecessary memory loads, stores, and passing the entire structure on the stack.

What Happens During ISRA

ISRA optimizes this process by performing the following steps: 1. Function Argument Decomposition: - Instead of passing the entire struct Point as a single argument to computeDistance, ISRA breaks it down into its components. This means that instead of passing the structure p, the compiler will generate a version of the function that takes two int arguments, x and y: int computeDistance(int x, int y) { return x * x + y * y; } 2. Across Function Boundaries: - The key part of ISRA is that it works interprocedurally—meaning it doesn’t just happen within one function but across function calls. If a function calls computeDistance, the compiler can transform both the calling function and computeDistance so that they pass and work on the individual scalar values (x and y), instead of the entire struct Point. For example: void process() { struct Point p = {3, 4}; int d = computeDistance(p); } ISRA would convert this into: void process() { int x = 3; int y = 4; int d = computeDistance(x, y); } 3. Improved Register Utilization: - By breaking down aggregates into their scalar components, the compiler can store and manipulate those values directly in CPU registers, which are much faster than accessing memory. In the example above, the x and y values can be kept in registers instead of being stored and loaded from memory, reducing the overhead of memory access. 4. Dead Code Elimination: - If only part of the structure is used, ISRA can also help eliminate unused fields. For instance, if a function only needs p.x but not p.y, the compiler can avoid passing or loading p.y entirely. This further reduces unnecessary computation and memory access.

Example of ISRA Optimization

Before ISRA:

struct Point {
    int x;
    int y;
};

int computeDistance(struct Point p) {
    return p.x * p.x + p.y * p.y;
}

void process() {
    struct Point p = {3, 4};
    int d = computeDistance(p);
}

After ISRA:

int computeDistance(int x, int y) {
    return x * x + y * y;
}

void process() {
    int x = 3;
    int y = 4;
    int d = computeDistance(x, y);
}

Benefits of ISRA

  1. Reduced Memory Traffic:
    • Since scalar values (like integers and floats) can often be passed in registers, ISRA reduces the need to read from or write to memory when working with aggregate data. This leads to faster execution because memory access is generally slower than register access.
  2. Smaller Code Size:
    • By eliminating the need to pass entire aggregates (especially if they are large), the generated code becomes smaller and more efficient, as the overhead of copying entire data structures is avoided.
  3. Better Cache Usage:
    • ISRA reduces memory accesses, which improves cache performance. By avoiding unnecessary loads and stores of the entire structure, it minimizes cache pollution, which can result in better overall performance.
  4. Improved Optimizations:
    • Once aggregates are replaced by scalars, the compiler can apply additional optimizations, such as constant propagation, dead code elimination, and register allocation, to individual fields, which can result in more efficient code.

Challenges and Limitations of ISRA

  • Large Structures: For very large structures, ISRA may not always be beneficial because breaking them down into many scalar values can lead to high register pressure. This is especially true on architectures with limited registers, where using too many registers for scalar values can degrade performance.
  • ABI Constraints: Some Application Binary Interfaces (ABI) dictate how functions should pass arguments (whether in registers or on the stack). ISRA optimizations must respect these rules, which may limit the extent to which aggregate structures can be scalarized.
  • Complex Structures: ISRA is easier to apply to simple aggregates (like structs with only a few fields), but it can be more complex or impractical for deeply nested or very large structures, especially if pointers are involved.

No comments:

Post a Comment