Why does LLVM allocate a redundant variable?

Why does this matter — what's the actual problem?

I think the deeper answer you're looking for might be: LLVM's architecture is based around fairly simple frontends and many passes. The frontends have to generate correct code, but it doesn't have to be good code. They can do the simplest thing that works.

In this case, Clang generates a couple of instructions that turn out not to be used for anything. That's generally not a problem, because some part of LLVM will get rid of superfluous instructions. Clang trusts that to happen. Clang doesn't need to avoid emitting dead code; its implementation may focus on correctness, simplicity, testability, etc.

Because Clang is done with syntax analysis but LLVM hasn't even started with optimization.

The Clang front end has generated IR (Intermediate Representation) and not machine code. Those variables are SSAs (Single Static Assignments); they haven't been bound to registers yet and actually after optimization, never will be because they are redundant.

That code is a somewhat literal representation of the source. It is what clang hands to LLVM for optimization. Basically, LLVM starts with that and optimizes from there. Indeed, for version 10 and x86_64, llc -O2 will eventually generate:

main: # @main
  xor eax, eax
  ret

This %1 register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this

int factorial(int n){
    int result;
    if(n < 2)
      result = 1;
    else{
      result = n * factorial(n-1);
    }
    return result;
}

You'd probably do this

int factorial(int n){
    if(n < 2)
      return 1;
    return n * factorial(n-1);
}

Why? Because Clang will insert that result variable that holds the return value for you. Yay. That's the reason for that %1 variable. Look at the ir for a slightly modified version of your code.

Modified code,

enum days {MON, TUE, WED, THU};

int main() {
    enum days d;
    d = WED;
    if(d) return 1;
    return 0;
}

IR,

define dso_local i32 @main() #0 !dbg !15 {
    %1 = alloca i32, align 4
    %2 = alloca i32, align 4
    store i32 0, i32* %1, align 4
    store i32 2, i32* %2, align 4, !dbg !22
    %3 = load i32, i32* %2, align 4, !dbg !23
    %4 = icmp ne i32 %3, 0, !dbg !23
    br i1 %4, label %5, label %6, !dbg !25

 5:                                                ; preds = %0
   store i32 1, i32* %1, align 4, !dbg !26
   br label %7, !dbg !26

 6:                                                ; preds = %0
  store i32 0, i32* %1, align 4, !dbg !27
  br label %7, !dbg !27

 7:                                                ; preds = %6, %5
  %8 = load i32, i32* %1, align 4, !dbg !28
  ret i32 %8, !dbg !28
}

Now you see %1 making itself useful huh? Most functions with a single return statement will have this variable stripped by one of llvm's passes.

Why does LLVM allocate a redundant variable?

Tags:

C

Llvm

Llvm Codegen

Related

Recent Posts