How modern compilers optimize

So I wrote this C program and compiled it with,-O1 -O2 and -O3 flags on x86-64 gcc 6.3 just for fun (notice the unused function argument):

int square(int num) {
    int sum = 0;
    for(int i = 0; i < 10; i ++) {
        sum += i;
    }
    return sum;
}

With -O1 flag:

square(int):
        mov     eax, 0
.L2:
        add     eax, 1
        cmp     eax, 10
        jne     .L2
        mov     eax, 45
        ret

With -O2 flag:

square(int):
        mov     eax, 45
        ret

With -O3 flag:

square(int):
        mov     eax, 45
        ret

With Clang, and any -O flag will result in:

square(int):                             # @square(int)
        mov     eax, 45
        ret

Seems like compilers are willing to run side effect free code at compile time, and calculate the values to be returned.

Update:
icc was even weirder, instead of moving 0s to registers, it XORed the registers with itself. I think that is faster for the i7 processors I was compiling for.

rustc 1.9 keeps emitting slightly shittier code because it doesn’t figure out the unused argument need not pushed to the stack. This happens despite it using the LLVM code generator:

pub fn square(num: i32) -> i32 {
  let mut sum:i32 = 0;
  for i in 1..10 {
    sum += i;
  }
  return sum;
}

emits with -C opt-level=3 flag:

example::square:
        push    rbp
        mov     rbp, rsp
        mov     eax, 45
        pop     rbp
        ret

D-language compiler gdc 5.2.0 emits code which is as good as clang, but with a lot of metadata, which is not surprising, because of the LLVM code generator

int square(int num) {
  int sum = 0;
  for(int i =0; i < 10; i++) {
    sum += i;
  }
  return sum;
}

emits

int example.square(int):
        mov     eax, 45
        ret
void example.__modinit():
        mov     rax, QWORD PTR _Dmodule_ref[rip]
        mov     QWORD PTR _Dmodule_ref[rip], OFFSET FLAT:__mod_ref.3526
        mov     QWORD PTR __mod_ref.3526[rip], rax
        ret
__mod_ref.3526:
        .quad   0
        .quad   _D7example12__ModuleInfoZ
_D7example12__ModuleInfoZ:
        .long   4100
        .long   0
        .string "example"

x86 gccgo 4.9.1 on -O3, also emits optimized code with lots of metadata and a main function:

main.Square:
        cmp     rsp, QWORD PTR %fs:112
        jb      .L4
.L2:
        mov     eax, 45
        ret
.L4:
        xor     r10d, r10d
        xor     r11d, r11d
        call    __morestack
        ret
        jmp     .L2
__go_init_main:
        cmp     rsp, QWORD PTR %fs:112
        jb      .L7
        ret
.L7:
        xor     r10d, r10d
        xor     r11d, r11d
        call    __morestack
        ret
        ret
main.Square$descriptor:
        .quad   main.Square

Advertisements