Henry S. Coelho

Comparing different compilation options

In this post, I will compare the code generated by the GCC compiler with different sets of parameters for the compiler, as well as small different changes in the source code.

Change 1

File size change: The file size went from 25kb to 10Mb

Change in headers: By running the command "file" against the generated files, we see that the first one is "Dynamically linked", while the second one is "Statically linked"

Function call: the <main> function in the second case calls a subroutine called _IO_printf, which is located inside the file. In the original file, the function called is printf@plt, which jumps to a subroutine called printf@GLIBC_2.2.5 and is not inside the file.

Change 2

Without the -fno-builtin parameter, the function called was not "printf", but "puts".

Change 3

Without the -g compiler option, the file generated does not have the flag "with debug_info" in the headers. It is also slight smaller in size.

There is a noticeable difference in the disassembly output: in the original file, we can see the lines of the main program, as well as the assembly code.

000000000000068a <main>:
#include <stdio.h>

int main() {
 68a:    55                      push   %rbp
 68b:    48 89 e5                mov    %rsp,%rbp
    printf("Hello World!\n");
 68e:    48 8d 3d 9f 00 00 00    lea    0x9f(%rip),%rdi        # 734 <_IO_stdin_used+0x4>
 695:    b8 00 00 00 00          mov    $0x0,%eax
 69a:    e8 c1 fe ff ff          callq  560 <printf@plt>
 69f:    b8 00 00 00 00          mov    $0x0,%eax
}
 6a4:    5d                      pop    %rbp
 6a5:    c3                      retq   
 6a6:    66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 6ad:    00 00 00 

While the other file only has the assembly code:

000000000000068a <main>:
 68a:    55                      push   %rbp
 68b:    48 89 e5                mov    %rsp,%rbp
 68e:    48 8d 3d 9f 00 00 00    lea    0x9f(%rip),%rdi        # 734 <_IO_stdin_used+0x4>
 695:    b8 00 00 00 00          mov    $0x0,%eax
 69a:    e8 c1 fe ff ff          callq  560 <printf@plt>
 69f:    b8 00 00 00 00          mov    $0x0,%eax
 6a4:    5d                      pop    %rbp
 6a5:    c3                      retq   
 6a6:    66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 6ad:    00 00 00 

Change 4

My CPU is a x86_64, so you can take a look at the registers for my machine in this link.

The first argument was pushed to the esi register; the second and third were pushed into the edx and ecx register; the fourth and fifth were pushed into register 8 and 9; and the following registers were pushed directly into the stack with pushq. This behaviour confirms what was described in the page I linked:

First six arguments are in rdi, rsi, rdx, rcx, r8d, r9d; remaining arguments are on the stack.

 68e:    48 83 ec 08             sub    $0x8,%rsp
 692:    6a 00                   pushq  $0x0
 694:    6a 09                   pushq  $0x9
 696:    6a 08                   pushq  $0x8
 698:    6a 07                   pushq  $0x7
 69a:    6a 06                   pushq  $0x6
 69c:    41 b9 05 00 00 00       mov    $0x5,%r9d
 6a2:    41 b8 04 00 00 00       mov    $0x4,%r8d
 6a8:    b9 03 00 00 00          mov    $0x3,%ecx
 6ad:    ba 02 00 00 00          mov    $0x2,%edx
 6b2:    be 01 00 00 00          mov    $0x1,%esi

After some research, I found out that because items are pushed into the stack, the rsp is changed in multiples of 8. Because we pushed 5 values, it was changed by 8.

Change 5

The changes in the disassembly are very simple. In the <main> section, instead of a call to the printf function, we have a call to the output function:

--- Original:

 68b:    48 89 e5                mov    %rsp,%rbp
  printf("Hello World! %d\n",
 68e:    be 01 00 00 00          mov    $0x1,%esi

--- Changed:

 6a3:    48 89 e5                mov    %rsp,%rbp
    output();
 6a6:    b8 00 00 00 00          mov    $0x0,%eax

Additionally, we have a subroutine called <output>:

000000000000068a <output>:
#include <stdio.h>

void output() {
 68a:    55                      push   %rbp
 68b:    48 89 e5                mov    %rsp,%rbp
    printf("Hello World!\n");
 68e:    48 8d 3d af 00 00 00    lea    0xaf(%rip),%rdi        # 744 <_IO_stdin_used+0x4>
 695:    b8 00 00 00 00          mov    $0x0,%eax
 69a:    e8 c1 fe ff ff          callq  560 <printf@plt>
}
 69f:    90                      nop
 6a0:    5d                      pop    %rbp
 6a1:    c3                      retq   

Change 6

The total size of the file is slightly larger, but according to the documentation, the O3 option is a tradeoff between file size and speed: we may end up with a file that is larger, but it will be faster.

One interesting change in the main function is this line:

 695:    b8 00 00 00 00          mov    $0x0,%eax

Being replaced with this line;

 58b:    31 c0                   xor    %eax,%eax

The XOR pattern is faster than MOV, and in these cases, they do the same thing:

The first case erases everything in register %eax (replaces it with 0) by moving the number 0 into it. The second case runs the value against itself on a XOR gate - this also results in the register being cleaned. This is why:

This is the output of a XOR gate:

A B Output
0 0 0
1 0 1
0 1 1
1 1 0

Basically: if the values are the same, the output bit is 0.

Suppose we have the value 01001 in the register. If we run this value against itself in a XOR gate, this is what we get:

01001
01001
-----
00000

This is very fast way to clear a register.