Henry S. Coelho

SPO600 Lab 2 - Comparing different compilation options

This lab will compare the code generated by the GCC compiler with different sets of parameters for the compiler, as well as small different changes in the source code.

The requirements for this lab, as well as the differences in the files/compilations can be found here

Change 1

File size change: The file size went from 25kb to 10Mb

Change in headers: By running the command "file" against the generated files, we see that the first one is "Dynamically linked", while the second one is "Statically linked"

Function call: the <main> function in the second case calls a subroutine called _IO_printf, which is located inside the file. In the original file, the function called is printf@plt, which jumps to a subroutine called printf@GLIBC_2.2.5 and is not inside the file.

Change 2

Without the -fno-builtin parameter, the function called was not "printf", but "puts".

Change 3

Without the -g compiler option, the file generated does not have the flag "with debug_info" in the headers. It is also slight smaller in size.

There is a noticeable difference in the disassembly output: in the original file, we can see the lines of the main program, as well as the assembly code.

000000000000068a <main>:
#include <stdio.h>

int main() {
 68a:    55                      push   %rbp
 68b:    48 89 e5                mov    %rsp,%rbp
    printf("Hello World!\n");
 68e:    48 8d 3d 9f 00 00 00    lea    0x9f(%rip),%rdi        # 734 <_IO_stdin_used+0x4>
 695:    b8 00 00 00 00          mov    $0x0,%eax
 69a:    e8 c1 fe ff ff          callq  560 <printf@plt>
 69f:    b8 00 00 00 00          mov    $0x0,%eax
}
 6a4:    5d                      pop    %rbp
 6a5:    c3                      retq   
 6a6:    66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 6ad:    00 00 00 

While the other file only has the assembly code:

000000000000068a <main>:
 68a:    55                      push   %rbp
 68b:    48 89 e5                mov    %rsp,%rbp
 68e:    48 8d 3d 9f 00 00 00    lea    0x9f(%rip),%rdi        # 734 <_IO_stdin_used+0x4>
 695:    b8 00 00 00 00          mov    $0x0,%eax
 69a:    e8 c1 fe ff ff          callq  560 <printf@plt>
 69f:    b8 00 00 00 00          mov    $0x0,%eax
 6a4:    5d                      pop    %rbp
 6a5:    c3                      retq   
 6a6:    66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 6ad:    00 00 00 

Change 4

My CPU is a x86_64, so you can take a look at the registers for my machine in this link.

The first argument was pushed to the esi register; the second and third were pushed into the edx and ecx register; the fourth and fifth were pushed into register 8 and 9; and the following registers were pushed directly into the stack with pushq. This behaviour confirms what was described in the page I linked:

First six arguments are in rdi, rsi, rdx, rcx, r8d, r9d; remaining arguments are on the stack.

 68e:    48 83 ec 08             sub    $0x8,%rsp
 692:    6a 00                   pushq  $0x0
 694:    6a 09                   pushq  $0x9
 696:    6a 08                   pushq  $0x8
 698:    6a 07                   pushq  $0x7
 69a:    6a 06                   pushq  $0x6
 69c:    41 b9 05 00 00 00       mov    $0x5,%r9d
 6a2:    41 b8 04 00 00 00       mov    $0x4,%r8d
 6a8:    b9 03 00 00 00          mov    $0x3,%ecx
 6ad:    ba 02 00 00 00          mov    $0x2,%edx
 6b2:    be 01 00 00 00          mov    $0x1,%esi

After some research, I found out that because items are pushed into the stack, the rsp is changed in multiples of 8. Because we pushed 5 values, it was changed by 8.

Change 5

The changes in the disassembly are very simple. In the <main> section, instead of a call to the printf function, we have a call to the output function:

--- Original:

 68b:    48 89 e5                mov    %rsp,%rbp
  printf("Hello World! %d\n",
 68e:    be 01 00 00 00          mov    $0x1,%esi

--- Changed:

 6a3:    48 89 e5                mov    %rsp,%rbp
    output();
 6a6:    b8 00 00 00 00          mov    $0x0,%eax

Additionally, we have a subroutine called <output>:

000000000000068a <output>:
#include <stdio.h>

void output() {
 68a:    55                      push   %rbp
 68b:    48 89 e5                mov    %rsp,%rbp
    printf("Hello World!\n");
 68e:    48 8d 3d af 00 00 00    lea    0xaf(%rip),%rdi        # 744 <_IO_stdin_used+0x4>
 695:    b8 00 00 00 00          mov    $0x0,%eax
 69a:    e8 c1 fe ff ff          callq  560 <printf@plt>
}
 69f:    90                      nop
 6a0:    5d                      pop    %rbp
 6a1:    c3                      retq   

Change 6

The total size of the file is slightly larger, but according to the documentation, the O3 option is a tradeoff between file size and speed: we may end up with a file that is larger, but it will be faster.

One interesting change in the main function is this line:

 695:    b8 00 00 00 00          mov    $0x0,%eax

Being replaced with this line;

 58b:    31 c0                   xor    %eax,%eax

The XOR pattern is faster than MOV, and in these cases, they do the same thing:

The first case erases everything in register %eax (replaces it with 0) by moving the number 0 into it. The second case runs the value against itself on a XOR gate - this also results in the register being cleaned. This is why:

This is the output of a XOR gate:

A B Output
0 0 0
1 0 1
0 1 1
1 1 0

Basically: if the values are the same, the output bit is 0.

Suppose we have the value 01001 in the register. If we run this value against itself in a XOR gate, this is what we get:

01001
01001
-----
00000

This is very fast way to clear a register.