SPO600 Project [Part 1] - Planning (Stage 1)

Date posted: 2017-12-13

As a requirement for my SPO600 course, I need to complete and blog about a project for identifying a checksum/hash algorithm in an open-source project and improving its performance. The full requirements can be found here. This post is the Stage 1 of my project, which is composed of the following steps:

1- Finding an open source package that has a checksum/hash algorithm, implemented in a language that gets compiled to machine code (C, C++, Assembly, etc).

2- Benchmark the performance on AArch64.

3- Identify the strategy to improve the algorithm, for example: altered build options, algorithm optimization, changes in the code to allow better optimization by the compiler, and in-line assembly.

Finding an open source package

One of our professor's recommendations was a hash called MurmurHash. I found the algorithm very interesting: it is simple, very easy to port, and due to its simplicity, improving it can be very challenging. I decided to follow the recommendation, and I found that Nginx uses this algorithm. I don't think it will be an easy task to make it more performant, but I am up for the challenge.

Benchmark the performance on AArch64

Testing the performance of this algorithm should be relatively straight forward: the function can easily be detached from the file, which will allow me to put a wrapper around it specifically made for testing its performance. A few important factors when testing the performance of this algorithm:

So, here is my extracted algorithm:

// murmur_original.c

// The include below is just for me to define a few
// constants, if I ever need it
#include "toolbox.h"

uint32_t
original_hash(u_char *data, size_t len)
{
    uint32_t  h, k;

    h = 0 ^ len;

    while (len >= 4) {
        k  = data[0];
        k |= data[1] << 8;
        k |= data[2] << 16;
        k |= data[3] << 24;

        k *= 0x5bd1e995;
        k ^= k >> 24;
        k *= 0x5bd1e995;

        h *= 0x5bd1e995;
        h ^= k;

        data += 4;
        len -= 4;
    }

    switch (len) {
    case 3:
        h ^= data[2] << 16;
        /* fall through */
    case 2:
        h ^= data[1] << 8;
        /* fall through */
    case 1:
        h ^= data[0];
        h *= 0x5bd1e995;
    }

    h ^= h >> 13;
    h *= 0x5bd1e995;
    h ^= h >> 15;

    return h;
}

I duplicated this file and called it murmur_modified.c (changing the name of the function as well). I will use the other file to modify the function, so I can compare them side by side.

All I need to do is make a wrapper and call the function. This is the wrapper I made:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/time.h>
#include <time.h>
#include "murmur_original.c"
#include "murmur_modified.c"

#define DATA_LENGTH 500000002

int main()
{
  uint32_t        result_original;
  uint32_t        result_modified;
  u_char          tmp;
  u_char         *data_o;
  int             i;
  int             are_identical;
  struct timeval  timer_ori_before,
                  timer_ori_after,
                  timer_ori_result,
                  timer_mod_before,
                  timer_mod_after,
                  timer_mod_result;

  srand(time(NULL));

  // Creating random characters
  data_o = calloc(DATA_LENGTH, sizeof(u_char));
  for (i = 0; i < DATA_LENGTH; i++) {
    tmp = (u_char)((rand() % 26) + 'a');
    data_o[i] = tmp;
  }

  gettimeofday(&timer_ori_before, NULL);
  result_original = original_hash(data_o, sizeof(u_char) * DATA_LENGTH);
  gettimeofday(&timer_ori_after, NULL);

  gettimeofday(&timer_mod_before, NULL);
  result_modified = modified_hash(data_o, sizeof(u_char) * DATA_LENGTH);
  gettimeofday(&timer_mod_after, NULL);

  timersub(&timer_ori_after, &timer_ori_before, &timer_ori_result);
  timersub(&timer_mod_after, &timer_mod_before, &timer_mod_result);

  are_identical = result_original == result_modified;

  printf("Generated %d characters for hash functions
", DATA_LENGTH);
  if (are_identical)
    printf("SUCCESS! - All hashes are identical for both functions
");
  else
    printf("!!! ERROR !!! The hashes for both functions do not match
");

  printf("Time for original: %ld.%06ld
",
    (long int)timer_ori_result.tv_sec,
    (long int)timer_ori_result.tv_usec);

  printf("Time for modified: %ld.%06ld
",
    (long int)timer_mod_result.tv_sec,
    (long int)timer_mod_result.tv_usec);

  free(data_o);

  return 0;
}

Still not perfect, but a good start. This wrapper:

1- Creates a long array of random characters

2- Starts the timer for the original algorithm

3- Sends the array to the first hash function and records the result

4- Stops the timer for the original algorithm

5- Starts the timer for the modified algorithm

6- Sends the array to the second hash function and records the result

7- Stops the timer for the modified algorithm

8- Compares both results. If they are different, my modified hash function is broken and the test is invalid. Otherwise, they comparable

9- Print the time for the original and the modified function

Here are the results for the first test run (the "modified" version is still the original algorithm):

Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330580
Time for modified: 0.330509
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.333051
Time for modified: 0.332968
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330849
Time for modified: 0.330729
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330410
Time for modified: 0.330330
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330744
Time for modified: 0.330551
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330604
Time for modified: 0.330511
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330427
Time for modified: 0.330301
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330391
Time for modified: 0.330341
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330375
Time for modified: 0.330274
Generated 500000002 characters for hash functions
SUCCESS! - All hashes are identical for both functions
Time for original: 0.330385
Time for modified: 0.330292

Not bad. The results seem to be very uniform.

Identifying the strategies

First, about altering build options: Nginx has a very complete build configuration, and I don't think I am comfortable with messing with their building options. However, I do know that some compiler flags are available for AArch64 that may offer some hints to the compiler for optimizations. When I investigated the file /proc/cpuinfo for the machine, I saw the following line:

Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

Some AArch64 processors offer features to speed up hashing algorithms. This one in particular seem to have these features for algorithms such as SHA-1 and SHA-2, but not for Murmur. I am not particularly optimistic about compiler flags, but it seems like SIMD may be an option for optimization. I will try a few options.

Second, about algorithm optimization: the code is very straight-forward and there is not much room for improvement there. It is already using bitwise operations, and it has a very simple structure. I think it is safe to say that I will not find anything to improve in terms of algorithm performance.

Third, about changing the code to allow optimizations by the compiler: I see some room for improvement here. Maybe I can make some changes in the code so the compiler will vectorize (SIMD) a few operations. It may give me some milliseconds.

Lastly, inline-assembly: probably not a tool that should be used very often. I find inline-assembly really neat, but it has a fatal flaw: it kills portability. If I write inline-assembly for AArch64, it will not run on X86-64 at all. It is something I should avoid, but it may be useful if I really can't find anything else to improve and I still want better performance.

Other to-dos