Open-sourcing our project

With the goals of our project almost reached, we decided to go one step forward: we will open-source our whole application (except for the proprietary details). In this post I will describe how we will do this and what we have done so far.

First, what is our project about? In short: our project is a more flexible and customizable version of Google Analytics. It records data from the users, and gives you tools to analyze it. This is what it is capable of recording, by default (among others):

  • When the user opens and closes the page
  • How far the user scrolled in the page
  • What links the user clicked
  • What texts the user selected
  • What texts and images the user copied
  • Which ads the user clicked
  • What forms the user started to fill and what forms they completed
  • What parts of the website the user shared and/or followed on social media
  • If the user commented or not
  • Miscellaneous information about the page the user visited (title, content, number of words, number of images, number of videos, number of paragraphs, etc)
  • User location (latitude, longitude, city, country...)
  • If the user had cookies enabled or not
  • The user's operating system, device, browser, browser engine, and cpu
  • Miscellaneous information about the user (in case they are logged in: name, email, etc)

This is the data that we record. Now, here is a problem: what do we do with all this information? Our tool also gives the website owner some ways to analyze it. Based on all that information, we can provide you answers to questions like:

  • How far are users scrolling on the articles of my website, for the author X?
  • How long do they stay on the articles before they close the page?
  • Is there a relation between the number of words/images and the time spent on the page?
  • What are the most successful articles, based on the number of shares and time spent on page?
  • How popular are the sections and authors in my website?
  • What is the optimal length for the articles?

We also provide a graphical representation of a timeline, in order to see all the actions a user did on the website, for example:

  • 12:00 Visited the home page
  • 12:03 Clicked the "Articles" link
  • 12:12 Scrolled to the middle of the page
  • 12:24 Scrolled to the bottom of the page
  • 12:29 Shared on facebook
  • 12:31 Closed the page

We provide these pre-made questions for the owner of the website, but you can also make your own using a graphical tool we made (we wanted to make it as user friendly as possible). I explained how we made it in my article Simulating Inner Joins on MongoDB. The results can be easily exported as a .csv file and used on Microsoft Excel to make graphs and pivot tables!

Ok, but here is another thing we can do with this tool: we can profile users based on this information gathered. For example, say we have an optional field for registered users, in which they can provide their job: we can observe the user's browsing pattern and try to find similar users to him - if the says he is a "designer", and read a lot of articles about design, then maybe people who read mostly articles about design are also designers.

With this possibility, we can now target content and ads for people who "are or look like engineers". I made a blog post (User Affinity Tool: grouping and finding patterns for users) on how we accomplished this too.

So what is going to be the name of our project? Well, what it does kind of looks like a census, right? When we gather data about people in an area in order to profile them. This is why we decided to name our project Rutilus, after Gaius Marcius Rutilus - he was the first plebeian censor of ancient Rome.

Right, but how do we open-source something like this? This is what we are doing in order to make this happen:

Separating into modules

Our first problem is that our application was very modular, but still too monolithic for what we intended: we want to offer people the full, pre-configured package, that is ready to go and easy to configure, but we also want people to be able to make their own parts if they wish to do so. So we want our application to be like Lego bricks: you can pick the parts that you want, make your own (if you really want to), or just get the whole pack.

So we decided to separate it into 4 modules that are completely independent:

  • Observer: this is the module that goes into the client browse. It gathers the information from the user and sends via HTTP/WebSockets to the Logger module.

  • Logger: this is an API, it connects to a MongoDB database and pushes the data in it.

  • Analytics: this is the interface that allows you to analyze the data that we gather: it gives you the affinity tool, used to profile users based on their browsing patterns, and the dashboard, used to query the database.

  • Heartbeat: this is a small helper module: it sends constant requests to both the Logger and Analytics modules and record their response time. If we are packing these modules in a container, it also gives you the option to throw an error and crash the container (making it restart) if one of the modules stop responding.

Making each module easy to launch and configure

Here is a question: can we make the modules so easy to start that it only takes one line of code? Yes, we can. And this is what we are doing. The only thing you will need in order to launch a module is to import and run it, passing the settings to it. For example:

require('rutilus-logger-node')({ port: 8080, ... });

This would launch the Logger module in the port 8080. There are more settings required, but they will all be easy to configure.

I think one of the biggest problems I had when I wanted to try a new tool was the configuration: they asked me for something I had no idea where to find, and I had to read two pages of cryptic documentation that would not help me at all. We will make sure this doesn't happen.

We will also give people options to record custom fields: do you want to record the user's age? All it takes it another line of configuration and you are all set.

Publishing the modules on NPM

The convenience of having a package manager is, without a doubt, something we must take advantage of. Instead of making people download our source codes in order to run their project, they can just run a command on npm in order to install it:

npm install --save rutilus-logger-node

And they can start using them.

This required a bit of research from us: we wanted to make sure our package.json files were properly configured in order to launch our modules on NPM.

Packing everything in another ready-to-go module

Having the option to pick and choose our own modules is nice, but how about people who don't know Node.js and, just want to get it done, and need a quick solution? We will also provide one solution for that.

We don't know the details yet because we haven't stated working on it, but this is what I have in mind:

1- Download a small .zip file with an NPM project 2- Change one or two configuration files according to our documentation (again, it must be very easy to do) 3- Run a few commands to deploy it on Amazon Web Services or Heroku 4- Include the Observer module in your website

And done. This is what we are aiming for, and I am sure we can accomplish it.