hcoelho.com

my blog

Title only : Full post

Deployment of containers on AWS

:

We spent the past days reading about AWS in order to deploy the 2 containers we developed: one container only has a "Dockerfile" with our database (MongoDB) and the other container has the API to insert data into this database (Node.js). In this post, I'll describe some things I would like to have known before I started the deployment; it was a very frustrating process, but after you learn how everything works, it becomes really easy.

First of all, the deployment is not a linear process: you will have to know some details about your application before you start the process; these details, however, will not be obvious for you if you haven't used AWS before: this is one of the reasons why it was a slow, painful process for us.

Looking back, I think the first step to deploy these containers is to upload the repositories, even though they are not properly configured yet: you need the repositories there to have a better perspective on what to do. So, first step: push the docker images to EC2 Container Registry. The process is simple, it only takes 4 steps (3 steps, after the first push), which are just copying and pasting commands in the command line.

After the containers are uploaded, we should choose a machine that will run Docker with the containers, and here is the catch: we need to choose a machine that is already optimized to be used for Container Service, otherwise it will not be a valid Container Instance and you would have to configure it yourself. To find machines that are optimized for ECS, we search for "ecs" on the custom instances. After the machine was chosen, we select the other specifications we'll need, such as storage, IPs, and so on - but nothing too special here.

With the right machine instance, a default Cluster will be created in the Container Registry. Here is the interesting part: the cluster is a set of services, which are responsible for (re)starting a set of tasks, which are groups of docker containers to be used by the machine. Instead of starting from the service, we now should start from the task, adding its containers and work back to the service - then the deployment will be complete.

To create a task is simple: we give it a name and a list of the repositories (the ones that we uploaded in the beginning), but we also have to set how the containers are going to interact with each other and with the devices outside. There are two special settings we had to do:

1- The MongoDB container should be visible for the API. This can be done by linking them together: on the container for the API, we map the name of the database container to an alias (for instance: Mongo:MongoContainer); with this, the container of the API will receive some environment variables, such as MONGOCONTAINER_PORT, with the address and port of the other container. We can use this to make the API connect to the database (and the source code would probably have to be modified to use this port).

2- The MongoDB container should use an external drive for storage, otherwise, its data will be lost when the container is restarted. For this, we map the external directory (that we want the data to be stored into) to an internal directory, which is used by the database (for instance, /usr/mongodb:/mongo/db). Since we wanted to use an external device, we also had to make sure the device would be mounted when the machine was started.

After the task is set up, we make the service for the cluster: the service, in this case, contains the only task that we made. With the service properly configured, it will start and restart the tasks automatically: the deployment should now be ready.

It's easy to understand why we spent so much time trying to make it work (giving the amount of details and steps), but looking back, this modularity makes a lot of sense. The learning curve is very steep, but I am very impressed by how powerful this service is. I am very inclined to start using it for deploying my own projects.

cdot aws containers ec2 ecs 

JavaScript and Non-blocking functions

:

One of the most interesting features of JavaScript must be its event-driven and asynchronous nature: the operations can, but don't need to block the next operation from being executed before the current one is done. For instance, the following snipped follows a very logical sequence:

console.log(1);
console.log(2);
console.log(3);
console.log(4);

// The output is: 1 2 3 4

However, we can make that these functions are executed in a different sequence by setting timeouts for them:

setTimeout(() => console.log(1), 75);
setTimeout(() => console.log(2), 0);
setTimeout(() => console.log(3), 50);
setTimeout(() => console.log(4), 25);

// The output is: 2 4 3 1

Why is this useful? Well, if we have operations that are costly to perform, but don't have a high priority. Normally, they would block other operations, even though they are more important and don't need the former:

// Not very important operation
for (let i = 0; i < 1000000000; i++);
console.log('Not very important operation is done!');

// Very important operations
console.log('Super important operation');
console.log('This operation is also very important');

/* Output:
Not very important operation is done!
Super important operation
This operation is also very important
*/

In this case, we could simply move the costly and unimportant operation to the end of the file (since it is not used by anything else, after all), but real life is not that easy: although we should prioritise the interaction with the user while leaving costly and unimportant operations for last, the interactions with the user are not predictable: we cannot create a logical sequence that attends to all the cases. However, we can use the setTimeout function and set a 0 (zero) timeout for a procedure: the operation will be sent to the back of the queue of operations to perform. Like in this case:

// Not very important operation
setTimeout(() => {
    for (let i = 0; i < 1000000000; i++);
    console.log('Not very important operation is done!');
}, 0);

// Very important operations
console.log('Super important operation');
console.log('This operation is also very important');

/* Output:
Super important operation
This operation is also very important
Not very important operation is done!
*/

Having this in mind, I started experimenting on what is the best combination to create a script that does the most vital (and cheap) operations as soon as possible, but leaves the ones that would affect the user experience for last.

First, I made a simple webpage like this one (I replaced the angle brackets with square brackets because Wordpress was screwing up with my page. Just use your imagination):

page.html
<html>
<head>
<script src="1.js"></script>
</head>
<body>
<div id="overall">
  ...around 100,000 auto-generated HTML elements here...
</div>
<script src="2.js"></script>
</body>
</html>

1.js
alert('Script 1 ' + document.getElementById('overall').childNodes.length);
for (let a = 0; a < 1000000000; a++);
alert('Script 1 done');

2.js
alert('Script 2 ' + document.getElementById('overall').childNodes.length);
for (let b = 0; b < 1000000000; b++);
alert('Script 2 done');

(I will change the file 1.js during this post, but 2.js and page.html will stay the same)

The idea is simple: a very heavy webpage with a script in the header, and a script at the end of the DOM; these scripts are just alerts saying how many elements are there in the DOM. This was the order of what happened while loading the page:

1- Script 1, 0 (alert in a blank page) 2- A few seconds of a blank page 3- Script 1 done 4- Dom is loaded 5- Script 2, 100002 (alert in a fully-loaded page) 6- A few seconds of loading, but with the page fully functional 7- Script 2 done

* the first childNode.length actually gives an error because childNodes wasn't even defined yet, but the moral is: the DOM is not loaded

This is why it is recommended to put your script at the end of the page: it will not block your DOM from rendering. On top of that, if you are planning to do some DOM manipulation, you have to wait for it anyway, otherwise there won't be anything to manipulate (duh).

However, it has a drawback: your script will only be called after the DOM is already rendered. For our case, we want to know how much time it took for the DOM to load, so this is not an acceptable alternative. What we can do in this case is use an event to see when the page gets loaded, and then we execute the script:

1.js

alert('Doing some very fast and important work here...');
document.addEventListener("DOMContentLoaded", function () { 
  alert('Script 1 ' + document.getElementById('overall').childNodes.length); 
  for (let a = 0; a < 1000000000; a++); 
  alert('Script 1 done'); 
});

With this, the orders of executions become:

1- Doing some very fast and important work here 2- Dom is loaded 3- Script 2, 100002 (alert in a fully-loaded page) 4- A few seconds of loading, but with the page fully functional 5- Script 2 is done 6- Script 1, 100002 (alert in a fully-loaded page) 7- A few seconds of loading, but with the page fully functional 8- Script 1 done

Now another problem arrives: what if there are several costly, but less important functions inside that one? Say this is our 1.js now:

1.js

alert('Doing some very fast and important work here...');
document.addEventListener("DOMContentLoaded", function () {
  alert('Doing not very important operation...');
  for (let a = 0; a < 1000000000; a++);
  alert('Not very important operation done');

  alert('Super important operation');
  alert('This operation is also very important');
});

The order of operations would be:

1- Doing some very fast and important work here 2- Dom is loaded 3- Script 2, 100002 (alert in a fully-loaded page) 4- A few seconds of loading, but with the page fully functional 5- Script 2 done 6- Doing not very important operation... 7- Not very important operation done 8- Super important operation 9- This operation is also very important

Can we send the "Not very important operation" to the back of the queue again? Yes we can. By using the setTimeout function I described before:

1.js
alert('Doing some very fast and important work here...');
document.addEventListener("DOMContentLoaded", function () {
  setTimeout(() => {
    alert('Doing not very important operation...');
    for (let a = 0; a < 1000000000; a++);
    alert('Not very important operation done');
  }, 0);

  alert('Super important operation');
  alert('This operation is also very important');
});

This is the order of operations we would get:

1- Doing some very fast and important work here 2- Dom is loaded 3- Script 2, 100002 (alert in a fully-loaded page) 4- A few seconds of loading, but with the page fully functional 5- Script 2 done 6- Super important operation 7- This operation is also very important 8- Doing not very important operation... 9- Not very important operation done

By using some timeouts and some events, I'm confident we will be able to make a client module that will execute at the right time: without interfering in the user experience, but still doing the right operations in the right time.

cdot javascript async 

Approaches for database connection with Express.js

:

While experimenting with different databases and Node.js, I saw several different approaches for pooling connections from the database and ensuring they can handle a lot of traffic.

When using MySQL, for instance, the simplest way to make a connection would be like this:

const mysql = require('mysql');
const express = require('express');
const app = express();

const dbSettings = {
    host    : 'localhost',
    user    : 'root',
    password: 'password',
    database: 'myDatabase'
};

// Index route
app.get('/', (req, res) => {

    const connection = mysql.createConnection(dbSettings).connect();

    // Here we get all rows from a table, end the connection,
    // and respond
    connection.query('SELECT * FROM Table', (err, doc) => {
        connection.end();

        if (err) {
            res.json(err);
        }
        else {
            res.json(doc);
        }
    });

});

This is a very simple way to connect to a database: for every request that we get, we connect to the database, fetch the results, end the connection, and respond. But it has a drawback: it is slow and does not support several connections at the same time.

To solve this problem, we can use a pool of connections - a cache of database connections that can be reused and can handle several connections at the same time. This is how it would look like using a pool:

const mysql = require('mysql');
const express = require('express');
const app = express();

const dbSettings = {
    connectionLimit : 100,

    host    : 'localhost',
    user    : 'root',
    password: 'password',
    database: 'myDatabase'
};

const pool = mysql.createPool(dbSettings);

// Index route
app.get('/', (req, res) => {

    pool.getConnection((err, connection) => {
        if (err) {
            res.json(err);
            return;
        }

        // Here we get all rows from a table, end the connection,
        // and respond
        connection.query('SELECT * FROM Table', (err, doc) => {
            connection.release();

            if (err) {
                res.json(err);
            }
            else {
                res.json(doc);
            }
        });
    });
});

This is a much better way to handle connections in production.

For NoSQL, however, the pattern that I found was a bit different: instead of starting the server before anything, we first connect to the database, and then we start the server. The connections will be kept alive until the application is terminated:

const mongodb = require('mongodb').MongoClient;
const express = require('express');
const app = express();

// Index route
app.get('/', (req, res) => {

    // Here we just get any document from a collection and respond
    app.locals.db.collection('myCollection').findOne({}, (err, doc) => {
        if (err) {
            res.json(err);
        }
        else {
            res.json(doc);
        }
    });

});

// Connecting to MongoDB before starting the server
mongodb.connect('mongodb://localhost:27017/myDatabase', (err, db) => {

    // Aborting in case of error
    if (err) {
        console.log('Unable to connect to Mongo.');
        process.exit();
    }

    // Making the connection available to the application instance
    app.locals.db = db;

    // After the connection has been stablished, we listen for connections
    app.listen(3000, () => console.log('Listening on port 3000'));

});

cdot databases