hcoelho.com

my blog

Title only : Full post

Fixing memory problems with Node.js and Mongo DB

:

Now that the basic functionality of Rutilus is done, I spent some time improving the memory limitations that we faced. In this post I will list the problems we faced and how I solved them.

Observation: We were using Mongoose for these queries, and not the native Node.js driver.

1- Steps in the aggregation pipeline taking too much memory

From the MongoDB manual:

"Aggregations are operations that process data records and return computed results. MongoDB provides a rich set of aggregation operations that examine and perform calculations on the data sets. Running data aggregation on the MongoDB instance simplifies application code and limits resource requirements."

So, obviously, a pipeline such the one below would need to have memory available to perform all those stages:

ZipCodes
  .aggregate([
    { $group: {
      _id: { state: "$state", city: "$city" },
      pop: { $sum:  "$pop" }
    }},
    { $sort: { pop: 1 }},
    { $group: {
      _id : "$_id.state",
      biggestCity:  { $last:  "$_id.city" },
      biggestPop:   { $last:  "$pop"      },
      smallestCity: { $first: "$_id.city" },
      smallestPop:  { $first: "$pop"      }
    }},
    { $project: {
      _id: 0,
      state: "$_id",
      biggestCity:  { name: "$biggestCity",  pop: "$biggestPop"  },
      smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
    }}
  ])
  .exec((err, docs) => {
    ...
  });

The problem we were having in this case was: we did not have enough memory to perform the stages, even though we did have enough memory for the output. In other words: the output was small and concise, but we needed a lot of memory to do it.

The solution for this was easy: we can simply tell Mongo to use disk space temporarily to store the data. It probably is slower, but it is better than not being able to run the query at all. To do this, we just needed to add an extra step (allowDiskUse) to that method chain:

ZipCodes
  .aggregate([
    ...
  ])
  .allowDiskUse(true) // < Allows MongoDB to use the disk temporarily
  .exec((err, docs) => {
    ...
  });

2- Result from aggregation pipeline exceeding maximum document size

For queries with a huge number of results, the aggregation pipeline would greet us with the lovely error "exceeds maximum document size problem". This is because the result of an aggregation pipeline is returned in a single BSON document, which has a size limit of 16Mb.

There are two ways to solve this problem:

1- Piping the results to another collection and querying it later

2- Getting a cursor to the first document and iterating through it

I picked the second method, and this is how I used it:

const cursor = ZipCodes
  .aggregate([
    ...
  ])
  .allowDiskUse(true)
  .cursor({ batchSize: 1000 }) // < Important
  .exec(); // < Returns a cursor

// The method .toArray of a cursor iterates through all documents
// and load them into an array in memory
cursor.toArray((err, docs) => {
  ...
});

The batchSize refers to how many documents we want returned in every batch, but according to the MongoDB documentation, this will not affect the use of the application because most results are returned in a single batch.

3- JavaScript Heap out of memory

After getting those beautiful millions of rows from the aggregation pipeline, we were greeted by another loverly error: "FATAL ERROR: CALLANDRETRY_LAST Allocation failed - JavaScript heap out of memory". This happens when the Node.js Heap runs out of memory (as you probably inferred from the description of the error).

According to some sources on the internet, the default memory limit for Node.js on 32-bit systems is 512Mb, and 1Gb for 64-bit systems. We can increase this memory limit when we are launching the node.js application with the option --max_old_space_size and specifying how much memory we want in Mb. For example:

node --max_old_space_size=8192 app.js

This will launch the app.js application with 8Gb of ram instead of 1Gb.

cdot mongo node memory limit 

Benchmark: NodeJS x Perfect (Swift)

:

Yesterday (October 27th, 2016) I went to a presentation called "How to Completely Fail at Open-Sourcing", presented by Sean Stephens, at FSOSS - Free Software and Open Source Symposium hosted at Seneca College. Sean Stephens is the CEO of PerfectlySoft Inc, the company that developed Perfect, a library for server-side Swift development. This immediately caught my attention: I've been thinking about server-side development in Swift for a while, and it seems that it finally happened.

During the presentation, Sean showed us some benchmarks where Swift (using the Perfect framework) beat NodeJS in several fronts. You can see more details in this post. Since I recently benchmarked PHP x NodeJS (around a month ago), I decided to use a similar scenario and test Perfect x NodeJS. This is how I set it up:

I wanted 2 servers: one with Perfect, and the other one with pure NodeJS. For every request, they would go to MongoDB, fetch all results, append some text to the response, and send it back. I used siege as the stress tester, in order to simulate concurrent connections.

I set up a virtual machine with 1 processor core, 512Mb of RAM and 20Gb of storage; the machine was running Debian Jessie. In this machine, I installed Docker and made 3 images:

1st image: MongoDB

Dockerfile
------------------
FROM ubuntu:16.04
MAINTAINER Docker
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
RUN echo "deb http://repo.mongodb.org/apt/ubuntu $(cat /etc/lsb-release | grep DISTRIB_CODENAME | cut -d= -f2)/mongodb-org/3.2 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-3.2.list
RUN apt-get update && apt-get install -y mongodb-org
VOLUME ["data/db"]
WORKDIR "data/"
EXPOSE 27017
ENTRYPOINT ["/usr/bin/mongod"]

2nd image: NodeJS

Dockerfile
------------------
FROM node:wheezy
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY package.json /usr/src/app/
RUN npm install
COPY . /usr/src/app
EXPOSE 8000
CMD [ "node", "index.js" ]


index.js
------------------
var http = require('http');
var mongodb = require('mongodb');

mongodb.connect('mongodb://10.46.52.207:27017/test', function (err, db) {
  if (err) { console.log(err); }
  http.createServer(function (req, res) {
    var s = "";
    for (var i = 1; i <= 1000; i++) {
      s += '' + i;
    }
    db.collection("test").find({}).toArray(function(err, doc) {
      res.end("Hello world" + JSON.stringify(doc) + s);
    });
  }).listen(8000);

});


package.json
------------------
{
  "name": "node",
  "version": "1.0.0",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "mongodb": "^2.2.10"
  }
}

Perfect (Swift)

Dockerfile
------------------

# Copyright (C) 2016 PerfectlySoft Inc.
# Author: Shao Miller 

FROM perfectlysoft/ubuntu1510
RUN /usr/src/Perfect-Ubuntu/install_swift.sh --sure
RUN apt-get install libtool -y
RUN apt-get install dh-autoreconf -y
RUN git clone https://github.com/mongodb/mongo-c-driver.git
WORKDIR ./mongo-c-driver
RUN ./autogen.sh --with-libbson=bundled
RUN make
RUN make install
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY . /usr/src/app
RUN swift build
CMD .build/debug/App --port 8000


Package.swift
------------------
import PackageDescription
let package = Package(
  name: "App",
  targets: [],
  dependencies: [
    .Package(
      url: "https://github.com/PerfectlySoft/Perfect-HTTPServer.git",
      majorVersion: 2,
      minor: 0),
    .Package(
      url:"https://github.com/PerfectlySoft/Perfect-MongoDB.git", 
      majorVersion: 2,
      minor: 0)
  ]
)


Sources/main.swift
------------------
import PerfectLib

import PerfectHTTP
import PerfectHTTPServer
import MongoDB

let server = HTTPServer()

var routes = Routes()
routes.add(method: .get, uri: "/", handler: {
  (request, response)->() in
    response.setHeader(.contentType, value: "text/html")

    let client = try! MongoClient(uri: "mongodb://10.46.52.207:27017")
    let db = client.getDatabase(name: "test")
    guard let collection = db.getCollection(name: "test") else { return }

    let fnd = collection.find(query: BSON())

    var arr = [String]()
    for x in fnd! {
      arr.append(x.asString)
    }

    defer {
      collection.close()
      db.close()
      client.close()
    }

    var s = ""

    for x in 1...1000 {
      s += String(x)
    }

    response.appendBody(string: "Hello world {\(arr.joined(separator: ","))}\(s)")
    response.completed()
  }
)

server.addRoutes(routes)

server.serverPort = 8000

do {
  try server.start()
} catch PerfectError.networkError(let err, let msg) {
  print("Network error thrown: \(err) \(msg)")
}

By the way, I'm sorry if my Swift code is not Swifty enough - I am just a JavaScript peasant. But anyway, these are the results I got:

500 users 1,000 users 1,500 users
NodeJS Perfect NodeJS Perfect NodeJS Perfect
Number of hits 1284 1273 2293 2284 3641 3556
Availability (%) 100 100 100 100 100 100
Data transferred (Mb) 4.08 4.26 7.28 7.64 11.56 11.9
Reponse time (s) 0.04 0.07 0.41 0.44 0.41 0.12
Transaction rate (/s) 84.89 86.25 161.37 161.19 250.76 250.78
Concurrency 3.85 5.84 65.67 71.08 102.82 30.17
Shortest transaction (s) 0 0 0 0 0 0
Longest transaction 0.22 0.27 7.12 7.16 7.13 0.36

The results were remarkably similar, I actually double checked to make sure I wasn't making requests to the same container. There are some discrepancies, but I would attribute them to statistical error.

Giving that we chose NodeJS for our project because of its resiliency, I think it is safe to say that Perfect is also a very good choice for APIs that are constantly under heavy load.

cdot node perfect swift 

Benchmarking PHP5 x Node.js

:

Long story short: one thing we did today was thinking what would be best language/framework to build an API: it should be stable under heavy load, fast, and capable of cpu-intensive operations; we ended up with 2 alternatives: PHP5 and Node.js and decided to do a little benchmarking to find out which one would be the best.

For the first test, we set up a server with virtual machines of Apache + PHP5 and another with Express + Node.js and used Siege, a stress tester, to benchmark both servers. Siege creates several connections and produces some statistics, such as number of hits, Mb transferred, transaction rate, etc. For both servers, we used 4 combinations of settings:

  1. 1 core and 1,000 concurrent users
  2. 4 cores and 1,500 concurrent users
  3. 1 core and 1,500 concurrent users
  4. 4 cores and 1,500 concurrent users

The tests consisted in a very simple task: receive the request of the user, perform a SELECT query in a database, and return the raw results back - we tried to keep the tests as similar as possible. The database used was PostgreSQL, located in another virtual machine.

These are the source codes we used for the tests:

JavaScript

var express = require('express');
var pg = require('pg');

var config = {
  user: 'postgres',
  database: '...',
  password: '...',
  host: '...',
  max: 10,
  idleTimeoutMillis: 30000
};

var app = express();
var pool = new pg.Pool(config);

var query = 'SELECT * FROM testtable;';

function siege(req, res, next) {
    pool.connect(function (err, client, done) {
        if (err) throw err;

        client.query(query, function (err, result) {
            done();
            if (err) throw err;
            res.json(result.rows);
        });
    });
}

app.get('/siege', siege);

app.listen(3000, function () {
  console.log('Example app listening on port 3000!');
});

PHP

$connection = pg_connect("host=... dbname=... user=... password=...");
$result = pg_query($connection, "SELECT * FROM testtable");
echo $result;
pg_close($connection);

These are the results:

Result 1 core
1,000 users 1,500 users
Node.js PHP Node.js PHP*
Number of hits 39,000 4,300 2,000 -
Availability (%) 100 95 66 -
Mb. transferred 11 0.06 0.56 -
Transaction rate (t/s) 1,300 148 800 -
Concurrency 655 355 570 -
Longest transfer (s) 0.96 28.14 1.16 -
Shortest transfer (s) 0.08 0.15 0.11 -
Result 4 cores
1,000 users 1,500 users
Node.js PHP Node.js PHP*
Number of hits 55,000 5,100 14,000 -
Availability (%) 100 98 93 -
Mb. transferred 16.02 0.07 4 -
Transaction rate (t/s) 1,800 170 1,700 -
Concurrency 19.6 424 73 -
Longest transfer (s) 0.4 28.16 1 -
Shortest transfer (s) 0 0 0 -
  • Aborted (too many errors)

I really was expecting the opposite result, Node.js seems to be incredibly fast in comparison to PHP for these operations.

For the next test, we tried to focus on cpu-intensive operations by running the following algorithm that searches for the first N prime numbers (yes, they could be optimized, but the purpose of the test was to make them cpu-intensive):

JavaScript

var express = require('express');
var app = express();

app.get('/', function (req, res) {
    function isPrime(num) {
        for (var i = 2; i < num; i++) {
            if (num % i === 0) { return false; }
        }
        return true;
    }

    function display(n) {
        var count = 0;
        for (var i = 3; i < n; i += 2) {
            if (isPrime(i)) { count++; }
        }
        console.log(count);
    }
    display(70000);
    res.json({});
});

app.listen(3000, function () {
  console.log('Example app listening on port 3000!');
});

PHP

function isPrime($num) {
    for ($i = 2; $i < $num; $i++) {
        if ($num % $i === 0) { return false; }
    }
    return true;
}

function display($n) {
    $count = 0;
    for ($i = 3; $i < $n; $i += 2) {
        if (isPrime($i)) { $count++; }
    }
    echo $count;
}

display(70000);

My expectations were that PHP would perform much better for this kind of tasks. These were the results:

Result 70,000 numbers 100,000 numbers
Node.js PHP Node.js PHP
Seconds 2 26 2.5 Timed-out after ~33 seconds

I don't know what to think anymore. I guess we are not using PHP.

cdot node php