hcoelho.com

my blog

Title only : Full post

Organizing data flow with React + Redux

:

One part of our application had its frontend made with React, taking advantage of its reactivity to state changes, which is very helpful when you are building modern and responsive applications. However, we underestimated the complexity of this system, and maintaining it with React only became very complicated and tiresome; this is when we decided to adopt one more paradigm: Redux.

This will not be a tutorial, instead, I will only want to present a general idea of how all these tools work.

I will first make a quick introduction on how React works: the easiest way to understand it, for me, is by imagining it as a way to make custom HTML elements. For example, say you have the following pattern:

<div>
  <h1>Header</h1>
  <p>Body text goes here<p>
</div>

Wouldn't it be nice if instead of typing all these divs, h1s and ps, you were able to make a custom element with that format (maybe call it Section)? With React, it would be easy:

class Section extends React.Component {

  render() {
    return (
      <div>
         <h1>{this.props.title}</h1>
        <p>{this.props.children}<p>
      </div>
    );
  }

}

Props are parameters passed to the component (like an html attribute or children), they are recovered from the this.props object.

Now to render this element with React:

<Section title="Header">
    Body text goes here
</Section>

React also has the concept of State, which refers to the mutable state of a component. For example: a lightbulb of 60W would have "60W" as a prop, but whether it is on or off, it will depend on its state.

States are very easy to work with, we set the initial state in the constructor, and every time we need to modify it, we use the method this.setState to pass the new state. The component will update itself automatically.

class Lightbulb extends React.Component {

  constructor(props) {
    super(props);
    this.state = { isOn: false };
  }

  toggle = () => {
    this.setState({
      isOn: !this.state.isOn,
    });
  }

  render() {
    let message;
    if (this.state.isOn) {
      message = 'On!';
    }
    else {
      message = 'Off!';
    }

    return (
      <div>
        {message}
        <button onClick={this.toggle}>Click me!</button>
      </div>
    );
  }

}

But things start to get complicated when our application grows: sometimes we need to access the state of a component from another component, sometimes they need to be shared: for this, we have to remove the state from the component and pass it to its parent, so the component only receive its values as props.

The tendency, therefore, is that all the state will end up in the root component, and all the child components will only receive props: all the state lives in the root component, which are passed down the tree as props; similarly, whenever an event happen on the bottom of the tree, it will bubble up to the top.

This is when better paradigms start to appear: the most popular used to be Flux, and now, it is Redux.

Redux is more a paradigm than a library - you don't need to use the library, but they do provide you some boilerplate code. It also respects this tendency of all the state living in a single root, which is called the store: the store is an object that contains the state of the whole application; and this is an important detail: you do not modify the state that lives in the store, you create a new "version" of this state - the old states get archived - this makes logging and debugging extremely easy. When you use the store provided by the Redux library, it will take care of recording the old states for you.

Myself, I would abstract the data flow of React + Redux in 5 simple steps:

  1. A component triggers an action (example: a button is clicked)
  2. The action is sent to the reducer (example: turn on the light)
  3. The reducer creates a new version of the state, based on the action (example: { lightsOn: true })
  4. The store gets updated with the new state
  5. The component gets re-rendered based on the new state

#1 A component triggers an action

To make the component trigger an action, we simply pass the function (action) as a prop - the component will then call it whenever the right event happens:


// In the lines below, we are binding the state from the store,
// as well as a function that dispatches the action to toggle
// the lights on/off. The "dispatch" function is provided
// by the Redux library - we only need to make the "toggleLight"
// action ourselves

const mapStateAsProps = (state) => ({
  isOn: state.isOn,
});

const mapDispatchAsProps = (dispatch) => ({
  toggle: () => {
    dispatch(actions.toggleLight());
  }
});

// The 'connect' function is also provided by the Redux
// library, it binds the props and methods to the React
// component
const LightbulbElement = connect(
    mapStateAsProps,
    mapDispatchAsProps,
)(Lightbulb);


class Lightbulb extends React.Component {

  render() {
    let message;
    if (this.props.isOn) {
      message = 'On!';
    }
    else {
      message = 'Off!';
    }

    return (
      <div>
        {message}
        <button onClick={this.props.toggle}>Click me!</button>
      </div>
    );
  }

}

And to render this element:

<LightbulbElement />

#2 The action is sent to the reducer

An action is sent to all reducers automatically every time we use the dispatch method I described above. But what does that toggleLight look like? Like this:

function toggleLight() {
  return {
    type: 'TOGGLE_LIGHT,
  };
}

Actions usually return objects with 1 or 2 parameters: type and payload. The type parameter refers to what kind of action you are performing: every action should have a distinct type. The payload parameter contains additional information that you need to pass to the reducer.

#3 The reducer creates a new version of the state, based on the action

Reducers are responsible for replacing the current state of the application with a new one. For every attribute in the state (for example, say our state object contains the attributes "isOn" and "colour"), we should have a distinct reducer - this will ensure that one reducer will not modify an attribute that does not belong to it.

In our case, since we only have one attribute (isOn), we would create only one reducer; it would check the action type to make sure that piece of the state should be changed, if it should, it must create a new version of the state and return it:

// This function receives "state", which is the previous state in our store,
// and "action", which is the action dispatched
function isOnReducer(state = false, action) {
  switch(action.type) {

    case 'TOGGLE_LIGHT':
      return !state;
      break;

    default:
      return state;
      break;

  }
}

In other scenario, say we are receiving a payload and we are going to modify a state that is an object:

function myOtherReducer(state = { colour: 'black', opacity: 1.0 }, action) {
  switch(action.type) {

    case 'CHANGE_COLOUR':
      // Notice that I am using the spread operator (...) to create a new object
      // and recover the values of the previous state; then overriding the colour
      // with what I received from the payload
      return { ...state, colour: action.payload };
      break;

    case 'CHANGE_OPACITY':
      return { ...state, opacity: action.payload };
      break;

    default:
      return state;
      break;

  }
}

#4 The store gets updated with the new state

This part is done automatically by Redux, we only need to give it our reducer:

import { createStore } from 'redux';

import { isOn } from './reducers';

const store = createStore(isOn);

export default store;

#5 The component gets re-rendered based on the new state

This is also done automatically. Redux will detect if the parts of the store that a component uses changed - if it did, the component will get re-rendered.

cdot redux javascript react 

Prototyping a calculated field on MongoDB for quick access

:

The next phase of our project will be a content recommendation system for the users who visit our website: we will consider their past preferences (article category, for example) in order to recommend new content. This system needs to be fast and not use the database unnecessarily, since it will be used for every visit of every user. Considering that all the data we gather from our users are spread among several collections in our database, we cannot afford to make an expensive, slow operation with joins; we need a way to make this operation fast and cheap.

Calculated values are a great way to turn expensive and slow operations into very simple queries, however, they have a drawback: how to keep them synchronized? Our solution for this problem was using a collection that contains all the hits made by a user, which we called a "session" (a session contains many hits); every time the user makes a new hit, we use the information from this hit to improve the history we have in the session - it also ensures that the calculated fields will always be up to date.

For example, assuming this is our current history for the user:

Session
{
    history: {
        visitedIds: [1, 2, 3, 4, 5, 6, 7, 8],
        articlesVisited: 5,
        videosVisited: 3,
    }
}

The history says that we visited 5 articles and 3 videos; the IDs (of the articles and videos, assuming they are stored in the same collection) visited are 1, 2, 3, 4, 5, 6, 7, and 8.

If the user makes another hit in another article (say article #9), the history in the user's session would be changed to:

Session
{
    history: {
        visitedIds: [1, 2, 3, 4, 5, 6, 7, 8, 9],
        articlesVisited: 6,
        videosVisited: 3,
    }
}

Changes like these are very easy to do with MongoDB. For pushing the new ID in the array, we can simply use the $push (not unique values) or the $addToSet (unique values) operator:

db.sessions.update({
    _id: <session id>
}, {
    $addToSet: {
        "history.visitedIds": <article id>
        // In our case, the article id would be "9"
    }
});

Likewise, it is easy to increment values, like for articles visited using the $inc operator:

db.sessions.update({
    _id: <session id>
}, {
    $inc: {
        "history.<field to increment>": 1
        // In our case, the field to increment would be "articlesVisited"
    }
});

Joining them together:

db.sessions.update({
    _id: 
}, {
    $addToSet: {
        "history.visitedIds": <article id>
    },

    $inc: {
        "history.<field to increment>": 1
    }
});

This takes care of maintaining the calculated fields up to date with a simple operation.

Now we get to another detail: the calculated field we are keeping is not in the exact format we want it to have; for example: instead of just the raw numbers of visits a user made couldn't we have it in percentage? This would help us to group them in clusters, if we so desire; for example:

Session 1
{
    history: {
        visitedIds: [1, 2, 3, 4, 5, 6],
        articlesVisited: 4,
        videosVisited: 2,
    }
}

Session 2
{
    history: {
        visitedIds: [1, 2, 3],
        articlesVisited: 2,
        videosVisited: 1,
    }
}

Despite the user from Session 2 having less visits than the user from Session 1, their preferences are actually similar: they visited twice more articles than videos. we could abstract these preferences like this:

Session 1
{
    history: {
        visitedIds: [1, 2, 3, 4, 5, 6],
        articlesVisited: 4,
        videosVisited: 2,
    },
    affinity: {
        articles: 66.666,
        videos: 33.333,
    }
}

Session 2
{
    history: {
        visitedIds: [1, 2, 3],
        articlesVisited: 2,
        videosVisited: 1,
    },
    affinity: {
        articles: 66.666,
        videos: 33.333,
    }
}

This could be done after we pull the data, on the server, or directly in the database. If we do it in the database, we can use the aggregation framework on MongoDB to make this calculation:

First, we get the total number of visits. For this, we can use the $project operator to sum the number of visits on articles and videos:

db.test.aggregate([
    { $project: {

        _id: 1, // Keep the ID

        history: 1, // Keep the history

        // Creating the "totalVisits" field by adding the visits together
        totalVisits: { $add: [
            "$history.articlesVisited",
            "$history.videosVisited"
        ]}
    }}
])

This would be the result:

Session 1
{
    history: {
        visitedIds: [1, 2, 3, 4, 5, 6],
        articlesVisited: 4,
        videosVisited: 2,
    },
    totalVisits: 6,
}

Now that we have the total of visits, we can do some arithmetic ($multiply and $divide for multiplication and division) to find the percentage of the categories with another $project:

db.test.aggregate([
    { $project: {
        _id: 1,
        history: 1,
        totalVisits: { $add: [
            "$history.articlesVisited",
            "$history.videosVisited"
        ]}
    }},

    { $project: {

        _id: 1,

        history: 1,

        // We don't project the totalVisits here, if we want to omit it

        affinity: {
            articles: { $multiply: [
                { $divide: [
                    "$history.articlesVisited", "$totalVisits"
                ]},
                100
            ]},

            videos: { $multiply: [
                { $divide: [
                    "$history.videosVisited", "$totalVisits"
                ]},
                100
            ]}
        }
    }}
])

And this will be the result:

{
    history: {
        visitedIds: [1, 2, 3, 4, 5, 6],
        articlesVisited: 4,
        videosVisited: 2,
    },
    affinity: {
        articles: 66.333,
        videos: 33.333,
    }
}

In this example, the categories were "hard coded": we will have more than "articles" and "videos", but this example was only to show that what we are envisioning can be done: we only need a more elaborated schema and a more intelligent algorithm.

cdot mongo database 

Simulating Inner Joins on MongoDB

:

Probably one of the most important features in SQL, for more complicated queries, is the ability to join data from several tables and group them; for example: having a table of users and a table of messages, and then joining them both to get the users, as well as their messages. There are several types of joins in SQL:

Diagram of joins

Different types of joins

Now, for our project, we are using MongoDB, a NoSQL database - how do joins work, in this case? Say we have two collections on MongoDB that follow this schema:

Users: {
    _id: Number,
    name: String
}

Messages: {
    _id: Number,
    text: String,
    creator: Number // References _id in Users
}


And a sample of the data:

Users: [{
    _id: 100,
    name: "John",
}, {
    _id: 101,
    name: "Paul"
}]

Messages: [{
    _id: 200,
    text: "Hello, how are you?",
    creator: 101
}]

And now I want to get all the messages, as well as the creator's name. Is there an easy way to do this? There is, with Mongoose, we can build these relationships in the schema, and we can use the method populate to join the two pieces together:

Schema:
Users: {
    _id: { type: Number },
    name: { type: String }
}

Messages: {
    _id: { type: Number },
    text: { type: String },
    creator: { type: Number, ref: 'Users' } // References _id in Users
}


Joining:
db.Messages.findOne({})
           .populate('creator')
           .exec((err, docs) => {
               if (err) { throw err; }
               console.log(docs);
           });

This would give us an output similar to this:

[{
    _id: 200,
    text: "Hello, how are you?",
    creator: {
        _id: 101,
        name: "Paul"
    }
}]

Good enough, right? Ok. But the problem is that this is a full left join: if there was a message without a creator, it would still be selected. So, what if I want an inner join? Short answer: you can't. MongoDB does not support inner joins. This is fine for most scenarios: you can simply filter the data afterwards to get rid of the incomplete documents; but it starts to be a problem when you run in to memory issues, which was the problem we faced during the development of a module, and it would be a really big problem. Luckily, we have algorithms by our side!

In our case, execution time is not a big issue, we must do inner joins using many collections (often more than 5), and memory is a limiting factor, so we tried to get the best of this scenario. I designed a module that did the inner joins manually for us and saved as much memory as possible, this is how I did it:

1- The most specific queries with the most sparsely populated collections happen first: if you are looking for "all the users that use IE 6", it is a much better idea to "look for the ID of the IE6 browser in the database, and then fetch the users that have that ID in the entry" than "getting all the users, selecting all their browsers, and then getting only the ones that use IE6".

2- For every query done, we build up more and more conditions for the next query: if you want all the users that use IE6, as long as they live in Canada, you do the query to "find the ID of the IE6 browser, and then you find the addresses within canada, and then you query for the users - but only the ones that match the accepted addresses and browser", instead of simply getting all the users at the end and joining the information.

3- Leave extra information for the end: if you want all the users messages in addition to the users from the previous case, fist you should find all the users that match those conditions and then you find their messages, instead of scanning for all the messages and then joining them with the users that matched the conditions.

4- If a query returned too many results even with conditions, try again later: following the rule #2, it is likely that if you let other queries run, you will end up with more conditions to refine the search even more. For example: if your first search for browsers returned too many results, but the next search, the one for users only returned 1 result, your next query for the browsers will only need to find the browser for that particular user.

Following these 4 rules, I managed to come up with a module that make inner joins on MongoDB for our project: you pass a JSON object with the conditions you want, and it will do the queries for you and join them automatically. For example:

stores.Users.execute({
    Users: {
        name: { contains: 'John' }
    },
    Browsers: {
        name: { contains: 'IE6' }
    },
    Address: {
        country: { matches: 'CA' }
    }
});

The snippet above would select all, and only the users that have "John" in their names, live in Canada and use IE6.

I can't believe it actually works.

cdot mongo join database