Administrate’s users, the Administrators usually manage a large number of learners which all belong to a specific Organization. The Organizations feature a hierarchy to represent subsidiary Organizations. This relationship is essentially a tree, which helps us represent real world organizations that are usually made up of various teams.
The Problem
In technical terms, there are various ways of representing the above problem.
Administrate uses Amazon Aurora under the hood, so it was representing the above relationship using the Adjacency List pattern. This means that finding the hierarchy involved lots of recursive queries to traverse the tree.
Amazon Aurora is MySQL compatible, which is not ideal for storing hierarchical relationships at the current version, so the above feature was being inefficient for large amounts of data. Also, the need for finding all the Contacts that are part of each Organization arose, which meant that we would need to query all Contacts for every Organization. That would mean that our existing implementation was not ideal to tackle this problem.
Enter the Nested Set Model
As described before, our Organization hierarchy is a non-binary tree which consists of a root node, child nodes and leaves. However, there is another way to think of a tree which fits our needs perfectly. By thinking of the tree as a set where each child node is a subset of the set, we can model the exact same relationship.
Before doing that though, we need to number all of the nodes in the tree. This can be done by traversing the tree and visiting each node twice; once for the “left” and another for the “right” value. The values would be set by increments of one. Starting numbering at 1 for the root node’s “left” value and continuing depth-first to increment each child’s node left by the parent’s left plus one until we get to a leaf, where we would start incrementing the “right” value and following our way back up, setting the “right” for every node until we reach our root node. Following that numbering, we could see that the difference between the “right” and “left” values is the width of the subtree of that node!
Now, thinking of the root node as a set on a numbered horizontal axis, where its “left” and “right” values correspond to the numbers on the axis, we can establish a relationship between those sets. We can follow with each subsequent child node by nesting the set inside our big root set. This leaves us with a set, with individual sets being nested within it, where every set has its boundaries set by the “left” and “right” values on the horizontal axis.
Implementation
Following the explanation above, we can persist this relationship by simply adding two additional columns on our table that correspond to the set’s boundaries. This makes SQL queries extremely easy. For example, to get all the children of a node, we can self join on our table, and then select the nodes where the children’s “left” is between the parent’s “left” and “right”. This provides really fast reads, as instead of recursively querying for data, we are just specifying a condition on two columns.
The challenging part, are the writes as when modifying our hierarchy, we need to re-calculate the entire set! Hopefully, in our case the read frequency is greater than the write so this solution was perfect for our use case. To add a node in the tree, we need to make space for it, which means that we need to expand our set and recalculate the “left” and “right” values to accommodate a new node.
Because Administrate is powered by various apps built at different times and varied technologies, we decided that the best solution to do this was at the root of our data, our MySQL database, using triggers.
The most important, but also easiest trigger is when we are inserting a new row. It’s important because we need to make space in our tree, but it’s also easy because a new row will always be a leaf, so the width of the tree will be incremented by one.
The challenge comes from updating or removing a node as that means that the set needs to be recalculated. For example, moving a node from a different tree to a new one, means that we need to clean up the old tree but also make space in the new one to accommodate the new node, which might also have a subtree attached to it.
Conclusion
We hope that this blog post has helped you identify a different way of representing a hierarchical relationship using MySQL. As everything, they all come with their own strengths and weaknesses so it’s important to consider the right solution for every problem.