Improving Software Development with Learnings from LSMs

Most of the developers out there must have heard the likes of Cassandra and RocksDB. These are very popular pieces of technologies which have seen a huge uptick in popularity in recent years due to their unique ability to handle large amounts of data. One of the primary reason for their success could easily be be attributed to LSMs and its core competency of handling large write volumes. LSMs is an acronym for Log Structure Merge Trees, so if you have not heard about it , you should definitely go and check out this article. Having said that, in this blog post, we will try to take some learnings from LSMs and see how we can use those learnings to improve the feature development throughput.

Log Structure Merge Trees

With the advent of large amounts of data, there has been a huge demand for databases which can handle large write throughput. Traditional databases like MySQL or PostgresSQL were build from the ground up under the assumption of providing best read performance for different segments of queries on a given amount of data. LSMs on the other hand were build from the ground up improving the write throughput.

In plain and simple words , LSMs comprises of following concepts

  • Sequential Writes
  • Background Reconciliation of Batches
  • Immutable Data Structures

 

LSM_Tree

 

So as the data comes pouring in , LSMs buffer that data in memory and then write the buffered batches to the disk sequentially. These buffered batches once flushed are immutable and hence they cannot be updated. Now because of this immutable on disk entities any new value of a key goes to the in-memory batch and hence at the time of reading, all the data is read back again from disk and reconciled before presenting the final result to the user. There are some optimisations in LSMs which does this reconciliation of the on disk batches at regular intervals to improve the read throughput.

RUM Conjecture

One of the important learnings from LSMs is that it relies heavily on sequential writes even in case of updates to improve the write throughput.  On the contrary, MySQL or PostgresSQL do not do sequential writes. Write Pattern in MySQL or PostgreSQL databases depend on a whole variety of factors ranging from the update frequency of a key to indexes present in the tables. MySQL or PostgresSQL index structures are simply build to optimise the read latencies in lieu of write latencies.

So seems like there is a tradeoff between read latencies ( MySQL , PostgreSQL ) and write latencies ( Cassandra, RockDB ). RUM Conjecture simplifies and explains these above mentioned tradeoffs which come into light while designing the databases systems.

RUM-conjecture-black
Enter a caption

 

As mentioned in the above diagram, it seems to be clear that given a limited amount of memory, a database either optimises for read performance or it optimises for write performance.

Applying the learnings from RUM Conjecture to Software Development

If I have to describe more generally then the RUM conjecture simply states:

  • You can either optimise for high write throughput
  • You can either optimise for high read throughput

Now I will take above two statements from the RUM conjecture and apply this to a software developer. Software developer can either:

  • Optimise for high code / feature throughput.
    • This essentially translates to how the developer can get more and more features in the product. These developed features in turn enrich the product and makes it sell-worthy among its peers.
  • Optimise for high code read throughput
    • This translates to how can fast can a group of developers write code which is easily readable and understandable in terms of design decisions and code flow.

Given the highly competitive times we are living in, time to market of a feature is one of the crucial aspects which decides the fate of many startups. Now-a-days startups need to have very low time to market if they want to survive. So given these predicaments, a startup normally choses the route for optimising for FEATURE throughput. Whereas big companies who already have a large market share do not focus as much on FEATURE throughput as they focus on CODE READ throughput.

Startups opting for high write / feature throughput should learn some tricks from the likes of LSMs to make sure that the FEATURE CODE is always well maintained and designed. As already mentioned in the above section, for LSM to maintain high write throughput and good read throughput it always writes sequentially in batches and does sequential reconciliation of these batches in the background. Most importantly this reconciliation does not come in the critical path for writes. So startups aiming for high feature throughput should use the same model for feature development i.e.

  • Develop feature as fast as possible without minimal overhead in terms of design. During the development of feature and new product requirements there will be many a times when one will have to decide between to rewrite and refactor a lot of code or extend and try to fit the current design with the new requirements. So if we are going to follow the LSM model and focus on write throughput , then we need to get the feature out ASAP by using the existing design in place. Refactoring ideally should never come in the critical path of developing features.
  • Refactoring is an effort which should always and always keep on going at regular intervals in parallel. I often see refactoring as a way to improve the code quality and readability of the code. The step of Code Refactoring during the software development process is very much similar to the reconciliation of the batches in the LSMs. Continuous Refactoring in parallel is very important so that it does not impact the current set of features if you want to maintain a high feature throughput

One natural question which comes to our mind is why one cannot develop feature and do refactoring simultaneously because at the end of the day, the number of development hours is same in whatever way you do. But moving from feature mindset to refactoring mindset requires some context switch starting all the way from the developer’s mindset to the testing methodology to the review process. Basically we would benefit much if we just do feature development in a single go and after that we do the reconciliation of the code i.e. refactoring. This is very much similar to how write throughput benefits from the sequential writes.

Blank Diagram (2)
Phase 1: Write Phase
Blank Diagram (1)
Phase 2 i.e. Compaction Phase

Having the feature development process and refactoring as separate entities altogether can easily help us define the problem statement for each of them in a much better way. Think of it like this, when a developer is developing a feature the sole end goal in his mind is to get the feature out of the door. And when the developer is working on refactoring the code base, then the sole criterion in his head is to make the code base as readable as possible. But if we intermingle these two steps then the end goal gets kind of fuzzy because now the developer has two orthogonal problems to optimise for i.e. faster feature development as well as readable code. Hence having two parallel efforts out of which one is faster feature development and one is code refactoring should bear better results.

Conclusion

We live in a world where the time to market of a product is of utmost importance. So if there are ways in which we can improve our software development methodology by learning how the LSMs perform and how do they achieve such high write throughput that will help in getting better at developing great products / features at great pace. ISN’T THIS WHAT EVERYONE WANTS 😉 😉 😉

P.S: I have tried to follow this model of having different development cycles for feature tasks and refactoring for quite some time and have seen some improvements in terms of execution speeds.

References

Photo Credits

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.