Microservices: Issues Associated with RDBMS and Strong Data Consistency
The goal of this article is to explain the issues associated with the RDBMS (Relational Database Management Systems) and strong data consistency in a microservice ecosystem. This article also serves as a primer for a bunch of articles about microservice architectural patterns (CQRS, Event Sourcing, Polyglot Persistence, etc) that I wanted to share to the .NET community.
Relational Database Management Systems (RDBMS)
Software development often starts with a thorough information gathering. Software developers, business analysts and product owners sit down for long hours inside meeting rooms performing these steps.
After performing some careful analysis on information gathered, traditional development teams jump start application development by designing a relational database schema that would be used to store and organize data that an OLTP application need to present its current state. There are countless reasons (Atomicity, Consistency, Integrity and Durability) why designing a relational database schema is quite popular for small to medium enterprises.
The usage of RDBMS had gone well for enterprise systems of the previous computing era (70s, 80s, and 90s), but things changed as the new millennium arrived. The explosion of the Internet's popularity had attracted most of the large companies to start offering online-transaction processing systems to achieve competitive advantages over their competitors.
These large enterprises (Amazon, Facebook, E-Bay, Alibaba, Netflix etc.) had been so successful that they've managed to attract millions of users to visit their web applications causing massive loads of data-related HTTP traffic in their system. In a database administrator's point of view, this meant chaos because a RDBMS is primarily designed to present "Consistent Data".
Strong Data Consistency
Implementing strong data consistency on a relational database is synonymous with implementing different forms of data LOCKING which ensures that every read/pull transaction that would be performed on the database is going to present the current and consistent state of application to requesting parties.
Data locking strategies were perfectly OK and acceptable for small and medium data management, this story easily flips over for large scale data management. Locking temporarily queues down incoming read/pull data request which easily means reduction in server throughput. For large enterprises like Amazon, a one second delay in response time mean a whopping 1.6 million US Dollars.
RDBMS Pain Points
Slow bootstrap time
Designing a RDBMS with strong consistency requires upfront analysis to design a schema. For large enterprises, this translated to losing competitive edge.
Relational Model Hide Domain Intent
Relational models can hide the domain importance of entities over time. An example of this would be a user's record in a domain could mean Buyer, Receiver or System Administrator.
Schema Changes are Risky
Making changes on a database currently consumed by multiple live applications is challenging and associated with risk unless a thorough impact analysis is performed.
Low throughput and availability
RDBMS that manages large datasets are prone to locking / contention of database objects.
Geographical replication and dispersal adds to the throughput chaos
Databases that were designed to be hosted on a single location don't work well with geographical dispersal strategies because federated servers often implement quorums to achieve consistency across a geographically dispersed schema.
Database Management Systems that can disperse geographically are expensive
DBs that can be dispersed geographically with low locking can cost you some serious amount of money. I know banks and Amazon are rich but spending money for unnecessary stuff is not a good idea to make more money.
Building RDBMS often requires knowledge of SQL.
Not all developers know SQL that well. Most of them can write working queries but are not aware of the intricacies associated with set programming as they were more familiar with nasty exponential looping.
Developers like to program using their domain abstractions
Like what item two states, the majority of developers love working with their domain abstractions (Hello POCO and POJO fans!).
Accessing RDBMS requires knowledge of Object Relational Mappers
Pulling data from a database requires ORMs to transform database layer objects into their domain layer counterparts. There are a lot of ORMs (Dapper, Entity Framework, ADO.net) out there and some of them don't perform well (EF, Ahem!!) on large data sets.
"It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees: Consistency. Availability. Partition tolerance."
So WTH can we do about it?
Yes, CAP Theorem stated that it is impossible to achieve Consistency, Availability and Partition Tolerance on a distributed data store, but we can actually bargain for "Eventual Consistency" in exchange of high availability and partition tolerance.
So how can we implement eventual consistency?
Oh well, there are a couple of microservice architectural patterns ( Event Sourcing, CQRS Pattern, Compensating Transaction, Materialized Views, Bounded Context, Polyglot Persistence, etc. ) out there that were dedicated for implementing an eventually consistent data strategy. I would do my best to create prototypes and present them on my upcoming articles.
Achieving strong consistency in a distributed system comes with the cost of the application's availability, which is a big "NO, NO, NO Lah" for large enterprises. Easing down to "Eventual Consistency" in exchange of high availability and fault tolerance is the most preferred option because it enables a large scale distributed system to handle massive volumes of data.
- Microservices: Picking the .NET Framework for your containerized applications.
- Microservices: Why choose Containers over Virtual Machines.