Background
When I encountered C++ for the first time in a professional setting, I wondered why the language seemed convoluted. (Compared to my prior experience in C#). It felt like every programmer in the team was using a particular dialect of the language. (Some of the developers wrote C++ code in C style, some were "Template" heavy and so on)
When I read "Design and Evolution of C++" by the creator of the language it became very clear why the language was so. Here is a quote from the book that so eloquently captures the rationale...
"My preferences in literature have reinforced this unwillingness to make a decision based on theory and logic alone. In this sense, C++ owes as much to novelists and essayists such as Martin A. Hansen, Albert Camus, and George Orwell, who never saw a computer, as it does to computer scientists such as David Gries, Don Knuth, and Roger Needham. Often, when I was tempted to outlaw a feature I personally disliked, I refrained from doing so because I did not think I had the right to force my views on others. I know that much can be achieved in a relatively short time by the energetic pursuit of logic and by ruthless condemnation of "bad, outdated, and confused habits of thought." However, the human cost is often high. A high degree of tolerance and acceptance that different people do think in different ways and strongly prefer to do things differently is to me far preferable." - Bjarne Stroustrup.
The term Good is subjective. Some engineers prefer cookbook-style writing which focuses on the "how" part. I prefer to understand the "why" part, with the hope that eventually this will let me figure out the "how" part by myself.
I feel that a book's true worth lies not only in its content but in the transformative perspective it provides.
Present
Most of my time nowadays is spent stitching together Microservices and running them on AWS. I learn by:
Reading the documentation of a framework such as Spring Cloud and adhering to recommended best practices.
Going through AWS documentation for the specifics of some AWS service.
Understanding the bigger picture through guides published by InfoQ
Breaking stuff and fixing it.
I felt that I needed a better grounding in the underlying principles of distributed systems. Martin Kleppman's book "Designing Data-Intensive Applications" does a great job of getting the key ideas of distributed systems into one place.
If you are an Engineer who reads Leslie Lamport's papers during breakfast then this book is not for you. If you were fortunate to build Netflix and Google scale systems as part of your day job then this book is not for you.
However, if you are an Engineer building enterprise applications, exhausted battling the latest incarnation of Agile, attending meetings that spawn other meetings, updating documents because "the process dictates so" and still have some energy left at the end of the day to wade into distributed systems, then this book might be for you.
Disclaimer
While reading the book, I made notes as markdown files in a GIT repository. The summary below is like classroom notes. The entries are succinct, not at all original and have just enough information to jog memory.
Any errors found in the summary are attributed to the student (myself) rather than to the teacher (Martin). I encourage you to read the original book for its thoroughness, and expository nature, and also because it is a damned good book.
Here we go...
Client Server Protocols
REST: Model domain entities (Users, Products, Patients) etc as resources and leverage HTTP methods and headers the way they were intended to be used.
SOAP: This protocol is unavoidable if you are integrating with legacy web services.
MQTT : Quite popular in IOT scenarios. MQTT is built on top of TCP and offers a Publisher Subscriber paradigm & Quality of Service (QOS) guarantees.
Custom Protocol: Widely found in system software such as Postgres (Wire Protocol)
Data Formats (On Wire & On Disk)
Most popular wire formats: JSON, XML and CSV
Lesser known formats: Binary JSON formats such as Message Pack and BSON
Apache Thrift
Protocol Buffer: Nice language-neutral format from Google, very efficient payloads.
Apache Avro: Provides the option of writing schema along with the payload.
Apache Arrow: In memory format used in real-time analytics.
Parquet & Orc - On disk format. (Columnar format)
Expectations from a distributed system
Availability: The system should always be up. Availability is measured as a ratio = uptime / (uptime + downtime)
Scalability: Will adding more hardware increase the performance proportionally?
Reliability: What is the probability of the system failing? The mean time between failures is one measure of reliability. Measured as Num of hours the system was up/Num of failures.
Tradeoffs in a distributed system
A system can be either optimized for read or write. It is hard to do both.
A system can be designed to be either strongly consistent or eventually consistent.
The interactions with the system can be either synchronous or asynchronous.
The system should indicate how it behaves in case of network failures to the consumer
Performance
Processing time = The total time taken to process a request.
Latency = The time it takes for the request to reach the server.
Response time = Processing time + Latency.
Average response time is not a good measure of performance since one bad reading can skew the results. Instead, use percentile. If a system says the response time is 250 ms for the 90 percentile, it means 90% of the users experienced a response time of 250 ms or less.
Median response time indicates where the middle exists. (50th percentile)
Throughput is the number of transactions per second. For a batch-processing system, throughput is very important
Data Modeling
Relational Model: The shape of the data is defined in SQL (declarative by nature)
Document Model: The shape of the data is defined as JSON.
Key Value Model: Simplest data model with data identified by a key and represented as a value (either opaque blob or semi-structured JSON)
Graph Model: Model data as vertices and edges. [vertex = person, edge - represents the connection between two people.]. (Example IMDB)
Each vertex has a unique identifier, incoming edges, outgoing edges, and properties represented as key values.
Each edge has a unique identifier, start vertex, end vertex, and label to describe the relationship between vertices & properties represented as key values.
Just like a user would POST JSON document in a document database, in a graph database such as Neo4J the graph is described using Cypher and queried using Cypher.
Scaling data store
Use a single master for read/write and scale up the server by increasing the hardware specification.
Use read replicas - All writes go to a master node, and the data is then replicated onto replica nodes. Reads can be served from replicas.
Shard the database - Distribute the traffic across multiple nodes based on a shard key. The disadvantage of this approach is that adding a new node results in an “incorrect lookup”. To overcome this use consistent hashing (see below).
One way to avoid application-level sharding is to use a system that already handles it at the database level such as Citus. MySQL equivalent of this is Vitess
Use a peer-to-peer replication of data across nodes - No single master. Writes and reads can go to any node. This brings up interesting challenges...
Consistency - What should be the behavior if the nodes in the system do not see the same data?
Availability - Is the system always available for reads & writes?
How should the system behave in case of a network partition? Should the system refuse writes to preserve consistency or allow writes with eventual consistency?
Make consistency tunable - R + W > N.
Total number of nodes (N)
How many nodes should agree for the write to succeed? (W)
How many nodes should agree for a read to succeed? (R)
Consistent Hashing
Decide the number of virtual nodes - Say 100. Think of this as a circle from [0 - 100]
Then pass say something like the server IP through the hashing function. This would yield a value such as 30.
Use the hash value to decide a range of keys that the server would handle. [0-30]
If this server crashes, then the server that precedes this in the ring would automatically start handling the reads for the crashed node.
Replication
Hardware-level replication- Storage Area Network- Replication occurs at the block level.
Operating system level replication - DRBD - Replication occurs at the block level.
Database physical replication - Through WAL file - Information about which byte has changed on the disk is conveyed to the replica
In PostgreSQL, this can happen by either sending WAL files across to the other nodes or through streams.
Streaming replication is at the cluster level (everything gets replicated by default)
Read replicas cannot be modified
Unidirectional replication
Replication can be synchronous. Or a mix of synchronous & async.
Also, quorum based. N servers, first N servers in a sequence.
Database logical replication - Information about which row has changed in which table is conveyed to the replica.
Needed when partial replication is desired.
Postgres has built-in logical replication through publish/subscribe mechanism. Schema is not replicated, only data
pg_logical is an extension, that provides more such capabilities. Replicate only certain columns, replicate rows that match criteria and so on.
Trigger-based replication - Bucardo
Application level replication - You do all the hard work.
Replication Types
Master slave replication. Writes handled by Master/Leader. Reads can be handled by slaves/read replicas/followers
Replication can be synchronous or asynchronous
Semi-synchronous replication - To avoid the system grinding to a halt because a follower has not responded. Seems more like a quorum-based approach.
Common Problems in Replication
When to start replication when a new node is added? - Start with a well-known snapshot of the master, then request changes since that point in time. Eventually, the follower will catch up with the master.
How to elect a new leader, when the current leader has crashed?
What to do when the original leader comes back?
Split brain - What if both nodes believe to be the leaders?
What constitutes a leader's failure? What should be the time duration?
A user submits a comment, the comment goes to the leader, and the data is not synced with the follower. The user reads from the follower and becomes unhappy- [We need "read your writes consistency"]
It might be necessary to decide that some information is always read from the leader.
Some data might disappear if a read replica with a greater lag is queried. Monotonic reads solves this problem by routing a particular user's request always to the same replica.
Reordered writes which result in a confusing comment thread. The answer to a question might appear before the question.
Conflicts - Preserve conflicts and let the application code decide.
The book also covers BTrees, Log Structured Merge Trees, Batch and Stream Processing in quite some depth. Summarizing these pithily would not do justice to the topics. Please also read Alex Petrov's book for the gory details. (I have just dipped my feet in it)
Resources
Please go through the extremely generous bonus in the form of a GIT repository that captures the "references" used in the book.
As always I will keep updating this section if I come across helpful content.