Designing Data-Intensive Applications - Notes
This blog is more about notes which I have grabbed while reading the well known book — Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann. It goes like this :
Three concerns that are important in most software systems:
Reliability -
“continuing to work correctly, even when things go wrong.” The things that can go wrong are called faults, and systems that anticipate faults and can cope with them are called fault-tolerant or resilient.
Faults and Failures :
Fault and failures is two different things , fault can be defined as a component of system that is varying from actual spec or expectation where as a failure is when a system component is completely down and stopped working. It is technically not possible to make probability of system fault to zero , so the best fault tolerance design is the mechanism that prevents faults from causing failures.
There mail types of faults are :
hardware faults — mainly caused due to hardware break down . One good way to tackle this is to create a redundant backup and use software fault-tolerance techniques in preference
Software faults — often unpredictable and these bugs that cause these kinds of software faults often lie dormant for a long time until they are triggered by an unusual set of circumstances. carefully thinking about assumptions and interactions in the system; thorough testing; process isolation; allowing processes to crash and restart; measuring, monitoring, and analysing system behaviour in production can help.
Human errors — Mostly human issues and in simple terms bugs , one good way to prevent them is through testing at all layers and good monitoring of production environment. create sandbox env before deploying to live so that humans can play around before using real data
Reliability is important not just for critical applications but also for non critical one , think of an example application which store notes and user have written notes for a year and suddenly the data base is corrupted. would he know how to restore it ?
There are situations in which we may choose to sacrifice reliability in order to reduce development cost (e.g., when developing a prototype product for an unproven mar‐ ket) or operational cost (e.g., for a service with a very narrow profit margin) — but we should be very conscious of when we are cutting corners.
Scalability
Scalability refers to how a system can perform when the load increases. scalability can be based on load parameters . load parameters may be requests per second to a web server, the ratio of reads to writes in a database, the number of simultaneously active users in a chat room, the hit rate on a cache, or something else.
Check twitters example on how reads and write are done in a hybrid model with relational DB as well as maintaining a cache for each user.
It is meaningless to say “X is scalable” or “Y doesn’t scale.” Rather, discussing scalability means considering questions like “If the system grows in a particular way, what are our options for coping with the growth?” and “How can we add computing resources to handle the additional load?”
Describe performance of system :
It can be described as how well your system is performing when the load parameter is increased. How much resources has to be increased in order to match the load parameters.
Latency and response time
Response time is what the user sees it can include network delays and queueing delays etc. Latency is the time duration that a request is waiting to be processed.
There are quite a lot to cover which is very hard to fit in a blog but i believe almost all software engineers should go through this book once.
You can get it in amazon , and also checkout his youtube channel.
Happy coding ❤ ☕