Explore All Python Interview Prep Machine Learning JavaScript TypeScript Python + Copilot Modern Web Dev SQL AI Essentials Pandas NumPy Email Assistant Java + AI

Consistency, Availability, and Tradeoffs

Let’s say you are building an application that stores user data, processes payments, or updates shared resources. When everything is running normally, all parts of the system respond correctly and quickly.

As the system grows, it becomes distributed. Multiple servers. Multiple databases. Multiple regions.

Sooner or later, something fails. A network link goes down. A server becomes unreachable. A database responds slowly. At this point, the system must make choices.

It cannot always be perfectly correct, perfectly fast, and perfectly available at the same time.

In this lesson, we’ll understand why these tradeoffs exist and how real systems handle them in practice.

1. CAP Theorem (The Practical Version)

The CAP theorem is often misunderstood. In simple terms, it says that when a system is distributed and a network failure happens, you must choose between consistency and availability.

Consistency means every user sees the same data at the same time.
Availability means the system always responds, even during failures.

Network partitions are unavoidable in distributed systems.

So the real question is not “Which two do you choose?”
The real question is “What does your system do when a failure happens?”

Good system design is about making this choice intentionally.

2. Strong Consistency with a Real Example

Strong consistency means that once data is written, every read returns the latest value. This is important when correctness matters more than speed.

For example, in a banking system, if money is transferred, the updated balance must be immediately visible everywhere.

If one part of the system shows an old balance, users could overspend.

Strong consistency often requires coordination between nodes, which increases latency and reduces availability during failures.

Systems choose strong consistency only where it is absolutely required.

3. Eventual Consistency with a Real Example

Eventual consistency allows temporary differences between nodes. After a write, some users may see old data for a short time. Eventually, all nodes converge to the same value. This is acceptable in many systems.

For example, in social media, if a new post takes a few seconds to appear everywhere, the system is still correct.

Eventual consistency improves availability and performance, especially across regions. Most large-scale systems use eventual consistency for non-critical data.

4. Choosing Between Strong and Eventual Consistency

The choice depends on the business requirement.

If showing stale data causes real harm, strong consistency is needed.
If slight delays are acceptable, eventual consistency simplifies scaling.

Many systems use both.

For example, payments use strong consistency, while notifications and feeds use eventual consistency.

This mixed approach is very common in real-world designs.

5. Distributed Locks and Why They Exist

Sometimes, multiple servers try to modify the same resource at the same time.

This can cause race conditions.

A distributed lock ensures that only one server performs a critical operation at a time.

For example, only one worker should process a specific payment or generate a unique invoice.

Distributed locks are useful but dangerous if misused.

If a lock is not released properly, systems can stall.

They should be used only when absolutely necessary.

6. Leader Election (When One Node Must Decide)

In some systems, one node needs to act as the coordinator.

This node is called the leader.

Leader election is the process of choosing that node.

For example, one node may be responsible for scheduling tasks or coordinating writes.

If the leader fails, another node is elected.

Leader election improves coordination but adds complexity.

It is commonly handled by specialized systems rather than custom logic.

7. Designing for Failures, Not Perfection

Failures are not rare events.

They are normal.

Good systems expect failures and handle them gracefully.

Timeouts prevent requests from waiting forever. Retries allow recovery from temporary issues. Backoff prevents retry storms. Circuit breakers stop repeated calls to failing services.

These patterns protect the system from cascading failures.

8. Why Timeouts and Backoff Matter

Without timeouts, services can hang indefinitely.

Without backoff, retries can overwhelm already failing systems. Exponential backoff gradually increases the wait time between retries. This gives systems time to recover. These simple techniques dramatically improve stability.

9. Circuit Breakers in Simple Terms

A circuit breaker monitors failures. If a service keeps failing, the circuit breaker opens and stops sending requests. After some time, it allows limited requests to check if the service has recovered. This prevents one failing service from bringing down the entire system. Circuit breakers are a key part of resilient architectures.

Final Thoughts

Distributed systems are about tradeoffs. You cannot avoid failures, but you can design for them. Understanding consistency, availability, and coordination helps you build systems that behave predictably under stress.

In system design interviews, interviewers look for:

clear understanding of tradeoffs,
realistic failure handling,
and practical decision-making.

Previous Lesson Next Lesson

Consistency, Availability, and Tradeoffs

1. CAP Theorem (The Practical Version)

2. Strong Consistency with a Real Example

3. Eventual Consistency with a Real Example

4. Choosing Between Strong and Eventual Consistency

5. Distributed Locks and Why They Exist

6. Leader Election (When One Node Must Decide)

7. Designing for Failures, Not Perfection

8. Why Timeouts and Backoff Matter

9. Circuit Breakers in Simple Terms

Final Thoughts

What does consistency mean in system design?

What is availability in distributed systems?

What is the CAP theorem in simple terms?

When should strong consistency be used?

When is eventual consistency acceptable?

How do timeouts and circuit breakers improve reliability?