Explain the CAP Theorem in Distributed Systems
Concept
The CAP Theorem, proposed by Eric Brewer, states that a distributed system can provide only two of the following three guarantees simultaneously:
- Consistency (C) — Every read receives the most recent write or an error.
- Availability (A) — Every request receives a (non-error) response, regardless of the state.
- Partition Tolerance (P) — The system continues to operate despite network partitions (failures in communication between nodes).
It is impossible to achieve all three at once in a distributed environment — you must choose which trade-off to make.
1. The Trade-off Explained
When a network partition occurs (nodes cannot communicate):
- A CP system sacrifices availability to maintain consistency.
- An AP system sacrifices consistency to maintain availability.
- CA systems exist only in theory — they assume no partitions, which is unrealistic in real-world distributed setups.
2. Real-World Examples
| Type | Example Systems | Behavior |
|---|---|---|
| CP (Consistency + Partition Tolerance) | HBase, MongoDB (configured for strong consistency), Zookeeper | Prioritize data correctness even if some nodes are unavailable. |
| AP (Availability + Partition Tolerance) | Cassandra, DynamoDB, Couchbase | Allow temporary inconsistency but remain operational during partitions. |
| CA (Consistency + Availability) | Single-node relational DB (not truly distributed) | Works only when partitions never occur. |
3. Visualizing the Trade-off
Triangle Representation (safe for MDX):
Consistency
/\
/ \
CA / \ CP
/______\
Availability
\ /
\ /
\/
Partition Tolerance
Each distributed database positions itself within this triangle based on design priorities.
4. Use Case Decisions
| Business Need | Best Choice | Example Scenario |
|---|---|---|
| Banking, transactions | CP | Must reject operations that risk data mismatch. |
| E-commerce inventory | AP | Availability is more critical; reconcile later. |
| Local-only systems | CA | Single machine or synchronous replicas only. |
5. Relationship to Modern Systems
Modern distributed systems try to mitigate CAP limitations using consistency models:
- Strong Consistency → All nodes agree immediately (e.g., Spanner).
- Eventual Consistency → Nodes sync over time (e.g., DynamoDB, Cassandra).
- Causal Consistency → Preserves cause-and-effect relationships.
These models allow fine-tuned balancing between performance, latency, and correctness.
6. Misconceptions
- CAP ≠ all or nothing: Real systems dynamically balance trade-offs using replication strategies and quorum-based protocols.
- Network partitions are not rare: They’re inevitable at scale — every distributed system must handle them gracefully.
- Eventual consistency ≠ weak consistency: It’s an intentional design to ensure long-term correctness under partition tolerance.
7. Interview Tip
- Clearly define each term (Consistency, Availability, Partition Tolerance).
- Explain what happens during a network partition — that’s when trade-offs matter.
- Cite real-world systems (Cassandra = AP, Zookeeper = CP).
- Mention how modern architectures (e.g., Google Spanner) use synchronized clocks and quorum writes to soften CAP trade-offs.
Summary Insight
The CAP theorem defines the boundary of distributed reliability — every scalable system lives somewhere on the triangle between consistency, availability, and fault tolerance.