InterviewBiz LogoInterviewBiz
← Back
Explain Consistent Hashing and Its Role in Distributed Systems
software-engineeringhard

Explain Consistent Hashing and Its Role in Distributed Systems

HardHotMajor: software engineeringamazon, meta

Concept

Consistent Hashing is a key technique in distributed systems that allows data to be evenly distributed across multiple nodes with minimal reorganization when nodes are added or removed.

It’s fundamental to systems like Amazon DynamoDB, Cassandra, Memcached, and CDNs, ensuring high scalability and fault tolerance.


1. The Problem with Regular Hashing

In traditional hashing:

node = hash(key) % N
  • When a node joins or leaves the cluster, N (the number of nodes) changes.
  • This causes most keys to be remapped, leading to massive cache invalidation or data rebalancing.

Consistent hashing solves this by minimizing the number of keys that need to move when nodes change.


2. How Consistent Hashing Works

  1. Represent all possible hash values as a circle (hash ring).
  2. Each node is assigned a position on the ring using a hash function.
  3. Each data item (key) is also hashed to a position on the ring.
  4. A key is stored in the first node clockwise from its hash position.

Example (safe for MDX):

Key("User123") → hash → falls between Node A and Node B → stored in Node B

When a node is added or removed:

  • Only keys that fall between its neighbors are remapped.
  • All other data remains untouched.

3. Advantages

FeatureBenefit
Minimal Key MovementOnly ~1/N of data remapped when a node changes.
Load BalancingEven key distribution using multiple virtual nodes.
ScalabilityNew nodes can join seamlessly.
Fault ToleranceEasy data replication across adjacent nodes.

4. Virtual Nodes (vNodes)

To ensure balanced load distribution, each physical node is assigned multiple virtual nodes on the hash ring.

Example (safe for MDX):

Node A → vNode1, vNode2, vNode3
Node B → vNode4, vNode5, vNode6

If Node B fails, Node A and others take over its vNodes — spreading load evenly and preventing hotspots.


5. Use Cases in System Design

SystemUsage
Amazon DynamoDB / CassandraData partitioning and replication.
Memcached / Redis ClusterCache sharding and rebalancing.
CDNs (Content Delivery Networks)Routing users to nearest cache node.
Load BalancersMapping requests to servers consistently.

6. Real-World Example — CDN Edge Caching

  • Each cache node is assigned a hash range.
  • When a user requests video.mp4, it’s hashed and routed to a specific node.
  • If that node goes down, only its portion of data is reassigned to neighbors.

This avoids the “cache miss storm” that would occur in simple modular hashing.


7. Mathematical Insight

If the number of nodes is N and total keys are K, when a node is added or removed:

Keys moved ≈ K / N

This is significantly better than K keys being redistributed in modular hashing.


8. Comparison: Consistent Hashing vs Modular Hashing

AspectModular HashingConsistent Hashing
Rebalancing on Node ChangeO(K)O(K / N)
Load DistributionUnevenEven (with vNodes)
Implementation ComplexitySimpleModerate
Real-World UseSmall systemsLarge distributed systems

9. Interview Tip

  • Be ready to draw or describe the hash ring.
  • Mention virtual nodes explicitly — they are crucial for balance.
  • Discuss key remapping efficiency and failure handling.
  • Use examples like Redis Cluster or Amazon DynamoDB.
  • If asked to implement, mention hashlib or MD5 to compute key positions.

Summary Insight

Consistent Hashing is the backbone of scalable distributed systems — enabling stability, balance, and resilience even as nodes come and go.