InterviewBiz LogoInterviewBiz
← Back
Explain the Concept of Database Sharding and Partitioning
software-engineeringmedium

Explain the Concept of Database Sharding and Partitioning

MediumHotMajor: software engineeringamazon, meta, uber

Concept

Database sharding and partitioning are techniques to split large datasets across multiple storage units to enhance scalability, performance, and availability.
While related, they differ slightly in scope — partitioning divides data logically within one database, whereas sharding distributes data across multiple independent database instances.


1. Why Sharding or Partitioning Is Needed

As applications scale, a single database often becomes a bottleneck:

  • Too much data for one server’s memory or disk.
  • Write throughput limited by I/O.
  • Query latency increases as tables grow.

Sharding and partitioning address these issues by dividing and conquering.


2. Types of Partitioning

TypeDescriptionExample
Horizontal Partitioning (Sharding)Rows of a table split into subsets across multiple databases.User IDs 1–1000 → DB1, 1001–2000 → DB2
Vertical PartitioningColumns split into separate tables for different access patterns.User profile vs authentication table
Functional PartitioningEntirely different schemas by feature or service domain.Orders DB vs Inventory DB

3. How Sharding Works

Each shard is a self-contained database responsible for a subset of the data.
The application uses a shard key to route queries to the right shard.

Flow (safe for MDX):

Application → Shard Router → Shard Database (based on key)

Example:

user_id % 4 → selects one of 4 shards

4. Sharding Strategies

StrategyDescriptionProsCons
Range-BasedSplit data by range (e.g., user_id 1–1000).Simple, predictableHotspot risk if data skewed
Hash-BasedUse hash function on shard key.Even distributionHard to re-shard
Directory-BasedMaintain lookup table mapping keys to shards.Flexible re-shardingAdds lookup overhead
Geo-ShardingPartition by geography or region.Low latency for usersComplex cross-region queries

5. Benefits

  • Improved performance — parallel reads/writes across shards.
  • Increased capacity — each shard handles smaller datasets.
  • Fault isolation — shard failure affects only part of the system.
  • Scalable growth — add shards dynamically as data grows.

6. Challenges

ChallengeDescription
Re-shardingMoving data when a shard becomes too large.
Cross-shard queriesJoins and transactions are harder across shards.
ConsistencyMaintaining ACID properties can be complex.
Operational overheadMore monitoring, backups, and schema sync required.

7. Real-World Example

Scenario: Global E-commerce Platform

  • Each customer’s data assigned by customer_id % N.
  • North America handled by shards A–D, Europe by E–H.
  • Orders, inventory, and payments stored in separate functional databases.
  • Scaling achieved by adding shards as user base grows.

Result:

  • Query latency reduced by 60%.
  • Database load distributed evenly.
  • Maintenance downtime reduced via shard-level rolling updates.

8. Sharding vs Replication

AspectShardingReplication
PurposeDistribute data horizontallyCopy same data for redundancy
Data StoredUnique subset per nodeIdentical copy of full dataset
GoalScale capacity and throughputImprove availability and read performance
ExampleUser IDs split across shardsPrimary–replica setup

9. Best Practices

  • Choose stable shard key (immutable, evenly distributed).
  • Monitor shard health and disk growth.
  • Use connection pooling and consistent hashing for routing.
  • Plan for graceful re-sharding and automated migration scripts.

10. Interview Tip

  • Clearly distinguish partitioning (within one DB) vs sharding (across many DBs).
  • Mention trade-offs: scalability vs complexity.
  • Cite real-world systems — MongoDB, MySQL, and PostgreSQL all support sharding strategies.

Summary Insight

Sharding and partitioning divide data to multiply performance. Done right, they unlock linear scalability — done wrong, they multiply operational pain.