Explain the Concept of Hashing and Hash Tables
Concept
Hashing is a technique used to map data of arbitrary size to fixed-size values called hash codes.
A Hash Table uses this mechanism to store and retrieve data efficiently in O(1) average time complexity.
Hash tables are foundational in databases, compilers, caching, and system design due to their speed and predictable access patterns.
1. How Hashing Works
- A hash function converts a key into an integer (hash code).
- The hash code is modded by the table size to determine the storage index.
- Values are stored at that computed index.
- On retrieval, the same function locates the data instantly.
Example (safe for MDX):
key = "user123"
hash = hash_function(key)
index = hash % table_size
table[index] = value
2. Hash Function Requirements
A good hash function should:
- Distribute keys uniformly to minimize collisions.
- Be fast to compute.
- Be deterministic — same input always gives same output.
- Produce values that fit within table size.
Common examples:
- Division method:
hash(key) = key % m - Multiplicative method:
hash(key) = floor(m * ((key * A) mod 1)) - Cryptographic hashes (SHA-256, MD5) for security use cases.
3. Collision Handling
Collisions occur when two keys map to the same index.
Techniques to handle collisions:
| Method | Description | Example |
|---|---|---|
| Chaining | Each table slot holds a linked list of key-value pairs. | Python dict, Java HashMap |
| Open Addressing | Find next available slot (probing). | Linear / Quadratic probing |
| Double Hashing | Uses a second hash function to resolve conflicts. | Advanced hash tables |
Example (safe for MDX):
Index 4 → [("John", 24), ("Jane", 31)]
4. Time and Space Complexity
| Operation | Average Case | Worst Case |
|---|---|---|
| Search | O(1) | O(n) (if collisions high) |
| Insert | O(1) | O(n) |
| Delete | O(1) | O(n) |
With a well-distributed hash function and a good load factor, performance remains near constant time.
5. Load Factor
The load factor (α) = number of elements / table size.
- When α exceeds a threshold (commonly 0.7), the hash table resizes (doubles its size and rehashes entries).
- This maintains efficiency and prevents long chains or probe sequences.
6. Real-World Applications
| Domain | Use Case |
|---|---|
| Databases | Indexing and key-based lookups |
| Caching | Storing precomputed results (e.g., Redis) |
| Compilers | Symbol tables for variable lookup |
| Security | Password hashing and checksums |
| Blockchain | Data integrity and transaction verification |
7. Hashing in Practice — Example: Python Dictionary
Python’s dict internally uses a resizable hash table with open addressing.
Keys are hashed, and collisions are handled via probing.
Example (safe for MDX):
users = {"alice": 25, "bob": 30}
# Hash("bob") → Index 6
# Retrieve in O(1)
8. Common Pitfalls
- Poor hash function → clustering and degraded performance.
- Uncontrolled growth → high memory usage.
- Mutable keys → inconsistent hashing behavior.
- Security concern: predictable hashes can be exploited in DoS attacks (hash flooding).
9. Interview Tip
- Always mention collision handling and load factor.
- Explain both chaining and open addressing.
- Be able to write a simple hash table implementation using arrays or lists.
- Discuss real-world optimizations (rehashing, resizing).
Summary Insight
Hashing transforms complexity into predictability — a simple mathematical idea powering fast lookups across computing.