Hash Function Internals and Collision Handling
Understand how Python hashes objects, how open addressing and chaining resolve collisions, and why average-case O(1) can degrade to O(n).
What Is a Hash Map?
A hash map (dictionary in Python) maps keys to values using a hash function that converts any key into an integer index into an underlying array. An ideal hash function spreads keys uniformly across the array, enabling O(1) average-case lookup, insertion, and deletion. The underlying array is called the hash table or bucket array.
In Python, dict is a highly optimised hash map. Understanding its internals helps you reason about worst-case behaviour and choose appropriate keys.
# Python dict is a hash map
hm = {}
hm['alice'] = 95
hm['bob'] = 87
hm['carol'] = 91
print(hm['alice']) # O(1) lookup: 95
print('bob' in hm) # O(1) membership: True
del hm['bob'] # O(1) deletion
print(hm) # {'alice': 95, 'carol': 91}Hash Functions and the __hash__ Method
Python calls __hash__(key) to compute an integer from the key, then takes that integer modulo the table size to find the bucket index. Built-in types like int, str, and tuple have fast built-in hash implementations. list and dict are not hashable (they are mutable, and mutating them would invalidate any stored hash).
A good hash function distributes keys uniformly, is deterministic, and is fast to compute. Python's string hash randomises across runs (a security feature) — use PYTHONHASHSEED=0 to disable for reproducibility in testing.
# Built-in hash in Python
print(hash(42)) # integer hashes to itself (CPython)
print(hash('hello')) # string hash (randomised per run)
print(hash((1, 2, 3))) # tuple hash: depends on contents
# Unhashable types
try:
hash([1, 2, 3]) # lists are mutable -> not hashable
except TypeError as e:
print('Error:', e)
# Custom class: define __hash__ and __eq__
class Point:
def __init__(self, x, y): self.x = x; self.y = y
def __hash__(self): return hash((self.x, self.y))
def __eq__(self, other): return self.x == other.x and self.y == other.y
points = {Point(1, 2): 'A', Point(3, 4): 'B'}
print(points[Point(1, 2)]) # 'A'All lessons in this course
- Hash Function Internals and Collision Handling
- Two-Sum and Its Many Variants
- Frequency Counting and Grouping
- Longest Consecutive Sequence and LRU Cache