Efficient Data Types for Memory Reduction
Downcast numeric columns to int32/float32 and convert string columns to Categorical to cut DataFrame memory by up to 70%.
Why Data Types Affect Performance
In Pandas, every column has a dtype (data type) that determines how values are stored in memory and how fast operations run. Pandas uses wide types by default when loading data: int64 (8 bytes per value), float64 (8 bytes), and object (variable, often 50-200 bytes per string). For millions of rows, choosing smaller types can reduce memory by 50-80% and speed up operations by 2-5x due to better cache utilisation.
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({
'id': np.arange(1000000, dtype='int64'),
'score': np.random.uniform(0, 100, 1000000).astype('float64'),
'category': np.random.choice(['A','B','C','D'], 1000000)
})
mem = df.memory_usage(deep=True)
print('Memory per column:')
print(mem)
print(f'Total: {mem.sum() / 1e6:.1f} MB')Downcasting Integer Columns
If a column contains integer values that fit in a smaller range, downcast it from int64 to a smaller integer type. pd.to_numeric(series, downcast='integer') automatically selects the smallest integer type that can hold all values: int8 (–128 to 127, 1 byte), int16 (–32768 to 32767, 2 bytes), int32 (±2 billion, 4 bytes), or stays as int64 if needed. An int8 column uses 8x less memory than int64.
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame({
'age': np.random.randint(18, 90, 500000).astype('int64'),
'score': np.random.randint(0, 100, 500000).astype('int64'),
'large_id': np.random.randint(0, 2**31, 500000).astype('int64')
})
# Downcast integers
for col in df.select_dtypes('int64').columns:
df[col] = pd.to_numeric(df[col], downcast='integer')
print('Dtypes after downcasting:')
print(df.dtypes)
print(f'\nMemory: {df.memory_usage(deep=True).sum()/1e6:.2f} MB')All lessons in this course
- Profiling with timeit and memory_profiler
- Avoiding iterrows and Python Loops
- Efficient Data Types for Memory Reduction
- Chunked Reading for Large Files