0Pricing
Pandas & NumPy Academy · Lesson

Performance Benefits of Sorted Indices

Sort a MultiIndex with sort_index(), measure slice performance with timeit, and use is_monotonic_increasing as a guard.

Why Index Sorting Matters for Performance

A sorted index enables Pandas to use binary search (O(log n)) instead of a full linear scan (O(n)) when looking up label ranges. For a DataFrame with one million rows, binary search finds the target range in about 20 comparisons vs. up to one million comparisons with a linear scan. This makes slice operations on sorted MultiIndexes orders of magnitude faster than on unsorted ones — and the difference becomes critical in production pipelines processing millions of rows.

import pandas as pd
import numpy as np

np.random.seed(42)
# Create a large DataFrame with a MultiIndex
countries = ['DE', 'UK', 'USA', 'FR', 'JP']
dates = pd.date_range('2020-01-01', periods=200)
mi = pd.MultiIndex.from_product([countries, dates], names=['country', 'date'])
df = pd.DataFrame({'value': np.random.randn(len(mi))}, index=mi)

print(f'DataFrame shape: {df.shape}')
print(f'Index is sorted: {df.index.is_monotonic_increasing}')

Checking if an Index Is Sorted

Use df.index.is_monotonic_increasing to check whether the index is sorted in ascending order. This returns a boolean. For a MultiIndex, Pandas checks sorting lexicographically across all levels. Always check this before performing slice operations with .loc[start:end] on a MultiIndex — an unsorted index will either raise UnsortedIndexError or silently return incorrect results depending on the Pandas version.

import pandas as pd

# Sorted MultiIndex
tuples_sorted = [('A', 1), ('A', 2), ('B', 1), ('B', 2)]
mi_sorted = pd.MultiIndex.from_tuples(tuples_sorted)
df_sorted = pd.DataFrame({'v': [10, 20, 30, 40]}, index=mi_sorted)

# Unsorted MultiIndex
tuples_unsorted = [('B', 2), ('A', 1), ('B', 1), ('A', 2)]
mi_unsorted = pd.MultiIndex.from_tuples(tuples_unsorted)
df_unsorted = pd.DataFrame({'v': [10, 20, 30, 40]}, index=mi_unsorted)

print('Sorted index is_monotonic_increasing:', df_sorted.index.is_monotonic_increasing)
print('Unsorted index is_monotonic_increasing:', df_unsorted.index.is_monotonic_increasing)

All lessons in this course

  1. Creating a MultiIndex
  2. Selecting Data from a MultiIndex
  3. Index Alignment and Reindexing
  4. Performance Benefits of Sorted Indices
← Back to Pandas & NumPy Academy