Pandas & NumPy Academy · Lesson

Selecting Data from a MultiIndex

Retrieve data at outer and inner index levels using .loc with tuples, slices, and the pd.IndexSlice helper.

Navigating Hierarchical Data

Once you have a MultiIndex, you need efficient ways to retrieve subsets of data. Pandas provides three tools for this: .loc with tuples for exact label selection, slice() and pd.IndexSlice for range selection across levels, and .xs() for cross-section selection at a specific level value. Mastering these access patterns is what unlocks the full power of hierarchical indexing.

import pandas as pd
import numpy as np

# Setup: GDP data by country and year
df_flat = pd.DataFrame({
    'country': ['USA','USA','USA','UK','UK','UK','DE','DE','DE'],
    'year': [2021, 2022, 2023, 2021, 2022, 2023, 2021, 2022, 2023],
    'gdp_bn': [23000, 25000, 26000, 3000, 3100, 3200, 4100, 4200, 4350]
})
df = df_flat.set_index(['country', 'year']).sort_index()
print(df)

Selecting by Outer Level with .loc

Pass a single outer-level label to .loc[] to retrieve all rows for that group. With a two-level MultiIndex (country, year), df.loc['USA'] returns all USA rows as a DataFrame with only the inner index (year) remaining. This behaviour is called level dropping — when you select a specific value at an outer level, Pandas removes that level from the returned object's index.

import pandas as pd

df_flat = pd.DataFrame({
    'country': ['USA','USA','USA','UK','UK','UK'],
    'year': [2021, 2022, 2023, 2021, 2022, 2023],
    'gdp_bn': [23000, 25000, 26000, 3000, 3100, 3200]
})
df = df_flat.set_index(['country', 'year']).sort_index()

# Select all USA rows
usa = df.loc['USA']
print('All USA rows:')
print(usa)
print('\nRemaining index type:', usa.index.name)

All lessons in this course

← Back to Pandas & NumPy Academy