Selecting Data from a MultiIndex
Retrieve data at outer and inner index levels using .loc with tuples, slices, and the pd.IndexSlice helper.
Navigating Hierarchical Data
Once you have a MultiIndex, you need efficient ways to retrieve subsets of data. Pandas provides three tools for this: .loc with tuples for exact label selection, slice() and pd.IndexSlice for range selection across levels, and .xs() for cross-section selection at a specific level value. Mastering these access patterns is what unlocks the full power of hierarchical indexing.
import pandas as pd
import numpy as np
# Setup: GDP data by country and year
df_flat = pd.DataFrame({
'country': ['USA','USA','USA','UK','UK','UK','DE','DE','DE'],
'year': [2021, 2022, 2023, 2021, 2022, 2023, 2021, 2022, 2023],
'gdp_bn': [23000, 25000, 26000, 3000, 3100, 3200, 4100, 4200, 4350]
})
df = df_flat.set_index(['country', 'year']).sort_index()
print(df)Selecting by Outer Level with .loc
Pass a single outer-level label to .loc[] to retrieve all rows for that group. With a two-level MultiIndex (country, year), df.loc['USA'] returns all USA rows as a DataFrame with only the inner index (year) remaining. This behaviour is called level dropping — when you select a specific value at an outer level, Pandas removes that level from the returned object's index.
import pandas as pd
df_flat = pd.DataFrame({
'country': ['USA','USA','USA','UK','UK','UK'],
'year': [2021, 2022, 2023, 2021, 2022, 2023],
'gdp_bn': [23000, 25000, 26000, 3000, 3100, 3200]
})
df = df_flat.set_index(['country', 'year']).sort_index()
# Select all USA rows
usa = df.loc['USA']
print('All USA rows:')
print(usa)
print('\nRemaining index type:', usa.index.name)All lessons in this course
- Creating a MultiIndex
- Selecting Data from a MultiIndex
- Index Alignment and Reindexing
- Performance Benefits of Sorted Indices