0PricingLogin
Pandas & NumPy Academy · Lesson

Creating a MultiIndex

Build a hierarchical row index with pd.MultiIndex.from_tuples or set_index on multiple columns, and inspect its levels.

What Is a MultiIndex?

A MultiIndex (also called a hierarchical index) allows a Pandas DataFrame or Series to have multiple levels of row labels. Think of it as a composite key in a database: instead of identifying a row by a single label, you identify it by a tuple like (country, city) or (year, quarter). MultiIndexes are essential for representing panel data, cross-sectional time series, and any dataset with a natural two-level grouping structure.

import pandas as pd

# A simple example: sales by country and city
data = {
    'sales': [100, 200, 150, 300, 80, 120]
}
index = pd.MultiIndex.from_tuples(
    [('USA', 'New York'), ('USA', 'Chicago'),
     ('UK', 'London'), ('UK', 'Manchester'),
     ('DE', 'Berlin'), ('DE', 'Munich')],
    names=['country', 'city']
)
df = pd.DataFrame(data, index=index)
print(df)

Creating a MultiIndex with from_tuples

pd.MultiIndex.from_tuples(list_of_tuples, names=) is the most explicit way to create a MultiIndex. Each tuple becomes one row label, and its elements become the levels. The names parameter assigns a label to each level (e.g. ['year', 'quarter']). This method is useful when you have a pre-built list of composite keys that you want to use as the row index.

import pandas as pd

# Multi-level time index: year x quarter
tuples = [
    (2022, 'Q1'), (2022, 'Q2'), (2022, 'Q3'), (2022, 'Q4'),
    (2023, 'Q1'), (2023, 'Q2'), (2023, 'Q3'), (2023, 'Q4')
]
mi = pd.MultiIndex.from_tuples(tuples, names=['year', 'quarter'])
revenue = [120, 135, 145, 160, 130, 148, 162, 175]
df = pd.Series(revenue, index=mi, name='revenue_M')
print(df)

All lessons in this course

  1. Creating a MultiIndex
  2. Selecting Data from a MultiIndex
  3. Index Alignment and Reindexing
  4. Performance Benefits of Sorted Indices
← Back to Pandas & NumPy Academy