0PricingLogin
Pandas & NumPy Academy · Lesson

pd.merge: Inner and Outer Joins

Merge two DataFrames on a shared key column using inner and outer joins, and understand which rows are kept or dropped.

What Is a Join?

A join combines two DataFrames by matching rows based on a shared key column — the same concept as SQL's JOIN clause. Pandas implements joins through pd.merge(). Unlike pd.concat() which simply stacks data, pd.merge() intelligently aligns rows from two different tables based on matching values in one or more columns.

The Shared Key Column

For a merge to work, both DataFrames need at least one column with shared values — the key column. For example, an orders table has a customer_id column, and a customers table also has a customer_id column. Merging on customer_id attaches customer details to each order row. Specify the key with the on parameter when both DataFrames share the same column name.

import pandas as pd

orders = pd.DataFrame({
    'order_id': [1, 2, 3, 4],
    'customer_id': [101, 102, 101, 103],
    'amount': [250, 80, 320, 150]
})

customers = pd.DataFrame({
    'customer_id': [101, 102, 104],
    'name': ['Alice', 'Bob', 'Diana']
})

All lessons in this course

  1. pd.concat for Stacking DataFrames
  2. pd.merge: Inner and Outer Joins
  3. Left and Right Joins
  4. Joining on the Index
← Back to Pandas & NumPy Academy