Machine Learning Academy · Lesson

Pandas for Data Manipulation

Learners will load CSV files into DataFrames, filter rows, select columns, handle missing values, and compute summary statistics.

What Is Pandas and Why Use It?

Pandas handles tabular data — rows and columns, like a spreadsheet. Its DataFrame is where you load and clean raw data before it ever reaches your model.

import pandas as pd
import numpy as np

# Create a DataFrame manually
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Carol', 'Dave'],
    'age': [25, 30, 35, 28],
    'salary': [50000, 70000, 90000, 60000],
    'department': ['Engineering', 'Marketing', 'Engineering', 'Sales']
})
print(df)
print('\nShape:', df.shape)  # (4, 4)

Loading Data from CSV Files

Load a file in one line with pd.read_csv(). Then always inspect it: .head(), .info(), and .describe() reveal shape, types, and missing values in seconds.

import pandas as pd

# Load a CSV
df = pd.read_csv('titanic.csv')

# First inspection
print(df.head())       # first 5 rows
print(df.tail(3))      # last 3 rows
print(df.info())       # column types + non-null counts
print(df.describe())   # count, mean, std, min, max for numeric cols
print(df.columns.tolist())  # column names
print(df.shape)        # (891, 12)

All lessons in this course

Installing Anaconda and Jupyter Notebook
NumPy Essentials: Arrays and Math Operations
Pandas for Data Manipulation
Visualising Data with Matplotlib and Seaborn

← Back to Machine Learning Academy