Pandas for Data Manipulation
Learners will load CSV files into DataFrames, filter rows, select columns, handle missing values, and compute summary statistics.
What Is Pandas and Why Use It?
Pandas handles tabular data — rows and columns, like a spreadsheet. Its DataFrame is where you load and clean raw data before it ever reaches your model.
import pandas as pd
import numpy as np
# Create a DataFrame manually
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Carol', 'Dave'],
'age': [25, 30, 35, 28],
'salary': [50000, 70000, 90000, 60000],
'department': ['Engineering', 'Marketing', 'Engineering', 'Sales']
})
print(df)
print('\nShape:', df.shape) # (4, 4)Loading Data from CSV Files
Load a file in one line with pd.read_csv(). Then always inspect it: .head(), .info(), and .describe() reveal shape, types, and missing values in seconds.
import pandas as pd
# Load a CSV
df = pd.read_csv('titanic.csv')
# First inspection
print(df.head()) # first 5 rows
print(df.tail(3)) # last 3 rows
print(df.info()) # column types + non-null counts
print(df.describe()) # count, mean, std, min, max for numeric cols
print(df.columns.tolist()) # column names
print(df.shape) # (891, 12)