Pandas & NumPy Academy · Lesson

Useful Series Methods

Use describe(), value_counts(), unique(), and map() to summarise and transform a Series quickly.

Quick Summary with describe()

s.describe() generates a statistical summary in a single call: count, mean, standard deviation, min, 25th percentile (Q1), median (Q2), 75th percentile (Q3), and max. For string Series it returns count, unique, top, and frequency instead. It is the fastest way to get a feel for a column's distribution when first exploring a dataset.

import pandas as pd

ages = pd.Series([25, 32, 19, 45, 28, 31, 22, 40])
print(ages.describe())
# count     8.000000
# mean     30.250000
# std       8.406...
# min      19.000000
# 25%      24.250000
# 50%      29.500000
# 75%      34.750000
# max      45.000000

value_counts() for Frequency Tables

s.value_counts() returns a new Series with the unique values as the index and their occurrence counts as the data, sorted descending by count. Pass normalize=True to get proportions instead of raw counts. This is the first thing to run on a categorical column to understand its distribution.

import pandas as pd

colors = pd.Series(['red', 'blue', 'red', 'green', 'blue', 'red'])
print(colors.value_counts())
# red      3
# blue     2
# green    1
print(colors.value_counts(normalize=True).round(2))
# red      0.50  blue  0.33  green  0.17

All lessons in this course

← Back to Pandas & NumPy Academy