0Pricing
Scala for Backend Engineering & Functional Programming · Lesson

Aggregations

Group and aggregate.

Why Aggregate?

Aggregations summarize many rows into fewer: totals, averages, counts per group. In Spark they are wide operations that may shuffle data across the cluster.

Global Aggregates

Compute a single summary over the whole DataFrame with agg and functions like count, sum, avg, min, max.

import org.apache.spark.sql.functions._

df.agg(
  count("*").as("rows"),
  avg("age").as("avg_age")
).show()

All lessons in this course

  1. RDDs and DataFrames
  2. Transformations and Actions
  3. Spark SQL
  4. Aggregations
← Back to Scala for Backend Engineering & Functional Programming