Aggregations
Group and aggregate.
Why Aggregate?
Aggregations summarize many rows into fewer: totals, averages, counts per group. In Spark they are wide operations that may shuffle data across the cluster.
Global Aggregates
Compute a single summary over the whole DataFrame with agg and functions like count, sum, avg, min, max.
import org.apache.spark.sql.functions._
df.agg(
count("*").as("rows"),
avg("age").as("avg_age")
).show()