Scala for Backend Engineering & Functional Programming · Lesson

RDDs and DataFrames

Spark data abstractions.

What Is Apache Spark?

Apache Spark is a distributed engine for large-scale data processing. It splits data across a cluster and runs computations in parallel. Scala is Spark's native language, giving a concise, type-aware API.

The RDD

The Resilient Distributed Dataset (RDD) is Spark's low-level abstraction: an immutable, partitioned collection that can be processed in parallel and rebuilt from lineage if a node fails.

All lessons in this course

RDDs and DataFrames
Transformations and Actions
Spark SQL
Aggregations

← Back to Scala for Backend Engineering & Functional Programming