0PricingLogin
Scala for Backend Engineering & Functional Programming · Lesson

RDDs and DataFrames

Spark data abstractions.

What Is Apache Spark?

Apache Spark is a distributed engine for large-scale data processing. It splits data across a cluster and runs computations in parallel. Scala is Spark's native language, giving a concise, type-aware API.

The RDD

The Resilient Distributed Dataset (RDD) is Spark's low-level abstraction: an immutable, partitioned collection that can be processed in parallel and rebuilt from lineage if a node fails.

All lessons in this course

  1. RDDs and DataFrames
  2. Transformations and Actions
  3. Spark SQL
  4. Aggregations
← Back to Scala for Backend Engineering & Functional Programming