This is the overview page for the Apache Spark Tutorial. This tutorial should give you a quick overview of Apache Spark. The entire tutorial is written in Python (PySpark). If you are not aware of Python, you can learn this via this Python Tutorial, that is dedicated to people already familiar with Software Development. The goal of this tutorial is to get you started fast with Spark and learn about the different aspects, such as Spark Dataframes, RDDs, Actions and different Data Transformations.

Apache Spark Tutorial Content:

  1. Getting started with Apache Spark: the first part of the tutorial series
  2. Setting up the environment: how to use Jupyter and Spark
  3. Introduction to RDDs: the elementary type of Spark and how to use them
  4. The first part on how to use Data Transformations on RDDs
  5. The second part on how to use Data Transformations on RDDs
  6. The third part on how to use Data Transformations on RDDs
  7. Actions ins Spark: how to use Actions in Spark
  8. Spark Dataframes: How to work with the high-level API in Apache Spark
  9. Spark Dataframes: Filtering, Ordering and Grouping data
  10. Spark Dataframes: Agglomerations in Apache Spark
  11. Spark Dataframes: Joining Data in Apache Spark
  12. Spark Dataframes: Limiting Data Results in Apache Spark
  13. Spark Dataframes: Dealing with wrong, corrupt or missing data in Spark
  14. Spark Dataframes: Working with Columns and Rows in Spark
  15. Spark Dataframes: Cubes and Rollups in Spark

If you want to learn everything about Apache Spark, make sure to visit the Spark Website. If you want to learn more about Data Science and Data Engineering, Have a look at the other Tutorials on it.

Do you want to get notified when new tutorials arrive? Subscribe now

A sourcecode for Apache Spark in Jupyter
Apache Spark Source Code in Jupyter