In our last tutorial, we had some brief introduction to Apache Spark. Now, in this tutorial we will have a look into how to setup an environment to work with Apache Spark. To get started, we first need to install Docker. If you don’t have it yet, find out how to install it from this link: https://docs.docker.com/install/

Once your docker is installed successfully, download the container for Spark via Kitematic. Select “all-spark-notebook” for our samples. Note that the download will take a while.

Download Apache Spark for Docker

Once your download has finished, it is about time to start your Docker container. When you download the container via Kitematic, it will be started by default. Within the container logs, you can see the URL and port to which Jupyter is mapped. Open the URL and enter the Token. When everything works as expected, you can now create new Notebooks in Jupyter.

Enter the URL and the Token
Jupyter is running
1 reply

Trackbacks & Pingbacks

  1. […] tool to work with Spark. If you haven’t installed Jupyter yet, you can read how to do it in this tutorial. Below are several variable assignments for different types. If you want to print the content of a […]

Leave a Reply

Want to join the discussion?
Feel free to contribute!