Hive Tutorial 2: Setting up the Hortonworks Sandbox with Docker


In the following tutorials, we will use the Hortonworks Sandbox to use Hive. Hortonworks is one of the Hadoop distributions (next to Cloudera and MapR) and a pre-configured environment. There is no need for additional setup or installations. Hortonworks is delivered via different VMs or also as a Docker container. We use this, as it is the easiest way (and you don’t need to install any VM tools). To get started, download the latest Docker environment for your computer/mac: https://www.docker.com/get-started

Follow the installation routine throughout, it is easy and straight forward. Once done, download the Hortonworks image fromhttps://hortonworks.com/downloads/#sandbox

As an install type, select “Docker” and make sure that you have the latest version. As of writing this article, the current version of HDP (Hortonworks Data Platform) is 3.0. Once you have finished the download, execute the Docker file (on Linux and Mac: docker-deploy-hdp30.sh). After that, the script pulls several repositories. This takes some time, as it is several GB in size – so be patient!

The script also installs the HDP proxy tool, which might cause some errors. If you have whitespaces in your directories, you need to edit the HDP proxy sh file (e.g. with vim) and set all paths under “”. Then, everything should be fine.

The next step is to change the admin password in your image. To do this, you need to SSH into the machine with the following command:

docker exec -it sandbox-hdp /bin/bash

Execute the following command:

ambari-admin-password-reset

Now re-type the passwords and the services will re-start. After that, you are ready to use HDP 3.0. To access your hdp, use your local ip (127.0.0.1) with port 8080. Now, you should see the Ambari Login screen. Enter “admin” and your password (the one you reset in the step before). You are now re-directed to your administration interface and should see the following screen:

HDP 3.0 with Ambari

You might see that most of your services are somewhat red. In order to get them to work, you need to restart them. This takes some time again, so you need to be patient here. Once your services turned green, we are ready to go.

Have fun exploring HDP – we will use it in the next tutorial.

I lead a team of Senior Experts in Data & Data Science as Head of Data & Analytics and AI at A1 Telekom Austria Group. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data & Data Science.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s