Hive Tutorial 3: Working with Databases


Actually, there are no “real” databases in Hive or Hadoop (unless you install HBase or so). All data is stored in files. However, with HiveQL, you get the feeling that it actually are databases. Therefore, we start by creating “databases” as a first start with Hive.

The syntax for creating databases is very easy:

CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT] [LOCATION] [WITH DBPROPERTIES] 

The easiest way to write is “CREATE DATABASE db”. All other options are optional:

  • IF NOT EXISTS: The new database is only created if it doesn’t exist already. If you don’t use this option and the database already exists, an error would be displayed.
  • COMMENT: Provides a comment for a new database, in case this is needed for further explanation.
  • LOCATION: Specifies a hdfs path for the new database.
  • WITH DBPROPERTIES: Specifies some additional properties for the database.

Deleting a database is also very similar to this. You can do this with the following syntax:

DROP DATABASE [IF EXISTS] 
database_name [CASCADE or RESTRICT]

Also here, the statement “DROP DATABASE db” is the easiest one. All other options are optional:

  • IF EXISTS: Prior deletion, checks if the database actually exists. If this command isn’t used and the database doesn’t exist, an error will be displayed.
  • CASCADE: Deletes tables first. If a database is marked for deletion but contains tables, an error would be produced otherwise
  • RESTRICT: Standard behavior for deletion. Would run into an error if tables exist in a database.

Easy, isn’t it? Now, let’s have some fun with Hive and create some Databases. Start the container we’ve created last time with Docker. Starting takes some time. Also make sure to start the hdp-proxy container. If you run into a bad gateway error (502), just wait some time and re-try. After that, you should be able to access Ambari again. Scroll down to “Data Analytics Studio” and click on “Data Analytics Studio UI”. You are then re-directed to a UI where you can write queries. The following image shows the welcome screen. Note: you might get to an error page, since it might wants to re-direct you to a wrong URL. exchange the URL with “127.0.0.1:30800” and you should be fine.

Hortonworks Data Analytics Studio

First, let’s create a new database. We call our database “university”. Note that we will use this database also over the next tutorials. Simply click on “Compose Query”. You should now see the query editor. Enter the following code:

CREATE DATABASE IF NOT EXISTS university;

After clicking “Execute”, your database will be created. The following image shows this:

Data Analytics Studio Query Editor

We also add the “IF NOT EXISTS” statement in order to not run into errors. We can delete the database with the DROP statement:

DROP DATABASE IF EXISTS university;

Re-create the database in case you did the drop now. In the next tutorial, we will look into how to work with tables in Hive.

I lead a team of Senior Experts in Data & Data Science as Head of Data & Analytics and AI at A1 Telekom Austria Group. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data & Data Science.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s