Actually, there are no “real” databases in Hive or Hadoop (unless you install HBase or so). All data is stored in files. However, with HiveQL, you get the feeling that it actually are databases. Therefore, we start by creating “databases” as a first start with Hive.
The syntax for creating databases is very easy:
CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT] [LOCATION] [WITH DBPROPERTIES]
The easiest way to write is “CREATE DATABASE db”. All other options are optional:
- IF NOT EXISTS: The new database is only created if it doesn’t exist already. If you don’t use this option and the database already exists, an error would be displayed.
- COMMENT: Provides a comment for a new database, in case this is needed for further explanation.
- LOCATION: Specifies a hdfs path for the new database.
- WITH DBPROPERTIES: Specifies some additional properties for the database.
Deleting a database is also very similar to this. You can do this with the following syntax:
DROP DATABASE [IF EXISTS]
database_name [CASCADE or RESTRICT]
Also here, the statement “DROP DATABASE db” is the easiest one. All other options are optional:
- IF EXISTS: Prior deletion, checks if the database actually exists. If this command isn’t used and the database doesn’t exist, an error will be displayed.
- CASCADE: Deletes tables first. If a database is marked for deletion but contains tables, an error would be produced otherwise
- RESTRICT: Standard behavior for deletion. Would run into an error if tables exist in a database.
Easy, isn’t it? Now, let’s have some fun with Hive and create some Databases. Start the container we’ve created last time with Docker. Starting takes some time. Also make sure to start the hdp-proxy container. If you run into a bad gateway error (502), just wait some time and re-try. After that, you should be able to access Ambari again. Scroll down to “Data Analytics Studio” and click on “Data Analytics Studio UI”. You are then re-directed to a UI where you can write queries. The following image shows the welcome screen. Note: you might get to an error page, since it might wants to re-direct you to a wrong URL. exchange the URL with “127.0.0.1:30800” and you should be fine.
First, let’s create a new database. We call our database “university”. Note that we will use this database also over the next tutorials. Simply click on “Compose Query”. You should now see the query editor. Enter the following code:
CREATE DATABASE IF NOT EXISTS university;
After clicking “Execute”, your database will be created. The following image shows this:
We also add the “IF NOT EXISTS” statement in order to not run into errors. We can delete the database with the DROP statement:
DROP DATABASE IF EXISTS university;
Re-create the database in case you did the drop now. In the next tutorial, we will look into how to work with tables in Hive.