One of the easiest to use tools in Hadoop is Hive. Hive is very similar to SQL and is easy to learn for those that have a strong SQL background. Apache Hive is a data-warehousing tool for Hadoop, focusing on large datasets and how to create a structure on them.

Hive queries are written in HiveQL. HiveQL is very similar to SQL, but not the same. As already mentioned, HiveQL translates to MapReduce and therefore comes with minor performance trade-offs. HiveQL can be extended by custom code and MapReduce queries. This is useful, when additional performance is required.

The following listings will show some Hive queries. The first listing will show how to query two rows from a dataset.

hive> SELECT column1, column2 FROM dataset2 5

4 9

5 7

5 9

Listing 2: simple Hive query

The next sample shows how to include a where-clause.

hive> SELECT DISTINCT column1 FROM dataset WHERE column2 = 91

Listing 3: where in Hive

HCatalog is an abstract table manager for Hadoop. The target of HCatalog is to make it easier for users to work with data. Users see everything like it would be a relational database. To access HCatalog, it is possible to use a Rest API.