Hadoop Tutorial – Apache Hive and Apache HCatalog


One of the easiest to use tools in Hadoop is Hive. Hive is very similar to SQL and is easy to learn for those that have a strong SQL background. Apache Hive is a data-warehousing tool for Hadoop, focusing on large datasets and how to create a structure on them. Hive queries are written in HiveQL. HiveQL is very similar to SQL, but not the same. As already mentioned, HiveQL translates to MapReduce and therefore comes with minor performance trade-offs. HiveQL can be extended by custom code and MapReduce queries. This is useful, when additional performance is required. The following listings will show some Hive queries. The first listing will show how to query two rows from a dataset. hive> SELECT column1, column2 FROM dataset2 5 4 9 5 7 5 9 Listing 2: simple Hive query The next sample shows how to include a where-clause. hive> SELECT DISTINCT column1 FROM dataset WHERE column2 = 91 Listing 3: where in

read more Hadoop Tutorial – Apache Hive and Apache HCatalog