In the last tutorial about Hive, we had a look at how to insert data into hive. Now, that we have the data in Hive, we look at how to access this data. Querying data is very easy and can be done by some easy steps with the hive SELECT statement. In it’s easiest form, it looks like the following:
The Hive select statement
SELECT fieldnames FROM tablename;
- fieldnames: Name of the fields to query – e.g. ID, firstname, lastname
- tablename: Name of the table to query the fields from – e.g. students
Of course, there is much more than that. After the tablename, you could specify a “WHERE” statement. This statement is capable of filtering data on specific criterias. A sample would be to limit the number of results only to students that are younger than 18. The following describes the Where-Statement.
SELECT fieldnames FROM tablename WHERE wherestatement;
Often, you want to order data in a certain way. This can be achieved by the “ORDER BY” statement. With this statement, you can order by the specified fields. A sample would be to sort them based on age. The statement is written like this:
SELECT fieldnames FROM tablename ORDER BY orderstatement;
Often, you only want to have a certain number of results returned. This can be done with the “LIMIT” statement. E.g. you only want to have the 10 most relevant items returned:
SELECT fieldnames FROM tablename LIMIT number;
Another common case is to group results by specific fields. This is useful if you want to create some functions on that:
SELECT fieldnames FROM tablename GROUP BY fieldnames;
Let’s now look at some code below:
SELECT * FROM university.students;
SELECT * FROM university.students WHERE gender = "female";
SELECT * FROM university.students WHERE gender = "female" ORDER BY lastname;
SELECT * FROM (
SELECT lastname, gender FROM university.students) sq;
In the first query, we want to return all students from the table. The second query only returns female students. The third one is ordering them by age. The last query shows that queries can be based on queries.
There are much more functions that can be applied and they can be chained (e.g. ORDER BY and WHERE).
This tutorial is part of the Apache Hive Tutorials. For more information about Hive, you might also visit the official page.