This is the overview page for the Apache Hadoop Tutorial. This tutorial should give you a quick overview of Apache Hadoop. It is partly taken from the E-book about Hadoop. You can get it for free if you subscribe to this blog here.
Apache Hadoop Tutorial Contents
- Getting started with Hadoop: Overview of the Hadoop technology and introduction
- The Hadoop technology stack: The different technologies that make up for the Hadoop tech stack.
- Apache Ambari for Cluster Management: How to manage a Hadoop cluster with Ambari and easy administration
- Apache Zookeeper for distributed coordination: How Hadoop manages “itself” and insights into Zookeeper.
- Managing Workflows with Oozie: How Hadoop manages workflows internally
- The Hadoop distributed file system (HDFS): How the file system works in Hadoop and how files are distributed
- Yarn in Hadoop: The core of the distributed system and answer to all
- Apache HBase: The distributed key/value store in Hadoop
- Apache Accumulo: A distributed database in Hadoop
- MapReduce: the initial idea of how to process large amounts of data
- Apache Hive: distributed SQL queries in Hadoop. Because people need to work with SQL
- Pig: a data-flow oriented language in Hadoop
- Apache Storm: real-time data processing in Hadoop
- Apache Giraph: Graph processing in Hadoop
- Mahout: Data Science in Hadoop
- Apache Flume: Analysing log data in Hadoop
- Sqoop: Import large amounts of data
- Apache Avro: Data serialisation in Hadoop
There are several things you can learn about Data Science and Data Engineering. A logical next step after this tutorial is to learn about Apache Spark in this Tutorial. You might also want to learn about the basics of Machine Learning in this tutorial and get more insights. If you want to know more about Python for Data Science, you can learn this here. If you want to learn more about Apache Hive, you can do this here. The official Hadoop website is here.