An elephant is the logo for Hadoop

This is the overview page for the Apache Hadoop Tutorial. This tutorial should give you a quick overview of Apache Hadoop. It is partly taken from the E-book about Hadoop. You can get it for free if you subscribe to this blog here.

Apache Hadoop Tutorial Contents

  1. Getting started with Hadoop: Overview of the Hadoop technology and introduction
  2. The Hadoop technology stack: The different technologies that make up for the Hadoop tech stack.
  3. Apache Ambari for Cluster Management: How to manage a Hadoop cluster with Ambari and easy administration
  4. Apache Zookeeper for distributed coordination: How Hadoop manages “itself” and insights into Zookeeper.
  5. Managing Workflows with Oozie: How Hadoop manages workflows internally
  6. The Hadoop distributed file system (HDFS): How the file system works in Hadoop and how files are distributed
  7. Yarn in Hadoop: The core of the distributed system and answer to all
  8. Apache HBase: The distributed key/value store in Hadoop
  9. Apache Accumulo: A distributed database in Hadoop
  10. MapReduce: the initial idea of how to process large amounts of data
  11. Apache Hive: distributed SQL queries in Hadoop. Because people need to work with SQL
  12. Pig: a data-flow oriented language in Hadoop
  13. Apache Storm: real-time data processing in Hadoop
  14. Apache Giraph: Graph processing in Hadoop
  15. Mahout: Data Science in Hadoop
  16. Apache Flume: Analysing log data in Hadoop
  17. Sqoop: Import large amounts of data
  18. Apache Avro: Data serialisation in Hadoop

There are several things you can learn about Data Science and Data Engineering. A logical next step after this tutorial is to learn about Apache Spark in this Tutorial. You might also want to learn about the basics of Machine Learning in this tutorial and get more insights. If you want to know more about Python for Data Science, you can learn this here. If you want to learn more about Apache Hive, you can do this here. The official Hadoop website is here.

Do you want to learn more and get notified when new tutorials arrive? Subscribe here:

An elephant is the logo for Hadoop. This image is used for the Apache Hadoop Tutorial
Apache Hadoop