Apache Hadoop Tutorial

Hi there,

this is the overview page for the Apache Hadoop Tutorial. This tutorial should give you a quick overview of Apache Hadoop. It is partly taken from the E-book about Hadoop. You can get it for free if you subscribe to this blog here.

Table of Contents

  1. Getting started with Hadoop: Overview of the Hadoop technology
  2. The Hadoop technology stack: The different technologies that make up for the Hadoop tech stack.
  3. Apache Ambari for Cluster Management: How to manage a Hadoop cluster with Ambari
  4. Apache Zookeeper for distributed coordination: How Hadoop manages “itself”.
  5. Managing Workflows with Oozie: How Hadoop manages workflows internally
  6. The Hadoop distributed file system (HDFS): How the file system works in Hadoop
  7. Apache Yarn in Hadoop: The core of the distributed system
  8. Apache HBase: The distributed key/value store in Hadoop
  9. Apache Accumulo: A distributed database in Hadoop
  10. MapReduce: the initial idea of how to process large amounts of data
  11. Apache Hive: distributed SQL queries in Hadoop
  12. Apache Pig: a data-flow oriented language in Hadoop
  13. Apache Storm: real-time data processing in Hadoop
  14. Apache Giraph: Graph processing in Hadoop
  15. Apache Mahout: Data Science in Hadoop
  16. Apache Flume: Analysing log data in Hadoop
  17. Apache Sqoop: Import large amounts of data
  18. Apache Avro: Data serialisation in Hadoop

Do you want to get notified when new tutorials arrive? Subscribe here: