this is the overview page for the Apache Hadoop Tutorial. This tutorial should give you a quick overview of Apache Hadoop. It is partly taken from the E-book about Hadoop. You can get it for free if you subscribe to this blog here.
Table of Contents
- Getting started with Hadoop: Overview of the Hadoop technology
- The Hadoop technology stack: The different technologies that make up for the Hadoop tech stack.
- Apache Ambari for Cluster Management: How to manage a Hadoop cluster with Ambari
- Apache Zookeeper for distributed coordination: How Hadoop manages “itself”.
- Managing Workflows with Oozie: How Hadoop manages workflows internally
- The Hadoop distributed file system (HDFS): How the file system works in Hadoop
- Apache Yarn in Hadoop: The core of the distributed system
- Apache HBase: The distributed key/value store in Hadoop
- Apache Accumulo: A distributed database in Hadoop
- MapReduce: the initial idea of how to process large amounts of data
- Apache Hive: distributed SQL queries in Hadoop
- Apache Pig: a data-flow oriented language in Hadoop
- Apache Storm: real-time data processing in Hadoop
- Apache Giraph: Graph processing in Hadoop
- Apache Mahout: Data Science in Hadoop
- Apache Flume: Analysing log data in Hadoop
- Apache Sqoop: Import large amounts of data
- Apache Avro: Data serialisation in Hadoop