Kick Start Apache Pig – Apache Pig e-book

Kick Start Hadoop: Apache Pig

Getting started with Data Science on Hadoop

Kick Start Hadoop: Apache Pig is an e-book on the Hadoop Technology – Apache Pig. The focus of the kick start series is to provide a very fast entry into a new technology. This e-book is useful if you need to build up knowledge on Pig within hours and don’t want to spend weeks learning the content. The e-book is useful for consultants, managers, trainers, students and sales staff, that need an overview of Apache Pig This book is all about get you started fast without the need to spend days or even weeks on trying to understand the technology.

This Kick Start is more technical then the others and some knowledge of Hadoop is necessary. The Kick Start comes with a lot of samples that can be downloaded from the book homepage. You will learn Pig with a lot of samples to be able to start working with it in little time.

You can find the e-book on Amazon here.

You can download the sample files here: Sample Files

From the content:


1 Introduction
1.1 Overview on Big Data
1.2 What is Hadoop and why is it important for Big Data?
1.3 The Hadoop Stack
2 Getting Started
2.1 Using the HortonWorks VM
2.1.1 Starting and accessing the virtual machine
2.2 The Datasets
3 Data Types in Apache Pig
3.1 Basic Data Types
3.1.1 Creating Schemas for Data types
3.1.2 Expressions and Operators in Apache Pig
3.1.3 Casting Data in Apache Pig
3.1.4 Working with Null Values in Apache Pig
3.1.5 Working with Strings in Apache Pig
3.1.6 Working with Boolean values in Apache Pig
3.1.7 Working with Date and Time in Apache Pig
3.2 Complex Data Types
3.2.1 Operations on complex data types
3.2.2 Constructors
3.2.3 Deference Operators
4 Accessing and Storing Data with Apache Pig
4.1 Basics
4.2 Handling Files
4.2.1 Text and Binary Files
4.2.2 Working with CSV Files
4.2.3 Working with JSON Files
4.2.4 Working with XML Files
4.3 Accessing Databases
4.3.1 Working with HCatalog
4.3.2 Working with HBase
5 Relational Statements in Apache Pig
5.1 Grouping and Filtering Data
5.1.1 Assert
5.1.2 Cogroup / Group
5.1.3 Distinct
5.1.4 Filter
5.1.5 Foreach
5.1.6 Limit
5.1.7 Order By
5.1.8 Rank
5.2 Joining Data
5.2.1 Cross
5.2.2 Join
5.2.3 Split
5.2.4 Union
6 Functions in Apache Pig
6.1 Mathematical Functions
6.2 Evaluation Functions
Table of Figures
Code Listings

Cover Image Copyright: Pete (
Cover Image Licensed under the Creative Commons License 2.0 (