Posts

2 Big Data and Hadoop E-Books are available at a special promotion. The reduced price is only valid for 1 week, so make sure to order soon! The offer expires on 21th of December and are available on the Kindle store. The two E-Books are:

  • Big Data (Introduction); 0.99$ instead of 5$: Get it here
  • Hadoop (Introduction); 0.99$ instead of 5$: Get it here

Have fun reading it!

I’ve created a new E-Book providing an Overview on the Hadoop technology. The usual price is 4.99 USD but is available until the end of the week for only 0.99 cent, which is a massive discount for early buyers. The E-Book gives an overview of Hadoop projects and is intended to those that need to get started fast with Hadoop. It focuses on explaining the technology stack rather than explaining details about each technology itself.
From the cover:
Kick Start: Hadoop is an e-book on the Hadoop Technology. The focus of the kick start series is to provide a very fast entry into a new technology. This e-book is useful if you need to build up knowledge on Hadoop within hours and don’t want to spend weeks learning the content. The e-book is useful for consultants, managers, trainers, students and sales staff, that need an overview of all Hadoop technologies but don’t need to understand the technical details. This book is all about get you started fast without the need to spend days or even weeks on trying to understand the technology.
From the Index:
1 Introduction
1.1 Overview on Big Data
1.2 What is Hadoop and why is it important for Big Data?
1.3 The Hadoop Stack
2 Cluster Management with Hadoop
2.1 Apache Ambari
2.2 ZooKeeper
2.3 Oozie
3 Infrastructure and Support
3.1 The Hadoop File System (HDFS)
3.2 Hadoop Commons
3.3 Apache Yarn
4 Storing Data with Hadoop
4.1 HBase
4.2 Accumulo
4.3 Other Databases
5 Accessing Data with Hadoop
5.1 MapReduce for Native Data Access
5.2 SQL Tools in Hadoop with Apache Hive and Apache HCatalog
5.3 Scripting Data with Apache Pig
5.4 Accessing Streaming Data with Apache Storm
5.5 Accessing Real-Time Data with Apache S4
5.6 Graph Data in Hadoop with Apache Giraph and Tez
6 Data Science in Hadoop with Apache Mahout
7 Data Governance and Data Integration In Hadoop
7.1 Apache Falcon
7.2 Apache Flume
7.3 Apache Sqoop
7.4 Apache Avro
8 User Interface in Hadoop with Apache Hue
You can obtain the E-Book on Amazon for Kindle here:

Kick Start: Big Data is an E-Book about Big Data. A kick start is an ebook that readers can read within short amount and get started really fast without the need to invest days in reading a book. The target of Kick starts is to learn all the important things about a specific topic in a short and easy to read ebook. The first of this series is on Big Data. Readers will learn what Big Data is, what core technologies are involved and where you can go from there. Some technologies featured in this ebook are: Hadoop, NoSQL Databases, Data Storage techniques, Data analytic techniques and many more.

Availabe in Amazon Stores:

Index:

Introduction to Big Data…………………………………………………………………. 7

  1. 1.1  Defining Big Data……………………………………………………………………. 7
  2. 1.2  Characteristics for Big Data……………………………………………………. 14

Challenges for Big Data ………………………………………………………………… 23

  1. 2.1  Storage Performance ……………………………………………………………. 23
  2. 2.2  Different Storage Systems …………………………………………………….. 25
  3. 2.3  Data partitioning and concurrency …………………………………………. 26
  4. 2.4  Moving Data for Analysis ………………………………………………………. 27

Creating Big Data Applications………………………………………………………. 29
3.1 Big Data Analysis iteration …………………………………………………….. 29
Big Data Management …………………………………………………………………. 32
4.1 Hardware Foundations …………………………………………………………. 32

  1. 4.1.1  Storage devices …………………………………………………………….. 32
  2. 4.1.2  Raid Systems ………………………………………………………………… 33
  3. 4.1.3  Requirements for private and public Cloud Solutions ………… 34

4.2 Data Storage and Software attributes …………………………………….. 39

  1. 4.2.1  Data Quality Attributes ………………………………………………….. 40
  2. 4.2.2  CAP Theorem ……………………………………………………………….. 42
  3. 4.2.3  Relational Database Management Systems ……………………… 45
  1. 4.2.4  NoSQL………………………………………………………………………….. 48
  2. 4.2.5  Hybrid RDBMS/NoSQL Systems ………………………………………. 52

Big Data Platforms ………………………………………………………………………. 55
5.1 Apache Hadoop……………………………………………………………………. 55
5.1.1 Hadoop Projects……………………………………………………………. 55
Big Data Analytics………………………………………………………………………… 58

  1. 6.1  Machine Learning…………………………………………………………………. 58
  2. 6.2  Data Mining…………………………………………………………………………. 58
  3. 6.3  Apache Mahout……………………………………………………………………. 60

Big Data Utilization………………………………………………………………………. 61
Appendix ……………………………………………………………………………………. 63

  1. 8.1  Table of Figures ……………………………………………………………………. 63
  2. 8.2  Table of Listings……………………………………………………………………. 64

References …………………………………………………………………………………. 65
 

Cover Image Copyright: Pete (https://www.flickr.com/photos/comedynose/) Cover Image Licensed under the Creative Commons License 2.0 (https://creativecommons.org/licenses/by/2.0/)