Hadoop Tutorial – Apache HBase

HBase is one of the most popular databases in the Hadoop and NoSQL ecosystem. HBase is a highly-scaleable database that works with fulfilling the partition tolerance and availability of the CAP-Theorem. In case you aren’t familiar with the CAP-Theorem: the theorem states that requirements for a database are consistency, availability and partition tolerance. However, you can only have two of them and the third one comes with a trade-off.

HBase uses a Key/Value storage. The schema of a table in HBase is not present (schema-less), which gives you much more flexibility than with a traditional relational database. HBase takes care of the failover and sharding of data for you.

HBase uses HDFS as storage and ZooKeeper for the coordination. There are several region servers that are controlled by a master server. This is displayed in the next image.

Apache HBase
Apache HBase

I lead a team of Senior Experts in Data & Data Science as Head of Data & Analytics and AI at A1 Telekom Austria Group. I also teach this topic at various universities and frequently speak at various Conferences. In 2010 I wrote a book about Cloud Computing, which is often used at German & Austrian Universities. In my home country (Austria) I am part of several organisations on Big Data & Data Science.

One thought on “Hadoop Tutorial – Apache HBase

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s