Data representation is an often-mentioned characteristic for Big Data. It goes well with “Variety” in the above stated definition. Each Data is represented in a specific form and it doesn’t matter what form it is. Well-known forms of Data are XML, Json, CSV or binary. Depending on the Representation of Data, different possibilities regarding relations can be integrated.

XML and Json for instance allows us to set child-objects or relations for data, whereas it is rather hard with CSV or binary. A possibility for relations can be a dataset of the type “Person”. Each person consists of some attributes that identify the person (e.g. the last name, age, sex) and an address that is an independent entity. To retrieve this data as CSV or binary, you either have to do two queries or create a new entity for a query where the data is merged. XML and Json allows us to nest entities in other entities.

What is Data representation?

data-entity

data-entity

The in Figure described entity would look like the following, if presented in XML:

<person><common>

 

<firstname>Mario</firstname>

<lastname>Meir-Huber</lastname>

<age>29</age>

</common>

<address>

<zipcode>1150</zipcode>

<city>Vienna</city>

</address>

</person>

Listing 1: XML representation of the entity “person”

Similar to that, the Json representation of our Model “Person” would look slightly similar:

[Person :[Common :

 

[“firstname” : “Mario”, “lastname” : “Meir-Huber”, “Age” : 29]

]

[Address :

[“zipcode” : “1150”, “city” : “Vienna”]

]

]

Listing 2: Json interpretation

The traditional way of data representation: SQL

If we now look at how we could represent this data from a database as binary data, we need to join two different datasets. This is basically supported by SQL. A possible representation could look like the following:

p.Firstname p.Lastname p.Age a.Zipcode a.City
Mario Meir-Huber 29 1150 Vienna

Listing 3: SQL-based binary representation

The representation of Data isn’t limited to what was described in this chapter so far. There are several other formats available and others might arise in the future. However, data must have a clear and documented representation in a form that can be processed by Tools that built upon that data.

I hope you enjoyed the first part of this tutorial about big data technologies. This tutorial is part of the Big Data Tutorial. Make sure to read the entire tutorials.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!