Big Data 101: Partitioning

Partitioning is another factor for Big Data Applications. It is one of the factors of the CAP-Theorem (see 1.6.1) and is also important for scaling applications. Partitioning basically describes the ability to distribute a database over different servers. In Big Data Applications, it is often not possible to store everything on one (Josuttis, 2011) The factors for partitioning illustrated in the Figure: Partitioning are described by (Rys, 2011). Functional partitioning is basically describing the service oriented architecture (SOA) approach (Josuttis, 2011). With SOA, different functions are provided by their own services. If we talk about a Web shop such as Amazon, there are a lot of different services involved. Some Services handle the Order Workflow; other Services handle the search and so on. If there is high load on a specific service such as the shopping cart, new instances can be added on demand. This reduces the risk of an outage that would lead to loosing money. Building a service-oriented

read more Big Data 101: Partitioning

Big Data 101: Scalability

Scalability is another factor of Big Data Applications described by (Rys, 2011). Whenever we talk about Big Data, it mainly involves high-scaling systems. Each Big Data Application should be built in a way that eases scaling. (Rys, 2011) describes several needs for scaling: user load scalability, data load scalability, computational scalability and scale agility. The figure illustrates the different needs for scalability in Big Data environments as described by (Rys, 2011). Many applications such as Facebook (Fowler, 2012) have a lot of users. Applications should support the large user base and should stay prone to errors in case the application sees unexpected high user numbers. Various techniques can be applied to support different needs such as fast data access. A factor that often – but not only – comes with a high number of users is the data load. (Rys, 2011) describes that some or many users can produce this data. However, things such as sensors and other devices that

read more Big Data 101: Scalability

Big Data 101: Data agility

Agility is an important factor to Big Data Applications. (Rys, 2011) describes 3 different agility factors which are: model agility, operational agility and programming ability. Model agility means how easy it is to change the Data Model. Traditionally, in SQL Systems it is rather hard to change a schema. Other Systems such as non-relational Databases allow easy change to the Database. If we look at Key/Value Storages such as DynamoDB (Amazon Web Services, 2013), the change to a Model is very easy. Databases in fast changing systems such as Social Media Applications, Online Shops and other require model agility. Updates to such systems occur frequently, often weekly to daily (Paul, 2012). In distributed environments, it is often necessary to change operational aspects of a System. New Servers get added often, also with different aspects such as Operating System and Hardware. Database systems should stay tolerant to operational changes, as this is a crucial factor to growth. Database Systems should support

read more Big Data 101: Data agility

Big Data 101: Transformable and Filterable Data

Transformable If data is transformed, it can be changed to a different format or layout. This could as well mean the format change from binary to e.g. Json or XML as well as a totally new representation. If someone wants to look at a specific dataset (which, for instance, could be filtered) not all data might be interesting. Let’s assume that a manager wants to filter for all Customers younger than 18 in a specific district. The manager is probably not interested in the names of the customer but rather in the sum of customers. Instead returning a huge list of Names with addresses and alike, a number is returned. Or the online marketing department wants to target all customers with specific criteria such as age, the address might not be relevant, but Names and E-Mail are. Transformability is also a necessary characteristic if data has to be exported to another database, e.g. for analytics. Filterable is a key characteristic to

read more Big Data 101: Transformable and Filterable Data