When it comes to Big Data, people are often talking about the “Data Lake”. But what is this?

Historically, we normally lived in “Data Ponds”. With the data pond architecture, each department within a company has it’s own data storage, often in different formats and technologies. HR, for instance, uses other technologies like the marketing department. The basics for that vary, but it is mostly due to the fact that these applications are too different.

With a data pond we used to have different storage technologies such as SQL, NoSQL, XML, unstructured data and many more available.

The major difference to a data lake, which is the new approach, is that all data is now seen as one thing – regarding less of where it is stored, what department is the data owner and so on. All data within a company is the company’s entire knowledgement. With new technologies such as Hadoop, we have the possibility to use all available data. Hadoop offers many data integration and governance tools to go to different data types.

With the Data Lake, all existing data ponds are joined together to one place, that forms up a data lake. The company or organisation gets a much better view on what data is available and it also gets a more comprehensive insight.

Header Image copyright under the creative commons license by Dave Bloggs.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!