A Deep Dive Into Data Lakes

672

In the age of Big Data, we’ve had to come up with new terms to describe large-scale data storage. We have databases, data warehouses and now data lakes.

While they all contain data, these terms describe different ways of storing and using that data. Before we discuss data lakes and why they are important, let’s examine how they differ from databases and data warehouses.

Let’s start here: A data warehouse is not a database. Although you could argue that they’re both relational data systems, they serve different purposes. Data warehousing allows you to pull data together from a number of different sources for analysis and reportiong. Data warehouses store vast amounts of historical data for complex queries across all data types being pulled together.

Data lakes are centralized storage and data repositories that allow you to work with a variety of different types of data. The cool thing here is that you don’t need to structure the data and it can be imported “as-is.” This allows you to work with raw data and run analytics, data visualization, big data processing, machine learning tools, AI, and much more. This level of data agility can actually give you some pretty cool competitive advantages.

 

Read more at Datacenter Frontier