Data warehouse and data lake are words often used within the world of databases and database management.
However, these two terms are often confused and misused. This blog will clear up some of the confusion surrounding these two terms.
Before we can get into the differences between warehouses and lakes it is important that one has a clear frame of reference and understands the context in which these terms exist.
Data warehouses and lakes are not specific types of databases themselves; these two concepts are types of database architectures.
Warehouses and lakes are large parts of an organization’s business intelligence strategies today.
Data warehousing is a way to organize multiple databases across an organization for the purpose of being able to perform large scale analytics on huge amounts of data.
In a data warehouse, you’ll find highly structured, related, refined data that is very organized. For example, customer information: names, birthdays, addresses and social security numbers.
A good example of an organization that would use this type of data architecture is American Express.
Below is an example of this type of architecture
In the data lake you find all types of data stored in its raw or native form.
Because it is stored in its raw form, any type of data can be kept in this architecture style, even structured.
The below image offers an easy to understand illustration of data lake architecture.
For example, images, videos, music and text. A good example of an organization that would use this type of data architecture is Instagram.
***It should be noted that data lakes have a risk of turning into what's referred to as a data swamp.
When a data lake collects a massive amount of data that sits unused for long periods of time and isn't managed properly it can simply become a mess. A frustrating swamp of unusable or irrelevant data collected through years of imports that now are of little use to the organization in question.
Talend has a great article explaining the concept of the data swamp in great detail as well as how to avoid it.
Data warehouses offer organizations a number of benefits. See below:
All of these listed points provide a high degree of ROI and increased business intelligence.
Data lakes also have their own set of advantages they provide. See below
The result of all these advantages is the same as a data warehouse, a high degree of ROI and increased business intelligence.
These two concepts exist concurrently in organizations today.
For example, a data lake may take in tons of raw, unfiltered, unorganized data and act as a landing pad of sorts until it then turns the data over to be refined.
When used correctly, these two database architectures can help organizations manage and organize huge amounts of data.
Data is the lifeblood of business. Therefore, how your company stores and manages it can mean the difference between success and failure. Be smart, know and understand the differences and advantages of these two data management strategies in order to help optimize your organization's business intelligence strategies.