In today's technological world with evolution of digitization, the speed at which we create the data is tremendous. A study last year shows that at-least 90% of data which exists in the world is generated in last 2 years. It has also been said that Data is the world's most valuable asset.
There is always a need for something new when we see that something is evolving at rapid speed. Once such which evolved in recent times is Data Lake. It's nothing but a lake of data which receives data from various sources and stores in it's natural format. Though it looks similar to the existing concept of Data Warehouse, Data Lake has it's own purpose to serve.
With growing competition in every field, there is always a need to provide he best to the consumers. The major use of Data Lake is that it receives data from every possible source & we can generate business value out of it (like making predictive analytics by gaining more insight).
How does a Data Lake different from Data Warehouse ?
A Data Warehouse contains relational & structured data from various known systems. The structure of the data is already defined and the received data is transformed before it is stored by means of Extract, Transform & Load (ETL).
A Data Lake stores data as such as it receives, both structured and unstructured. The stored data is transformed only when it is needed for processing by means of Extract, Load & Transform (ELT).
Image Credits : https://softcrylic.com/blogs/power-bi-and-power-query-elt-workflows-vs-etl/
With more historical data by using Data Lake, it's possible to have better forecasts. It's also possible to combine data from a numerous sources.
The potential challenge with Data Lake could be too much of data. It needs to be processed right with a right approach & strategy. Also security of the data should be taken into consideration.Normally a typical Organization would need both Data Warehouse & Data Lake but the ones which has Data Lake outperform , shows a study.