Data Lakes → Data Swamps → Data Lake House

The Data Lakehouse - The next evolution

12/29/20231 min read

Data Lakes → Data Swamps → Data Lake House

The Data Lakehouse - The next evolution

A data warehouse holds highly structured data.

A data lake (a term originated in 2011 from data vendor Pentaho (now Hitachi) as a way to reduce data silos that were forming in Data Warehouse-based ecosystems) holds structured and unstructured data. Key was to collect and keep every type of data, because when you would make use of it then it was available. If all data is collected and stored, but never used, then a Data Swamp will arise. As with “Lake cleaning behavior” when solving data quality issues, nobody gets delighted with a data swamp.

The Data Warehouse and the Data Lake merge into the Data Lake House: Data Lake → Data Swamp → Data Lake House. Bill Inmon, Mary Levins and Ranjeet Srivastava have published the book “Building the Data Lakehouse” in 2021.