Data warehouse, Data lake, data lakehouse, data fabric, data mesh

7 min readOct 6, 2022

These 5 words are usually mentioned in data stack talk. However, the first 2 are more about data storage, while data lakehouse and data fabric are more about data infrastructure, data mesh is more about decentralized data product building.

Data Warehouse vs. Data Lake vs. Data Lakehouse: An Overview of Three Cloud Data Storage Patterns

As more companies rely on data to drive critical business decisions, improve product offerings, and serve customers…

www.striim.com

Data warehouse comes first in history, when the main use cases are Business intelligence, reporting, visualization. It provides “source of truth” for decision-makers.

Disadvantage of data warehouse: lack of data flexibility: performs well with structured data, but it struggles for semi-structured and unstructured data formats (images, audios, videos). It does not support machine learning use cases well; Expensive: cannot store raw data cheaply, which can enable future cases

Data lake: solve above problems of data warehouse, supporting structure, unstructured data format inexpensively, enabling machine learning use cases. Disadvantage: poor performance for business intelligence and reporting use cases, lack of data reliability.

Therefore, traditional approach is storing raw data in data lake and let machine learning directly access data in data lake, processing and storing curated data in data warehouse. Thus you need to maintain two systems which are based on different technologies.

Data lakehouse is trying to use open source technologies to bring data warehouse performance/consistency (metadata, governance) to data lake at a cheaper cost that data warehouse. For comparison, refer to above article and this.

Data Mesh is a paradigm, while lakehouse is a platform. Data mesh is focused on solving scalability of data ownership. Traditionally a central data platform is both owning data infrastructure and data pipeline building. However, usually they are not data owner and domain expert, so they have to work with domain team and could become bottleneck. Data mesh is giving data pipeline ownership back to domain team, while central data platform team can focuses on providing data infrastructure and data pipeline framework that can be leveraged by different domain teams to enable high-quality development. Data infrastructure can be based on data lakehouse platform. So the article mentions data mesh solves data pipeline development scalability, while data lakehouse (or data fabric) solves use case scalability.

Appendix

Data workloads

Workloads in Data Engineering

A brief introduction to various types of data workloads

towardsdatascience.com

Transactional, analytical, translytical (HTAP)

Data Warehouse and Data Lake and Data Mesh (Part 3)

Delta Lake open source and a lot of enterprise used now, it is not additional storage this is just additional software…

blog.devgenius.io

History of data lake, data warehouse, data fabric, data mesh

What is the Medallion Data Lakehouse Architecture all about?

Design Principle for designing a Modern Data Platform

medium.com

Different Data Warehousing Modeling Techniques and How to Implement them on the Databricks…

Using Data Vaults and Star Schemas on the Lakehouse The lakehouse is a new data platform paradigm that combines the…

www.databricks.com

Data Lakehouse: Concept, Key Features, and Architecture Layers

In 1901, a woman named Julia Davis Chandler published the recipe that changed the world for good. It was the very first…

www.altexsoft.com

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi) |…

Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Asked yourself what…

airbyte.com

http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf

https://www.oracle.com/a/ocom/docs/datamesh-ebook.pdf

Data Warehouse vs Data Vault vs Data Lake vs Delta Lake vs Data Fabric vs Data Mesh

A data warehouse is a central repository for all the data an organization collects and uses. It is structured and…

medium.com

What Is A Data Mesh Organizational Architecture?

Data Mesh is an increasingly popular concept among data platform specialists. Technological innovations and the…

www.plainconcepts.com

PlainConcepts

DATA

MAY 5, 2022

What Is A Data Mesh Organizational Architecture?

Data Mesh is an increasingly popular concept among data platform specialists. Technological innovations and the popularization of Big Data in companies lead to new paradigms for data decentralization and consumption. In this sense, the Data Mesh organizational approach can help corporations looking to organize data teams.

What Is Data Mesh

Data Mesh is a technical and organizational architecture approach aimed at the decentralization and large-scale management of an organization’s analytical data.

Why is Data Mesh Being Adopted

Blanca Mayayo is the Product Owner of Sidra Data Platform at Plain Concepts, has previously worked as an engineer and product leader in companies such as Adidas, Nestlé or Telefónica. In her opinion, there are several trends that are leading companies to take an interest in a new way of managing data:

Companies want to differentiate themselves and provide value thanks to the data they possess.
They want all areas of the company to take advantage of it, in an effective and efficient way.
At the same time, data governance and data sovereignty aspects are becoming increasingly relevant.

Problems that Data Mesh can solve

Data Mesh allows facing several problems that companies have about data management, such as:

Lack of clear ownership or responsibility for the data. For example, in centralized data warehouses or data lakes, technical managers do not have the specialized business knowledge to take advantage of and optimize the data.
Lack of data metrics translates into distrust of the data to draw conclusions or make decisions.
Difficulty in bringing engineering expertise to the rest of the organization. As a single team manages the centralized platform, this can lead to bottlenecks or friction between teams.

If these problems persist in the medium and long term, the situation leads to low use of data and difficulty in innovating or adding value.

Data Mesh Principles

The Data Mesh is built around four principles:

Domain oriented property
Data as a product
Self-service data platform or infrastructure
Federated government

The third and fourth principles are more technological approaches.

Domain-oriented property

A ‘domain’ is a department, section, area… of the company. In the principle of domain-oriented ownership in Data Mesh, the responsibility for data would go beyond the centralized data platform team, to bring this duty to those teams where it is generated (for example, the commercial area where customer information is ‘born’) and that could extract a broad and quality value from it.

Data as a product

The principle of data as a product in Data Mesh means conceiving data as a consumable product in the business.

These data as products have input and output ports:

Input ports: Data-producing sources.
Output ports: In charge of exposing the data so that other parts of the company or end users can consume it.

And not only this: the products have to be easy to use, with metrics and metadata. Moreover, they are offered in packages that include not only data and metadata, but also the code and infrastructure with which they have been produced.

DATSIS Principles

Within the data grid, these products are governed by DATSIS principles:

Discoverable. The product has to be easily found through some tool, such as a data catalog.
Addressable. To access it, some kind of generic or global guidelines must be followed.
Trustworthy. To be trusted, the product must have quality and service standards.
Secure. Effective granular access policies to this data must be defined.
Interoperable. Ideally, products should follow open standards and multiple interfaces can be used to search and find the data.
Self-describing. The package must include the enunciation of the input and output ports, as well as a product schematic and updated documentation.

https://pages.matillion.com/rs/992-UIW-731/images/Ebook-Guide-to-the-Lakehouse.pdf

https://www.databricks.com/wp-content/uploads/2020/10/The-Modern-Cloud-Data-Platform-For-Dummies-Databricks-Special-Edition.pdf

https://www.databricks.com/wp-content/uploads/2021/10/Big-Book-of-Data-Engineering-Final.pdf

Data platform

Maturity assessment

http://www.cs.uu.nl/research/techreps/repo/CS-2010/2010-021.pdf

Requirement gathering and effort estimation

https://openproceedings.org/2015/conf/edbt/paper-295.pdf

Key Business Intelligence Requirements for Every Business - Ubiq BI

Gathering Business Intelligence Requirements plays a key role in every BI project's success. If executed properly, it…

ubiq.co

How To Gather Business Intelligence Reporting Requirements - Ubiq BI

Defining requirements for business intelligence projects is critical. Most BI projects fail because of poor requirement…

ubiq.co

SQL Server Business Intelligence Requirements and Estimation

Overview In a typical Business Intelligence (BI) project with the Software Development Life Cycle (SDLC) methodology, a…

www.mssqltips.com

Data warehouse, Data lake, data lakehouse, data fabric, data mesh

Data Warehouse vs. Data Lake vs. Data Lakehouse: An Overview of Three Cloud Data Storage Patterns

As more companies rely on data to drive critical business decisions, improve product offerings, and serve customers…

Workloads in Data Engineering

A brief introduction to various types of data workloads

Data Warehouse and Data Lake and Data Mesh (Part 3)

Delta Lake open source and a lot of enterprise used now, it is not additional storage this is just additional software…

What is the Medallion Data Lakehouse Architecture all about?

Design Principle for designing a Modern Data Platform

Different Data Warehousing Modeling Techniques and How to Implement them on the Databricks…

Using Data Vaults and Star Schemas on the Lakehouse The lakehouse is a new data platform paradigm that combines the…

Data Lakehouse: Concept, Key Features, and Architecture Layers

In 1901, a woman named Julia Davis Chandler published the recipe that changed the world for good. It was the very first…

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi) |…

Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Asked yourself what…

Data Warehouse vs Data Vault vs Data Lake vs Delta Lake vs Data Fabric vs Data Mesh

A data warehouse is a central repository for all the data an organization collects and uses. It is structured and…

What Is A Data Mesh Organizational Architecture?

Data Mesh is an increasingly popular concept among data platform specialists. Technological innovations and the…

What Is A Data Mesh Organizational Architecture?

What Is Data Mesh

Why is Data Mesh Being Adopted

Problems that Data Mesh can solve

Data Mesh Principles

Domain-oriented property

Data as a product

DATSIS Principles

Data platform

Key Business Intelligence Requirements for Every Business - Ubiq BI

Gathering Business Intelligence Requirements plays a key role in every BI project's success. If executed properly, it…

How To Gather Business Intelligence Reporting Requirements - Ubiq BI

Defining requirements for business intelligence projects is critical. Most BI projects fail because of poor requirement…

SQL Server Business Intelligence Requirements and Estimation

Overview In a typical Business Intelligence (BI) project with the Software Development Life Cycle (SDLC) methodology, a…

Business Intelligence Requirements 2023 | Template & Checklist

Business intelligence improves performance and boosts revenue by helping enterprises identify opportunities from…

Written by Xin Cheng

No responses yet

Data warehouse, Data lake, data lakehouse, data fabric, data mesh

Data Warehouse vs. Data Lake vs. Data Lakehouse: An Overview of Three Cloud Data Storage Patterns

As more companies rely on data to drive critical business decisions, improve product offerings, and serve customers…

Workloads in Data Engineering

A brief introduction to various types of data workloads

Data Warehouse and Data Lake and Data Mesh (Part 3)

Delta Lake open source and a lot of enterprise used now, it is not additional storage this is just additional software…

What is the Medallion Data Lakehouse Architecture all about?

Design Principle for designing a Modern Data Platform

Different Data Warehousing Modeling Techniques and How to Implement them on the Databricks…

Using Data Vaults and Star Schemas on the Lakehouse The lakehouse is a new data platform paradigm that combines the…

Data Lakehouse: Concept, Key Features, and Architecture Layers

In 1901, a woman named Julia Davis Chandler published the recipe that changed the world for good. It was the very first…

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi) |…

Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Asked yourself what…

Data Warehouse vs Data Vault vs Data Lake vs Delta Lake vs Data Fabric vs Data Mesh

A data warehouse is a central repository for all the data an organization collects and uses. It is structured and…

What Is A Data Mesh Organizational Architecture?

Data Mesh is an increasingly popular concept among data platform specialists. Technological innovations and the…

What Is A Data Mesh Organizational Architecture?

What Is Data Mesh

Why is Data Mesh Being Adopted

Problems that Data Mesh can solve

Data Mesh Principles

Domain-oriented property

Data as a product

DATSIS Principles

Data platform

Key Business Intelligence Requirements for Every Business - Ubiq BI

Gathering Business Intelligence Requirements plays a key role in every BI project's success. If executed properly, it…

How To Gather Business Intelligence Reporting Requirements - Ubiq BI

Defining requirements for business intelligence projects is critical. Most BI projects fail because of poor requirement…

SQL Server Business Intelligence Requirements and Estimation

Overview In a typical Business Intelligence (BI) project with the Software Development Life Cycle (SDLC) methodology, a…

Business Intelligence Requirements 2023 | Template &amp; Checklist

Business intelligence improves performance and boosts revenue by helping enterprises identify opportunities from…

Written by Xin Cheng

No responses yet

Business Intelligence Requirements 2023 | Template & Checklist