Xin Cheng
4 min readJan 19, 2021

Solving Babel Tower of Business Data with Open Data Initiative and Common Data Model

Motivation

Data is oil. However, data is hard to dig and often locked in “silos”. Especially, data about business processes are not like open source technologies. Every product that implements some business processes have their own language in terms of data model. This brings two challenges to leverage data from a variety of data sources:

  1. Application data model hard to understand (traditionally rely on documentation, which is separate from data and hard to find), not enough metadata description
  2. Different application uses different nomenclature, semantics (even for same entity, different attributes, different field name for same attributes, different name for same entity, etc.), hard to bring together

The result is a Babel tower of application data.

Enter Open Data Initiative

and Common Data Model

Common Data Model includes a set of standardized, extensible data schemas that Microsoft and its partners have published.

Result: Different applications talk same language

1. Human can understand data based on metadata

2. Machine can expect standardized schema for an entity

How it works

Data producer

Outputs to data along with metadata/schema description (using CDM library), data will be standard format (csv, parquet, CDM does not provide library for writing data) into CDM folders

Sample producers

CDS/Dynamics 365

SAP

Adobe

Data consumer

consumes entity data using schema.

https://microsoft.github.io/CDM/

As depicted by following notebook, we can get a Spark dataframe for an entity

readDf = (spark.read.format(“com.microsoft.cdm”)
.option(“storage”, storageAccountName)
.option(“manifestPath”, container + “/implicitTest/default.manifest.cdm.json”)
.option(“entity”, “TestEntity”)
.load())

Azure Synapse Analytics, Spark, Azure Data Factory has support to process CDM data.

Accelerator

Industry accelerators are basic components within the Microsoft powerplatform and Dynamics 365 that enable ISVs and other solution providers to quickly build industry vertical solutions. The accelerators extend the Common Data Model so that it includes new entities to support a data schema for concepts within specific industries.

We can expect if customer uses these accelerators, the output to CDM folder will be expected CDM format.

Appendix

Xin Cheng
Xin Cheng

Written by Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified

No responses yet