Solving Babel Tower of Business Data with Open Data Initiative and Common Data Model
Motivation
Data is oil. However, data is hard to dig and often locked in “silos”. Especially, data about business processes are not like open source technologies. Every product that implements some business processes have their own language in terms of data model. This brings two challenges to leverage data from a variety of data sources:
- Application data model hard to understand (traditionally rely on documentation, which is separate from data and hard to find), not enough metadata description
- Different application uses different nomenclature, semantics (even for same entity, different attributes, different field name for same attributes, different name for same entity, etc.), hard to bring together
The result is a Babel tower of application data.
Enter Open Data Initiative
and Common Data Model
Common Data Model includes a set of standardized, extensible data schemas that Microsoft and its partners have published.
Result: Different applications talk same language
1. Human can understand data based on metadata
2. Machine can expect standardized schema for an entity
How it works
Data producer
Outputs to data along with metadata/schema description (using CDM library), data will be standard format (csv, parquet, CDM does not provide library for writing data) into CDM folders
Sample producers
CDS/Dynamics 365
SAP
Adobe
Data consumer
consumes entity data using schema.
https://microsoft.github.io/CDM/
As depicted by following notebook, we can get a Spark dataframe for an entity
readDf = (spark.read.format(“com.microsoft.cdm”)
.option(“storage”, storageAccountName)
.option(“manifestPath”, container + “/implicitTest/default.manifest.cdm.json”)
.option(“entity”, “TestEntity”)
.load())
Azure Synapse Analytics, Spark, Azure Data Factory has support to process CDM data.
Accelerator
Industry accelerators are basic components within the Microsoft powerplatform and Dynamics 365 that enable ISVs and other solution providers to quickly build industry vertical solutions. The accelerators extend the Common Data Model so that it includes new entities to support a data schema for concepts within specific industries.
We can expect if customer uses these accelerators, the output to CDM folder will be expected CDM format.