Thinking of Enterprise Machine Learning Model Governance With Open-source solutions

3 min readMar 16, 2020

Thinking on Enterprise Machine Learning Model Governance With Open-source Solutions

We are familiar with the following picture of machine learning workflow. Training a ML model is just a small piece. To successfully use a machine learning model, there are lots of other tasks. Like other assets (code, data, infra, etc.), models should be treated as important assets. Therefore, there should be governance around machine learning models. The governance is to ensure that the model meets functional, compliance, security, etc. threshold before entering into production, and in each stage of the model (e.g. dev, staging, production), there is necessary control around that (these controls can be automated through CI/CD, or human-in-the-loop/manual gate). Also when deprecating an ML model, there is proper procedure followed.

(picture from below)

An introduction to Kubeflow

Model construction and training are just a small part of supporting machine learning (ML) workflows. Other things you…

opensource.com

From last year, MLOps becomes hotter and hotter, as more and more enterprises are leveraging machine learning. This is still a new area, the most independent article found so far:

What are model governance and model operations?

Our surveys over the past couple of years have shown growing interest in machine learning (ML) among organizations from…

www.oreilly.com

As of now, lots of data scientists start to use mlflow or modeldb to track models in development phase. However, at this point, open source mlflow and modeldb are more around experiment tracking for a team (e.g. lacking features of multi-tenancy/access control/audit trail, even for authentication, I don’t see it on the near-term roadmap https://github.com/mlflow/mlflow/issues/761)

To track the models as enterprise assets, we need systems that provide the following features:

Authentication: who is performing the operation
Authorization: what action is allowed on who to which resource. Access control
Audit trail: who has performed what (e.g. who has approved and promoted model to target stage)
Dashboard: custom views for models, operations
Model stages: dev, staging, production, archived

Introducing the MLflow Model Registry--Machine Learning Model Hub

Watch the announcement and demo At today's Spark + AI Summit in Amsterdam, we announced the availability of the MLflow…

databricks.com

Naturally, JFrog Artifactory and Sonatype Nexus come into the picture. They are in software artifacts repository management category. They integrate with popular CI/CD tool like Jenkins to manage software artifacts (e.g. pip package, docker image, npm package, maven, etc.). Besides above features, they also provide REST api for external integration. They both provide open source and enterprise versions.

Since these tools do not have a notion of machine learning model yet, we need to provide the following also as artifacts associated with the model version:

Model documentation
Model validation data sets, accuracy, compliance, bias or other metrics related to the model (all information required for business to make decision on model lifecycle management, e.g. promote to staging, production, archived stage)

As mentioned, we can integrate these artifacts by leveraging REST api of these tools.

Artifactory - Universal Artifact Management

As the first, and only, universal Artifact Repository Manager on the market, JFrog Artifactory fully supports software…

jfrog.com

Nexus Repository | Software Component Management

Know what's inside your software. Nexus Repository - The world's best way to organize, store, and distribute software…

www.sonatype.com

Some cloud provider claims they provide some ML model governance feature. As MLOps become more and more mature, there will be more convergence in this area.

MLOps: ML model management - Azure Machine Learning

In this article, learn about how to use Azure Machine Learning to manage the lifecycle of your models. Azure Machine…

docs.microsoft.com

Production Model Governance | DataRobot

What Is Production Model Governance? When machine learning models become critical to business functions, new…

www.datarobot.com

From the following, model governance even has slight difference in definition

https://www.datatron.com/platform/model-governance