Thinking on Enterprise Machine Learning Model Governance With Open-source Solutions
We are familiar with the following picture of machine learning workflow. Training a ML model is just a small piece. To successfully use a machine learning model, there are lots of other tasks. Like other assets (code, data, infra, etc.), models should be treated as important assets. Therefore, there should be governance around machine learning models. The governance is to ensure that the model meets functional, compliance, security, etc. threshold before entering into production, and in each stage of the model (e.g. dev, staging, production), there is necessary control around that (these controls can be automated through CI/CD, or human-in-the-loop/manual gate). Also when deprecating an ML model, there is proper procedure followed.
(picture from below)
From last year, MLOps becomes hotter and hotter, as more and more enterprises are leveraging machine learning. This is still a new area, the most independent article found so far:
As of now, lots of data scientists start to use mlflow or modeldb to track models in development phase. However, at this point, open source mlflow and modeldb are more around experiment tracking for a team (e.g. lacking features of multi-tenancy/access control/audit trail, even for authentication, I don’t see it on the near-term roadmap https://github.com/mlflow/mlflow/issues/761)
To track the models as enterprise assets, we need systems that provide the following features:
- Authentication: who is performing the operation
- Authorization: what action is allowed on who to which resource. Access control
- Audit trail: who has performed what (e.g. who has approved and promoted model to target stage)
- Dashboard: custom views for models, operations
- Model stages: dev, staging, production, archived
Naturally, JFrog Artifactory and Sonatype Nexus come into the picture. They are in software artifacts repository management category. They integrate with popular CI/CD tool like Jenkins to manage software artifacts (e.g. pip package, docker image, npm package, maven, etc.). Besides above features, they also provide REST api for external integration. They both provide open source and enterprise versions.
Since these tools do not have a notion of machine learning model yet, we need to provide the following also as artifacts associated with the model version:
- Model documentation
- Model validation data sets, accuracy, compliance, bias or other metrics related to the model (all information required for business to make decision on model lifecycle management, e.g. promote to staging, production, archived stage)
As mentioned, we can integrate these artifacts by leveraging REST api of these tools.
Some cloud provider claims they provide some ML model governance feature. As MLOps become more and more mature, there will be more convergence in this area.
From the following, model governance even has slight difference in definition