Git-backed Machine Learning Model Registry to bring order to chaos
Use your Git repositories to build a model registry with model versioning, lineage, and lifecycle management. With Iterative Studio, have the who, what, why, where and when questions of your team's model production at your fingertips. Read to find out how.
- Tapa Dipti Sitaula
- July 26, 2022 • 4 min read
Without the right tools, model management can easily turn chaotic
Machine learning tasks are iterative by nature. Over time, you build several versions of your ML models, which could be in different stages of production-readiness. A version may be running in production, another version that seems to perform better may be in staging, and a couple more versions could be in active development by you and your teammates - using updated hyperparameters, datasets, or algorithms.
How do you keep track of all your models, their versions, and deployment statuses? How do you get answers to questions like these easily:
- Which model version is currently in production?
- When was the last time this model was updated?
If you are like some of the data scientists we know, you may have a Google sheet or a Notion page with the list of all your models, their changes, deployment history, and so on. But this is highly error-prone and will probably get out-of-date very quickly. Or maybe you upload all your models to a cloud bucket and “attach” text reports to them. Not very maintainable or searchable either. We’ve even seen people use sticky notes, or better yet, rely on their memory 😀.
Some of the more organized folks use Model Registries - tools created specifically to organize models into a central, searchable repository. While this is definitely better than using random files or sticky notes, one major problem persists: the data science and machine learning team members work completely isolated from the software development and DevOps team members. This makes collaboration far more time consuming than it should be.
Some even implement in-house systems, and maybe you are also planning to do so. But these can get expensive to develop and maintain.
We built the Iterative Studio Model Registry to solve these problems.
Iterative Studio Model Registry enables ML teams to collaborate on models by providing model organization, discovery, versioning, lineage (tracing the origin of the model), and the ability to manage deployment statuses such as, development, staging, and production across multiple projects.
Iterative Studio Model Registry is built on top of Git, which means:
- You can reuse your existing Git infrastructure to manage ML models together with code, data, experiment pipelines, and deployment statuses.
- You can use GitOps for model deployment, and trigger model deployment from Iterative Studio, which you can also use to run your ML experiments.
- DS/ML folks and Software/DevOps folks can work together more easily, because they utilize the same tools and infrastructure.
A core philosophy at Iterative is open MLOps - we build tools that work with your infrastructure. Our toolstack is modular, so you can build your model registry on top of your existing cloud and DevOps infrastructure.
- GTO enables semantic versioning and stage transitions of artifacts using metadata files and Git tags.
- MLEM saves ML models and extracts model metadata including framework, methods, input / output data schema, and requirements.
Iterative Model Studio Registry meets you where you are, through your favorite interface. Whether you like APIs, prefer a web interface, or work best in the command line; whatever your role or preference, we've got you covered so your team can be most efficient.
Save your model files wherever works best for you, whether it’s in S3, GCP, or any other of your remote (or local) storages. Then, add them to the model registry in a non-intrusive, no-code fashion without modifying your ML training code. This saves you hours of valuable time.
A central dashboard of all your models facilitates transparency and discovery across every project by your whole team.
And on the model details page, you’ll find that information about the model is automatically detected and its history tracked.
Try our demo Model Registry to get a feel for Iterative Studio's Model Registry features.
For registering versions, select the commit and provide the version number. To assign stages, select the version and provide the stage name. It is as simple as that.
Here’s a brief explanation of how the model, version and stage information is stored in Git:
- The following entry in
artifacts.yamlindicates that your
image-synthesismodel is stored in an
This model is used to classify images of different objects submitted by
users. This version of the model has an accuracy of 95%.
- Random Forest
- image classification
In the following example, the Git tag
[email protected] indicates
that you created version
2.0.0 of your
image-classifier-model from the Git
The Git tag
image-classifier-model#production#3 indicates that you assigned
production stage to version
2.0.0 of your model.
Since its inception, Iterative Studio has brought together Git, DVC, and CML for seamless data and model management, experiment tracking, visualization and automation. Now, by harnessing the power of MLEM and GTO in its Model Registry, it makes your machine learning processes even more robust.
With the Iterative Studio Model Registry, your ML model (dis)organization is not in chaos anymore. Collaborating on your ML projects becomes faster and your ML team members’ lives become much easier.
Start using Iterative Studio Model Registry today. And answer all the who, what, why, where and when questions of your team's model production directly from the information in your Git repository.