December ’20 Heartbeat
Monthly updates are here- read all about our brand new video docs, the DVC Udemy course, open jobs with our team, and essential reading about Git-flow with DVC.
- Elle O'Brien
- December 18, 2020 • 3 min read
This holiday season, show your loved ones you care with our new shirt.
Welcome to the December Heartbeat! Let's dive in with some news from the team.
Our search continues for two roles:
A Senior Software Engineer for the core DVC team- someone with strong Python development skills who can build and ship essential DVC features.
A Developer Advocate to support and inspire developers by creating new content like blogs, tutorials, and videos- plus lead outreach through meetups and conferences.
Does this sound like you or someone you know? Be in touch!
As you may have heard last month, we've been working on adding complete video docs to the "Getting Started" section of the DVC site. We now have 100% coverage! We have videos that mirror the tutorials for:
Data versioning - how to use Git and DVC together to track different versions of a dataset
Data access - how to share models and datasets across projects and environments
Pipelines - how to create reproducible pipelines to transform datasets to features to models
Experiments - how to do a
git difffor models that compares and visualizes metrics
The full playlist is on our YouTube channel- where, by the way, we've recently passed 2,000 subscribers! Thanks so much for your support. There's much more coming up soon.
We recently released a new blog with GitLab all about using CML with GitLab CI.
You may notice that the tweet spelled our name differently, and since Twitter doesn't have an edit button, I think that means we're "Interative" now. Hurry up and get your merch!
We gave a workshop at a virtual meetup held by the Toronto Machine Learning Society, and you can catch a video recording if you missed it. This workshop was all about getting started with GitHub Actions and CML! It starts with some high-level overview and then gets into live-coding.
There's no shortage of cool things to report from the community:
Now you can learn the fundamentals of machine learning engineering, from experiment tracking to data management to continuous integration, with DVC and Udemy! Data scientists/DVC ambassadors Mikhail Rozhkov and Marcel Ribeiro-Dantas created a course full of practical tips and tricks for learners of all levels.
Machine Learning Experiments and Engineering with DVC
Over the past couple of months we have started using DVC in our small team. With a handful of developers all coding, training models & committing in the same repository, we soon realized the need for a workflow.
The post outlines three strategies his team adopted:
Create a "debugging dataset" containing a subset of your data, with which you can test your complete DVC pipeline locally on a developer's machine
Use CI-Runners to execute the DVC pipeline on the full dataset
Adopt a naming convention for Git branches that correspond to machine learning experiments, in addition to the usual feature branches
Agree? Disagree? Fabian is actively soliciting feedback on his proposal (and possible solutions for some unresolved issues), so please read and chime in on our discussion board.
Git Flow for DVC
The AI Show on Channel 9, part of the Microsoft DevRel universe, put out an episode all about ML and scientific computing with Python featuring Tania Allard and Seth Juarez. Their episode includes how DVC can fit in this development toolkit, so check it out!
We'll end on a tweet we love:
This beautiful diagram, made by Joy Heron in response to a talk by Dr. Larysa Visengeriyeva about MLOps, is a wonderful encapsulation of the many considerations (at many scales) that go into ML engineering. Do you see DVC in there? 🕵️
Thank you for reading, and happy holidays to you! ❄️ 🎁 ☃️