July '22 Heartbeat
This month you will find:
🎙 MLOps World,
🚀 MLEM Release,
🔥 DVC extension for VS Code,
🥰 Guide to migrating from Git-LFS to DVC,
✍🏼 New Docs and Blog content,
🚀 New demo videos, and more!
- Jeny De Figueiredo
- July 18, 2022 • 9 min read
This month our cover image is inspired by a Community member Gift Ojebulu. Gift is a champion of Community and is a leader in the data movement in Nigeria. Recently he presented about DVC at the Open Source Africa Conference. He is also extremely involved doing amazing work building the data Community in Africa through Data Fest Africa. We are lucky to have a Gift as a member of our own Community.
I first must share my sincere apologies. With all that was going on in the Iterative Community last month, I ran out of time to finish the June Heartbeat. With even more time passing there's lots to write about; let's do this!
On June 1st we released our latest open source tool in the Iterative ecosystem.
MLEM is a model registry and deployment tool connected to your Git repo.
Together with DVC and GTO (Git Tag Ops), MLEM helps you maintain a model registry right in your git repository. Now we have one more step in the process of fully syncing together the software development and machine learning worlds. To learn more about MLEM, visit the website, ⭐️ the repository, read the blog post, or watch the video of Mike Svehnikov's full presentation and demo on MLEM at our Release Party.
I started writing this Heartbeat on the plane heading back from MLOps World in Toronto. This conference was a real treat! It was wonderful to meet so many Community members already using DVC and also to see conference talks advocating for our tools that we didn't even know were going to happen! Many thanks to Interos' Stephen Brown and Amy Bachir for sharing about DVC and CML in the talk, A GitOps Approach to Machine Learning.
Additionally, it was great to finally meet in person all the people from the greater MLOps Community that I'd previously only known virtually including Demetrios Brinkman of MLOps Community Slack, our friends from DAGsHub, and Tryo-Labs, and one of our Community Champions Sami Jawhar who presented at one of our most engaging meetups on record, asking the question What IS an experiment? You can find this great talk below.
The conference talks were great. I was able to attend three:
- Top 5 Lessons Learned in Helping Organizations Adopt MLOps practices from Shelbee Eigenbrode, Principal AI/ML Specialist
- Panel: What Every Product Manager Delivering AI Solutions Should Know, moderated by Jessie Lamontagne (who was lucky enough to take home her very own DeeVee, see below), Data Science Manager at Kinaxis; with Nahla Salem, Senior Product Manager at Yelp; Anneya Golob, Staff Data Scientist at Shopify, and Phillip Gorniki, St. Product Manager at Kinaxis. A particular quote that was a stand out for me from this panel from Nahla, was, "If everything is a priority, nothing is a priority." That was a lesson I needed to take to heart, hence a bumped Heartbeat. 😢
Jessie Lamontagne of Kinaxis with DeeVee! (Source link)
I heard great feedback from attendees on conference talks as well. In general, the atmosphere at the conference had a fantastic, positive vibe with great connections made through the event app, the conference itself, and parties and networking opportunities 🥳🍻 We also thoroughly enjoyed being Expo Booth neighbors with Seldon (model serving) and Genesis Cloud (environmentally sustainable GPUs!) I must finally give hats off to the organizers Faraz Thambi and Tina Aprile, who delivered an extremely well thought out and run, in-person Conference! If you didn't attend this year, you should definitely put it on your radar for next, or attend their Toronto Machine Learning Summit in November! Plus Toronto was fun! Check out our team dinner the last night from the CN Tower.
Team dinner at the CN Tower - Pictured L to R: Gabriella Caraballo, Stephanie Roy, Mike Moynihan, Jorge Orpinel Perez (forward), me, Mikhail Sveshnikov, Milecia McGregor (forward), Max Aginsky, Alex Kim (forward), and Dmitry Petrov)
We just released our DVC extension for VS Code! It was so fun to let the cat out of the bag to conference goers and watch their eyes light up! 😃 This was a foreshadowing of events to come at the release! While it hadn't been completely a secret since Paige Bailey's tweet about it a while ago and the fact that it's been on the VS Code Marketplace for a couple of months so beta testers could try it out, we did finally, officially release the tool June 12th.
And OH. MY. GOSH. The response has been amazing! Already over 3,400 people watched the video below on YouTube. And 1000 more new users downloaded the DVC Extension for VS Code from the marketplace, just within the first couple of days!
You will find in this extension:
- tons of experiment tracking and table functionality over your regular CLI
- live metrics tracking
- the ability to run and queue experiments directly from the experiment table or the command tree
- sorting, drag and drop column and group movement
- expanded plot viewing capabilities - zoom into plots and save them as PNGs or SVGs for your reporting needs
If you are a DVC and VS Code user, you will be a happy camper! Please try it and as always reach out with feedback! We want to make these tools better for you!
There's been lots of juicy content from the Community since the last Heartbeat. When I first started at Iterative over a year and a half ago, I would hope each month that there would be enough content from the Community to write about. This is no longer an issue; I sadly have to filter it now, so that these Heartbeats don't go on for days. If you've written something about our tools and it hasn't appeared in a Heartbeat, just know that we see it and we are grateful for all the Community's efforts to share about our tools! 🙏🏼
Alex Strick van Linschoten - More Data, More Problems: Using DVC to handle data versioning for a computer vision problem
Alex Strick van Linschoten brings us this great overview of DVC's versioning capabilities on his use of DVC in a redaction identifier project. He goes through the pluses of using DVC which he mentions as "be(ing) more or less unchallenged for what it does in the data versioning domain." He had previously used Git LFS and found it to be less robust so made the switch to DVC. In his post, he provides a tutorial on making the switch from Git LFS to DVC. We are so grateful to Alex for sharing this guide with the Community!
Also super worthy of mention is Alex's shout-out about our welcoming Community. We are thankful for this praise and for his contributions to our Community. 🙏🏼
Thanks for the shout-out Alex! (Source link)
MyMLOps.com provides a tool to help you build a cool diagram for your MLOps Stack. There's no about page there or indication of who made this for the greater MLOps Community, which is frankly a bit sus. Nevertheless, we were excited to see DVC included in the section of Experiment Tracking as it should! We know there are other great experiment tracking tools out there, and we are content to see that the larger Community is starting to recognize this capability with DVC! We like to think of it as taking a step beyond tracking to versioning. To learn more about experiment versioning, visit this blog piece from Technical Product Manager - DVC, Dave Berenbaum.
Our team had an internal discussion about the absence of our tools from certain categories, DVC and CML for artifact tracking, CML for Pipeline Orchestration Runtime Engine, MLEM for Model Registry and Serving. But like everything in this space, things are changing constantly. Thanks to whoever you are out there that made this nifty tool!
MLOps tool stack diagram generator from MyMLOps.com (Source link)
Samson Zhang: MLOps: How DVC smartly manages your data sets for training your machine learning models on top of Git
Samson Zhang of LittleBigCode writes an in-depth article in Medium on how DVC aptly manages large datasets. He discusses why DVC is needed and how it is a better option compared to MLFlow because MLflow does not optimize storage for file duplication like DVC does, as well as Git-LFS for the same reasons mentioned by Alex Strick van Linschoten in the piece mentioned above. Samson goes through a very thorough overview of the tool, how it works and how to use it. He includes some best practices that he has figured out while using the tool and goes over how to set up a dataset registry which he finds particularly useful with DVC.
DVC workflow, cache, and storage (Source link)
Dror Atariah is the first Community member to write about MLEM! 🎉 In his piece he gives a review of the tool and starts with a general overview. Giving it a try with the iris dataset, he ultimately builds a Docker image with MLEM to get predictions from a trained model served by MLEM in an API. You can try out his project in this repo!
As you can imagine, with new tools come new docs! The docs and product teams have been furiously busy making sure that you have the docs you need to try our new tools. Of note please find:
- MLEM Docs
- Machine Learning Model Registry in DVC.org docs as well as in the MLEM docs
- VS Code docs and walkthrough
Have you ever or are you struggling with syncing data with one of the cloud providers? We know that comes up a lot in the Discord server. So Milecia Mc Gregor wrote three detailed pieces to help you out.
- Syncing Data to AWS S3
- Syncing Data to GCP
- Syncing Data to Azure Blob Storage
Whatever your flavor, she's got you covered. Look out for short videos covering the same topics this quarter.
Find more of your Discord questions answered in the latest editions of Community Gems. 💎
We have surpassed 1300 students in our Iterative Tools School! 🎉 We now have in place:
- Closed captions
- Course guides for each lesson. For some of these, you will find the video embedded into the lesson itself, but for the lessons that include code snippets, the guides are in PDF form so that you can copy and paste them to your heart's content! 😉
If you are in the course already or through social media you may have noticed Gema Perreño Piqueras' amazing notes on the modules she has created (see below). 🚨Spoiler alert: Gema's joining the DevRel team next week! So look forward to more great content from her.
Gema Perreño Piqueras' Course Notes (Source link)
We'll be at AI4 from August 16-18.
Dmitry Petrov will give a talk as well as participate in a panel discussion on MLOps. If you are attending, stop by the booth and say hi or check out one of the in-booth demos we will have on our tools throughout the day.
Additional conferences we will be attending this year:
- ODSC West in San Francisco
- Deep Learning World - Berlin
- MLOps Summit - Re-work - London
- Toronto Machine Learning Summit- Toronto
Use this link to find details of all the open positions. This month we are especially seeking a fit for the Senior Software Engineer (Dataset Label Management, Python) role, so if that fits you or someone else you know, get applying! 🚀
Iterative is Hiring (Source link)
Because I missed a month, there's just going to have to be two…
We were excited to see this project come up from
Chansung using DVC, Iterative Studio,
Huggingface and Jarvis Labs AI.
Looking forward to seeing how it develops! 🍿
Redrew for easier understanding of #git_mlops projects with @DVCorg and @jarvislabsai. The code needs to be cleaned, but it now deploys any model from any branches to @huggingface model and space repository.— chansung (@algo_diver) May 28, 2022
Basically @DVCorg is heavily used, but I just put the one logo in it. pic.twitter.com/Cj7Z7KPaOy
And we have this great Tweet thread from Leon Menkreo about how he's taken back control of his data, models, and predictions with DVC!
I took back control of my data, models, and predictions with— Leon Menkreo (@LeonMenkreo) July 8, 2022
Data Versioning 🔀
Everything you need to get started with DVC by @DVCorg in one mega 🧵:
⁉️ What is DVC?
🔀 DVC & Model Versioning
🐍 DVC in python
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.