March '22 Heartbeat
This month you will find:
🇺🇦 A special note on the war in Ukraine,
🧘🏻♀️ MLOps is a mess, but that's ok,
🥰 Tutorials and workflows from the Community,
🗣 Upcoming Events,
🔺 MLOps Maturity Models,
💻 Online Course(s) updates,
📖 New doc,
🚀 Info on our growing team, and more!
- Jeny De Figueiredo
- March 17, 2022 • 7 min read
While the war in Ukraine has impacted the world, it has also greatly impacted our company as we have team members living in Ukraine and Russia, and many with family ties to both. Our hearts are with our Iterative family in Ukraine and we are committed to doing everything we can to support the safety of our Ukrainian, as well as the transition of our Russian colleagues during this crisis.
We as a company are against this war. We have donated to the humanitarian efforts to help the people of Ukraine and are matching our team members' donations as well. We are proud of the perseverance, care, and support coming from our team at this time.
If you are able, we ask that you consider these resources as ways to help. Our hope is that the world will find a quick and peaceful end to this war and Ukraine will be restored, even stronger than before. 🇺🇦
- A list of charities with direct connections to Ukrainian people endorsed by the Kyiv Independent. Everything on this list except for the "Charities that help the war effort” section is for humanitarian efforts only.
- Humanitarian Assistance to Ukrainians by National Bank of Ukraine
- UNICEF USA (2x additional match)
- UNICEF UK
- RedCross Ukraine (there are some concerns about this org - see one, two)
- RedCross UK
- International Medical Corps
- NOVA UKRAINE
- GOFUNDME / Support Ukrainian Refugees Arriving In Poland
- Doctors Without Borders
- Save the Children
- Project Hope
Mihail Eric writes a long, but really
worth it piece entitled
MLOps is a Mess But That’s to be Expected.
In it he discusses the allure of seeking a machine learning career, only run
smack into the giant wall of learning that encompasses the space, not the least
of which is the multitude of tools to pick through once you get there. The state
of machine learning is reviewed and some history of DevOps for perspective on
MLOps is added.
You will find advice for newcomers and some final, thorough, thoughts and predictions especially as they relate to “ML at a reasonable scale” companies.
Definitely worth your review!
Gartner Hype cycle for MLOps (Source link)
In this hilarious post, Kevin Lu teaches us how to use DVC to enable us to disconnect from our unhealthy addictive relationships with our computers and make room for more human relationships! You don't want to miss the humor, productivity and wisdom here, all while helping you understand how each of DVC's commands help your machine learning engineering exploits.
Thanakorn Panyapiang: Putting A Machine Learning model into production with Google Cloud Platform and DVC
Are you a data scientist new to putting models into production?
In this piece Thanakorn Panyapiang describes various model deployment strategies to put projects into production including model-as-service, batch prediction and model-on-edge. In his example he uses a batch prediction approach with an image segmentation model to identify clouds. He uses DVC as a model registry with Google Cloud storage and GitHub actions to automate the Cloud Functions deployment. See all the steps he outlines in his piece to get real value out of your machine learning projects.
Data Pipeline (Source link: Author)
In the December Heartbeat, I told
you about Matt Upson's first post in his
series on using DVC, CML and Rasa together.
In this second post
he goes through some Rasa basics and gets the DVC pipeline setup, with its train
and test stages, params, dependencies, outs and metrics. He also covers syncing
with DVC, making changes, the
dvc repro command, the
.dvc-lock file, and
pushing to remote storage. We're looking forward to the next installment when we
will see how CML can be used to automatically train the model.
DVC metrics diff in Rasa project (Source link)
Sibanjan Das notes the trending of the MLOps keyword in his piece in DZone. Sibanjan gives an overview of MLOps and how it supports the AI/ML ecosystem to deliver return on investment for ML projects. He reviews the components of MLOps, including automated ML model building pipelines, model serving, model version control, model/data monitoring, and security and governance. He also discusses the MLOps maturity models of Google and Microsoft (see below). I found this part especially interesting as it mirrors what we see in our Community and how they develop using our tools as well. Finally, he outlines some tools that help in the process, including DVC.
Comparing Google's and Microsoft's maturity models (Source link)
Jagreet Kaur of Xenonstack authors a guide on applying DevOps to machine learning and generally what the continuous development life cycle is as it relates to machine learning projects. Jagreet goes over all the fun continuous topics including, continuous integration, continuous testing, continuous retraining, and continuous deployment. She gives an overview of the use of Tensor Flow, PyTorch, and Docker, as well as DVC for version control, experiment management deployment, and collaboration. Additional resources from Xenonstack are provided for further review.
Yuqi Li in this opinion piece, in Towards Data Science. overviews the meaning and components of MLOps and identifies a number of good open-source tools in the space which of course includes DVC. He also outlines a number of reasons why MLOps should be open source. Among the reasons making the cut:
- No privacy concern
- Build Community around the tool Examine these reasons to determine if open source makes sense for your MLOps work. We think you will.
If you’ve been in our Discord server, been to one of our Meetups, or interacted with us on Twitter, you’ve surely come across DVC Community All-Star Mert Bozkir. Mert has written a great piece Entitled Community Driven Learning and describes how it is the best way to learn. He outlines his reasoning for this including the support, encouragement, and motivation you can get from the Community to be persistent in your learning efforts. He also includes eight communities that are great for learning, with invites included. Be sure to check it out!
Community Driven Learning (Source link: Unsplash by john_cameron)
We now have over 250 students taking the course and 10 students that have completed the course! 🎉 Thank you to all who have given us feedback. We are actively working on making adjustments to the course and improving the next one.
We have a new look! The website for our online course, Iterative Tools for Data Scientists and Analysts has been updated to be more streamlined to more clearly identify what our students need in the course!
We have already begun working on the second course which will be more advanced (remember those maturity models outlined in the article from DZone above?) and will cover scenarios with CML. We are also working on creating an ebook for each video that will provide relevant information, diagrams, and links with the video content instead of being batched at the end of the module. The ebook format will also let you take your own notes as you study!
Mike Moynihan joins us from Brooklyn, NY as an Account Executive. He previously worked at Code Climate as the Manager of Business Development and an Account Executive. Mike's really into biking and will be participating in the 5-Boro Bike Tour in NYC this year. He's also a baker and has been baking bread and other baked goods consistently for about 3 years now. Finally, when not working or biking or baking, you may find him playing one of the video or board games in his 500-strong collection.
Rob De Wit joins our team from
Utrecht, the Netherlands as a Developer Advocate. Rob's first focus will be on
developing those new ebooks for our new online courses mentioned above. He has a
background in Information Sciences and previously worked at bol.com and
Devoteam. When not working, Rob likes photo and video editing, board games,
organizing meetups, and hiking (the Peaks of the Balkans are on his bucket
He also stays busy by learning Spanish and dabbling in local politics.
Be sure to join us at the
March Office Hours Meetup,
where Fabian Zills, PhD student at
University of Stuttgart, will present his
ZnTrack ("zinc track") project which creates, runs and benchmarks DVC pipelines
in Python and Jupyter Notebooks.
Find the repo here!
March Office Hours - ZnTrack
- We will be sponsoring ODSC East and MLOps World this year, so if you are attending, we'd love to meet you IRL! Stop by our booth!
- Milecia McGregor will be speaking at PythonWeb Conference March 22nd on "Using Reproducible Experiments to Creat Better Machine Learning Models."
- David de la Iglesia Castro will be presenting his workshop "Making MLOps Uncool Again" at MLOps World New York on March 29th and at PyCon Berlin April 11th.
- Community member Gift Ojeabulu will be giving a talk on "MLops Exploration with Git and DVC for Machine Learning Project" at Open Source Festival 2022 March 24-26.
- BatteryDev Hackathon will take place next week and Milecia McGregor will hold an Office Hours for those needing help with DVC on March 21st
- Antoine Toubhans will be presenting his DVC integration with Streamlit at PyCon Berlin as well.
CML has a new command line reference that lets you prepare the Git repository
for CML operations. For more info on
check out the docs
Even with our amazing new additions to the team, we're still hiring! Use this link to find details of all the positions and share with anyone you think may be interested! 🚀
Iterative is Hiring (Source link)
We were really excited to the the Sicara team all decked out in their DVC swag this month in this Tweet. If you haven't seen the video of Antoine Toubhans integration with Streamlit, you can see it on our YouTube channel or catch the presentation at this year's PyCon Berlin.
How do you get some DVC swag you ask? Write us some great content, contribute to our tools, give a presentation at one of our Meetups! We'd love to have you!
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.