December '21 Heartbeat
This month you will find:
🥰 Tutorials and workflows from the Community,
🗣 DVC, CML and Rasa,
🎙 Speech Diarization,
🧐 Research paper on MLOps AntiPatterns,
📖 Docs updates,
💻 Online Course updates,
🚀 Info on our growing team, and more!
- Jeny De Figueiredo
- December 15, 2021 • 5 min read
We've made it to the end of the year! 2021 has been an amazing journey for us and our growing Community all over the world. There's lots of great news this month. Let's not waste a heartbeat and get right to it! 😉
Matthew Upson, Founder at MantisNLP, an AI consultancy focused on NLP, along with his team, put out the first blog post in a series showing how to use DVC and CML along with Rasa in developing conversational AI. This post sets the scene for the following more detailed parts, but lays out DVC's use for generating the DAG as well as logging metrics and using CML to do the testing. We're looking forward to the next installments!
The co-authored article entitled, “Who Said That?” A Technical Intro to Speaker Diarization," by Dario Cazzani, and Alex Huang, machine learning engineers at Cisco, provides an introduction to the topic of Speaker Diarization, or who spoke when, in audio recordings. Their team's solution takes you through the fingerprinting of voices, clustering to assign speaker labels, creating the needed data pipeline, and the integration with Webex.
In this process, the team derives benefit from using DVC to version data and models, as well as easily collaborate with each other and the transcription team. More info on this project can be found in their repo.
Dario Cazzani and team's process for assinging speaker labels to audio files (Source link)
Matthew Segal, in his post, "DevOps in Academic Research," reviews his work of applying some of the tried and true practices in DevOps to data science projects using a Markov chain Monte Carlo (MCMC) technique to create a model to simulate the spread of tuberculosis and later, as the pandemic erupted, COVID-19.
The article covers mapping the workflow (see below), testing the codebase, smoke tests (with a guide link), contiunuous integration, and data management (where he recommends DVC).
Working to develop a pipeline (Source link)
Well Thoughtworks included DVC in its recent Thoughtworks Guide to MLOps Platforms. While being included is great, things move so fast that they seemed to have missed our experiment capabilities and the CI/CD capabilities for machine learning of CML.🤔
And if they only knew what's to come! 🚀 Lots planned in the new year!
Dmitry Petrov's meme (Source Link)
In his post, What is MLOps - Everything You Need to Know to Get Started, Harshit Tyagi provides an overview of MLOps and why it's necessary for today's ML and AI to production projects. You will learn the different parts of the puzzle that make up MLOps, and review the machine learning life cycle. In the post, Harshit also provides a video of the concepts as well as an interview with our CEO, Dmitry Petrov. Be sure to check it out!
Harshit Tyagi's ML Systems Engineering and Operations with their Stakeholders (Source link)
Nikhil Maralidhar, et. al., in their survey paper, Using AntiPatterns to avoid MLOps Mistakes, aim to develop a vocabulary for anti-patterns found in machine learning projects in the financial services industry. In the paper, they also give recommendations for acheiving MLOps at an enterprise scale using processes for documentation and management. Luckily, our tools help you to solve some of these challenges!
Using AntiPatterns to avoid MLOps Mistakes
Amrit Ghimire joins our Studio team as a back end developer, from Pokhara, Nepal. Prior to joining Iterative, he lead a team at Leapfrog, Inc. to develop applications for a drug discovery company. Amrit likes to read and watch movies in this free time and works to complete reading 3-4 books per month. Finally he enjoys working in Python, Rust and customizing Linux systems and personal command line automations. Welcome Amrit! 🎉
As always, we're still hiring! Use this link to find details of all the positions including:
- Senior Software Engineer (ML, Labeling, Python)
- Senior FronteEnd Engineer (Typescript, Node, React)
- Senior Software Engineer (ML, DevTools, Python)
- Senior Software Engineer (ML, Data Infra, GoLang)
- Field Data Scientist / Sales Engineer
- Developer Advocate (Machine Learning)
- Director / VP of Engineering (ML, DevTools)
- Director / VP of Product (ML, Data Infra, SaaS)
- Head of Talent
- Head of DevRel
- Account Executive (Sales)
Please pass this info on to anyone you know that may fit the bill. Come join our rocket ship! 🚀
The DVC team has been steadily adding to the Experiment Management section of our docs. We want to make sure that all your experiment versioinng needs are met and there's more to come! 🚀
And don't miss the latest Use Case on Machine Learning Experiment Tracking, which outlines going from the traditional, painful, note taking, to more advanced methods, and compares how DVC can take you to the next level!
Tired of this? Check out our docs! (Source link)
The course is in editing mode and this week we are getting the second cuts for review. The first course will focus on DVC for Data Scientists and Analysts. The course is on track to be out by the end of the year! It will be 100% FREE and available from our websites. We are so excited about how it's coming to life! 🚀
👀 Note the the Udemy channel in Discord has now changed to #iterative-online-course. We're getting ready!
Be sure to join us at the January Office Hours Meetup, where Gennaro Todesco, Senior Data Scientist at Billie.io, will present his workflow with DVC and CML, and his Neovim-DVC plugin. Tezan Sahu, will follow presenting a workflow from a series of tutorials that we shared from him in the September Heartbeat, including DVC, PyCaret, MLFlow and FastAPI.
January Office Hours Meetup - 2 workflows
There were many candidates this month. Check out our Testimonials Wall of Love, which is now live on our Community Page and holds many of our favorite Tweets! If you'd like to give a shout our for our tools head here to make a video or written testimonial. We'd appreciate it! 🙏🏼
But for this month, this Tweet wins the coveted Tweet Love slot.
Playing with Data Science Version control (DVC) from @Iterativeai - amazing how much it has progressed since I looked at it a couple of years ago— Chris Samiullah (@ChrisSamiullah) November 19, 2021
And with that we close out the year! We send a huge thank you to all of our Community members that help us make our tools better. Thank you for your contributions, trust and feedback! We look forward to continue to grow with you in 2022! Have a wonderful holiday season and Happy New Year! 🎉
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.