January '23 Heartbeat
This month you will find:
🎥 MLEM tutorial video from Community member,
🥇 Top Python tools for 2022 from Tryolabs,
🎅🏼 Naughty or Nice MLEM project,
❣️ Unstructured Data Query Language coming,
🎥 Sami Jawhar's Running Parallel Pipelines with DVC & TPI Video,
🎥 Casper da Costa-Luis' MLOps Summit presentation video,
👀 DVC tutorial, and more!
- Jeny De Figueiredo
- January 17, 2023 • 4 min read
Happy New Year! We are looking forward to what’s going to be a stellar year for us and for all of you! We are hoping for peace to reign, the recession to subside, and success aplenty. 🤞🏼 Are you ready? Let’s do this!
We always start with DVC, but this month, in this new year, we’ll start with MLEM! We released MLEM in June of last year and have made some advances to it already. It seems the Community is learning about it and recognizing its benefits. We are thrilled to see that!
JCharis Jesse created the FIRST video tutorial from the Community for MLEM! In this very well-explained and recorded video, Jesse takes you through what MLEM is and where it fits in the machine learning to production process. He follows that by showing the different options of saving a model, where to find the model metadata and how it works, loading the ML model, examples of serving with FastAPI and Docker, and finally applying the model to data for prediction. If you are interested in using MLEM for serving your models, this will definitely help get you started! You can find a ton of other great content on his YouTube site.
From our friends at Tryolabs, Alan Descoins and Facundo Lezama round out 2022 with Tryolabs’ annual picks for the best Python Libraries of 2022. The requirements to make the cut are for libraries that were launched or gained popularity within the year. They have a list of top 10 picks that you will want to take a look at, including LineaPy which helps you convert notebooks to production pipelines. MLEM also made the list in the category of Tools & Enablers.
Tryolabs Best Python Libraries of 2022 (Source link)
In the first part of a new series on DVC, Bex Tuychiev writes a fire 🔥 tutorial on DVC in Towards Data Science with a computer vision project using the German Traffic Sign Recognition Benchmark Dataset and Tensorflow. He guides you on getting the project properly set up, then how to start adding, tracking, pulling, and pushing files with DVC. Next, he goes over building the image classification model and then concludes with how to create a shared cache if you are working on a large project with a team. Reproducibility and Collaboration for the win! We are looking forward to the next parts of the series!
For a very nice comparison of Data Versioning Tools, look to Aryan Jadon’s recent post on the subject. He seems to hit them all, providing information about their benefits and things of which to be cautious. Naturally, DVC makes this list with the only caution being, “you need to use a Git repository to use DVC’s versioning features." Isn’t Git a part of every modern tech stack? 😉 Staying true to our mission to deliver the best developer experience for machine learning teams by creating an ecosystem of open, modular ML tools!
Deciding on Data Versioning Tools? (Source link by Mary Amato )
If you couldn’t make the December Meetup, good news! The video is already out! Sami Jawhar joined us to share a solution he built to run parallel pipelines with DVC and TPI to save time processing the massive amount of data they use in their brain research at Kernel. He describes the context of his situation as well as all of its constraints and finally the details of the solution, coined “Neuromancer” after the famous sci-fi novel. Get ready for some mind-blowing engineering! 🤯
In case you missed it while you were out for the holidays, Alex Guschin and Mike Sveshnikov, your friendly neighborhood MLEM creators, put together a fun project using MLEM to determine if you had been naughty or nice just ahead of Santa’s trot around the globe in 2022. In the blog post, you will learn how they DDOS’ed Santa’s website, Trained a Christmas (decision) tree, and Deployed a ML service with MLEM to Streamlit to see the predictions.
Our CML Product Manager, Casper da Costa-Luis' presented in November at MLOps Summit on Painless cloud experiments without leaving your IDE. The presentation is now available on YouTube here. If Full lifecycle management of computing resources (including GPUs and auto-respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)… without needing to be a cloud expert appeals, this talk is for you! He discusses how to move experiments seamlessly between a local laptop, a powerful cloud machine, and your CI/CD of choice.
Do you use Amazon S3, Azure Blob Storage, or Google Cloud Storage? We have a new solution for finding and managing your datasets of unstructured data like images, audio files, and PDFs! Extend your DVC environment with the first unstructured data query language (think SQL -> DQL) for machine learning. We are looking for beta customers for this new tool.
Our favorite Tweet this month is from Osman Bayram who mentions he plans to use CML with Huggingface GPU. We are looking forward to that! 🍿 I'm seeing a lot of popcorn eating in our future. See you next month!
i am going to try to make a github action to use hf GPU. possibly with iterative cml https://t.co/W1nWr0kdeO— Osman Bayram 🐤 (@the_osbm) December 22, 2022
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.