October '22 Heartbeat
This month you will find:
🎙 Andrew Ng Intel Keynote talk,
🇺🇸 White House Blueprint for AI Bill of Rights,
🧐 CML in research,
🎥 Nadia Nahar video: Collaboration Challenges in ML-Enabled Systems,
🐉 DVC-Hydra integration,
🗣 CI/CD for Machine Learning upcoming webinar,
🚀 New hire, and more!
- Jeny De Figueiredo+1
- October 20, 2022 • 6 min read
Welcome to October! As the days grow shorter or longer depending on your hemisphere, we bring you the latest and greatest from the Iterative Community.
At Intel’s Innovation conference, Andrew Ng gave a keynote on democratizing AI. He posits that while large companies have embraced AI, most smaller companies outside of the consumer-based domains still struggle. He provides two main reasons for this: small datasets and customization.
According to Ng, data-centric AI will be the key to unlocking that potential, forcing a paradigm shift away from code-centric AI. In this scenario, people could take mostly ready-built ML tech and focus on the data to ensure it captures all necessary domain knowledge.
For example, two companies that produce cornflakes and medication could take the same ML model and train it on their respective datasets. As long as they have the right tools and practices and provide a domain representative dataset, the same model can reproduce effective results. If you want to see some of the tools Ng uses, make sure to check out his keynote.
What do you think? Does the average data scientist need a different set of skills in the near future? Are you in one of these smaller industries that are starting to embrace AI? We'd love to read your thoughts! Join us in our discussion of this topic on Discord!
If you will recall from last month's Heartbeat we called to your attention the EU AI Act. This act proposes new rules that would require that open source developers adhere to guidelines across a spectrum of categories including risk management, data governance, technical documentation and transparency, standards and accuracy, and cyber security. Not to be outdone, the US White House declared a Blue Print for an AI Bill of Rights. The White House Office of Science and Technology Policy (OSTP) has defined 5 categories for these rights:
- Safe and Effective Systems
- Algorithmic Discrimination Protection
- Data Privacy
- Notice and Explanation
- Human Alternatives, Consideration, and Fallback
There's definitely some overlap here with the EU AI Act and some catching up with Data Privacy in the mix. There's lots to unpack, compare, and contrast on scope and philosophy between the two. It's nice to see that major attention is given to these issues.
We could think of the relationship between AI rights and Andrew Ng's talk in the sense of the AI space maturing. To Andrew Ng's points, as we move from the frenzied all-important model development to an understanding of the need for a data-centric approach and this democratization, we are changing the focus to enable us to adequately address these hard and important issues. Improving the efficiency of tooling will help with this too. That's why we are here.
What do you think? Do the efficiencies we are gaining open up room for improved time/attention to bake protections into the process or am I too hopeful? Head to Discord and share your thoughts!
Did you hear? DVC has a new integration with Hydra. Now you can use Hydra composition to configure your DVC experiments. You can also apend and remove parameters on the fly as well as do a grid search of parameters. Random search functionlity is coming, weigh in on the issue here. Find out more in David de la Iglesia's blog post.
If you missed the October Meetup with Nadia Nahar presenting her team's research on Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process don't worry, there's a video! Catch it below!
Join us for our next meetup on November 16th. We will have Dmytro Filatov of DeepX presenting Continous Computer Vision with DVC and CML and Jelle Bouwman demoing Iterative Studio Model Registry. Be sure to register here!
Continuous Computer Vision with DVC and CML plus Iterative Studio Model Registry Demo
Join Alex Kim on November 30th with ODSC to learn about CI/CD for Machine Learning. This webinar shares how CML is a project to help ML and data science practitioners automate their ML model training and model evaluation, using best practices and tools from software engineering, such as GitLab CI/CD (as well as GitHub Actions and BitBucket Pipelines). The idea is to automatically train your model and test it in a production-like environment every time your data or code changes. In this talk, you'll learn how to:
- Automatically allocate cloud instances (AWS, Azure, GCP) to train ML models. And automatically shut the instance down when training is over
- Automatically generate reports with graphs and tables in pull/merge requests to summarize your model's performance, using any visualization library
- Transfer data between cloud storage and computing instances with DVC
- Customize your automation workflow with GitLab CI/CD
Sign up for the talk here.
Alex Kim webinar CI/CD for Machine Learning for ODSC (Source link)
It's Hacktoberfest month and we are participating! Find out all the information
in Mert Bozkir's
blog post. But if
you just want to jump in, find all the open HackToBerFest issues
Follow along in the
#hacktoberfest channel in Discord to keep up to date for
the rest of the month and be sure to read next month's Heartbeat to learn of the
Ivan Longin joins us as a Senior Software Engineer on the Iterative Studio team from Zadar, Croatia. When Ivan's not working he likes to spend time doing outdoor activities, swimming in good weather, and or just walking or often running after his one-year-old! Been there three times over! ❤️ Welcome Ivan!
This month was full of great content. We wanted to give a shout-out to all of
it, so we are trying out a more abbreviated list.
Thanks to all these amazing Community members that are sharing their knowledge! 🚀
- Data and Machine Learning Model Versioning with DVC by Ruben Winastwan Nice visuals! ⭐️
- A great guide from Willem Meints - Managing Machine Learning Datasets with DVC. Also, find his Tweets on Twitter
- Jorge Namour will give a Webinar on Tracking Data with Git + DVC en Español on October 27th at this YouTube link.
- Some GitHub goodness: MLOps - tutorial with DVC, MLFlow, and Pycaret from Murilo Cunha, vspara, and virginiemar
- Updated Udemy course that includes DVC - Complete MLOps Bootcamp | From Zero to Hero in Python 2022
- How to Version Control Your Data and Models with DVC (Video included) by Khuyen Tran Dig the DVC color-themed command line! 🤩
- NLP and CV with DVC! From UNet to BERT: Extraction of Important Information from Scientific Papers by Eman Shemsu
- [MLOps] How to use DVC (Data Version Control) data versioning in Korean 🇰🇷 by Minimin2
- Great guide from Déborah Mesquita - The ultimate guide to building maintainable Machine Learning pipelines using DVC (Video Included) ⭐️
- Also from Khuyen Tran: Create a Maintainable Data Pipeline with Prefect and DVC
- In-depth tutorial covering Data Management, Pipelines and Experimentation with DVC Gleb Ivashkevich - Creating Reproducible data Science Workflows with DVC ⭐️
- Data Version Control (DVC): Beginner's Guide by Ajmain Inqiad Alam
- There is now a DVC Wikipedia page!
- Great discussion around challenges in Machine learning from Dmytro Samchuk - Machine Learning Done Right in Your Business.
- CML in research! 🤩 A Preliminary Investigation of MLOps Practices in GitHub, PDF by Fabio Calefato, Filippo Lanubile, and Luigi Quaranta
- Part III in Matt Upson's: series MLOps for Conversational AI with Rasa, DVC, and CML (Part III)!
- Zen ML adds CML to its Awesome Data Science with Python list. 😎
- Alessandro Paticchio (Casavo) Using AI to automatically estimate the status of a façade. ⭐️
- CI/CD for Machine Learning Model Training with GitHub Actions by Zoumana Keita
A little belated but neverthless hugely interesting post by my co founders @m_a_upson in which he touches on some core tools we use at Mantis like @DVCorg, @Rasa_HQ and continuous machine learning.— Nick Sorros (@nsorros) September 19, 2022
It comes with code 💻 so you can take some of what you will read and use 🚀 https://t.co/PHgLXtvckz
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.