The Sequence Scope: The MLOps Space is Getting Crowded and Confusing
Weekly newsletter with over 80,000 subscribers that discusses impactful ML research papers, cool tech releases, the money in AI, and real-life implementations.
The Sequence Scope is a summary of the most important published research papers, released technology and startup news in the AI ecosystem in the last week. This compendium is part of TheSequence newsletter. Data scientists, scholars, and developers from Microsoft Research, Intel Corporation, Linux Foundation AI, Google, Lockheed Martin, Cardiff University, Mellon College of Science, Warsaw University of Technology, Universitat Politècnica de València and other companies and universities are already subscribed to TheSequence.
Subscribe to stay up-to-date with the most relevant projects and research papers in the AI world. Trusted by 85,000+…
📝 Editorial: The MLOps Space is Getting Crowded and Confusing
MLOps is one of the most popular and overloaded terms in modern machine learning. Typically associated with platforms that manage different aspects of ML models, MLOps seems to be used indiscriminately today to describe everything from model training to monitoring. As a result, it becomes really confusing for organizations and data science teams trying to assemble MLOps capabilities in their ML pipelines. Just this week, I read separate press releases about funding rounds for MLOps startups like Comet and Snorkel, which operate in areas as different as model monitoring and data labeling respectively. So don’t feel bad if you are confused about MLOps😉
The overcrowding of MLOps is a result of the tremendous levels of innovation in the machine learning space. To make sense through the noise, it might help to think about three main categories of MLOps platforms:
In one group you can place the big cloud platforms such as AWS, Microsoft and Google that have built MLOps capabilities across their machine learning services.
The second relevant group is end-to-end MLOps runtimes such as KubeFlow or MLFlow that manage many aspects of the lifecycle of machine learning solutions.
Finally, we have startups that are focusing on individual features of machine learning pipelines like training or monitoring. The current machine learning market is fragmented enough that, at this point, it makes sense to bank on best-of-breed startups in individual categories while being aware that consolidations in the MLOps market are poised for a decent level of consolidation in the near future.
🗓 Next week in TheSequence Edge:
Edge#79: What is Few-Shot Learning; Prototypical Networks as One of the Most Popular Few-Shot Learning Architectures; TorchMeta is the OpenAI Gym of Meta-Learning
Edge#80: deep dive into a data labeling use case from Snorkel AI .
Now, let’s review the most important developments in the AI industry this week
🔎 ML Research
Generating Text from Data
Amazon Research published a blog post explaining the research behind DataTuner, an open-source model that is able to generate text from structured datasets ->read more on Amazon Research blog
Casual Conversations Dataset
Facebook AI Research (FAIR) published a paper describing Casual Conversations, an open-source dataset to improve fairness in computer vision systems ->read more on FAIR blog
Using Transformers to Learn Organic Chemistry
IBM Research published a fascinating paper outlining a transformer model that was able to extract a grammar of organic chemistry by learning from chemistry reactions ->read more on IBM Research blog
🤖 Cool AI Tech Releases
Lookout for Equipment
AWS announced the release of Lookout for Equipment, a new service that uses machine learning models optimized to protect customer equipment at scale ->read more in the AWS press release
Trifacta + Databricks
Trifacta announced native integration capabilities for the Databricks platform to enable data quality management capabilities on their Lakehouse platform ->read more in original press release
💸 Money in AI
- MLOps startup Snorkel AI raised a $35 million Series B round and introduced Application Studio, a visual builder with templated solutions for common AI use cases and easy construction of new and custom use cases (currently in preview). Incubated at Stanford University in 2016, Snorkel became a very popular open-source project for data labeling. Snorkel Flow is an end-to-end platform built on the principles of the Snorkel project.
- MLOps startup Comet raised $13 million in a Series A funding round. (We covered them in Edge#11). Comet is one of the machine learning platforms that’s gaining increasing traction within data science teams. The platform streamlines the creation of ML models and experiments across different frameworks.
- Streamlit raised $35 million in Series B funding. It’s an open-source app framework for ML and Data Science teams, helping them turn data scripts into data apps. All in Python.
- No-code data lake engineering platform Upsolver raised $25 million. They simplify transforming raw data into queryable data through a visual SQL UI, and automate hundreds of data lake engineering tasks to optimize performance.
- ML monitoring platform Aporia raised a $5 million seed round. They claim that the core of their platform is a strong ML monitoring engine topped by a flexible, collaborative UX that “turns monitor configuration and modification into an effortless — even fun — experience”.
- Synthetic data startup Synthesis AI raised $4.5 million in its funding round. Using a proprietary combination of generative neural network and cinematic CGI pipelines, it creates a vast amount of synthetic perfectly-labeled data to build more capable computer vision models.
- Computer vision development platform CrowdAI raised a $6.25 million Series A financing round. It’s an end-to-end, no-code platform that builds custom AI to automate visual inspection for clients and help them analyze imagery and video.