The Sequence Scope: The Challenge of Data-Efficient Machine Learning

Image for post
Image for post

This is a summary of the most important published research papers, released technology and startup news in the AI ecosystem in the last week. This compendium is part of TheSequence newsletter. Give it a try by subscribing below:


Supervised learning is the dominant school in machine learning solutions. The idea of training a model in a labeled dataset in order to master a task seems intuitive. However, in practice, many supervised learning techniques run into the challenges that require large labeled datasets in order to generalize even very simple knowledge.

This challenge is overwhelming for both startups and big companies and is one of the main roadblocks for the mainstream adoption of ML. We all love to hear about breakthroughs like AlphaGo or GPT-3 until you realize the ginormous size of the training datasets used to create those models.

The idea of building machine learning methods that can operate with smaller labeled datasets is an active area of research and there is no shortage of ideas. Techniques like semi-supervised learning attempt to use unlabeled datasets in the training process. Generative models look to create new labeled datasets from existing ones. Self-supervised learning wants to build models that learn from scratch. Transfer learning tries to reuse knowledge between tasks while meta-learning has the simple goal of building models that learn to learn from scratch. Just this week, DeepMind propose a new meta-learning technique to build more efficient reinforcement learning models. Some of the best minds in machine learning are paving the way to build more data-efficient models.

Next week in TheSequence Edge

July 28, Edge#7: the concept of generative models; Optimus, one of the most innovative research in generative models recently published by Microsoft Research; ART, an open-source framework that uses generative models for protecting neural networks.

July 30, Edge#8: the concept of generative adversarial networks; the original GAN paper by Ian Goodfellow; deep dive into TF-GANs.

To stay up-to-date and receive TheSequence Edge every Tuesday and Thursday, please consider joining our community. Till August 15 you can subscribe with a permanent 20% discount. Sunday edition of TheSequence Scope is always free.

Now, let’s review the most important developments in AI research and technology this week.

ML Research

Using Meta-Learning to Generate Reinforcement Learning Models

DeepMind researchers published a paper proposing a meta-learning method that can automatically generate reinforcement learning models ->read the original paper

Self-Supervised Learning for Image Classification

Facebook AI Research (FAIR) is at the forefront of self-supervised learning. Recently, they published a paper proposing a self-supervised learning method for training image classification models ->read more in Facebook AI blog

Data-Efficient Reinforcement Learning

The prestigious Berkeley AI Lab published a couple of papers about techniques for making reinforcement learning operate with smaller training datasets ->read more in Berkeley AI Lab blog

Cool AI Tech Releases

LinkedIn’s LIquid

LinkedIn’s engineering team detailed their work on LIquid, a new type of graph database ->read more in LinkedIn’s engineering blog

TensorFlow Lite XNNPACK Integration

The TensorFlow team unveiled support for the XNNPACK hardware acceleration library. The integration will allow 2–3x faster inference routines in TensorFlow Lite models ->read more in TensorFlow blog

Useful Tweet

GPT3 (generative pre-trained transformer) is the newest in the family of NLU (nature language understanding) models but it’s been around for a few months. It uses the transformer architecture that we covered in Edge#3. The hype around GPT3 is huge but the language generators are still in a very nascent stage. Great opportunities to join OpenAI and work on the development of this fascinating technology.

Image for post
Image for post

Money in AI

  • AI-based crowdsourcing startup StuffThatWorks raised a $9 million seed round. Its idea is to let people find the most effective treatments collaborating with each other. The human input is enhanced by ML algorithms that are programmed to look for valuable insights.
  • Autonomous technology company and intelligence systems provider Sea Machines Robotics raised $15 million to accelerate deployment of its AI-powered situational awareness mechanism in the unmanned naval boat and ship market. They are hiring.
  • Educational platform Riiid has just raised quite a large sum of $41.8 million for AI-test prep solutions. After successful concept validation in Korea, Japan, and Vietnam, Riiid is going to expand across the U.S., South America, and the Middle East.
  • Big data analytics startup Quantexa that built a machine learning platform “Contextual Decision Intelligence” (CDI) has raised $64.7 million. The platform gathers scattered data points, analyses it to uncover risky activities, as well as enhance customer intelligence, and deal with credit risk and fraud challenges.

Written by

CEO of IntoTheBlock, Chief Scientist at Invector Labs, Guest lecturer at Columbia University, Angel Investor, Author, Speaker.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store