The Sequence Scope: The Emerging Market of Data Labeling

Image for post
Image for post

The Sequence Scope is a summary of the most important published research papers, released technology and startup news in the AI ecosystem in the last week. This compendium is part of TheSequence newsletter. Give it a try by subscribing below:

📝 Editorial: The Emerging Market of Data Labeling

Metadata management has historically been one of the most boring markets in enterprise software. And it was, until machine learning came along. Supervised learning models need labeled datasets for training, and those are expensive to create and maintain. Suddenly, the boring metadata management space found a new purpose and as a result, a new generation of startups emerged trying to solve the problems of data labeling for machine learning models. Venture capital dollars have been flowing into the data labeling space, making it one of the few areas of the machine learning market in which startups have a chance to compete with technology giants like Google, Amazon or Microsoft.

Data labeling in machine learning is one of those things that is easy to trivialize until you need to do it at scale. Then the challenges are everywhere. Labeling text datasets is different from labeling images, and that is different from labeling video or audio. Furthermore, the processes for inspecting datasets with millions of records and attaching the appropriate labels have many scaling issues. Finally, data labeling is rarely an isolated process and requires collaboration between multiple teams. Those challenges require a new type of solution and we are seeing exciting platforms such as Labelbox, Snorkel.ai, and Scale AI drive innovation into the space. One thing is for certain, data labeling is becoming a standalone and highly competitive market in the machine learning space.

Aug 11, Edge#11: the concept of meta-learning; Google’s famous paper about an algorithm for meta-learning that is model-agnostic; deep dive into Comet.ml, which many people called the GitHub of machine learning.

Aug 13, Edge#12: the concept of model serving; a paper in which Google Research outlines the architecture of a serving pipeline for TensorFlow models; review MLflow, one of the most complete machine learning lifecycle management frameworks in the market.

To stay up to date and receive TheSequence Edge every Tuesday and Thursday, please consider joining our community. Till August 15 you can subscribe with a permanent 20% discount. Sunday edition of TheSequence Scope is always free.

Now, let’s review the most important developments in the AI industry this week.

🔎 ML Research

Advancing Reinforcement Learning in Gaming

Microsoft Research published three different papers detailing advancements in reinforcement learning for gaming scenarios ->read more on Microsoft Research blog

A Better Benchmark for AI Assistants

Researchers from ElementAI and Stanford University published a paper demonstrating that the market needs a better benchmark and methodology for language user interfaces ->read more in the research paper

Fooling Facial Recognition Systems

Researchers from McAfee published a paper proposing a variation of generative adversarial neural networks (GANs) known as CycleGAN that can fool a modern face-recognition algorithm into seeing someone who isn’t there ->read more on McAfee Research blog

🤖 Cool AI Tech Releases

DeText

LinkedIn open-sources DeText, a flexible framework for different natural language understanding tasks ->read more in that post from the LinkedIn engineering team

TransCoder

Facebook AI Research open-sources TransCoder, a framework that uses self-supervised learning to translate code between different programming languages ->read more on Facebook AI blog

MediaPipe Iris

Google open-sourced Media Pipe Iris, a new machine learning model for iris estimation, which is essential in many vision analysis applications ->read more on Google AI blog

💬 Useful Tweet

by Kirk Borne

Image for post
Image for post

💸 Money in AI

  • Expert System (founded in 1989), veteran in natural language understanding (NLU) technologies, raised $29.4 million in funding. Their flagship software — Cogito Discover — leverages the NLU engine to identify the content of documents in different formats and make them available for analyses and automation.
  • Health tech startup Infermedica raised $10.25 million in Series A funding. They offer symptom triage and advice to patients based on doctors’ expertise enhanced by their own ML algorithms. They also integrate with chatbots, patient portals, and EHRs.
  • Big data analytics platform StreetLight Data raised $15 million in its Series D round. It uses smartphones as sensors to measure activity on all streets, applying its ML algorithms to figure out how people move through the cities; foot and bicycle traffic, the busiest time for transportation, etc.
  • Another big data analytics startup, Isima, raised $10 million in funding to launch a data convergence platform called BiOS. The company asserts its solution can reduce or even eliminate disparate databases while improving overall speed and reliability. Its rival Quantexa recently raised $64.7 million.
  • Noise-canceling tech startup Krisp raised $5 million in Series A funding. Its ML system is trained to understand what is and isn’t a human voice in streaming audio and remove the rest, making the sound clearer.
  • Blood diagnostics startup Sight Diagnostics raised $71 million in funding. It “digitizes” blood into over 1,000 high-resolution colored microscope images, using its own machine-vision based technology trained on half a petabyte of anonymized data from four years of clinical studies to analyze such blood scans.
  • Zencity, a data-driven platform for municipalities, has just raised $13.5 million. Its algorithms analyze aggregated feedback from local communities to identify key topics and trends, in order to understand what impacts a community.
  • Deep learning tech startup Syntiant raised $35 million. It provides hardware that merges machine learning with semiconductor design for always-on voice applications. For better understanding, Syntiant creates the processors that are responsible for offering wake word, command word, event detection in your Alexa, and more.

If you find our newsletter useful, please consider supporting our efforts. Subscribe or make it a gift for those who can benefit from it. It’s the last week when you can get it with a permanent 20% discount.

Written by

CEO of IntoTheBlock, Chief Scientist at Invector Labs, Guest lecturer at Columbia University, Angel Investor, Author, Speaker.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store