Google Video Intelligence API is a Major Milestone for the AI Space

Google Cloud is really committed to make artificial intelligence(AI) the foundational piece of its next generation services. A few months ago, Google Cloud had little or no capabilities in the AI space. Now, in just a few months, Google Cloud has launched a series of AI and machine learning(ML) services that are making the cloud platform some of the top contenders in the market. Last week during its Cloud Next Conference, Google announced a new addition to its cognitive AI stack with the release of the Video Intelligence API.

The availability of a Video AI API represents a major milestones for Google for a couple of reasons. For starters, the new API could become a robust differentiator with competitors such as Microsoft Cognitive Services or Watson Developer Cloud that have limited or no video intelligence capabilities at the moment. Secondly, Google Video Intelligence API represents an initial step on what could be one of the biggest markets for mainstream AI capabilities. Notice that I’ve used the term mainstream because, even though there have been a lot of practical video AI capabilities (noticeably self-driving cars or airport surveillance ) there haven’t been available to mainstream developers. The general availability of technologies such as Google Video Intelligence API can open the door to one of the biggest AI markets known until this point.

The mainstream availability and richness of content of video data sources creates an ideal environment of the proliferation of video AI capabilities such as the ones provided by Google Video Intelligence API. However, the unique characteristics of video analytic scenarios make this space equally exciting and challenging. To clarify some of these ideas, let’s explore some of the video AI techniques and capabilities that could be soon included as part of Google Video Intelligence API or competitive offerings.

5 Key Capabilities of Video AI Services

1 — Object Recognitition

Recognizing objects in video sequences is one of the most straightforward video AI techniques. This capability has been the core focus of Google’s new Video Intelligence API. Some of the interesting scenarios addressed by this feature include object search video cataloging, etc.

2 — Behavior & Intent Detection

Videos, differently from images, include enough data points to detect intentions or actions of objects in the video. For instance, a Video AI API analyzing a bank branch could identify relevant situations such as person extracting cash from an ATM or a customer who has been waiting in line for a long time.

3 — Sentiment Analysis

Video AI techniques can also enable sentiment analysis capabilities similar to those used in text analytic stacks. A Video AI system could process streams from a casino cameras and determine whether players on a poker game are anxious, happy or acting suspiciously.

4 — Predictions

Video AI models can be used to predict and calculate the actions of objects in a video feed. Let’s use a self-driving car scenario on which a car can determine if another vehicle is changing lanes or if a pedestrian is crossing the road.

5 — Bringing it all Together

Videos, more than any other cognitive data source resembles real life environments. Video data sources combine information in the form of audio ,text and, of course, video. As a result video AI capabilities orchestrate many AI capabilities such as image processing, speech recognition, natural language and many others. More about this in a future post.

Written by

CEO of IntoTheBlock, Chief Scientist at Invector Labs, Guest lecturer at Columbia University, Angel Investor, Author, Speaker.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store