Sitemap

Member-only story

The AI Powering Imagen Video: Google’s New Text-to-Video Super Model

The new model can generate high-frame fidelity videos from textual inputs.

3 min readOct 24, 2022
Image Credit: Google Brain

I recently started an AI-focused educational newsletter, that already has over 125,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

Text-to-Video(TTV) synthesis is rapidly evolving into one of the new fronts of innovation in the deep learning space. Recently, Meta AI unveiled Make-A-Video, a new TTV model that builds on their Make-A-Scene text-to-image synthesis method. Shortly after, Google published a paper presenting Image Video, a TTV model that is able to generate short, high-frame fidelity videos from textual inputs.

As it names indicates, Imagen Video builds on Google’s own Imagen text-to-image synthesis models. In fact, one of the biggest contributions of Imagen Video was to…

--

--

Jesus Rodriguez
Jesus Rodriguez

Written by Jesus Rodriguez

Co-Founder and CTO of Sentora( fka IntoTheBlock), President of LayerLens, Faktory and NeuralFabric. Founder of The Sequence , Lecturer at Columbia, Wharton

No responses yet