Member-only story
The AI Powering Imagen Video: Google’s New Text-to-Video Super Model
The new model can generate high-frame fidelity videos from textual inputs.
I recently started an AI-focused educational newsletter, that already has over 125,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
Text-to-Video(TTV) synthesis is rapidly evolving into one of the new fronts of innovation in the deep learning space. Recently, Meta AI unveiled Make-A-Video, a new TTV model that builds on their Make-A-Scene text-to-image synthesis method. Shortly after, Google published a paper presenting Image Video, a TTV model that is able to generate short, high-frame fidelity videos from textual inputs.
As it names indicates, Imagen Video builds on Google’s own Imagen text-to-image synthesis models. In fact, one of the biggest contributions of Imagen Video was to…