Inside Imagen. Google’s Impressive Text-to-Image Alternative to OpenAI’s DALLE-2.
Imagen provides a simpler architecture able to generate photorealistic images from language inputs.
I recently started an AI-focused educational newsletter, that already has over 125,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:
Text-to-image(TTI) is one of the most innovative areas in multi-modal learning these days. The influence that transformer architectures have played in natural language understanding(NLU) and computer vision, have catalyzed the research in the TTI space. In the last few months, OpenAI have made the headlines by publishing two papers of their DALL-E model which can generate photorealistic, artistic images based on language. Recently, Google Brain published the research related to Imagen, a simpler alternative to…