Inside DALL-E 2: OpenAI’s Upgraded Supermodel that can Generate Artistic Images from Text

The new model outperforms its predecessor by generating higher quality images from highly complex language descriptions.

Jesus Rodriguez
3 min readApr 11, 2022
Image Credit: OpenAI

In early 2021, OpenAI unveiled DALL-E, a neural network able to generate photorealistic images from text descriptions. The model quickly became one of the standards for text-to-image generation tasks. Last week, the AI powerhouse provided a preview of DALL-E 2, a much improved version of the neural network able to generate much more realistic images.

Just like it’s predecessor, DALL-E 2 is able to learn relationships between text and image representations which are the key of text-to-image generation tasks. DALL-E 2 exhibits much higher text comprehension abilities than its previous version and is able to generate images with up to a 4x resolution. The contrast is incredibly visible in some of the outputs of both models.

Image Credit: OpenAI

The Architecture

The magic behind DALL-E 2 is based on a technique called unCLIP. The original version of DALL-E was based on…

--

--

Jesus Rodriguez

CEO of IntoTheBlock, President of Faktory, President of NeuralFabric and founder of The Sequence , Lecturer at Columbia University, Wharton, Angel Investor...