Member-only story
Inside DALL-E 2: OpenAI’s Upgraded Supermodel that can Generate Artistic Images from Text
The new model outperforms its predecessor by generating higher quality images from highly complex language descriptions.
In early 2021, OpenAI unveiled DALL-E, a neural network able to generate photorealistic images from text descriptions. The model quickly became one of the standards for text-to-image generation tasks. Last week, the AI powerhouse provided a preview of DALL-E 2, a much improved version of the neural network able to generate much more realistic images.
Just like it’s predecessor, DALL-E 2 is able to learn relationships between text and image representations which are the key of text-to-image generation tasks. DALL-E 2 exhibits much higher text comprehension abilities than its previous version and is able to generate images with up to a 4x resolution. The contrast is incredibly visible in some of the outputs of both models.
The Architecture
The magic behind DALL-E 2 is based on a technique called unCLIP. The original version of DALL-E was based on…