

DALL-E 2 generates the image using a process called diffusion, which Dhariwal describes as starting with a “bag of dots” and then filling in a pattern with greater and greater detail. CLIP was designed to look at images and summarize their contents the way a human would, and OpenAI iterated on this process to create “unCLIP” - an inverted version that starts with the description and works its way toward an image. But the word-matching didn’t necessarily capture the qualities humans found most important, and the predictive process limited the realism of the images.
#Cnn newsbar generator series
“DALL-E 1 just took our GPT-3 approach from language and applied it to produce an image: we compressed images into a series of words and we just learned to predict what comes next,” says OpenAI research scientist Prafulla Dhariwal, referring to the GPT model used by many text AI apps. The generated images are 1,024 x 1,024 pixels, a leap over the 256 x 256 pixels the original model delivered.ĭALL-E 2 builds on CLIP, a computer vision system that OpenAI also announced last year. They can also blend two images, generating pictures that have elements of both.

Users can upload a starting image and then create a range of variations similar to it. Another feature, variations, is sort of like an image search tool for pictures that don’t exist.

The model can fill (or remove) objects while accounting for details like the directions of shadows in a room. You can block out a painting on a living room wall and replace it with a different picture, for instance, or add a vase of flowers on a coffee table. Users can start with an existing picture, select an area, and tell the model to edit it. One of the new DALL-E 2 features, inpainting, applies DALL-E’s text-to-image capabilities on a more granular level. A DALL-E 2 result for “Shiba Inu dog wearing a beret and black turtleneck.” It’s attempting to address those issues using technical safeguards and a new content policy while also reducing its computing load and pushing forward the basic capabilities of the model. At the time, OpenAI said it would continue to build on the system while examining potential dangers like bias in image generation or the production of misinformation.

It was a limited but fascinating test of AI’s ability to visually represent concepts, from mundane depictions of a mannequin in a flannel shirt to “a giraffe made of turtle” or an illustration of a radish walking a dog. The original DALL-E, a portmanteau of the artist “Salvador Dalí” and the robot “WALL-E,” debuted in January of 2021. But researchers can sign up online to preview the system, and OpenAI hopes to later make it available for use in third-party apps. As with previous OpenAI work, the tool isn’t being directly released to the public. It also includes new capabilities, like editing an existing image. DALL-E 2 features a higher-resolution and lower-latency version of the original system, which produces pictures depicting descriptions written by users. Artificial intelligence research group OpenAI has created a new version of DALL-E, its text-to-image generation program.
