I wrote earlier about DALL-E, an image generating algorithm recently developed by OpenAI. One part of DALL-E’s success is another algorithm called CLIP, which is essentially an art critic. Show CLIP a picture and a phrase, and it’ll return a score telling you how well it thinks the picture matches the phrase. You can see how that might be useful if you wanted to tell the difference between, say, a pizza and a calzone - you’d show it a picture of something and compare the scores for “this is a pizza” and “this is a calzone”.
How you come up with the pictures and captions is up to you - CLIP is merely the judge. But if you have a way to generate images, you can use CLIP to tell whether you’re getting closer to or farther from whatever you’re trying to generate. One example of this is Ryan Murdock’s Big Sleep program, which uses CLIP to steer BigGAN’s image generation. If I give Big Sleep the phrase “a cat singing a sea shanty”, it’ll start with some random guesses and then tweak the image based on CLIP’s feedback, searching for an image that better fits the prompt.
So its first try at “a cat singing a sea shanty” might be this:
After maybe 20 minutes of crunching on a big processor, it find a better image, looking something like this:
It isn’t nearly as good as the DALL-E results I’ve seen (although to be fair, they posted the best of tens to thousands of tries, and I haven’t seen DALL-E try “a cat singing a sea shanty”). But it’s an interesting way of exploring BigGAN’s latent space. Here’s “a giraffe in the Great British Bakeoff”.
I’ve mentioned before that BigGAN was trained on about 1000 categories of images from ImageNet, all hand-labeled by hired humans. “Giraffe” and “Great British Bakeoff” are not among the ImageNet categories, but CLIP can still give feedback about what looks like a giraffe, and what looks like an outdoor baking show, because CLIP was trained on a huge set of images and nearby text scraped from the internet. The upside of this is that CLIP knows how to nudge BigGAN into finding something vaguely like “My Little Pony Friendship Is Magic in outer space”, even if nothing like this was in the BigGAN training images.
The downside of this is that CLIP has seen a skewed picture of the world through its internet training data. In their paper, the makers of CLIP discuss a bunch of biases that CLIP learned, including classifying people as criminals based on their appearance or making assumptions about what a doctor looks like. If I had given it terrible prompts it would have helpfully done its best to fulfill them.
Many times, the CLIP-BigGAN combination steered out of control rather than arriving anywhere. “Spiderman delivering pizza” looked at first like it was refining a drawing of a technicolor emu:
then dissolved into sudden chaos:
heroically refined that chaos into something resembling spiderman and a single slice:
and then dissolved suddenly into white nothingness from which it never recovered. This chaotic disaster happens when the search wanders off into extreme values and is usually accompanied by intense strobing colors and huge areas of blank, or spots, or stripes. And it happened all the time. When I gave it the AI-generated drawing prompt “Coots of magic”, things were looking promising for a few minutes:
But in the very next step it collapsed into this:
A “professional high quality illustration of a cat singing a sea shanty” got this far:
before turning into this neon laserfest:
In other cases, progress would just stop somewhere weird.
“Tyrannosaurus Rex with lasers” became ever more finely textured but just looked like a dog.
This is why AI programmers tend to laugh at the idea that an AI left running long enough would eventually become superintelligent. It’s hard to get an AI to keep progressing.
I had particular difficulty trying to get it to find a photo of Bernie Sanders. In three different trials I would get neon blobs from the get-go. Finally CLIP-BigGAN went with a strategy that has occasionally paid off for it: if you can’t make it well, make it very tiny. I give you “a photo of Bernie Sanders sitting on a chair and wearing mittens”.
For more examples, including when I tried to get it NOT to generate a picture of a pink elephant, become an AI Weirdness supporter! Or become a free subscriber to get AI Weirdness in your inbox.
“A golden retriever in the Great British Bakeoff”