I’ve been working with an image-generating algorithm by Vadim Epstein called CLIP+FFT, which uses OpenAI’s CLIP algorithm to judge whether images match a given caption, and an FFT algorithm to come up with new images to present to CLIP. Give it any random phrase, and CLIP+FFT will try its best to come up with a matching image. And now there’s a version that will generate images to go with several phrases in a row and then fuse them into a video.
Here’s the sea shanty The Wellerman, sung by Nathan Evans, Jonny Stewart, and others, and illustrated by CLIP+FFT.
Now, there are several interesting things going on here, once you get past the sheer AI fever dream horror of it. One thing you’ll notice is that I changed some of the lines from the standard lyrics. CLIP+FFT deals with each line independently, so even if we have been talking about a ship and a whale throughout the song, the AI doesn’t know that in “when down on her a right whale bore”, the “her” refers to a ship. I made similar tweaks in one or two places.
There was nothing I could do about the line “One day, when the tonguing is done”. Trying to be more precise about the whaling sense of “tonguing” would, if anything, have made the image more horrifying.
Having none of the “Wellerman is a ship” context, the AI interprets The Wellerman itself as some kind of eldritch oil well drilling supervillain.
I kind of like what happened to “The winds blew hard, her bow dipped down,” with golden locks of hair and bows everywhere. I mean, I like it in a “oh no this has gone terribly yet fascinatingly wrong” sort of way.
The image for “We’ll take our leave and go” is also interesting, since it illustrates “leave” in so many ways. Sometimes there are cars and suitcases, or people shaking hands. Interestingly, I see hints of European Union flags and British flags in many of them, signs that during training CLIP was learning to associate “leave” with Brexit.
The “bully boys” are hilarious, classic glowering expressions and mean-kid haircuts. The AI is not used to the early-1900s meaning of “bully = awesome”
You’ll notice that many of the frames have text, which I find charming, as if the AI is frowning to itself and muttering “tea. tea. Billy. tea.” or “blow. blow.” The less interpretable the phrase is in image form, the more likely the AI is to use text instead.
In fact CLIP treating the word and the object as equivalent has led to an interesting way of fooling its image recognition capabilities:
I also had CLIP+FFT illustrate The Twelve Days of Christmas and this is one of my favorite frames from it: Ten Lords A-Leaping
To see the other illustrated Days of Christmas (including the weirdly human-faced swans), become a supporter of AI Weirdness! Or become a free subscriber to get new AI Weirdness posts in your inbox.