More AI Image Generation

Alert Gazette readers will recall that I gave a quick introduction to DreamStudio, an AI-powered image generator, currently running in beta form. The results of my admittedly rudimentary tests were less than impressive, although the application is still in the early stages.

I also spent a little time with a similar application — although the creators call it a system — cleverly named DALL-E 2. As Joanna Stern, the Wall Street Journal’s tech columnist puts it, the name is a play on Pixar’s animated robot WALL-E and surrealist artist Salvador Dalí.

DALL-E 2, which, btw, is tedious to type so I’ll hereafter refer to it as “D2” (no relation to the more famous D2) is the brainchild of OpenAI, LP, a “capped profit” limited partnership controlled by a non-profit entity, OpenAI. Confusing, right? It’s not important for our purposes. However, an interesting tidbit for any Texans in the audience, especially those with ties to West Texas, is that former US Representative and CIA officer Will Hurd is on the board of OpenAI.

D2 is the second generation of the DALL-E system, which was rolled out in January, 2021. D2 was introduced earlier this year. According to OpenAI, D2 performs significantly better than its predecessor in terms of photorealism and accuracy matching its generated images to the text that was input to create them.

So, how does it work? Well, like DreamStudio, you need an account, which is free to set up. You also start with a fixed amount of free credits which are used up as you generate images. Once the free credits are gone, you can buy more at a cost of about $0.13/credit.

Once you’ve logged in, you find an interface that’s significantly less complicated (and less powerful) than DreamStudio’s. In fact, there are no options for tweaking parameters to guide the AI. All you can do is enter a text string describing the image you’d like to generate.

D2 proceeds to create four new images for your perusal (DreamStudio gives you the option of creating up to nine). So, when I entered a lazy ant doing yoga on an alien planet in the artistic style of Picasso, here’s what D2 offered me:

Four images generated by DALL-E 2's AI engine
Original images are each 1024×1024 pixels

You may have a different opinion, but I like the results of this query. I’m not sure why I’d ever need an image with that description, but I am sure that whatever I came up with on my own wouldn’t be as imaginative…and it would take me a darned sight longer than the 30 seconds it took D2.

D2 has a feature called outpainting, in which an uploaded image is used as a base model, so to speak, for further manipulation by the AI engine. I gave it a try, and I can’t decide if I’m impressed or just weirded out. (DreamStudio has a similar feature.)

I uploaded a lovely photo of the lovely Abbye Fabulous, may she RIP, and turned the AI loose on it. Here’s the original photo:

Photo: Abbye lying on top of a brick wall

And here’s what the AI created when I told it to generate a rainy day in New York City. Again, this is a collage of the original four much larger images.

Four images generated by DALL-E 2's AI engine

Going clockwise from top left, my comments:

  • A bit unimaginative…where’s the rain? But the reflections in the standing water are a nice touch.
  • Have no idea what that animal-looking thing in the upper right corner is
  • Good composition…Abbye eyeballing person that’s presumably eyeballing her
  • Kind of a grunge vibe, I guess; in the full-sized version you can see water pouring from the seams of the lower row of bricks

The AI did a good job carrying the brick pattern down into each image.

According to Joanna Stern’s column, both AI implementations have some built-in safeguards to disallow certain types of images…think porn or ultra-violent depictions. D2 has more restrictions than DreamStudio with regard to using images of public figures or celebrities (see also “deepfakes“). I didn’t test any of those restrictions.

As I mentioned in my DreamStudio post, AI-generated images from systems like these are considered to be in the public domain and thus are not copyright-protected. However, it might be a different matter if you incorporate an AI-generated image in whole or part into your own work of art. I’ll leave that question up to the IP attorneys in the audience.

So, where does this leave us? As a content creator, albeit not an artist or illustrator, I can’t see this technology (in its current state) being anything other than a fun and interesting toy to experiment with. For my purposes, at best it’s a quick and easy way to generate some throwaway images, such as the header at the top of this post. That’s not a bad thing, but it’s certainly not life-changing.

Of course, there are deeper issues…questions that probe the definition of art, and the role of humans in creating it. If I were truly an artist, would the use of AI in creating my art give me a sense of fulfillment? Would it be edifying to the viewers of that art if they knew how it was created? There’s something about pushing a button or clicking a link to create something that seems too easy…there’s more to the idea of suffering for one’s art than dealing with bandwidth issues.

This technology is not going away, and it’s going to become more sophisticated. I’m not an ethicist, and I’m not a Luddite, but I’ve lived long enough to understand that humans are capable of perverting the most benign aspects of life in unexpected ways, even as the same are applied to the solution of previously insurmountable problems. Again, borrowing from Ms. Stern’s excellent column, an AI-powered future should be both amazing and terrifying.

Image: Ant in the style of Frida Kahlo by DALL-E 2's AI engine
D2 generated this in response to amazing and terrifying ant in the style of Frida Kahlo.