What type of content do you primarily create?
They say a picture is worth a thousand words, so it’s no wonder text-to-image AI generators are so hard to use: They force you to translate a picture in your mind into a prompt that’s only a handful of lines.
On top of that, the unpredictable nature of these AI models means no prompt will ever generate the exact same image every time.
Unless you learn how to prompt them properly, the randomness that should be a feature of these AI tools can start to feel like a bug.
But with a bit of practice, and this beginner-friendly guide, AI image generation can become a powerful addition to your creative toolbox.
The good, the bad, and the weird-looking hands of AI image generation
Note: While there are several text-to-image AI models—like Midjourney, Stable Diffusion, and Adobe Firefly—we’ll mostly be focusing on DALL-E by OpenAI since it’s the most versatile and widely available option.
Before we get into writing prompts, it’s worth setting expectations around AI image generation: what it’s good for and what it’s bad at.
AI image generators work best when you play to their strengths, which are:
- Producing multiple variations and visual directions in a matter of seconds
- Visualizing concepts that would be difficult or expensive to bring to life otherwise
- Referencing famous artists, brands, and other real world examples
- Understanding all the creative jargon in different creative fields, from photography to color theory.
Text-to-image AI models are better than your average person—namely, me—at photography, graphic design, 3D modeling, and illustration. But on their own, they still struggle with tasks where precision and attention to detail are required.
Case in point: I’ve tried over a dozen prompts and different tools to generate a simple image of a chess board with all the pieces in their correct starting positions. It always gets at least one glaring detail wrong. Usually several.
Since random generation is at the core of what they do, text-to-image generators aren’t great at revising existing images based on feedback. Ask it to change one detail in an image—like make a T-shirt blue—and it'll often change that and a whole lot more.
In their current state, AI image models also generally fall short at reliably rendering text within images. They have a habit of adding extra letters, like on this birthday cake.
Credit where credit’s due though—these models have gotten a lot better at generating hands with the correct number of fingers, something they were notoriously bad at not too long ago.
So a lot of these quirks I just mentioned might get ironed out in future updates.
It's not perfect, but there are still plenty of scenarios where the ability to magically type an image into existence can be really handy (hopefully without any extra fingers).
Creative ways to use AI image generation
Since this technology is still young and constantly improving, new use cases are being discovered every day. It's safe to say text-to-image AI is no longer just a mildly amusing party trick.
Here are just a few ideas for how to use this breed of generative AI in your next creative project.
1. Bespoke stock photo substitutes
If you ever find yourself wishing for a specific stock photo that probably doesn’t exist, say “a slow loris with a gambling problem,” AI can generate a decent substitute in a snap—perfect for a throwaway joke in a YouTube video.
2. Backdrops for green screen edits
By combining the AI powers of image generation and video background removal—which your AI Underlord in Descript can do in a couple of clicks—you can drop yourself or a human subject into any scene you can imagine to help you tell your story better.
3. Background images
Another way to harness the randomness of AI to your benefit is by generating simple backgrounds consisting of shapes, colors, and illustrations based on your brand or style.
You can use these to create visually interesting backgrounds you can build upon with text and other graphical elements to create title cards, video templates, podcast audiograms, and more.
4. Cover images for podcast episodes and articles
If you manage a podcast or blog, AI can be a quick and cheap way to produce cover art and inline visuals.
For the best results, give the AI a specific concept to work with. Here’s a first attempt at generating a slot machine as a visual metaphor for the role that luck plays in generating AI images.
Prompt-writing best practices to generate better AI art
Ironically, generating AI imagery is kind of an art itself. There’s a lot of trial and error involved, and a fair bit of luck, since even the most prescriptive prompt will generate a different image every time.
So if you don't like the results, you can refine your prompt or pull the lever again to spin the reels until you hit the jackpot.
If you're generating images in Descript, Underlord will come up with multiple variations at a time, letting you pick one as the direction to regenerate more options.
Keep it short but specific
AI art prompts are usually around one to six sentences depending on how much creative control you want over the result. AI has no artistic taste of its own so it relies heavily on your instructions. An accidental "s" at the end of a noun can result in many when you only wanted one.
Give too little context and the AI model might take creative liberties where you don’t want it to; too much, and it might prioritize the wrong details in a long list of requirements.
It's kind of like delegating to an intern wearing a blindfold on the first day of the job. Minimize the potential for misinterpretation with explicit instructions—don't trust that it'll fill in the blanks with the correct assumptions.
Jargon is power so brush up on that art theory
One word can make a world of difference in the results you get from any given prompt.
Luckily, AI is fluent in jargon so there are plenty of keywords you can use to efficiently tell it what you want.
If there's one thing that will make you a better prompt writer, it's spending some time brushing up on your theory, from photography to illustration, and working those keywords into your prompt.
For anyone who slept through art class, the Descript blog has you covered:
- Lighting techniques ever video creator should know
- Types of shots pieced together: Unfolding the story before your eyes
- What is color grading? Learn the importance of styling footage
Let's use a generic prompt like "generate a raccoon" as an example to look at how adding just one keyword across different parameters can dramatically change the result you get.
There are obviously a lot more keywords you can explore across parameters like:
- Depth of field to control the range of distance that's in focus in a photo (e.g. shallow, deep, bokeh).
- Mood to control the impression and emotion you want to exude (e.g. melancholy, mysterious, serene, energetic).
- Lighting to control the intensity and type of light in your photo (e.g. soft, hard, or natural).
- Composition to control the layout and placement of specific elements (e.g. foreground, background, balanced, asymmetrical).
- Setting to control the time of day, weather, or location of an image (e.g. sunset, thunderstorm, December in Canada).
- References to artists, characters, or brands as short-hand for a distinct style (e.g. Van Gogh, Garfield, Pixar).
Use AI for the first draft, but polish the rest in post
More often than not, you'll generate an image that's so close to good enough—if only it didn't have that alien-language text, warped object, or really weird guy in the back.
You can't really get an AI image generator to target and fix these imperfections for you because it'll just go and generate a brand new image.
What you can do is apply some good old-fashioned editing:
- Covering up any text and replacing it with your own
- Cropping out weird-looking people or objects
- Removing the background and dropping your AI-generated subject into a different one
You can do some of these edits in Descript, or use a photo editing tool like Photoshop or Canva, where you can use generative AI to fill, replace, or remove specific areas of an image.
Get ChatGPT to help you write your prompt
Now that ChatGPT can "read" images, you can actually use it to help you write AI image prompts by uploading an existing image you want to use as a reference.
I uploaded an actual photo I took on my trip to Italy and split my own instructions into two parts so I could edit its AI image prompt if I wanted:
Describe this image and then write an AI art prompt that generates a similar image.
Here's the prompt it spat out:
"Create a serene lakeside scene during early morning with soft, golden sunlight. In the foreground, depict a graceful white swan gliding on calm water with gentle ripples. Ensure the sunlight reflects off the water's surface, creating a shimmering effect. In the background, include several anchored sailboats with tall masts. The far shore should have a line of trees and buildings, with misty mountains visible beyond, adding depth and tranquility to the scene. The overall atmosphere should be peaceful and idyllic."
I gave that prompt as is to Underlord in Descript and this was the result.
Make your prompt your canvas
Even as AI image generation improves over time (a safe bet given it's only a few years old), one thing probably won't change: It's really hard to describe a picture in words.
AI still relies heavily on human input, and despite all the talk about AI taking our jobs, it's only at its best when it's in creative hands.