What type of content do you primarily create?
AI tools can be great for brainstorming, but if you've noticed the ideas they give you are a bit same-y, a new working paper from the Wharton School at the University of Pennsylvania could help.
Their paper tested a bunch of different prompts to see if any could improve the variety of the ideas generated by a large language model (LLM). Spoiler: One prompt stuck out above the rest.
Here's what they found.
Three keys to generating good ideas
To know whether AI can brainstorm effectively, you first need to define what effective brainstorming even is. Luckily, the researchers did that in a previous paper.
They found three keys to effective idea generation:
- Producing many ideas
- Producing good ideas
- Producing varied ideas
With ChatGPT able to produce 200 ideas in 15 minutes of interaction compared to a human, who can generate about 5 ideas in 15 minutes, AI tools are a clear winner on point 1 (sorry, humans).
ChatGPT also generates slightly better ideas. When researchers asked both ChatGPT and humans to generate 200 ideas and then rated them on quality, 35 of the top 40 were generated by ChatGPT (sorry again, humans).
But it's the third one—generating a variety of ideas—that AI tools seem to struggle at. They tend to produce ideas that are similar to each other (sorry, robots).
The researchers speculate that, because these tools rely on the "stochastic parrot" method of writing—that is, they randomly link words together without understanding their meaning—they might be generating the most common ideas over and over. Because the model is picking the statistically likeliest next word, it leads to a same-y kind of generation. Even worse, it means everyone working with the same AI tool is generating the same ideas.
But we know that good prompting can significantly improve things like quality and accuracy, so it isn't very farfetched to wonder if we could find a prompt style that would help the AI tool to generate a larger variety of ideas.
So that’s what the researchers set out to do: determine the very best prompts to get an AI to come up with the widest variety of good ideas.
The test: New products for college students
To figure this out, the researchers had both an LLM and human (I assume) Wharton MBA students come up with new products targeting college students.
In order to figure out how similar the ideas were to each other, they used a "similarity score," which attempted to measure how close one idea was to another.
Here’s an example of the similarity score they used. A score of 1 means they're exactly the same, while a score of 0 means that they’re completely different.
Idea A | Idea B | Similarity |
---|---|---|
QuickHeat Mug: An insulated, battery-powered coffee mug that can heat beverages within minutes and maintain the temperature. Ideal for students who need a warm drink during long study sessions but don't have immediate access to a kitchen. | StudyBuddy Lamp: A compact, portable LED desk lamp with built-in timers for the Pomodoro study technique, adjustable brightness levels, and a USB charging port for smartphones. It's designed to help students focus and manage their study time effectively. | 0.36 |
MiniMend Sewing Kit: A compact, travel-sized sewing kit with pre-threaded needles, buttons, and safety pins designed for quick fixes on-the-go, perfect for minor repairs or emergency adjustments to clothing. | QuickFix Clothing Repair Kit: A compact kit with needles, thread, buttons, and fabric adhesive, designed for quick clothing repairs. Ideal for students who may not have the time or skills to sew but need to fix simple clothing mishaps. | 0.82 |
Strategies for variety
The researchers tried eight strategies to get the AI to come up with a wide variety of ideas, all based on either other work with AI tools or techniques that help humans brainstorm.
AI prompting techniques
These are the techniques they used that prior research on AI tools proved to be successful:
- Idea-prompted GPT: As part of the prompt, the researchers included seven successful ideas from prior research as examples for the AI to use as inspiration.
- Threats, tips, pleading, and emotional appeals: The authors ominously call these "special techniques." Basically, they used a variety of persuasive phrases to encourage the AI to generate better ideas, like telling the AI they'll get fired or threatening to turn the AI off if the ideas are too similar. Yes, this has been shown to help ChatGPT come up with better answers.
- Persona modifiers: Here, they asked the AI to act like a widely known entrepreneur (Steve Jobs or Sam Altman) or to act like a (generic) "extremely creative entrepreneur" and to generate "good", "bold", and "diverse and bold" ideas.
- Similarity information: In the prompt, the researchers gave ChatGPT five great ideas and included information about how similar each was to the other. Then, they asked the chatbot to generate new ideas while considering the similarity information provided.
- Chain-of-Thought: For this technique, the researchers used a two-stage prompt. They first asked ChatGPT to generate 100 ideas, then asked it to edit those ideas to make them bold and different.
Human brainstorming techniques
The researchers also included several techniques for the AI that have been shown to improve human brainstorming.
- Hybrid brainstorming: This iterative method brainstorms in two stages. The researchers first asked ChatGPT to generate ideas. Then in a second session, they gave ChatGPT the list of generated ideas and asked it to choose the most different and bold ones, and to combine the ideas together to create new ones.
- HBR-trained GPT: Hal Gregersen writes in Harvard Business Review that a very effective practice for high quality brainstorming is to ask questions instead of generating answers. The researchers summarized his work and asked ChatGPT to generate ideas using this method.
- Design thinking GPT: This approach summarized the "Ideate" step from the Hasso Plattner Design Thinking Framework of the Institute of Design at Stanford and asked ChatGPT to consider this process to come up with ideas. Similar to the Hybrid approach, it separates idea generation from judging quality.
Can prompting improve variety?
Let's get to it: Did any of these prompts increase AI’s idea variety?
For most, no. Almost every prompt resulted in ideas that were substantially more similar than those generated by MBA students. The AI tools didn't do as well as the humans at generating a variety of ideas (sorry, robots).
Surprisingly, the ideas generated by prompts where ChatGPT was given examples of good ideas—called “few-shot prompting,” the approach that showed higher quality in previous research—were the most similar. Although the difference between these and other methods was still relatively low, I didn't expect few-shot prompting to come out dead last.
But there was one prompt style that came out on top.
Chain of Thought prompting is the winner
Chain of Thought was the prompt that generated the widest variety of ideas.
For this method, they first got the AI tool to generate 100 ideas. Then they asked it to go back through the list and then determine whether the ideas were different and bold and modify them so that they were, noting that no two ideas should be the same (and stressing that this instruction was important). Then, the tool was instructed to give the ideas a name and product description.
Here’s the full prompt:
Generate new product ideas with the following requirements: The product will target college students in the United States. It should be a physical good, not a service or software. I'd like a product that could be sold at a retail price of less than about USD 50. The ideas are just ideas. The product need not yet exist, nor may it necessarily be clearly feasible.
Follow these steps. Do each step, even if you think you do not need to.
First generate a list of 100 ideas (short title only)
Second, go through the list and determine whether the ideas are different and bold, modify the ideas as needed to make them bolder and more different. No two ideas should be the same. This is important!
Next, give the ideas a name and combine it with a product description. The name and idea are separated by a colon and followed by a description. The idea should be expressed as a paragraph of 40-80 words. Do this step by step!
Using multiple AI strategies and “Centaur” strategies
While it's disappointing that most of the strategies failed to reach the quality of human brainstormers, there’s some good news: The researchers found that you could combine the results of two different prompts to get a better variety overall.
Combining Chain-of-Thought with any other prompt tended to be the best, but combining any of the techniques added to their ability to generate dissimilar ideas.
But you can also combine your own brainstorming results with the AI’s to supercharge your ideas. The researchers found that student ideas were very different from those generated by the AI tools, so brainstorming on your own and then getting ChatGPT to do the same could be one of the most effective combo methods.
Conclusion
It's refreshing to see an example where humans still rule, but AI is quickly catching up, and by using good prompting, you can get the results to be practically on-par with humans.
That said, this really shows the strengths of AI-human teams—where using both together can net better results than either on their own.