What type of content do you primarily create?
Shawn Cole is the Head of Audio Design and Production at Pacific Content, where he leads a team of sound designers who help brands tell stories with audio. Pacific Content's work has received numerous awards from the Webbies, the Ambies, AdWeek, and most recently TribecaX. Shawn likes long walks on the beach and playing with robots.
These are dark times. Nary a week seems to pass without headlines about layoffs in media and tech.
Everyone is being asked to produce more with less. Less time, less money, less people.
It might feel like a new issue, but really it isn’t. Over the decades, those of us who work with audio and media have seen old techniques and methods evolve through new technology. And while it’s often a boon to efficiency, capitalism notices when we can do things faster with less human input and error. Necessity is the mother of invention, but who’s to say if necessity comes from a desire to do more in less time or the edict from above to do more with less.
We don’t record our mix as a live “performance” to a stereo recording device anymore, we just bounce it by pressing enter. We don’t actually cut tape anymore, we move colored waveforms (and now just words!) around our screen. Recording artists and composers produce world-class music in their bedrooms on a laptop instead of in a fancy recording studio with live musicians. The “projectionist” at a movie theater is the kid who presses play and then makes more popcorn, instead of a licensed, trained, well-paid professional.
The thing is, whether through necessity or by edict, the urge to find a better way to do things is still a very human impulse. At least that's how I feel about it.
So when I reached out to Descript’s handsome and affable head of business development Jay LeBoeuf a few years ago to talk about their fancy new tool, my excitement for future efficiencies helped mask my skepticism about the technology. It wasn’t fear, I just couldn’t imagine they’d be able to pull off what they were trying.
Fear is what I heard from my radio colleague in production when I told them about Descript’s Studio Sound function.
“I’ve got 20 audio producers who don’t want to lose their jobs to an automatic tool.”
Fear is what I heard from collaborators with acutely trained ears after listening to my first cloned “overdub” voice.
“Our clients will never go for it, it sounds too robotic, they’ll hate it”
Fear is what I heard from those I encouraged to lean on robo-transcription instead of the pricey human led variety.
“The robo-transcripts have too many errors, and there are too many voices, it’ll screw up the script”
Don’t get me wrong, I totally understand where the fear comes from. I’m just an incredibly annoying relentless optimist. Instead of letting fear seep into my psyche, I filled that space with excitement for all the things we could be doing if this tool actually worked.
And man, does it ever work.
Studio Sound uses AI to transform a crappy-sounding recording into what we imagine a studio recording should sound like. It’s not just noise reduction, or reverb reduction, or click removal, or beefy EQ, or enhancement. It’s all of those things.
I even took a terribly recorded phone call and used my best tools to improve the recording the old fashioned way. Each fix as a process with an industry leading tool, adjusted by hand (and ear). I was pretty proud of how great I got it to sound. Then I loaded that same crappy recording into Descript and pressed one button. The robot had me beat.
It doesn’t always beat me, and I still have to adjust the intensity slider to make it sound less broadcast-y and more podcast-y, but still. I showed it to my radio colleague and their fear disappeared. Like me, they knew that this would mean more time for their team to be creative, and less time in the weeds of restoration.
It also lets regular people make better recordings. Say Bob’s Used Car Sales & BBQ makes radio ads. Bob fancies himself as his best sales person. Bob records the voiceover for his ads at his desk, on his phone. Bob’s ads have never sounded very good. Now Bob's recording goes into the magic robot studio sound machine and Bob sounds like he left the used car lot/fire pit and recorded in the studio. Now the radio producers can focus on the wacky sound effects Bob has requested, instead of hand-applying lipstick on his boar-ing recording.
The Overdub feature started as a time saver, by using a voice clone for a rough draft script. Now, in some cases, we use the AI version of the hosts’ voice for the first version the client hears. Or even in the final product if we had to replace a word or two and the host is unavailable — saving the host a ton of time and hassle.
The AI transcription accuracy seems to never stop improving. It discerns between multiple voices, labels them correctly, and only seems to stumble with heavier accents or technobabble. Fair enough, sweet robot, those are tricky for humans too.
And now I find out that Descript doesn’t even need a green screen! AI can just magically transform what looked like a hostage video of the host recording in their closet into something that we can confidently post on social.
It’s hard to stay optimistic about AI tools when so many of our colleagues find themselves being laid off in droves. It might seem like we’re doomed to be replaced. I’d rather imagine a future where tools like Descript help us shred the repetitive and time-sucking parts of our job and let us focus on what we do best: creating beautifully compelling audio stories in collaboration with other brilliant humans, while the robot runs around vacuuming the floor.