What type of content do you primarily create?
The Cloudinary Customer Education team’s job is to make their customers understand how to use Cloudinary’s image- and video-management software. Video is their primary tool. Here's how they use Descript to make video that looks and sounds great — without all the hassle.
AI transcription so training videos hit home
Cloudinary’s challenge
Neither human nor AI transcription services were able to accurately transcribe Cloudinary’s training videos, which were loaded with technical terms, proprietary lingo, and acronyms (and frequently taught by non-native English speakers). Correcting the transcripts took hours, and blocked the Cloudinary team from doing what they wanted to be doing: making more, better videos to help their customers.
How Descript helped
Descript automatically transcribes video in seconds, with industry-leading accuracy. The Cloudinary team uses Descript’s transcription glossary feature to teach the AI those especially tricky terms and acronyms, so when it hears “JSON file,” Descript transcribes it that way, not as “Jason file.” Descript’s AI transcription also learns and gets better the more you use it. So after a few months, the Cloudinary team noticed they no longer had to correct one of their most common, important, and mistranscribed words: “Cloudinary.”
Of course, no transcription — human or AI — is or ever will be perfect; Descript makes the inevitable manual corrections easier too. The Cloudinary team can invite multiple editors to every video project, so instructors and subject-matter experts can quickly scrub inaccuracies using Descript’s effortless transcript-correction tool. All the changes happen in the cloud, in real time, so editors can work simultaneously, making corrections and leaving comments as they go.
Don’t underestimate the transcript
“Video is easy to mis-hear, especially when it’s a highly technical topic, but if your customers can hear it and read it on screen, they’re much more likely to retain the information.” — Sam Brace, Senior Director of Customer Education and Community
The results
Since introducing Descript into its video-editing workflow, the Cloudinary team has drastically reduced the time it takes to produce video — from 13 hours for a single episode of its video podcast to 4 hours, a 70% time savings. That’s enabled them to make more video —going from one podcast episode every few months to two episodes a month — with the same team, at the same cost. Same goes for the other videos the team makes.
There’s another, harder-to-quantify value as well: Descript has replaced the miserable drudgery of post-production with a process that’s actually enjoyable.
Avoid post-production hell
“Without the right tools, it will take you forever to get your videos the way you need them to be. I’d tell anybody new to video for customer education, or learning and development: without Descript, you’re going to find yourself stuck in an editing abyss.”
– Sam Brace, Senior Director of Customer Education and Community
Come for the transcription, stay for the AI magic
The Cloudinary team came to Descript for better transcription. Then they discovered magical features they never expected, or ever dared to wish for — and transformed their workflow.
Filler word removal – “ums” and “uhs,” gone in a few clicks
Many of Cloudinary’s instructional videos are recorded as livestreams. Because they’re humans, the instructors use a lot of “ums,” “uhs,” “likes,” “you knows,” and other filler words. Most of those were okay for the video — the team wanted to keep it similar to the live version. But removing them from the transcript during editing was a tedious, time-consuming process for the production team.
Until Descript came along. Descript’s automatic filler-word detection and removal enabled Cloudinary’s editors to remove every unwanted filler word from the transcript in a few clicks. Same for any excessive filler words in the video. Descript also adds room tone automatically to smooth the cuts, and the option to restore them quickly where removal feels unnatural.
Regenerative AI for studio-quality audio wherever the experts are
You can send quality microphones to your subject matter experts, but you can’t be sure they’ll set them up right, or that the room they record in won’t be full of reverb, or that their neighbor’s leaf blower won’t start up midway through recording. But clean, clear audio is critical to helping customers comprehend the instruction they’re getting. So fixing those kind of audio challenges used to be a big problem for Cloudinary’s team.
Descript’s Studio Sound solved it almost instantly. It uses AI voice re-generation to strip out background noise, reverb and other stuff you don’t want, then re-construct the voice audio so it sounds like it was recorded in a studio. Studio Sound works so well that Sam Brace no longer uses an external mic to record the Cloudinary podcast — he gets better sound by using his laptop mic, then applying Studio Sound in post-production.
AI voice re-generation — for audio that’s divine
“We run Studio Sound over everything. It's absolutely a must-use feature in our videos.” — Senior Director of Customer Education and Community