I Ran OmniVoice Again, Timed the Dub, and Hit Some Bizarre Errors

This time I dubbed an English video into Korean. I timed each step of a single dub, and even ran into robotic noise where a voice should have been.

Following up on the last post, I spent a bit more time with OmniVoice. This time I was curious about two things. One was how long it actually takes to dub a single video. The other was the opposite direction from last time: what happens if I turn an English video into Korean.

So I grabbed a short clip of a Trump speech (in English) and dubbed it into Korean. Here’s the original first:

Original: the English clip I wanted to dub into Korean

And here’s the result after OmniVoice dubbed it into Korean. It cloned the original voice and had it speak Korean:

OmniVoice: dubbed English → Korean, with the original voice cloned

How long does a single dub take

To take one 22-second clip all the way from transcription to translation to voice synthesis to export took about 3 minutes total. All of it ran on my MacBook, with no internet. Broken down by step, it goes like this:

Prep (pulling the audio out of the video and splitting voice from background): about 7 seconds
Transcription (turning the speech into text): about 29 seconds
Translation (English to Korean): about 90 seconds
Building the voice profile (registering the original voice): about 5 seconds
Voice synthesis + cloning: about 49 seconds
Export (merging it back into the video): about 2 seconds

One fun detail: the first synthesis run takes longer, but running it again cuts the time roughly in half. That’s because the time to load the AI model into memory for the first time is only counted on that first run.

Which model ran at each step

The dub is split into steps, and a different model handles each one:

Splitting voice and background: Demucs
Transcription: WhisperX
Word timing: wav2vec2
Speaker separation (telling apart who’s talking): WavLM
Translation: gemma2:27b (better quality than the built-in translator)
Voice synthesis + cloning: OmniVoice

It wasn’t all smooth

Two things tripped me up along the way.

One, sometimes where a voice should have been, I got a crushed, staticky noise instead of a human voice. This time I had it build Korean from an English voice sample, and that voice trying to imitate Korean, a language it had never spoken, sometimes came out broken. So I switched to a setting that refines the synthesis over more passes, re-ran it, and the Trump video came out fine.

Two, when I ran the translation, one sentence came out completely different from the original, so I had to go in and fix it by hand.

So, the takeaway

Of all the open-source dubbing tools I’ve used, this one was about as easy to install as a single click. It also ran more smoothly than any of the others I’ve tried, which I appreciated. That said, the output quality isn’t at a level I’m happy with yet.

Has anyone else here used OmniVoice? I’d love to hear what kind of videos you tried and how the quality turned out. I ran it on a Mac, so I’m also curious to hear from people who’ve used it on other setups.

I Ran OmniVoice Again, Timed the Dub, and Hit Some Bizarre Errors

How long does a single dub take

Which model ran at each step

It wasn’t all smooth

So, the takeaway

What I liked

What I didn't

Rating

Comments (0)

I Ran OmniVoice Again, Timed the Dub, and Hit Some Bizarre Errors

How long does a single dub take

Which model ran at each step

It wasn’t all smooth

So, the takeaway

What I liked

What I didn't

Rating

Get the weekly AI dubbing digest

Comments (0)