« Return to video

The 3Play Way: AI Dubbing [TRANSCRIPT]

SOFIA LEIVA: Thank you for joining us today. My name is Sofia, and I’m on the marketing team at 3Play. I’m a Latina woman with black hair, and I’m wearing a black blazer. Today’s session, “The 3Play Way AI Dubbing,” will be presented by our incredible product marketing manager, Jesse Ariss. Thank you for joining us today. And I’ll pass it off to Jesse.

JESSE ARISS: Thank you. Thank you so much, Sofia. Those are really kind words. My name is Jesse and I’m a product marketing manager at 3Play. I am a white male with, I don’t want to say balding, let’s call it a shaved head. And I’m in my early 40s.

Product marketing manager is such an amazing role, especially here at 3Play. I think the job is all about making those connections, figuring out how we can take all this cool technology that we’re seeing built every day, and not just make it useful, but turn it into a game changer for businesses like yours who are looking to get their content into new markets or improve comprehension and accessibility.

So I get to work every day with this really cool tech and see how it’s changing the world. And that’s exactly what we’re going to talk about, show you how it can do some heavy lifting for your communication strategy. This is going to be fun. I think it’s going to be fun. It will be fun. That’s a Jesse guarantee.

Now before we jump into AI dubbing, I also want to tell you a little bit about 3Play Media. You may have heard of 3Play Media, you may not have heard of 3Play Media. We’ve been around for well over 10 years in the accessibility and localization world. And our mission is simple yet powerful. It’s right there– making media accessible and understandable to everyone everywhere. That’s it. That’s it. It’s that simple.

But it’s so powerful because for the last 12, 13 years, we’ve been at the forefront of this. And again, in today’s world, which is obviously increasingly globalized, the importance of this multilingual communication cannot be overstated. We’re going to learn today how you can break down language barriers, but also foster inclusivity, and, for those folks who care, really boost engagement on your video content.

So, let’s take a look at where we are today. Let’s make the bed, set the table, whatever metaphor you want to use. Traditionally, and you may have experience in this, but dubbing has typically meant a full human process. So that’s transcribing, that’s translation, that’s a voice actor performing the dialogue.

And this is time-consuming. You need casting. You need recording. You need synchronization. You need to make sure that what’s being said is matching up with what’s being shown on the screen. Now, we’ve also started to see the emergence of pure AI dubbing, which uses artificial intelligence to do a lot of that, automatically translate and generate that dialogue with a synthetic voice.

This is a really great innovation, and it’s been around for about a year now. And it has really sped up the dubbing process. It’s cutting down on costs. But it does have its challenges. I personally don’t like to leave a robot unchecked. They’ve been known to make a few mistakes. You ever had a Siri mix up? What word is this?

If the machine is reading that, L-E-A-D, does it mean lead, like what my boss does, or lead like the metal I’m not supposed to be consuming, right? It can be confusing. And we think there’s a better way. We think there’s a way to take the speed and the speed savings and the cost savings of AI and combine it with the accuracy of traditional methods. So let’s just talk about that for a second.

This is how I see the current landscape. This is AI dubbing on one slide. And it’s a Wild West right now. The landscape is changing so quickly. It’s changing month by month. And it can really be divided into what I would consider three main categories.

So on the left here, you have basically the model builders. This is the tech behind it. This is the foundational technology, like the speech recognition or the voice synthesis. These are best suited for companies who have experience and are building their own solutions and have six- or seven-person teams who can work with this.

Now if you move further down, we have those turnkey AI applications, which I was speaking to. These combine those technologies that we’re seeing in the building blocks, and they’re using it to create fast and affordable dubbing. It’s excellent, to a point. Because there are things that that lack. It lacks quality control. So I would recommend a tool like that for your low stakes, short-term content like your Instagram Reels, for example.

Finally, on the right, we have full end-to-end service providers, like us, 3Play Media. That’s where we’re integrating AI with human expertise to give you that high-quality, reliable dubbing. And that’s for your long shelf life content, so your e-learning or your corporate training, which you know is going to have a lot of eyes on it.

We think that AI dubbing is the perfect middle ground. So taking that AI technology to do the heavy lifting, you get the speed, you get the scalability. But then we infuse human expertise into the process to ensure accuracy, cultural relevance, and quality. That’s us.

Our secret sauce is this hybrid model– AI driven with human assistance. So that’s going to give us high-quality dubbed content that’s got the best of both worlds. It’s affordable and timely. You get that AI efficiency without sacrificing the nuances that only humans can provide. This industry is still finding its footing. And we offer a balanced, reliable solution that stands out from the rest.

I want to just take you through what the process looks like, and then I’ll show you a few examples of it in action. So let’s go through this step by step. So it starts with getting your video. So you input your video. You feed your original video into our system.

Now, you can do that however you want. A lot of platforms, you have to log into their platform, create a user. You have to upload it using their platform. We’re not like that. You can do that if you want, but we’re willing to go the extra mile and work how you work.

So if you’re using a video platform already like Brightcove, for example, we can do all this within Brightcove. If you’re more old school and prefer the FTP or the SFTP, yeah, we can do that too. Doesn’t matter. We are going to work to your standards, not the other way around.

OK, so you’ve got the video in our system, however you’ve done that. Now the AI is going to analyze things like speech patterns, cadence, audio elements. This analysis is really crucial for creating a dubbed version that mirrors the original intent and style. Once we’ve got that, we translate your script into the target language using our natural language processing algorithms.

Then we create the voice, and we can make that voice match the original speaker’s characteristics so it sounds just like that original speaker. Or you can choose a different voice. You can choose your own brand voice so that you have the same voice throughout all your content to maintain consistency and never have to worry about hiring or finding that person again, if that’s the way you want to do it.

And then finally, and this is really, the magic, this is where our team of professional linguists and cultural experts review the translated content. So they’re going to check for linguistic accuracy. They’re going to check for idioms. They’re going to check for cultural nuances. I’ll give you an example.

We had a video where someone was talking. I think it was an educational video. And they were saying, it’s raining cats and dogs. Well, that doesn’t actually translate because there is no saying “it’s raining cats and dogs” in French or Spanish. In French, they might say it’s raining ropes. In Spanish, they might say it’s raining jugs or it’s raining pitchers.

So if we were to do a direct translation just using AI, yes, it may work, but it’s not going to make sense. And that’s the key. You need to make sure that the content you’re creating and dubbing is very accurate to those nuances, to those idioms, to those custom expressions. And that’s where our local native linguists and experts really come into play. And at the end of the day, you get a high-quality, culturally sensitive dubbed video that’s ready to engage audiences anywhere you want.

So where does that put us? Well, we don’t need actors. We don’t need studios. We don’t need a production process. So right off the bat, we’re going to cut 50% off the cost of traditional AI dubbing. And we’re still going to be able to do it in a way that’s high quality.

And then I’m also hearing from a lot of people that I’m talking to that time is a critical factor. So it could be an upcoming marketing campaign. It could be training that you need to adapt to a new market. All this stuff, speed matters. So we’re talking about turning around content in days as opposed to weeks.

And then finally, I’ve said this a few times, but quality, right? Your content’s integrity is essential. So our AI ensures consistency in things like terminology. Maybe you work for a large retailer and you have a custom language for your employees. Don’t worry. We’ll make sure that we’ve got that, that we’ve got it correct, and that we’re saying it and rendering it correctly. This human review maintains all that intent, all that tone, so that your message can still remain impactful. So yeah, we combine all these things to create a solution that’s efficient and affordable, but meets the very high standards that you’re demanding.

But I want to actually show you what this looks like in action. So what I’m going to do is I’m going to push play. This is a training video for miners. And I’ll just clarify if the AI is listening. That’s miners, not minors– another perfect example. This is miners with the pick, like “Minecraft.” And this video was created to give them an overview of their rights and responsibilities.

It was created for a Spanish audience. There’s a large population of native Spanish speakers in the industry. So let’s take a look. This is the original video. This is in Spanish.

Usually, we’re taking English content and localizing it to other languages. But since this is primarily an English-speaking audience, I’m going to try it differently. So we’re going to start with the Spanish version. And then I’ll show you the experience that your viewers would have because you’ll be experiencing it in English. OK. Enough talking. Let’s press Play.

[VIDEO PLAYBACK]

– [SPEAKING SPANISH]

JESSE ARISS: Don’t worry, we’re not going to watch the whole–

– [SPEAKING SPANISH]

[END PLAYBACK]

JESSE ARISS: All right. So let’s take a look now at what this would look like in English. So this is fully synthesized.

[VIDEO PLAYBACK]

– Mining and quarrying is an important job with large machines and impressive equipment. It has always been hard work and can be dangerous. November 20, 1968– the disaster at Farmington–

JESSE ARISS: What you’re listening to is a synthetic voice meant to match the voice of the speaker. I think that’s really cool.

[END PLAYBACK]

Perfect. So that’s really cool. We took a Spanish video, we ran it through this process, and it created an English version of that video. Sounded just like the original speaker. The timing was all lined up. All the cultural nuances were there. It’s really, really exciting.

And just imagine how that can benefit all that content you have. Whether it’s that existing training content or older e-learning modules or whatever it is, you can now very quickly get that into new markets, which obviously, as you likely know, is very important and powerful.

Some of those markets, let’s talk about that actually for a second here, so there’s lots of areas where it can make a huge impact. We’re seeing multinational companies where employees speak many different languages. Maybe they’re opening up a new branch in a new region. They need to localize those training materials, those internal communications that are confidential, making sure that everyone is on the same page. So, that safety video that you saw can significantly improve compliance, and, as a matter of fact, have an impact on the bottom line, which in this case is reducing workplace incidents.

If you’re a university, a college, or you’re responsible for e-learning, or you have an online learning platform, offering courses in multiple languages opens new doors to a global base. Students, we know this, are more likely to engage with and complete courses presented in their native language. And we know and we’ve seen that it improves the overall satisfaction and success that they have.

Marketing– I talked about marketing a little bit. I’m a marketer through and through. And marketing is all about making connections. So by tailoring your campaigns to local cultures, now you’re increasing engagement. You’re increasing conversion rates. Hello, marketers. If you’re a marketer, that’s an important word for you, right?

Imagine a commercial or an ad in another market that speaks the local language and incorporates with the cultural references. And you don’t really have to do anything other than just upload it to 3Play, wait a few days, and you’ve got it. It’s really, really powerful stuff we’re talking about here.

I’ve also seen subscription video on-demand services use this. So there are lots of providers out there who have YouTube-like channels and YouTube-like content. This is a great way, again, to enhance the satisfaction of your viewers, but also expand your subscriber base. So in all these applications, we’re offering a scalable, efficient way to reach new markets.

And it’s cliche, but don’t take my word for it, right? We recently helped a nonprofit organization called Arc of Aurora. Now they’re dedicated to supporting families with developmental disabilities. And they provide things resources, advocacy, education to a diverse community. But they’re a nonprofit. They have a limited budget.

And there are folks who have developmental disabilities in their community who don’t speak English. So Arc needed to communicate a lot of important information to a huge multilingual audience, but didn’t have the money, they didn’t have the time, they didn’t have the expertise, they didn’t have someone on their staff. And they had tight deadlines.

So we worked with them. We worked with them to implement AI dubbing for their informational videos. Now they were able to see a significant rise in understanding and comprehension and engagement with their organization from non-English speakers.

They received several emails praising this. And they received recognition for their efforts to be more inclusive, which really aligned nicely with their mission, which was to advance their mission of support and advocacy. They were now able to do this for a broader audience.

So, I want to cover quickly one more thing, because I think it’s important that you take something away here. Regardless of whether you work with us or work with someone else, these are the questions that you need to be asking your vendor. These are important questions to ask.

Write these down. Or better yet, you can actually scan this QR code. And we’ve created a full free PDF. There is no gate. You scan that. It’s going to go right to the PDF. We’re not going to make you fill a bunch of stuff out. Just scan that. You’ve got it. It’s your guide. These are the– I’ll come back to it in one second, but these are the questions that you’re going to need to ask.

Make sure you’re asking about the quality. How are you handling things like what we talked about with lead versus lead? How are you handling things like “it’s raining cats versus dogs” on the cultural sensitivity? What’s your cost? If you’re hearing, oh, it’s about $1 a minute, that to me is a red flag that you might have some concerns or questions, more questions to ask around quality and accuracy.

Find out what the technology is. Ask about the reputation. Ask about their customer support. Are they just going to sell it to you and then you’re on your own or are they going to hold your hand and help you through this to make sure that you see success?

And also security– what are they doing with your data? Are they training their models with your data? For $1 a minute, you might want to ask yourself that question.

Here’s the QR code again. So, by partnering with 3Play, I think there’s a lot we can do here with delivering tons of value, helping you reach those goals of connecting with your global audiences in a way that’s fast, affordable, kind of cool, right, and definitely scalable. So I’m going to stop there for a minute. And I’m just going to see if folks have any questions for me.

SOFIA LEIVA: Thanks, Jesse. Yes, we’ve had several questions come in, and I encourage you all to continue putting your questions in the chat window. The first question we had here is, how do you handle songs and lyrics?

JESSE ARISS: Oh, songs, and lyrics, that’s a really great question. So this comes up obviously a lot. You’d be surprised. In corporate training videos, there’s a lot of background audio. So we do that by our technology. Our IP basically pulls out the two tracks. So you’ve got a speaker track and you’ve got an audio track.

So if you have a song, if you have music, if you have corporate, like, what would you call that? You know those cheesy music that goes in the back of corporate videos? You can have that. You can keep your cheesy music, no problem. So we pull that out. We do all the work on the language, on the speaking track, and then we put it back in over the audio so you can have it mixed back together.

Now, if– great question– if the song has lyrics in it, we’re going to leave those lyrics alone. We’re just going to focus on what’s being spoken. We’re not going to translate Justin Bieber or whatever. We’re just going to keep him doing his best. And we’re only going to focus on what’s being spoken.

If you wanted a different audio, that’s something you would have to work with your video team. That would be involved in the mastering process. And there’s things like rights as well that we don’t want to touch that too much. Thank you.

SOFIA LEIVA: Thank you. The next question we have is, how much control do I have over the voice style, tone, accent? Can I customize it to my brand voice?

JESSE ARISS: Yeah, absolutely. So if you have a voice already that you’re working with, a professional, or you have a voice, you can absolutely do that. If you don’t mean the literal voice, if you mean your brand nuances, we do that as well. If you want to customize that voice, let’s customize it.

Out of the gate, we’ll give you a couple of different voices to look at. But if you’re not happy with those, no problem. Let’s work together. Tell us what you’re looking for. Maybe provide a sample of what kind of voice you’re looking for and we’ll absolutely make it work.

We’re very, like I said, this is hands on. We’re going to work with you throughout the process to make sure that you have exactly the solution you’re looking for.

SOFIA LEIVA: Thank you. In the example that you shared, how long did it take to create that dubbed audio track?

JESSE ARISS: Yeah, great question. So that example was– we created that, I would say, back in the summer of this year. Now the technology has changed a lot since then. As I mentioned, it’s changing by the month. But I would say for a video like that, that’s about 13 minutes long, I would expect eight, nine business days to make sure that you’re getting it back with the absolute accuracy that you deserve.

So that timeline is getting shorter. And it’s significantly shorter than the 30 days or so required with human dubbing. But I admit, it’s a little bit longer than some of this instant AI stuff. But again, that’s because we’re making sure that it’s perfect the first time. And in that sense, you’re actually saving time because you’re not going back and forth so much and the quality is guaranteed.

SOFIA LEIVA: Thank you. The next question we have here is, which languages does your solution support?

JESSE ARISS: Yeah. Thank you. That’s obviously a very important question. And the list of languages that we support is vast. We are adding new languages all the time. But right now, our primary expertise, and I’m just checking my notes down here, but where we can deliver the absolute best experience for you is going to be in Spanish, French, German, Italian, Portuguese, Dutch.

And I’ve seen some great work we’ve been doing in Japanese as well. And Japanese, now that’s a very unique language with lots of nuances. And that’s been really fun to see what the team has been able to do there.

SOFIA LEIVA: I think we have time for a couple more questions. The next one we have here is, is the tech available for dubbing with the person speaking, i.e. moving their lips to match the different languages? Or is it only voiceover?

JESSE ARISS: Oh, that’s a good question. Yeah, so what we’re talking about here, just so everyone’s on the same page, is actually making the lips move to match what’s being said. And no, we’re not doing that today. It’s absolutely– the whole industry is looking at that right now. We’re exploring it.

But you have to imagine, if you’re working with movie studios or big corporations, to do something like that, you’re actually looking at mastering their original video file. So there are some nuances around that.

I will say, though, today what we are doing is we are following the dubbing best practices that have been in place for dozens and dozens of years with human voice actors, which is ensuring that what’s being said matches frame perfect with the head that’s being shown on screen, as well as trying our best to make the words match the lips.

Again, another example in that human process, you may have a German word that is 15 syllables, but in English, it’s two syllables. What do we do in those situations? Well, that’s where that expert linguist comes in. And we try to find another way to say that to get across the same intent while still keeping everything as close to the original intent as possible.

And that’s something that you’re not going to be able to see with that pure AI solution. It just can’t happen. So we’re pretty proud of that. Thank you.

SOFIA LEIVA: Yes, definitely. And the last question that we’ll get to today is an interesting use case. I have an instructor with heavy accents. Can you do an English-to-English dubbing?

JESSE ARISS: Can we do– you know what, I want to try it. That’s a really cool use case. I’m going to check who asked that because I think that’s fascinating. And remember, it’s not about trying to embarrass anyone or anything like that. It’s just about making sure that the content can be understood. That’s it.

That’s why we’re all here. We’re making this great content. We want it to be understood. So if you have an issue where the content isn’t being able to be understood, whether it’s a different language or whether it’s an accent, that’s something we can help with. So I definitely think I would like to reach out to that person and chat with them on some of the options available there. Because product team might not like it, but I think we can help you out.

SOFIA LEIVA: Definitely. Well, that’s all the time we have for today. But if you have any additional questions, you can always reach out to us or connect with us via LinkedIn. And thank you again, Jesse, for chatting with us today. And thank you, everyone, for joining us, and I hope you have a wonderful rest of your day.