« Return to video

The 3Play Way: AI Audio Description for Higher Education [TRANSCRIPT]

ELISA LEWIS: So thank you, everyone, for joining today’s session, The 3Play Way– AI, or Artificial Intelligence, Audio Description for Higher Education. My name is Elisa Lewis. I’m on the marketing team here at 3Play Media, and I’ll be moderating today’s webinar. I’m a female in my 30s. I have fair skin and dark hair, and I’m wearing a purple sweater today.

I will pass it off to Stephanie Titlebaum and Daria Ferdine, who have prepared a great presentation for you this afternoon, and I will let them introduce themselves.

STEPHANIE TITLEBAUM: Thank you, Elisa. Hi, everyone. My name is Stephanie, and I am the Director of Product Management for our captioning and audio description accessibility solutions. I use she/her pronouns, and I’m a white woman with long brown hair, and today I’m wearing a navy blue sweater. Daria?

DARIA FERDINE: Awesome. Good afternoon, everyone. My name is Daria Ferdine. I am a Product Marketing Manager here at 3Play Media. I also use she/her pronouns. I’m a white woman with light brown hair, and today I’m wearing a gray sweater. Excited to chat through. Thanks so much for driving today, Elisa. And thank you to Patty, our ASL interpreter, as well. Really excited to be here and have this conversation.

The direct result of a few hundred customer, at this point, conversations with our higher ed partners in the past year around ADA Title II initiatives and strategies has brought us to today’s topic, where we will be discussing one of the more traditionally intimidating aspects of scaling accessibility, we would say, audio description.

To level set, throughout this presentation, we will be using the terms AD and audio description interchangeably, so as we talk through audio description as an umbrella, understanding its role in video accessibility and how it works. Stephanie is going to talk through the AI AD approach here at 3Play, the process that goes into that creation, best practices for publishing today.

She’s also got a few demo examples up her sleeve that I know we’re all really excited to see, specifically for what is working well with educational content. And then we’re going to talk through the impact that this technology has on higher education specifically, with a really deep emphasis on what to consider when scaling accessibility. So let’s dive in. Let’s do it. Next slide here, please. Amazing.

Audio description– what is it? Also referred to as description or video description or AD, is defined as the verbal depiction of key visual elements and media and live productions. So for individuals who are blind or have low vision, audio description is definitely the key to revealing that detailed information that sighted people consume without a thought.

AD helps provide information on visual context that is considered essential to the comprehension of the program. A lot of times in these cases, not providing AD would hinder blind and low-vision participants from gaining a complete understanding of the program or content. Really important stuff is what we’re talking today.

A few key terms that we really wanted to familiarize yourself with as we talk through the AD paths today– “standard” versus “extended.” This is always a fun one. These considerations frequently create a dilemma for my audio describers. How can I provide accurate and high-quality descriptions in the appropriate amount of time on my file?

Really, there’s two main buckets that we want to always consider here, which is “standard”– “standard” is the term where AD fits description into natural pauses of source content, or “extended”– audio description allowing for more pauses on that source content as needed to make room for the descriptions.

In simple terms, if your video has a lot of space to describe, standard audio description will most likely meet your needs. But if your video doesn’t have a lot of natural space to add narration or snippets, extended is definitely the way to go. Next slide. Thank you.

Like everything in accessibility, quality matters. Quality is a huge priority for our customers, also a huge priority for us and our team here at 3Play as well. Our describers follow best practices and standards set by the Described and Captioned Media Program, also called the DCMP. The DCMP Description Key was set to ensure high-quality descriptions.

According to the DCMP Description Key, which is what is shown on the slide here today, audio description really should have five major benchmarks. I’m going to walk through the benchmarks here. So they should be accurate– no errors in pronunciation, word selections, diction, or enunciation.

They should be equal. Equal access requires that meaning and intention of the program must be conveyed. It also means that the describer should not interject their personal interpretations or opinions. We want to make sure things are appropriate. Keep the intended audience in mind. Be neutral, simple, succinct. We want to stay consistent. The content as well as the voicing should match the style, the tone, the pace of your program.

And prioritized. So making sure that content essential to the comprehension and the enjoyment is of the utmost importance with quality. The description should really portray only what is physically observable rather than motivations or intentions, and should complement the original content. So those are the five buckets we want to keep in mind when we’re thinking about audio description quality. Thank you to our friends at the DCMP. Awesome.

But when it comes to navigating audio description, determining how and where to publish definitely remains a large challenge in higher education. Restrictions within player capabilities is often the culprit here. Rest assured, there are existing options intended to best support the team’s current workflow.

But as the market develops broader support of audio description, a couple of options that we do have available today to think about. One, if your player does currently support audio tracks– woot woot– simply upload that audio described MP4 file to your hosted video platform. You could publish a secondary video with the audio track as an option.

You can also share an AD track. Download that MP4 file and share directly onto your site. Or you could access 3Play’s Access Player– double access, double the value there– which is our homegrown accessibility media player that works with your existing media player of choice.

STEPHANIE TITLEBAUM: Great. Thank you, Daria. Now that we know what audio description is, let’s talk about how we’re doing audio description at 3Play Media. So the diagram displayed on screen here shows the different types of audio description that 3Play offers in each step of that description solution.

The top row of the diagram shows our historical AD solutions offered as professional AD today. You may also be familiar with these as voice artist audio description or synthesized audio description. In both solutions, we offer human scripting and the option to use AI voices or voice artists for that voiceover step. Our new service is simply AI for both the scripting and voiceover step. Let’s talk more about that on the next slide.

Earlier this year, we released a solution in partnership with universities sharing the same common concerns, as Daria mentioned earlier. They were all looking to specifically tackle their large backlogs of existing video. Leveraging our patented AI technology, we saw this as an opportunity to help higher education teams do more for less. AI audio descriptions are allowing schools to successfully scale their accessibility efforts and execute on their strategies.

So what is AI AD? It’s exactly that. It’s scripted and voiced with AI tools to ensure we’re describing accurately, consistently, and appropriately, while also ensuring the right visual information on screen is prioritized in the limited time that the video allows. AI AD offers the same familiar order options to our professional AD Services with both standard and extended AD. Coming soon, we’ll be able to support an upgrade path if you want to add human review to the AI AD.

So how are we doing all of this? We’re constantly evolving which AI tools we’re using to analyze the full video through our patented process. We’re using content-based prompts that we– to apply best practices in combination with feedback from our team of expert describers and technology to produce a quality description.

And then, looping this all back, why are we doing it? Historically, AD is slow and expensive, and it creates challenges to tackle large volumes of content. AI AD makes AD accessible as a solution, and it fits in to allow that optionality with human review added on. If we want to go to the next slide– thank you.

So let’s dive into that AI AD process a little deeper. The diagram on screen displays the different steps your video would go through from upload to delivery back when AI AD is ordered. We’re going to start with the ingestion step. This happens when a customer orders the service when the video is uploaded to our platform. Next, we transcribe the video to create a transcript and captions. This step can be AI-only or AI with human editing.

We’re then going to use the timecodes produced to identify where we can input the description in the video. This is critical to ensure the description does not overlap with the dialogue in the video, and this is especially important in standard AD, when we’re limited to those natural pauses in that dialogue.

The following steps get into the specific AD service. We’ll use our patented AI tooling, the video uploaded, and transcript produced to generate the description script, and then run that through our AI voice solution before finalizing the audio mix to ensure the AI AD track fits seamlessly with the audio in the video. And then lastly, we deliver that video back to you either as a text output, audio file, or video file.

Next we’re going to dive into some demos so you can all see what we’re working with. So this first one– there’s going to be three in total– will be an example of a online lecture. And in this one, we’re talking about iOS apps.

[VIDEO PLAYBACK]

– Aerial view of Stanford University campus. The camera flies over the Memorial Church and surrounding buildings.

– Stanford University.

– Text, “School of Engineering, Developing Apps for iOS, CS 193, P Lecture 1. One, Overview of iOS, Objective-C, September 23, 2013. For more Stanford Engineering courses, visit see.stanford.edu, space, scpd.stanford.edu.”

– All right, so welcome to Stanford CS193P, Fall of 2013-’14 academic year. This is our very first lecture.

– The lecturer stands at the podium.

– And we are going to be covering developing applications for iOS, specifically iOS 7. Today’s lecture kind of has a first part and a second part. The first part is a little bit of logistics. The second part is I’m going to dive right into the course material because there’s a lot to cover–

– The lecturer stands at the podium, gesturing with his hands.

– –so I need to start covering it. So what will I learn in this course? I think you know what you’re going to learn in this course. You’re going to learn how to build cool apps.

– Text– “How to build cool apps. Easy to build even very complex applications. Result lives in your pocket or backpack. Very easy to distribute your application through the App Store. Vibrant development community.” A picture-in-picture window of the lecturer appears in the bottom right corner.

– OK, iOS is a really cool platform for building apps. You probably already know that. The apps look cool. They’ve got animation. What’s really–

[END PLAYBACK]

STEPHANIE TITLEBAUM: Great. So this example does a really good job of capturing all of the on-screen text that could be displayed in a lecture-type video. This is especially important in lecture content. We want to make sure all students have equal opportunity to engage with the content.

But it also doesn’t do the best that it could with the AI. It wouldn’t be fair if we were showing you all perfect shiny demos. So on the first slide, you might have noticed that there was some inconsistent tone and timing with the way the AI voice read all of the text on screen. It’s important to highlight that these are some things that we’re working on and improving as we iterate on the new service.

This next demo is going to be on more promotional content. You’ll notice that there’s no dialogue in this video, so pay attention to how the AD is inserted in this one.

[VIDEO PLAYBACK]

KidKraft, Made For Make Believe, trademark. Children run to a wooden playset. A boy swings upside down. A girl slides down a yellow spiral slide. A boy slides down a straight yellow slide. A girl opens the playhouse mailbox. Another girl sets the clock. A boy climbs a wooden ladder.

Castle wood playset. A boy pretends to cook on a play stove and talks on a play phone. Children sit in the playhouse.

[MUFFLE SOUND]

Close up on KidKraft sign on playset. View of the playhouse roof.

[END PLAYBACK]

I think that was all of it– yep. So this is an example of promotional content. There’s no dialogue, so the AI AD really has a lot of freedom to pick and choose which of the visual information and moments are captured on screen. But on the other side, this also can make it difficult to make sure that it’s picking and choosing the right things. But I think this one’s a pretty good example of what it can do and how it can describe many different things happening throughout the video.

This last video is what would be used as supplementary course content in a lecture. It’s more of a visual, documentary-style video.

[VIDEO PLAYBACK]

– It’ll be me. I just made a promise that the rest of my life, I’d just choose to do good things for other people.

– The man in a green shirt looks off to the side. Text– “In 2010, he moved his family from the city to a small village outside of Jakarta.” A rural scene with a building and trees is visible in early morning or evening light. A makeshift wind turbine spins. A person walks past a house.

– Indonesia is already a developing country. And then my village, where I live, is probably one of the underdeveloped villages.

– A man, woman, and young girl walk down a dirt road. The man talks to an older man next to a clothesline. Two children play.

– I’m not saying you have to sell your houses, your cars and everything. Well, that’s what I did, but I’m not imposing the values to other people. That’s what I needed to do. What I’m saying is that maybe there’s somebody that needs to build a whatever– the houses or irrigation system or something or anything.

– A woman uses a pickax to break up dirt and roots on a muddy slope. A dark blue Ford Ranger pickup truck is parked near some trees and a brick wall. The truck is seen driving down a dirt road.

– There’s a need in the village. You go out and you be the answer to the need.

[END PLAYBACK]

STEPHANIE TITLEBAUM: Great. So if you were able to see any of these demos that we just saw, you may have noticed that the audio description can sometimes be described earlier than or after the visual information is happening on screen. It’s important to remember from the point of view of someone who may be using audio description, they’re just trying to get– and we’re trying to provide– an equal opportunity of the visual information happening in support of the dialogue.

So this example also was extended. We were pausing the video to insert extra information because that dialogue was so important. But also all of the details that were shown on screen were setting the scene for what the speaker was talking about.

I’ll hand it back to Daria.

DARIA FERDINE: Amazing. Thanks, Stephanie. Awesome. So I just wanted to go over, what does this technology mean for higher education and for all of you here today? For one, scaling accessibility, quite literally, has never been more possible than it is now. Your teams have needed the ability to do more for less, like we had mentioned earlier, and now is really the time to let technology drive scalability so that you can concentrate on what matters most– those quality, those accommodation needs specifically.

I do want to call out, just going off of the demo as well, that throughout the entire development process of this service, we have been deeply committed to making sure that the voices of users were central to this entire process. We most recently partnered with an organization called Knowbility for user feedback that we actually just wrapped up. Shout out to if anybody from Knowbility on today’s call.

And we understand that the best way to create something truly meaningful, it’s to listen. It’s to learn. It is to continuously be improving. We have seen this AI’s improvement in the last two weeks substantial just based on user feedback and having that voice be able to be at the front page there.

But gathering invaluable feedback from those who matter most, the people who use it– really a north star for us here at 3Play. By bringing in their insights specifically to the design and to the refinement phases, we ensure that the final product wasn’t just built for them, but was built with them.

So the feedback that we got was actually super exciting. Users are really optimistic about the barriers technology will help dismantle in creating a more universal independence when it comes to interacting with video. So as far as executing those ADA Title II strategies, go, deep breath, we’re all in this together. Leveraging AI is going to help tremendously when it comes to expediting workflows and helping stay within or even below budget.

But most importantly, it’s helping support your teams, move that needle forward on promoting inclusivity, ensuring fair treatment, creating equal access to publicly-available resources that has not been there before. Overall, this AI is a really great stepping stone towards your full compliance for 2026 that would just be a major miss to not utilize. Can we go to the next slide.

Speaking of which, didn’t mean to jump scare anyone here, but I did want to resurface our ADA Title II compliance timeline view. We shared this on a general accessibility webinar that we had last month. It was a really big hit for a lot of you, but I did want to show it with an updated view as to where we’re at throughout the year.

At this point, we recommend that you should be rounding out your solution gathering phases, really start fine tuning the prioritization of content that you’re looking at. Budgeting season is also underway for most of you, so getting really clear about the resources that you will need to be successful on executing your accessibility strategies from start to finish in order to scale at your school is going to be key.

Awesome, thank you. We really appreciate your time today. Super look forward to partnering on all things video accessibility. We did want to allot time for questions today, so I think I will pull Elisa back in if that is–

ELISA LEWIS: Perfect.

DARIA FERDINE: Amazing, hi.

ELISA LEWIS: Hi, I’m back. Thanks, everyone. Lots of great questions already coming in through the chat and Q&A window. So definitely please keep those coming as we get started.

The first question that we have from an attendee is, can you speak more to what you mean by publishing AD? If I understand correctly, it is where people are recording and then uploading to a document, site, or publication.

DARIA FERDINE: Steph, you want to go?

STEPHANIE TITLEBAUM: Yeah, I just don’t want to talk over anyone. So when we talk about publishing audio description, we’re talking about publishing the described transcript or audio with the video. And as Daria spoke to before, we have a couple of solutions to do that.

If you are publishing lecture content in a lecture capture system or a video player, you can do that by embedding the audio track with the video if that player supports that, or you can also publish a second video. We have a couple other solutions, and we can share those resources after this.

ELISA LEWIS: Great. Thank you. Someone also asked in regards to publishing audio description, can you download a VTT file to upload for audio description in Panopto?

STEPHANIE TITLEBAUM: Yes. So our audio description outputs can be anything from a caption file, like a VTT, or a text transcript. It can also be an audio file, like an MP3, or video file, like an MP4. So anything that comes with media, we can probably produce.

ELISA LEWIS: Thank you. We have a couple questions around voice options. Someone is asking, are there voice options? And if so, how many? And then a quick follow to that is, can I request certain voice styles?

STEPHANIE TITLEBAUM: That is a great question. And I don’t actually know how many voices we have off the top of my head, but we have a variety of different voices that support different dialects as well as different gender or gender-neutral styles. So these are all available through the settings in your 3Play Media account. You can play with them, you can sample them, and also play with different speeds. Those that use audio description are typically used to a faster than a natural speaking rate, so we also allow that optionality.

DARIA FERDINE: And we also definitely welcome, if there’s specific voices/accents that are really prevalent within your demographic that you would be interested in having put on audio description, definitely bring that to the conversation or feel free to throw them in the chat as well. That’s a good place for us to log that.

ELISA LEWIS: Great. Thank you both. Another question came in. Do you do gender-neutral descriptions for people? They’re specifically thinking of the kids playing on the playground, and whether their gender was important or any more important than the colors of their skin or their clothing.

STEPHANIE TITLEBAUM: I love that this question came up. It’s a big discussion that we’ve had internally, and we want to make sure that we’re always describing fairly and accurately. So we can cater to what your preferences are. We’ll be able to, in the future, provide guidelines that we want to follow to support either your brand or just general preferences. So something like this we could feed into the AI to generate more gender-neutral descriptions rather than identifying gender. It’s essentially customizable.

ELISA LEWIS: Great. Thank you. Someone else was curious. Is there any reason that this AI approach couldn’t work in the pre-K through 12 space?

STEPHANIE TITLEBAUM: I don’t think so. I don’t know that we’ve tested with any of that specific content, but we are definitely open to it. We’re finding that the educational content with someone speaking and there being images on screen works really well. We’re also finding that sometimes we need to use extended AD in those scenarios, just to make sure we do have enough space to describe all of that information.

DARIA FERDINE: Yeah, and I’ll piggyback off of that, too, with saying the AD solution in particular is really with the mindset of these larger backlog projects, where you have a lot of low-risk content. So for videos that are high-visibility, that live on the first few clicks of your website for accommodation needs, 100%, we definitely recommend exploring a fully-compliant professional solution.

ELISA LEWIS: Great. And on a similar note, someone’s asking, are there certain types of content that this solution works better on?

STEPHANIE TITLEBAUM: Definitely.

DARIA FERDINE: I could talk all day about this. Go ahead.

STEPHANIE TITLEBAUM: Definitely. We are seeing that there’s better outputs when we’re working with simpler, less highly-technical content. Highly-technical STEM courses can be a little bit challenging. We are continuously testing and improving our models. So over time, just what we produced about three weeks ago is already improving in the next three weeks.

So what I would say to that is definitely reach out to your 3Play rep or our 3Play team in general, and we can check out your content and test with it. As we get more familiar with the different types of highly-technical content, we get more comfortable with it.

DARIA FERDINE: But this is why, too, it’s really important to recognize the deadlines on things so that you can be having these conversations on “is the AI good enough for my content” today, versus having it April 1, 2026, scrambling and trying to meet compliance eyes.

ELISA LEWIS: Great point. Thank you. Someone else is asking about the difference between standard and extended, and they’re wondering if it will automatically be selected, whether the video is processed with standard or extended audio description.

STEPHANIE TITLEBAUM: Do you want me to take that one?

DARIA FERDINE: Yeah, go for it.

STEPHANIE TITLEBAUM: Cool. At this time, we’re not automatically detecting it. We do have an option with our professional AD solutions, where we use the gaps in the speech to analyze if we think it should be standard or extended. We have yet to build that out on the AI AD solution, but we hope to be able to model that workflow in the future.

ELISA LEWIS: Great. We have a few cost-related questions. The first question is, how much does this cost generally? And then the second question is, how does the AI audio description compare cost-wise to human audio description?

DARIA FERDINE: Go for it, Steph.

STEPHANIE TITLEBAUM: Cool. So the standard AD, for AI AD cost is around $1 a minute. This definitely can be changed or discounted if we’re talking about backlogs. So we’re flexible with working on what you’re doing, what your captioning requirements are, and bundling that all together. This compares to be drastically less than a human description solution. Our human description solution’s up near $9, and comparing that to $1, you’re in pretty good shape there.

DARIA FERDINE: Yeah, one more thing I’ll add to that as well, a dip that I see a lot of my customers fall into is the decentralization of their different divisions and teams throughout their university. If you can, it very much behooves you to have a central person who is helping with these ADA Title II strategies from a more holistic view of saying, if you have five different departments who are paying for five different captioning contracts, it’s much easier for that cost to get a lot lower when you start centralizing your budgets.

And then also, of course, over time, that’s just a cost savings for your university as a whole. So if you need help on ways to position that to your university, on how to have those conversations with procurement or your budgeting department, please loop us into those conversations as well. We have a ton of experience with helping schools centralize this process specifically. More than happy to share best practices there.

ELISA LEWIS: Great. Thank you. We have a few questions coming in around how the AI audio description works. So someone’s asking, does your AI look at the content of the transcript and try not to repeat the content that the speaker is going to present? Doesn’t need to read the content off the screen because the speaker says everything, as an example.

STEPHANIE TITLEBAUM: Yep, that’s a great question. So when we are producing the AI audio description, we are prompting our models as well as using the video itself and the transcript that we have. That transcript is providing context to the prompt to ensure that we don’t repeat anything in the description that is also said in the dialogue.

ELISA LEWIS: Thank you. Similarly, someone’s asking, does this solution scan to determine if the media should have audio description at all? Or does it generate audio description regardless of whether it’s actually needed or not? For example, a lecture capture video, where the professor does an excellent job describing the visual elements on their slide deck already has descriptions.

STEPHANIE TITLEBAUM: Yeah, that’s a great question, especially in the context of talking about large video backlogs. We understand that when you have thousands and thousands of videos, it’s hard to understand what needs audio description. In Title II, we do want to make sure that those videos do have audio description, so we would recommend providing audio description. AI AD is a good solution in that context.

But to answer the question specifically, we are not doing anything to scan whether or not something needs audio description. In most cases, it probably should.

ELISA LEWIS: Great. And another question around how AI audio description works– someone’s asking, do you make sure that AI audio description doesn’t interrupt in the middle of a sentence?

STEPHANIE TITLEBAUM: Good question. So this is actually something we have improved over time. In our original audio description outputs a couple of months ago, we were seeing this interruption, and it was not a great experience. It was actually called out in the user study we did as being distracting. So in later improvements and later releases, we’ve improved this to make sure that in large paragraphs of speech, the audio description does less interrupting, especially when we’re looking at extended audio description.

ELISA LEWIS: Great. Thank you. And then there are a couple of questions about editing specifically. Someone is asking if you can review and edit the audio description. An example that they gave was in the example where they said the people’s gender, or maybe perhaps guessed, can this be changed manually if they were to get it wrong or, again, if someone went back and wanted it to be more gender neutral?

STEPHANIE TITLEBAUM: Definitely. We have an audio description editing tool in our platform already today, and it works great with AI AD. It allows you to go in, you can even do a search to pull up all the segments where maybe gender is mentioned, and you can quickly replace that or change it to your liking.

ELISA LEWIS: Someone else is asking if the demos are available anywhere so that they can share with their colleagues.

DARIA FERDINE: They are not posted on our site today. However, though, that’s definitely something we can send you, for sure.

ELISA LEWIS: And just a reminder, this session is recorded and you’ll receive the recording and slide deck, so you can absolutely share that recording out with colleagues and throughout your department.

DARIA FERDINE: –your question, send me an email, too. I got you.

ELISA LEWIS: Thank you. And then we have another question coming in. Someone’s asking, are there other language options?

STEPHANIE TITLEBAUM: Today, we do not have other language options. We do have a Spanish audio description solution with professional audio description. We hope to be able to expand to other languages in the future. We just want to master and perfect our English AI AD offering before we move to other languages.

DARIA FERDINE: Yeah, and I will say, too, just off of that, as a company, we are investing heavily in localization and globalization. So we are very well positioned in the future to support those languages. It would be really helpful from my higher education friends, if there are specific languages that would be super helpful for your, again, demographic, please feel free to put those in the chat, send those to me or Stephanie directly, so we know which ones are priority to start with.

ELISA LEWIS: Great. Thank you. We’ve covered a lot of great questions today, as well as the great presentation from Stephanie and Daria. So I think we’re going to wrap it up here for the moment. I want to thank you so much, Daria and Stephanie, for the wonderful presentation and for sharing so much great information with us today. Thank you to our audience for participating, answering the poll questions, and asking great questions. I hope you all have a great rest of the day.

Download Transcript

Localization

Accessibility

Platform

The 3Play Way: AI Audio Description for Higher Education [TRANSCRIPT]