Editing Auto-Captions, Auto-Captions Only, or Re-Captioning

August 25, 2017 BY PATRICK LOFTUS
Updated: February 2, 2021

an old 1950s-style wind up robot with a speach bubble that says, I think therefore I ham
Nobody’s perfect. Not even computers…

While artificial intelligence still isn’t that great at understanding human speech, reliance on automatic captioning is pervasive. In fact, YouTube recently announced that their auto-captioning feature, which relies on articial intelligence for speech recognition, has added automatic captions to over 1 billion videos.

Automatic captions, or auto-captions, rely on automated speech recognition software (ASR) to convert spoken audio into text on-screen. The problem is that this kind of technology cannot be relied on alone to achieve captions and transcripts that represent an accurate reproduction of the spoken audio, thereby excluding people who cannot hear it.

For our multi-industry report, the 2017 State of Captioning, we surveyed over 1400 people and asked how captioning is done at their employer’s organization.

One of the questions we asked in our survey was, “Do you use automatic captions?” Data from the results yielded the chart below:

Column chart showing use of automatic captions: 50.97% don't use automatic captions; 26.68% do, but with cleanup; 18.81% do for some videos; 3.53% do for all videos.

Most organizations (50.97%) do not use auto-captions at all. Of the remaining 49.03% of organizations that do use auto-captions, more than half (26.68%) clean them up in post-production. However, a shockingly high 22.23% of respondents said their organization relies on auto-captioning only for all or some of their videos.

To give organizations an idea of which process works best for their needs, we’re going to take a look at the major pros and cons of publishing videos with “cleaned-up” auto-captions, relying on auto-captions only, and re-captioning videos entirely in post-production.

Editing Auto-Captions

The most efficient captioning processes combine both speech recognition and human editing.

Speech recognition software is often free or very inexpensive. YouTube is probably the best-known source of auto-captions because it automatically adds them to user-uploaded videos and has a free interface that allows users to edit those captions for timing, spelling, and punctuation accuracy.

Image: YouTube’s free auto-caption editing interface.

If one wanted free, accurate, high-quality captions, using YouTube they would have to:

Upload all their videos to YouTube.
Let the ASR produce a rough transcript.
Go into the editor and clean up the spelling, grammar, punctuations, and speaker labels.
Edit the time codes to align caption frames with the spoken audio.
Boom. Voilà. Free closed captions! Repeat steps 1 through 3 for each video.

So, when planning on using a workflow that relies on editing auto-captions, consider these pros and cons…

Pros:

Free in most cases
Works well with very clean, simple audio
Captions can be downloaded in a few common formats

Cons:

Doesn’t work well with audio that has multiple speakers, difficult accents, fast speakers, or background noise
Editing can be a huge time commitment
All your videos will have to be on YouTube

Using Auto-Captions Only

As stated earlier, auto-captions are free. That is why they are so attractive and why 22.23% of people’s organizations use them as a video captioning solution.

They are also infamously inaccurate. Not only is it frustrating those of us who cannot or do not want to hear the audio, but it could also put your organization in a lot of legal trouble.

Organizations that rely exclusively on ASR to caption their videos risk excluding deaf and hard of hearing users who want to watch that video content, too. Harvard and MIT are currently engaged in a lawsuit brought about by the National Association of the Deaf (NAD) for using inaccurate captions (made using only ASR) on their online course videos.

So, when planning on using a workflow that relies solely on auto-captions, consider these pros and cons…

Pros:

They’re free
Works well with very clean, simple audio
Captions can be downloaded in a few common formats

Cons:

Doesn’t work well with audio that has multiple speakers, difficult accents, fast speakers, or background noise
Proper spelling, grammar, punctuation, time-synchronization, and speaker labels will be lost
You exclude those who cannot hear and those who do not want the audio on
You can be put your organization at legal risk for discriminating against deaf and hard of hearing users

Re-Captioning in Post-Production

This method works best if you’ve used live captions — which usually don’t have time stamps to align with the spoken audio — or have a video publishing process that isn’t compatible with using YouTube’s auto-captioning feature every time.

Editing Live Captions

There are some great live captioning services out there, and many of the good ones are very accurate. However, using a live captioning service and editing those captions in post-production is often expensive and fraught with difficulties.

Column chart showing how organizations caption a recorded video after a live-streamed event. 17.84% of organizations use live captions for the recording of a video; 30.48% edit and republish the live captions; 51.67% get them recaptioned.

Chart representing answers to the question, “Do you re-use live captions?”

Because live captions don’t translate well in post-production, most organizations (51.67%) get video recordings of live events re-captioned.

So the best two options are to either caption your videos in-house, or outsource the work to a premium captioning vendor to ensure quality standards are met.

So, when planning on using a workflow that relies on re-captioning in post-production, consider these pros and cons…

Pros:

Highest caption accuracy possible
With captioning vendor, significant time is saved
Peace of mind knowing your caption quality will be consistent

Cons:

Not free if using a captioning vendor
Heavy time commitment for in-house captioning

Captioning needs are increasing year after year.

So, for organizations that want to both accommodate the needs of those who watch video without sound and those who cannot hear, it is in their best interests to seek the most efficient and scalable captioning solution that works for their purposes. Whether it’s in-house, through a third-party service, or a combination of both, what matters most is that your process produces captions that are accurate and high-quality.

—

Read our full report, the 2017 State of Captioning.

If you’re looking for a vendor that leverages both speech recognition and human editing to produce high-quality captions, check out our pricing today!

Ultimate Guide to the European Accessibility Act (EAA) for Video

by Elisa Lewis in Video Accessibility

The European Accessibility Act (EAA) establishes accessibility requirements for several key products and services to benefit individuals with disabilities and elderly people in the European Union. The goal of the legislation is to eliminate the barriers that people with disabilities confront when…

March 24, 2025

Does Your Video Need Extended or Standard Description?

by Aj Beltis in Video Accessibility

If you’re trying to make your videos more accessible, you’re probably looking into perfecting their audio descriptions. However, creating audio descriptions is no easy feat. Audio describers often have to make difficult choices and judgment calls. They must decide what to describe,…

March 19, 2025

Standards and Requirements for Quality Audio Description

by Aj Beltis in Video Accessibility

Providing a verbal description of on-screen elements and visual details in media, audio description gives individuals with low vision the context they need to understand and enjoy video content. Think of it as the equivalent of captions, but for those who are…

March 10, 2025

Localization

Accessibility

Platform

Further Reading

Ultimate Guide to the European Accessibility Act (EAA) for Video

Does Your Video Need Extended or Standard Description?

Standards and Requirements for Quality Audio Description