The Current State of Automatic Speech Recognition: A Report

April 20, 2022 BY ELISA LEWIS

Read the 2022 State of Automatic Speech Recognition [Free Report]

3Play Media conducts annual research to determine the current state of automatic speech recognition technology. The study looks at the general state of speech-to-text technology and evaluates how the top speech recognition engines perform, specific to the task of captioning and transcription. Our research found that the accuracy of the technology has improved measurably since the company’s last report, published in January of 2021.

“As the AI models driving ASR continue to evolve, many of the engines we evaluated have shown significant strides in their transcription accuracy over the last two years.”

3Play Media tested all engines using a large dataset representative of 3Play Media’s diverse customer base. Accuracy was evaluated against two measurements: Word Error Rate (WER) and Formatted Error Rate (FER), which includes formatting errors like grammar, speaker identification, and non-speech elements in addition to word errors.

In both WER and FER measurements, Speechmatics with 3Play modeling and post-processing led the pack, followed by Speechmatics alone and Microsoft. Rev, Google VM, and Voicegain followed, each with respectable scores which were close enough that these vendors are hard to differentiate. Despite exciting improvement across the board, all engines performed well below the industry standard of 99% accuracy, confirming that ASR on its own still falls short of being “good enough” for compliance with closed captioning legal requirements.

Read The 2022 State of Automatic Speech Recognition: ➡️

“As the AI models driving ASR continue to evolve, many of the engines we evaluated have shown significant strides in their transcription accuracy over the last two years,” Chris Antunes, Co-CEO and Co-Founder, 3Play Media, said. “We run this report every year because we use ASR in our own transcription process, and we have a vested interest in making sure we’re utilizing the best engine on the market. Speechmatics remains a clear industry leader in both pre-recorded and live automated transcription, and applying 3Play’s mappings and post-processing resulted in an exciting improvement in word error rate of over 8%.”

The study showed a wide range in accuracy among the technology tested, with the highest and lowest performing engines differing by over 15 percentage points. This suggests that different engines are optimizing for different goals, and some ASR engines will not perform well for transcription. Compared to other uses of speech-to-text technology like automated assistants that are able to train on a specific voice, transcription is a very difficult task, with variables like diverse sentence structure and spontaneous speech, specialized terminology, and complex patterns including multiple speakers, accents, and background noise.

Read the FREE 2022 State of Automatic Speech Recognition Report ✨

Accuracy is critical in captioning for a number of reasons, most important being that individuals who are d/Deaf or hard of hearing rely on captions as an accommodation. Accurate captions also improve viewer engagement – studies show that captions improve watch time, brand recognition, and comprehension. And, while customer experience has emerged as a critical driver for businesses, so has digital accessibility legislation: in 2021 alone, there were 10 accessibility lawsuits filed per day.

—

Read the full 2022 State of Captioning Report to learn more about our findings and discover what you can expect from both 3Play Media and ASR engines in 2022 and beyond.

Why Captioning Non-Speech Elements Matters for Accuracy

by Rebecca Klein in Video Accessibility

Captioning Best Practices for Media and Entertainment [Free Ebook] When we talk about the quality of closed captions and podcast transcripts, we often reference the 99% industry standard for accuracy. Captioning accuracy measures punctuation, spelling, and grammar and is made up…

April 15, 2022

What Is 99% Accuracy, Really? Why Caption Quality Matters

by Elisa Lewis in Video Accessibility

99% accuracy is the industry standard for caption quality. Most captioning vendors guarantee 99% accuracy. But, how true are their claims? What Does Caption Quality Include? Accuracy is a critical aspect of the captioning process. Not only do accurate captions create an…

Updated February 6, 2024

Dog Training and Machine Learning: What They Have In Common

by John Slocum in Industry Trends

Although sometimes it seems we’re eerily close, machines haven’t replaced us yet. Yes, machines can make faster and more complex decisions, but it’s pretty easy to break one. Also, machines still can’t process logic that they haven’t been taught. Try out some…

Updated May 19, 2021

Localization

Accessibility

Platform

Further Reading

Why Captioning Non-Speech Elements Matters for Accuracy

What Is 99% Accuracy, Really? Why Caption Quality Matters

Dog Training and Machine Learning: What They Have In Common