The “Human Touch” in Live Captioning Ensures Accuracy & Accessibility
Updated: December 6, 2021
Closed captions are an important factor in making video accessible for all audiences, including live streamed events. After the pandemic gave rise to a renewed popularity of virtual events, more organizations are looking for live captioning solutions – but keep in mind, not all live captions are created the same. Almost more important than providing live captions, it’s crucial to provide a high level of accuracy in live captioning.
What are live captions?
Live closed captions are time-synchronized text that appears in real time and can be used for a multitude of reasons, such as virtual events, meetings, online classes, or performances.
It’s important that virtual events provide live captions in order to create a fully accessible experience for those who may be d/Deaf or hard of hearing, but captions have the added ability to improve audience engagement and overall comprehension. Additionally, the business benefits offered by captions hold the potential to boost search engine optimization and enhance the user experience.
However, to properly harness the benefits of live captions they must be accurate. The accuracy rate of live captioning often depends on the method by which they were created, which could be through the use of automatic speech recognition (ASR) or a human captioner.
Why you need live (human) captions to stream events ➡️
Automatic vs. human-generated captions
There are several different ways to incorporate live captions, but the two most popular are live automatic captioning and live human captioning.
Live automatic captioning
While automatic captions are more readily available and less expensive (generated through popular meeting platforms like Zoom), their accuracy rates are notoriously low.
Live automatic captions do not involve a human captioner and are written using artificial intelligence (AI) like ASR. Because of this, the likelihood of errors in punctuation, speaker identifications, and grammar greatly increases. In addition, AI doesn’t have the same capacity for contextualization as a human being – meaning that when ASR misunderstands a word, there’s a possibility it will be substituted with something irrelevant, or omitted altogether.
Omission errors can drastically change the meaning of a sentence! Consider the following example:
“The flash flood warning for Suffolk County has been lifted.”
“The flash flood warning for Suffolk County has NOT been lifted.”
However, industry standards find it acceptable to omit stammers, false starts, and filler words like ‘um’ or ‘ah.’
While there is currently no definitive legal requirement for live captioning accuracy rates, existing federal and state captioning regulations for recorded content state that accessible accommodations must provide an equal experience to that of a hearing viewer. This condition, coupled with their tendency toward low accuracy means that live automatic captions alone are not sufficient to provide an equitable experience for d/Deaf or hard of hearing viewers.
Learn more about accessibility laws 📑
Live human captioning
By comparison, live human captioning is significantly more accurate and reliable. While neither AI nor human captioners can provide 100% accuracy, the most effective methods of live captioning incorporate both in order to get as close as possible.
There are two primary ways to include humans in live captioning workflows: CART and voice writing. Communication Access Real-time Translation (or CART for short) employs a skilled transcriber operating a stenotype keyboard to produce captions in real time, or as close to it as possible. The process of voice writing, on the other hand, consists of a few more components:
- The original speaker at the live event
- A highly trained voice writer
- Specially tuned ASR software
Whichever method is used, the human touch is irreplaceable in producing accurate, real-time captions. Once again, the lack of common standards for live captioning makes appropriately measuring accuracy a somewhat subjective endeavor. However, there is a generally accepted formula used to evaluate accuracy:
Calculating caption accuracy
Accuracy = (Total # words captioned – Incorrect words captioned) / Total # words captioned x 100
For example, if a captioner writes 10,000 words during a live event and 200 of those words were incorrect, the resulting accuracy rate would be 98%.
While this is a useful working definition, the number of “incorrect words captions” in this equation doesn’t account for punctuation mistakes, words that are omitted, or substitutions. As seen in the example above, these types of errors can impact the understanding of a d/Deaf or hard of hearing viewer. To remedy this oversight in calculation, the FCC’s Report and Order on closed captioning quality specifies:
No matter how you calculate accuracy, it’s undeniable that human involvement in your live captioning workflow increases your chances of producing accurate, comprehensible, and engaging captions at your next virtual event.
Want to learn more about live captioning?
Further Reading
Subscribe to the Blog Digest
Sign up to receive our blog digest and other information on this topic. You can unsubscribe anytime.
By subscribing you agree to our privacy policy.