Real-Time Captioning: What It Is and Why You Need It

What Is Real-Time Captioning?


Captions are words displayed on the screen in conjunction with video content. It’s possible that you’ve turned them on at some point while watching Netflix or another streaming service. Real-time captioning, also known as live captioning, is simply the creation of captions to appear on a screen while content is being played or words are being spoken. This can be executed in one of two ways: stenography and speech recognition.



Stenography has been used for decades to achieve the real-time capture and reproduction of spoken words. The most common use of stenography is in court reporting. In the 1980s, the United States courts system began using court reporters nationwide to both speed up record making and facilitate participation from jurors and litigants who were deaf or hard of hearing. Now, these same stenographers use their skills to make live captions for video content such as court proceedings, conferences, and live events.


Speech Recognition

In the past decade, there’s been an explosion of apps and services offering live captions using speech-to-text software. This is the same technology that enables us to use virtual assistants on our phones or dictate an email. In 2017, Microsoft tested its speech recognition software on what’s called the Switchboard conversational speech recognition task. Essentially, it pitted their transcription system against human transcribers. As it turned out, the software performed as well as multiple transcribers working together, with a word error rate of only 5.1%. That benchmark was achieved three years ago, and speech recognition software has only improved since then.


Why Use Live-Captioning Software?

You’re here, so you already know how prominent and widespread live video is today. People everywhere watch live streams all the time—from entertainment and news to home surveillance. Often, they’re watching while they’re on the go or multitasking. It’s not always convenient or possible for people to listen to the audio for every video they view because they could be at work or in a public space where it’s difficult to hear. Humans adore convenience, which is why live captioning is so important. If that’s not convincing enough, there are two great arguments for using real-time captioning that are tied to revenue.


To Reach a Bigger Audience

Adding captions can greatly expand your audience, especially if you’re able to offer them in multiple languages. As our world becomes more connected, the global distribution of content is becoming more common. Live events, such as concerts and conferences, appeal to audiences with shared tastes or vocations no matter where they live.


For Accessibility

By law, some live streaming content must include captions. The Americans With Disabilities Act (ADA) of 1990 ensures that people with disabilities are not excluded from public and private services because of the lack of accommodations. Captions serve as an auxiliary aid for the hearing impaired. Additionally, in 1993, the FCC began requiring that all televisions sold have a built-in decoder for closed-captioned programming. Then, in 2010, the FCC mandated that all content that includes captions when broadcast on TV also have captions when distributed online. Given that there are very few exceptions to the ADA rules, this applies to the lion’s share of content .


What Are the Different Types of Captions—And Why Does it Matter?

It may surprise you that there’s an important difference between subtitles and closed captions. Captions are specifically for those who are deaf and hard of hearing, and for situations in which audiences won’t be able to hear the audio. Subtitles are for audiences that don’t know the language being spoken. That’s why subtitles only include the text of what’s being said, whereas captions typically also indicate music being played, sound effects, and in some cases, vocal inflections and emotions.

This distinction matters when you’re choosing a real-time captioning service because the type of content may dictate which kind of caption needs to be used, and certain audiences will prefer one over the other. The extra information conveyed in closed captions can be distracting—or even annoying—to people who can hear the audio and only need a translation. But if you’re broadcasting content that most people will watch, for instance, on their muted phones, then you’ll probably want to include those extra cues.


How To Integrate Live Captions With Wowza

Wowza provides support for live-caption services in both our software and service products. Wowza Streaming Engine can ingest caption data from a variety of in-stream and file-based sources, or it can directly embed the correct caption format for outbound video based on the protocol. Click here for complete details about how to incorporate live closed captioning with Wowza Streaming Engine.

Wowza Video can ingest CEA-608 (digital) and onTextData captions. On the outbound side of things, Wowza Video embeds CEA-608 captions or onTextData depending on the protocol used for playback—that is, HLS, HDS, or RTMP. Click here for more information about how to add real-time captions to Wowza Video.


Search Wowza Resources



Follow Us


About Brittney Dougherty

Based in Denver, CO, Brittney Dougherty is a digital marketer at Wowza in charge of SEO and website updates. She has eight years of experience in B2B writing and research. She is known to frequently nerd out about marketing, gaming, Sci-Fi, and hiking.