Real-Time Captioning: What It Is and Why You Need It
What Is Real-Time Captioning?

Captions are words displayed on the screen in conjunction with video content. It’s possible that you’ve turned them on at some point while watching Netflix or another streaming service. Real-time captioning, also known as live captioning, is simply the creation of captions to appear on a screen while content is being played or words are being spoken. This can be executed in one of two ways: stenography and speech recognition.
Stenography
Stenography has been used for decades to achieve the real-time capture and reproduction of spoken words. The most common use of stenography is in court reporting. In the 1980s, the United States courts system began using court reporters nationwide. This sped up record-making and facilitated participation from jurors and litigants who were deaf or hard of hearing. Now, these same stenographers use their skills to make live captions for video content such as court proceedings, conferences, and live events.
Speech Recognition
In the past decade, there’s been an explosion of apps and services offering live captions using speech-to-text software. This is the same technology that enables us to use virtual assistants on our phones or dictate an email. Microsoft tested its speech recognition software on what’s called the Switchboard conversational speech recognition task in 2017. Essentially, it pitted their transcription system against human transcribers. As it turned out, the software performed as well as multiple transcribers working together. There was a word error rate of only 5.1%. Speech recognition software has only improved since it achieved that benchmark three years ago.
Why Use Live-Captioning Software?
You’re here, so you already know how prominent and widespread live video is today. People everywhere watch live streams all the time—from entertainment and news to home surveillance. Often, they’re watching while they’re on the go or multitasking. It’s not always convenient or possible for people to listen to the audio for every video they view. They could be at work or in a public space where it’s difficult to hear. Humans adore convenience, which is why live captioning is so important. There are two great arguments for using real-time captioning, tied to revenue, if that’s not convincing enough.
To Reach a Bigger Audience
Adding captions can greatly expand your audience, especially if you’re able to offer them in multiple languages. As our world becomes more connected, the global distribution of content is becoming more common. Live events, such as concerts and conferences, appeal to audiences with shared tastes or vocations no matter where they live.
For Accessibility
By law, some live streaming content must include captions. The Americans With Disabilities Act (ADA) of 1990 ensures that people with disabilities are not excluded from public or private services due to a lack of accommodations. Captions serve as an auxiliary aid for the hearing-impaired. Additionally, in 1993, the FCC began requiring that all televisions sold have a built-in decoder for closed-captioned programming. Then, in 2010, the FCC mandated that all content that includes captions when broadcast on TV also have captions when distributed online. Given that there are very few exceptions to the ADA rules, this applies to the lion’s share of content.
What Are the Different Types of Captions—And Why Does it Matter?
It may surprise you that there’s an important difference between subtitles and closed captions. Captions are specifically for those who are deaf and hard of hearing, and for situations in which audiences won’t be able to hear the audio. Subtitles are for audiences that don’t know the spoken language. That’s why subtitles only include the text of what’s being said. In contrast, captions typically also indicate music, sound effects, and in some cases, vocal inflections and emotions.
This distinction matters when you’re choosing a real-time captioning service. The content type may dictate which caption type you should use. Plus, certain audiences will prefer one over the other. The extra information conveyed in closed captions can be distracting—or even annoying—to people who can hear the audio and only need a translation. But if you’re broadcasting content that most people will watch, for instance, on their muted phones, then you’ll probably want to include those extra cues.
What is the difference between closed captioning and real-time captioning?
A video streaming server is software that ingests live or file-based video, processes it through transcoding and packaging, and delivers it to viewers over standard protocols like HLS, DASH, WebRTC, RTMP, and SRT. It sits between the video source and the viewer, handling the technical work of making video reliably available at scale across devices, network conditions, and geographic regions. Wowza Streaming Engine is a configurable video streaming server that runs on-premises, at the edge, in the cloud, or in fully air-gapped environments, giving organizations complete control over how video is ingested, secured, processed, and delivered.
What is a Live Caption example?
A common Live Caption example is a televised news broadcast, where on-screen text transcribes the anchor’s words at the bottom of the screen within 1–3 seconds of being spoken. Other everyday examples include NFL games on streaming platforms, Zoom and Microsoft Teams meetings with auto-captions enabled, and YouTube Live streams with auto-generated captions.
How To Integrate Live Captions With Wowza
Wowza provides support for live-caption services in both our software and service products. Wowza Streaming Engine can ingest caption data from a variety of in-stream and file-based sources, or it can directly embed the correct caption format for outbound video based on the protocol. Click here for complete details about how to incorporate live closed captioning with Wowza Streaming Engine.
Wowza Video can ingest CEA-608 (digital) and onTextData captions. On the outbound side of things, Wowza Video embeds CEA-608 captions or onTextData depending on the protocol used for playback—that is, HLS, HDS, or RTMP. Click here for more information about how to add real-time captions to Wowza Video.
How to get real-time subtitles?
Getting real-time subtitles requires connecting a live video source to an automatic speech recognition (ASR) service that transcribes audio and injects synchronized text into the output stream. Common methods include enabling automated captions in platforms like Zoom, Microsoft Teams, and YouTube Live, integrating speech-to-text APIs such as OpenAI Whisper, Azure AI Speech Services, Google Cloud Speech-to-Text, or AWS Transcribe with a streaming server, or using browser-based caption tools like Live Caption on Chrome and Android. Then, deliver the transcription as CEA-708, WebVTT, or IMSC1 caption tracks inside HLS, DASH, or SRT streams.