Injecting Closed Captions Using Wowza Streaming Engine


There are a number of reasons to use live closed captioning in your video streams. Beyond increasing inclusion, captions can improve comprehension for viewers — especially those with limited proficiency in the language. Plus, with so many viewers accessing content on the go, captions ensure that the message is conveyed despite any background noise. 

In this demo, Tim Dougherty shows how to push closed captions into Wowza Streaming Engine to merge with an incoming stream. This functionality could also be used in multiple different languages to reach audiences across the world.

See how it works in the video above or follow along with the transcript below.

Full Video Transcript:Tim Dougherty:

Hi, Tim with Wowza here. This is a brief functional demo to show you what it looks like when you might want to push closed caption data into Wowza Streaming Engine and merge it with an incoming stream. Take a look at this diagram I have here. Wowza Streaming Engine running in Amazon EC2. This one’s running on Linux. It’s also very doable on Windows, this workflow. I’m using OBS to push RTMP into Wowza Streaming Engine. Very simple, straightforward encode. I’ll show you the setup in just a moment for OBS. I’m watching it here on this WebRTC client. If you see some pausing, I’m actually using a server that’s several states away. That’s my mistake, but it won’t affect this demo too much.

But I’m going to open up a console with a Telnet client, push in ASCII data via TCP. I’ll be doing that up here in this console. And again, I’m watching it via WebRTC. Now it’s consumed directly using an M3U8 URL, but it’s also important to note that Wowza Streaming Engine can push an RTMP stream with closed caption data loaded that can go into another closed caption supporting RTMP provider endpoint. So there’s a lot of flexibility here. Another aspect of this, I do have a player running, I’ll fire it up in just a moment. This is my URL. FQDN for my server live application. I am using a SMIL file, which loads the playlist. So the playlist has the closed caption track and it has the video and audio track. If you’re using adaptive bitrate streaming (ABR), you’ll have to put your ABR renditions in there as well. It’s pretty straightforward, repeatable setup but that’s essentially what I have going. I’ll go ahead and start the stream here. This is the standard.

So you can hear the audio looping through from the encode and then I’ll turn on captions. So we’re ready to view playback over here. Just real brief on Wowza Streaming Engine, stream one coming in from the encoder, that’s the IP address of the encoder, and then I’ve got an Opus transcode going on. But there really isn’t anything special in the Wowza UI. Just take a quick look at the OBS. I’ve got my headset in the feed so that you can hear my voice and kind of compare that to the captions and then see the latency. You can see over here with WebRTC, that the latency is really low. So I could be a captionist watching this screen here in a remote location, typing into my caption software. In this case, I’m again just using a Telnet session, but captioning and stenography, they have a very specialized task and they use software that supports Telnet and other formats.

So, anyway, let’s go ahead and get started over here in the console I’m going to log in. I’m going to call it English. And that’s how we identify the languages. You could have multiple translators or captionists running. So you could have English and German and Russian and Catalan and Mandarin. There really isn’t a significant limit to it, but that’s how we identify. So I type in English. Password is pass. This is very configurable from a security standpoint, and we’re ready to get started. So I have this muted now. I’m going to watch what’s coming down here. I’m just going to type and then talk.

So, “welcome to the auction. Nice Cobra. This is a very exciting event. Reserve is off. This car is not for the faint of heart. Very fast. What do you think, Bill? They going to sell this? I don’t know.”

Now I see the caption data coming over here in the player I’m going to unmute and just listen to the audio and watch the captions.

“This is a very exciting event. Reserve is off. This car is not for the faint of heart. Very fast. What do you think Bill? They going to sell this? I don’t know.”

So that’s it end to end. And now there are some nuances to formatting and when the data is sent and obviously a lot of practice is involved. So this is a very simple demonstration. Having worked with caption companies in the past, doing this very exact same use case, there is a lot of specialization there and timing captions, et cetera. So don’t hold Wowza responsible for some subtle timing differences. That really is up to the specialized art, if you will, of being a stenographer and a captionist.

But I do want to highlight that what you saw here is a simple Telnet session right here. It’s pushing data into Wowza Streaming Engine, and you just saw it come up right here in the player. So if you’re interested in learning more, let us know. We’d love to hear from you We’ll get you in touch with whomever you need to speak to inside of our company, myself included, to be able to talk to you about this workflow. We’re very excited that we can support this functionality, and we hope to extend that excitement and functionality to you. Thank you very much for watching. Have a great rest of your day.


Search Wowza Resources



Follow Us


About Traci Ruether

Traci Ruether is a Colorado-based B2B tech writer with a background in streaming and network infrastructure. Aside from writing, Traci enjoys cooking, gardening, and spending quality time with her kith and kin. Follow her on LinkedIn at or learn more at