The whisperSpeechToText module for Wowza Streaming Engine™ media server software can be used to receive audio from an incoming source stream and send that raw audio to OpenAI Whisper. Whisper's speech recognition service processes the audio data and returns captions for display alongside your live stream. For available models and languages, see the whisper project on GitHub.
The module automatically enables captions for WebVTT output, which we generally recommend. However, it's also possible to configure it for CEA-608/708 captions. When used with the Whisper service, the module is only capable of transcribing audio into captions. It doesn't translate the source audio into different language tracks.
You can get the whisperSpeechToText source code from the wse-plugin-caption-handlers repository on GitHub.
Prerequisites
To work with the whisperSpeechToText module, you must meet the following prerequisites:
- You must have Wowza Streaming Engine 4.9.4 or later installed and use Java 21.
- If you plan to preview the module using Docker Compose, install and run Docker Desktop.
- If you're not using Docker Compose to preview the module, you need to manually set up your own Whisper server.
Usage
You can preview the whisperSpeechToText module using our Docker Compose deployment, or you can manually install the module in your existing Wowza Streaming Engine installation. Select one of the following workflows depending on your use case:
A successful setup utilizes Whisper's automatic speech recognition (ASR) system to convert audio from a source stream into text, which is then injected into the Wowza Streaming Engine live stream as onTextData. Once the onTextData is inserted into the stream, you can configure Wowza Streaming Engine to output CEA-608/708 or WebVTT captions.
For most modern use cases, we recommend using WebVTT captions since they provide rich styling and customization options, full UTF-8 encoding for internationalization, and native support in multiple browsers and players.
Preview the module with Docker Compose
To preview this module, you can use our docker-compose.yaml deployment. We describe a similar process in the Trial Wowza Streaming Engine using a Docker Compose deployment article, where you can find additional information about environment variables.
This Docker Compose workflow is pre-configured to start a Wowza Streaming Engine instance with the whisperSpeechToText module installed and set up to leverage Whisper's ASR services. It also installs and sets up a Whisper server to automatically detect the audio input and transcribes it into WebVTT captions using the detected language.
If you're trying to manually add the module to an existing installation of Wowza Streaming Engine, continue with the Install the module section instead.
To use the Docker Compose preview deployment, follow these steps.
- Install Docker Desktop, which includes the Docker Engine and the Docker Compose plugin.
- Make sure Docker Desktop and Docker Engine are running.
- Clone the wse-plugin-caption-handlers repo:
git clone git@github.com:WowzaMediaSystems/wse-plugin-caption-handlers.git
- Change the directory to the wse-plugin-caption-handlers repo:
cd wse-plugin-caption-handlers
- Update the WSE_LICENSE_KEY variable in the docker-compose.yaml file with your Wowza Streaming Engine key:
export WSE_LICENSE_KEY=[your-license-key]
Note: If you set the license key using the described method, it doesn't persist between terminal sessions and each time you run the Docker container or reboot your server. For a more consistent experience, you can directly add the license key to the docker-compose.yaml file or use a .env file to store sensitive data.
- From your local wse-plugin-caption-handlers repo, run:
docker compose up
- Open a new browser tab and go to:
http://localhost:8088/login.htm?host=http://wse.docker:8087
Note: When you click the Server link, confirm the http://wse.docker:8087 URL displays.
- Log in to Wowza Streaming Engine using the credentials from the docker-compose.yaml file.
- Go to Applications and click the whisper application.
- Check the Modules tab for the whisper application, which includes the whisperSpeechToText module.
- Go to the Properties tab and view the Custom properties. They are pre-configured to work with the Whisper ASR service.
- Start a stream and send it to your Wowza Streaming Engine server using the following server and stream key. For more about publishing live streams, see Connect a live source to Wowza Streaming Engine.
rtmp://wse-demo.wowza.com/whisper/myStream
- To test playback and see the automatically generated WebVTT captions, go to our Wowza Test Player and use this URL:
http://localhost:1935/whisper/myStream_delayed/playlist.m3u8
Set up the module without Docker Compose
If you already have Wowza Streaming Engine installed and don't plan to use the Docker Compose deployment to preview the pre-configured whisperSpeechToText module, you can install and configure the standalone module using the steps in this section.
Install the module
To manually install the standalone module without using our Dockerized solution, follow these steps.
- Download the wse-plugin-caption-handlers-[version].jar file from the latest plugin release version.
- Copy the wse-plugin-caption-handlers-[version].jar file to the [install-dir]/ lib folder in your Wowza Streaming Engine installation.
- Enable the Wowza Streaming Engine Transcoder for your live application.
- Restart Wowza Streaming Engine.
- Continue to the Enable the module and Configure module properties sections.
Enable the module
To enable this module, add the following module definition to your application configuration. See Configure modules for details.
Name
|
Description
|
Fully qualified class name
|
whisperSpeechToText | WhisperSpeechToText | com.wowza.wms.plugin.captions.ModuleWhisperCaptions |
Configure module properties
After enabling the module, you can adjust the default settings by adding the following Custom properties to your live application. See Configure properties for details.
Required properties
Path
|
Name
|
Type
|
Value
|
Description |
/Root/Application | whisperCaptionsEnabled | Boolean | true | If the whisperSpeechToText module is configured, set this property to enable it. The default value is false. |
/Root/Application | whisperSocketHost | String | localhost | Specify the hostname or IP address where the Whisper service is hosted. |
/Root/Application | whisperSocketPort | String | 3000 | Specify the network port on which the Whisper server is actively listening for incoming connections. |
Optional properties
Path | Name | Type | Value | Description |
/Root/Application | captionHandlerDebug | Boolean | true | Enables extra debug logging for troubleshooting. |
/Root/Application | captionHandlerStreamDelay | String | 10000 | Defines the delay between the source stream and output stream in milliseconds. The default value is 30000 (or 30 seconds). |
Configure captioning properties
The whisperSpeechToText module enables WebVTT captions and defaults to the detected language. If you plan to use embedded captions, such as CEA-608/708, you have to disable the captionLiveIngestLanguages closed-captioning property.
- From the Properties tab of your Wowza Streaming Engine live application, click Closed Captions.
- Click Edit.
- Disable the captionLiveIngestLanguages property.
- Click Save.
- Restart your application.
- See Configure closed captioning for Wowza Streaming Engine live streams for more information.
Set up a Whisper server
If you're not using the Docker workflow to preview this module, you must independently set up a Whisper server to process audio data and return captions. We provide this whisper_streaming GitHub repository with a Docker container to run a standalone Whisper service. This project builds upon this resource. To run the Whisper server, follow these steps.
- Clone the whisper_streaming repo:
git@github.com:WowzaMediaSystems/whisper_streaming.git
- Change the directory to the whisper_streaming repo:
cd whisper_streaming
- To run the Whisper server in standalone mode, comment out the trial and manager services in the docker-compose.yaml.
- Make sure that the image in the docker-compose.yaml file is set to wowza/whisper_streaming:latest.
- From your local whisper_streaming repo, run:
docker compose up
Test playback
Use the steps in this section to publish your source stream to Wowza Streaming Engine and to verify that the module is working as expected.
- Start a stream and send it to your Wowza Streaming Engine server using the following server/port and stream key. For more about publishing live streams, see Connect a live source to Wowza Streaming Engine.
rtmp://[server-ip-address]:1935/[application-name]/myStream
- Check the Incoming Streams page for your live stream, where the output looks similar to this:
Note: The [stream-name]_160p, [stream_name]_360p, and [stream-name]_source renditions include WebVTT captions. They're transcoded versions of the [stream-name]_delayed stream.
- Go to our Wowza Test Player to test playback with the automatically generated WebVTT captions using the following URL:
http://[server-ip-address]:[port]/[application-name]/myStream_delayed/playlist.m3u8