Use alternative audio or video tracks with MPEG-DASH streams in Wowza Streaming Engine

The MPEG-DASH protocol and Wowza Streaming Engine™ media server software allows you to include multiple audio and video tracks within an MPEG DASH presentation (or MPD). Including multiple tracks allows viewers to select between multiple languages or aspect ratios during stream playback. This article describes how to use SMIL files with custom tags in Wowza Streaming Engine to enable track selection features in multi-track streams.

Notes:
  • Wowza Streaming Engine version 4.6.0 or later is required.
  • Not all MPEG-DASH players fully support multiple language tracks or the role feature.
  • Multi-language audio and video tracks require advanced configuration for each stream. It's not currently possible to do the full configuration in Wowza Streaming Engine Manager.
  • Similar SMIL file functionality can be implemented programatically through the Wowza Streaming Engine Java API AMLST feature.
  • For more information about MPEG-DASH streaming, see our MPEG-DASH guide.

Audio and video track setup


For everything to be synchronized, all alternative tracks must be timecode-aligned. This means that they must come from the same encoder or from separate encoders that produce aligned streams.

For live streams, Wowza Streaming Engine can ingest a single MPEG-TS stream that has multiple tracks, and then separate them into individual tracks. It's also possible to ingest streams from separate encoders, as long as all tracks are timecode-aligned.

For video-on-demand (VOD) streams, separate audio and video files can be used or files with muxed audio and video as long as all tracks are timecode-aligned. It's also possible to use a single file with all of the tracks in the file; however, you must use the ModuleMultiTrackVOD module to separate out the tracks.

MPEG-DASH SMIL file parameters


Wowza Streaming Engine uses custom Synchronized Multimedia Integration Language (SMIL) files to link multiple video or audio tracks (or renditions) together within a MPEG-DASH live or on demand stream. Renditions that are defined using <video> tags in the SMIL file will overwrite information in the MPEG-DASH MPD.

This section describes how MPEG-DASH streams can be customized using parameters in SMIL file video definitions. For more information about common attributes and parameters that you can use in your SMIL file to control settings in your MPEG-DASH MPD, see Understanding SMIL file syntax.
 
Note: Defining bitrate (audio, video, or system-bitrate) and source (src) attributes or paramater in each <video> object is mandatory to create a valid SMIL file.

Audio-only, Video-only

Add the videoOnly parameter to the <video> definition to make Wowza Streaming Engine ignore any audio tracks in the source media file.

<video src="sample.mp4" video-bitrate="8000000" width="1280" height="720">
        <param name="videoOnly" value="TRUE" valuetype="data"/>
</video>

Add the audioOnly parameter to the <video> definition to ignore any video tracks in the source media file.

<video src="sample.mp4" system-language="en" audio-bitrate="96000">
<param name="audioOnly" value="TRUE" valuetype="data"/>
</video>
Note: Do not set the audioOnly and videoOnly parameters to True in the same <video> element, as this will cause unpredictable behavior.

Bitrate

The audioBitrate and videoBitrate parameters set the bitrate value of the audio or video track that is selected first from the <video> definition's source media and saves that value in the MPD. Bitrates can also be set as attributes (system-bitrate, video-bitrate, and audio-bitrate). If a bitrate parameter and attribute of the same type are both defined in the same <video> definition, the parameter value is used over the attribute value.
 
<video src="mp4:main.stream" title="English-high">
<param name="audioBitrate" value="105768" valuetype="data"/>
<param name="audioOnly" value="TRUE" valuetype="data"/>
</video>
<video src="mp4:main.stream" title="Main" system-bitrate="2274910">
<param name="videoOnly" value="TRUE" valuetype="data"/>
</video>
Note: All SMIL file <video> elements for MPEG-DASH streams must include at least one bitrate parameter (systemBitrate, audioBitrate, or videoBitrate) or the <video> definition is ignored and you'll see an error in your Wowza Streaming Engine log file. If you don't want to set a bitrate, only use systemBitrate as this legacy parameter isn't used by Wowza Streaming Engine for MPEG-DASH streams and the bitrate is calculated from the media's bitstream data instead.

Codecs

The optional videoCodecId and audioCodecId parameters override the codec information read from the source media and are saved in the codecs parameter in the MPEG-DASH MPD. You may want to override codec IDs when the encoder defines incorrect codec information. These parameters can also be used to distinguish between Adaption Sets with the same bitrate and source. Renditions with codec IDs of profiles in the same codec family are grouped in the same Adaption set

Codec IDs must conform to RFC6381, Section 3.2. Wowza Streaming Engine uses these standards when generating the codec string from the source media's codec data or when a SMIL is generated using the method in Generate a SMIL file with an HTTP provider using the Wowza Streaming Engine Java API. If you're manually creating a SMIL file but don't know the correct codec strings, don't specify the audioCodecId or videoCodecId parameters in the SMIL file and Wowza Streaming Engine will generate the correct IDs automatically.
 
<video src="mp4:main.stream" title="Main" width="768" height="432">
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
</video>

Language

The audioLanguage and videoLanguage parameters define the language of the audio or video track that is selected first in the <video> definition's media file. For example, setting a specific language on a video track is useful when your stream contains embedded captions and you need to create Adaption Sets in the MPD for each language.

The system-Language attribute is a legacy setting can also be used to set the language of the audio or video track that is selected first in the <video> definition's media file. If the media file contains both audio and video tracks, the system-Language is applied to the audio track. To make the language apply to the video track instead, set videoOnly parameter to true.

audioLanguage and videoLanguage override system-Language (if one is set), and all three language parameters override language information read from the source media.
 
<video src="mp4:video.mp4" system-Language="eng" height="240">
<param name="videoBitrate" value="200000" valuetype="data"></param>
<param name="audioBitrate" value="44100" valuetype="data"></param>
</video>
<video src="mp4:audio_eng.mp4" title="English">
<param name="audioBitrate" value="128000" valuetype="data"/>
<param name="audioLanguage" value="en" valuetype="data"/>
</video>

Role

The role, audioRole, and videoRole parameters define primary and secondary language tracks in the media file.

Set the role to main to identify the primary language track (for example, the most important commentator or dialog track). Multiple audio tracks can be set to main. Apply the main role to video tracks to mark that those tracks contain embedded captions meant to be paired with the primary audio language track.

Set the role to dub to identify the secondary language tracks (for example, the alternate translations of the primary language track or supporting commentary). Apply the dub role to video tracks to mark that those tracks contain embedded captions meant to be paired with the translated or secondary audio tracks.

<video src="mp4:audio_eng.mp4" title="English">
       <param name="audioOnly" value="TRUE" valuetype="data"/>
       <param name="audioRole" value="main" valuetype="data"/>
</video>
<video src="mp4:audio_de.mp4" title="Deutsch">
       <param name="audioOnly" value="TRUE" valuetype="data"/>
       <param name="audioRole" value="dub" valuetype="data"/>
</video>

For more information on setting roles, see Using roles for multiple languages.

Aspect ratio

Add the aspectRatio parameter to a <video> definition to override the aspect ratio calculated from the source media. The aspectRatio parameter is saved in the par (picture aspect ratio) property in the corresponding Adaption Set within the MPEG-DASH MPD. If you don't set an aspect ratio, the rendition is placed in the Adaption Set with a par that matches the media's calculated display width and height.
 
<video src="mp4:video.mp4" title="English">
<param name="videoBitrate" value="800000" valuetype="data"/>
<param name="aspectRatio" value="16:9" valuetype="data"/>
</video>

Playback stream duration

You can use the begin and dur attributes to define a specific section of the source media to stream. The begin attribute sets the position (in milliseconds) into the stream at which to begin playback. The dur attribute sets the duration (in seconds) of the stream interval that you want to play.
 
<video src="mp4:video.mp4" begin="250000" dur="240" title="English">
<param name="videoBitrate" value="800000" valuetype="data"/>
</video>

SMIL file settings not used in MPEG-DASH presentations

Wowza Streaming Engine ignores the following parameters and attributes when generating a MPEG-DASH MPD, but you can add these settings to your SMIL file <video> definitions for your reference:
 
  • The width and height attributes.
  • The title attribute.
In addition, the keyFrameOnly parameter is not supported for MPEG-DASH presentations.

How SMIL files affect the MPEG-DASH MPD


Wowza Streaming Engine combines source media data with SMIL file settings to create the Media Presentation Description (MPD) file that is used to stream your content. Renditions are saved in Representation objects in the MPD. These Representations are grouped into Adaption Sets depending on their parameters and attributes. Understanding how Wowza Streaming Engine separates audio and video tracks into Renditions and how to control track selection helps you build a SMIL file that creates your desired streaming experience.

Adaption Sets and Representations

The parameters provided in a SMIL file can have direct impact on how Representations are grouped into Adaption Sets in the MPEG-DASH MPD. Wowza Streaming Engine uses the following rules when creating a MPD from a SMIL file:
 
  • Audio Representations with the same role, language, and container/MIME type are grouped into the same audio Adaption Set.
  • Video Representations with the same role, picture aspect ratio (par), language, and container/MIME type are grouped into the same video Adaption Set.
  • Audio or video Representations with codec IDs that are part of the same codec family are grouped in the same Adaption Set. For example, mp4a.40.2 and mp4a.40.5 are codec IDs for two profiles of the AAC codec family. Renditions using these codec IDs remain in separate Representations that are grouped into the same Adaption Set for that codec family.
  • Within each Adaption Set, renditions are differentiated by bitrate and codec ID. If two renditions have the exact same bitrate and codec ID, only one is actually listed in the Adaption Set.

Using roles for multiple languages

The role, audioRole, and videoRole parameters are used to group renditions into Adaption Sets in the MPD (an Adaption Set is created for each role used in the SMIL file).

Use audioRole and videoRole when the media contains both video and audio tracks and use role when the rendition media is audio-only or video-only. Renditions with the same role in the SMIL file will all be placed together in an Adaption Set that's unique for that role in the MPD. All renditions with no role will also be placed together in an Adaption Set. Configuring multiple renditions to have the same role may be preferred when you want the player to choose a default rendition instead of the viewer (Adaption Sets are usually chosen by the viewer).

If your source media contains multi-language audio tracks in the source media files or live stream, we recommend that you define the following SMIL file parameters for each audio track in the source media:
 
  • audioRole – Tells Wowza Streaming Engine to put the track into the same Adaption Set as other renditions of the same role (main or dub).
  • audioLanguage – Tells Wowza Streaming Engine to put the track into the same Adaption Set as other renditions with the same language. Set this parameter's value to a two- or three-letter language code.

You can also use the audioOnly parameter if the video from the stream isn't present or should be excluded from the MPD.

If your source media contains multi-language video tracks in the source media files or live stream, we recommend that you define the following SMIL file parameters for each video track that contains a different video language (or embedded captions):

  • videoRole – Tells Wowza Streaming Engine to put the track into the same Adaption Set as other renditions of the same role (main or dub).
  • videoLanguage – Tells Wowza Streaming Engine to put the track into the same Adaption Set as other renditions with the same language. Set this parameter's value to a two- or three-letter language code.
You can also use the videoOnly parameter if the audio from the stream isn't present or should be excluded from the MPD.

Selecting tracks with audioindex and videoindex

To select audio and video tracks within a source media file that contains multiple audio or video tracks, add the audioindex or videoindex query parameters to the stream URLs as required. The attributes and parameters in the <video> definition will apply to the selected track.

The default value of each index parameter is 0 which uses the first track of each type. To use the second track of a track type, set that type's index to 1. To use the third track, set the index to 2, and so on. Using index parameters is only required if you want to use a track other than the first track of that type (and therefore are defining an index of 1 or higher).
 
Note: You must enable the ModuleMultiTrackVOD module to select specific audio and video tracks using audioindex and videoindex.

VOD SMIL file examples


Selecting multiple audio tracks from a single file

The following example uses a file containing a single video track and two separate audio tracks. Specific audio or video tracks are selected from a source media file using the audioindex and videoindex query parameters in the source src attribute.

single-file-multilang-audio.smil

<smil>
<head>
</head>
<body>
<switch>
<video src="mp4:myVideo.mp4?audioindex=0" title="English">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="128000" valuetype="data"/>
<param name="audioLanguage" value="en" valuetype="data"/>
</video>
<video src="mp4:myVideo.mp4?audioindex=1" title="Deutsch">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="128000" valuetype="data"/>
<param name="audioLanguage" value="de" valuetype="data"/>
</video>
<video src="mp4:myVideo.mp4" width="656" height="274">
<param name="videoOnly" value="TRUE" valuetype="data"/>
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="800000" valuetype="data"/>
</video>
</switch>
</body>
</smil>

Selecting multiple audio tracks from different files

The following example uses a single video-only file and two separate audio-only files. All files are timecode-aligned.

multi-file-multilang-audio.smil

<smil>
<head>
</head>
<body>
<switch>
<video src="mp4:audio_eng.mp4" title="English">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioRole" value="main" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="128000" valuetype="data"/>
<param name="audioLanguage" value="en" valuetype="data"/>
</video>
<video src="mp4:audio_de.mp4" title="Deutsch">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioRole" value="dub" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="128000" valuetype="data"/>
<param name="audioLanguage" value="de" valuetype="data"/>
</video>
<video src="mp4:video.mp4" width="656" height="274">
<param name="videoOnly" value="TRUE" valuetype="data"/>
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="800000" valuetype="data"/>
</video>
</switch>
</body>
</smil>

Selecting multiple audio and video tracks from different files

The following example uses a set of four files containing audio and video tracks. In this example, we only allow the playback of the audio from two of the files (media1.mp4 and media2.mp4) and video from the other two files (media3.mp4 and media4.mp4).

multi-file-multi-audio-multi-video.smil

<smil>
<head>
</head>
<body>
<switch>
<video src="mp4:media1.mp4" title="English">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="128000" valuetype="data"/>
<param name="audioLanguage" value="en" valuetype="data"/>
</video>
<video src="mp4:media2.mp4" title="Deutsch">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="128000" valuetype="data"/>
<param name="audioLanguage" value="de" valuetype="data"/>
</video>
<video src="mp4:media3.mp4" width="1280" height="720">
<param name="videoOnly" value="TRUE" valuetype="data"/>
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="1500000" valuetype="data"/>
</video>
<video src="mp4:media4.mp4" width="656" height="274">
<param name="videoOnly" value="TRUE" valuetype="data"/>
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="800000" valuetype="data"/>
</video>
</switch>
</body>
</smil>

Live SMIL file example


Selecting multiple audio tracks

The following example uses a MPEG-TS stream that has two audio tracks and one video track to create an MPEG-DASH MPD with multiple language tracks. Each audio and video track in the source stream requires a separate .stream file as well as SMIL files to start and play the set of .stream files as a group.

Creating .stream and SMIL files for stream startup

First, create a separate .stream file for each audio and video track in the source stream. (If the source is a single MPEG-TS source, then you must enable port sharing.) Each .stream file must list an audio packet identifier (PID) and video PID for the rendition. If a PID isn't listed, then the first one of each type is used. To reduce the number of packetizers created, you can create separate audio-only and video-only renditions. To create either an audio-only or video-only rendition, set the other PID to a non-existent value (1 or 0).

main.stream - English and main video

{
  uri : "udp://0.0.0.0:10004",
  mpegtsVideoPID : "0x104",
  mpegtsAudioPID : "0xfc"
}

de.stream - German and main video

{
  uri : "udp://0.0.0.0:10004",
  mpegtsVideoPID : "0x104",
  mpegtsAudioPID : "0xfd"
}
Note: If the video encoding isn't already using H.264 or the audio encoding isn't already using AAC, then they must be transcoded to use these codecs before they can be used. See Transcoding below for more information.

The separate .stream files should be started as a group so that they all start packetizing at the same time. To do this, create a SMIL file (startup.smil) to start the streams. The following example SMIL file contains the two separate audio streams (English and German) and two video streams that contain the same main video track.

startup.smil

<smil>
<head></head>
    <body>
        <switch>
            <video src="mp4:main.stream" system-bitrate="2380678">
            </video>
            <video src="mp4:de.stream" system-bitrate="2380678">
            </video>
        </switch>
    </body>
</smil>

This SMIL file is used to start the streams as a group by using the Startup Streams page found in the Server options in Wowza Streaming Engine Manager. It also allows the streams to be reset as a group if there's a problem. However, this SMIL file isn't used to play the streams.

Creating the SMIL file to play the streams

Create a second SMIL file to contain the additional information required to play the streams that were started by the first SMIL file.

The following example SMIL file contains the video track and the English audio track from the main.stream stream and the Deutsch audio from the de.stream stream. The English audio track is assigned the main role and the Deutsch audio track is assigned the dub role.

alternative-audio.smil

<smil>
<head>
</head>
<body>
<switch>
<video src="mp4:main.stream" title="English and video" width="768" height="432">
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="2274910" valuetype="data"/>
<param name="audioRole" value="main" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="105768" valuetype="data"/>
<param name="audioLanguage" value="en" valuetype="data"/>
</video>
<video src="mp4:de.stream" title="Deutsch">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioRole" value="dub" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="105768" valuetype="data"/>
<param name="audioLanguage" value="de" valuetype="data"/>
</video>
</switch>
</body>
</smil>

The following example SMIL file is an alternative to the above SMIL file. In the SMIL file below, the audio and video from the main track are defined in separate renditions, one configuring audio parameters in main.stream and the other configuring video parameters in main.stream. Both SMIL file examples create the same result in the MPD.

alternative-audio-split.smil

<smil>
<head>
</head>
<body>
<switch>
<video src="mp4:main.stream" title="English">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioRole" value="main" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="105768" valuetype="data"/>
<param name="audioLanguage" value="en" valuetype="data"/>
</video>
<video src="mp4:de.stream" title="Deutsch">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioRole" value="dub" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="105768" valuetype="data"/>
<param name="audioLanguage" value="de" valuetype="data"/>
</video>
<video src="mp4:main.stream" title="video" width="768" height="432">
<param name="videoOnly" value="TRUE" valuetype="data"/>
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="2274910" valuetype="data"/>
</video>
</switch>
</body>
</smil>

Transcoding

You can also use Transcoder to provide adaptive bitrate renditions with alternative audio or video tracks. Each track that requires transcoding only needs to be transcoded once and then you can combine these transcoded tracks with the original tracks.

alternative-audio-abr.smil

<smil>
<head>
</head>
<body>
<switch>
<video src="mp4:main.stream" title="English-high">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioRole" value="main" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="105768" valuetype="data"/>
<param name="audioLanguage" value="en" valuetype="data"/>
</video>
<video src="mp4:main2.stream" title="English-low">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioRole" value="main" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="98304" valuetype="data"/>
<param name="audioLanguage" value="en" valuetype="data"/>
</video>
<video src="mp4:de.stream"title="Deutsch">
<param name="audioOnly" value="TRUE" valuetype="data"/>
<param name="audioRole" value="dub" valuetype="data"/>
<param name="audioCodecId" value="mp4a.40.2" valuetype="data"/>
<param name="audioBitrate" value="105768" valuetype="data"/>
<param name="audioLanguage" value="de" valuetype="data"/>
</video>
<video src="mp4:main.stream" title="Main" width="768" height="432">
<param name="videoOnly" value="TRUE" valuetype="data"/>
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="2274910" valuetype="data"/>
</video>
<video src="mp4:main.stream_360p" title="Main" width="640" height="360">
<param name="videoOnly" value="TRUE" valuetype="data"/>
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="1800000" valuetype="data"/>
</video>
<video src="mp4:main.stream_240p" title="Main" width="320" height="240">
<param name="videoOnly" value="TRUE" valuetype="data"/>
<param name="videoCodecId" value="avc1.4d400c" valuetype="data"/>
<param name="videoBitrate" value="800000" valuetype="data"/>
</video>
</switch>
</body>
</smil>

This SMIL has two extra video tracks that were transcoded (main.stream_360p and main.stream_240p). The SMIL file also contains an extra audio-only track that provides a low bitrate audio-only English stream. This SMIL file create a MPEG-DASH MPD with one English audio Adaption Set containing two bitrate renditions, one Deutsch audio Adaption Set containing a single bitrate rendition, and one video Adaption Set containing three bitrate renditions. A player or viewer can combine any of these audio renditions with any of the video renditions.