Video Formats for On-Demand Streaming
When it comes to delivering edited content — whether it’s for playback in your house of worship, for over the air (OTA) broadcast, or even just for over-the-top (OTT) delivery — there are three critical factors to consider: collections, compression, and container formats.
As a contributing editor for Streaming Media Magazine over the past 22 years, and as a film and non-linear video editor for almost a decade prior to that, I’ve seen, used, and written about dozens of compression solutions. Every few years we get an advance in standards-based video compression, and a contribution to the growing glut of acronyms, from the newest AV1 and VVC to older 3GPP and even older H.261 or Apple Animation codecs.
Despite this, there’s been surprisingly little work on video formats in that same time period. This blog post explores both the background and future of container formats. From there, I’ll suggest an approach to storing content in a way that lessens the transcoding and repackaging requirements of audio and video tracks for streaming playout.
Why Do Video Container Formats Matter?
From a consumer standpoint, the fact that the majority of file formats used in 2020 to store video are more than two decades old is good news, since these container formats are understood by operating systems as far back as Windows Vista and the original Mac OS X, as well as every major smartphone and tablet operating system since the Palm OS. It’s also beneficial from a collections (or archiving) standpoint, with organizations such as the National Archives and its Video Preservation Lab recommending standard-definition (SD) videos be stored uncompressed in the Audio-Video Interleaved (AVI) container file.
From a content delivery standpoint, though, the choice of a container format is almost as important as the video compression contained inside the video file. That’s because the wrong compression-container pairing can add significant expense to an OTT workflow.
For instance, the AVI video format mentioned above — which shouldn’t be confused with the more recent AV1 open-source video codec — could contain over fifteen different compression schemes. In addition, there are Type 1 and Type 2 AVI containers, with most non-linear editing solutions only supporting Type 1, as well as open-sourced versions known as AVI 2.0 containers. The QuickTime MOV file extension is another container format that can today support over fifty audio, still image, and video compression schemes — some of which are almost 30 years old (JPEG is one example).
The Importance of Matching the Container Format and Codec
Even if you’re using an industry-standard video codec like Advanced Video Coding (AVC, also known as H.264 or MPEG-4 Part 10), a codec-container mismatch creates issues for the streaming protocol in use, be it Apple HTTP Live Streaming (HLS) or MPEG Dynamic Adaptive Streaming via HTTP (DASH).
For both emerging and legacy codecs, the wrong container could necessitate transcoding, re-containerizing, and then repackaging the content before it can be streamed. While this is not a showstopper (thanks to transcoding engines in products like the Wowza Streaming Engine), it does present a critical inflection point in the workflow process. The tradeoff comes when a content owner needs to factor in a significant computational burden for transcoding, as well as increased storage capacity on every delivery server, for every piece of content that’s requested.
How Do You Choose the Best Video Format?
So what’s the best approach when it comes to storing content for on-demand streaming? The simple answer is to think about which container formats are best suited for segmented streaming delivery over HTTP.
QuickTime MOV: The Trailblazer
I mentioned QuickTime’s MOV above as a legacy container format, but it turns out that it was a robust enough container format for Apple to use in the iTunes Store during their initial forays into AVC video encoding and AAC audio encoding. Apple shifted from just MOV — which could contain audio, video, interactive elements, and subtitling — to the more descriptive container formats of M4V for video and M4A to audio. Along the way, the company added digital rights management (DRM), which I’ll cover in another blog post next month. This robustness led to significant adoption beyond just iTunes and the Mac operating system.
The QuickTime MOV container format was then selected by the ISO standards body to play a larger role in everything from OTT to localized playback. Thanks to the efforts of the ISO’s MPEG standards working group, the QuickTime MOV container format became what’s now known as the ISO Base Media File Format or MPEG-4 Part 12. This happened around the same time that MPEG adopted H.264 into the MPEG-4 standard as MPEG-4 Part 10, leading to an intended combination of AVC and the container format formerly known as QuickTime.
MP4: The Next Evolution
The container format was extended in MPEG-4 Part 14 to what is now known as MP4. Additional work has been done on two key benefits of MP4 — the ability to store multiple audio and video tracks within a single container, as well as to do byte-range addressing — to extend MP4 for use in streaming.
HTTP streaming works by sending hundreds or even thousands of small files from the media server to the client’s playback device. These small files initially were known as segments and needed to be pre-packaged into a legacy OTA container format known as an MPEG-2 Transport Stream (MPEG-TS, or M2TS, or even .ts for short), which itself is based on a legacy telecom protocol known as ATM.
Because of the legacy nature of ATM (and the subsequent adoption of transport streams by satellite providers to deliver video uplinks and downlinks), the packet sizes contained a significant amount of overhead (additional data needed to confirm that the packets had been delivered) and were better suited for static playback on a living room television set rather than the dynamic nature of OTT streaming where hundreds of different software players could add significant complexity.
Fragmented MP4: Today’s Standard
That’s where MP4 byte-range addressing comes into play, in what is now known as fragmented MP4 (fMP4). This solution — where the elementary audio and video files remain intact, but the streaming engine repackages small segments of the requested audio and video tracks just in time for delivery — was the basis of Adobe HTTP Dynamic Streaming and Microsoft Smooth Streaming, which eventually formed the basis for MPEG-DASH.
One key benefit for this fMP4 approach is what’s called “late binding” where all the audio and video fragments can be delivered separately from one another to the client playback device and then bound together just prior to playback. Besides making it easier to deliver alternate audio or video tracks from within the MP4 container, this also significantly reduces the dozens of permutations of bitrates, video files, and multiple audio tracks that plagued early adopters of MPEG-TS-based HLS streaming.
Ironically, even though Apple’s QuickTime formed the basis of MP4 and Apple had a role on the committee that approved MPEG-DASH, it wasn’t until several years after fMP4 was in use that Apple shifted HLS away from MPEG-TS to fragmented MP4 approach.
As such, we’re now at a point in the industry where fragmented MP4 forms the basis of all current standards-based OTT on-demand streaming solutions that use HTTP for delivery. It’s worth noting that MP4 is not limited to AVC on the video compression front. Not only can HEVC (H.265 or MPEG-H Part 2) be used in an MP4 container format, but legacy MPEG-1 and MPEG-2 video codecs can also be used.
Likewise, audio from these legacy standards (MPEG-2 Part 3 or what we’d typically call MP3 files these days) can also be used in the MP4 format, alongside newer audio codecs such as Advanced Audio Coding (AAC) and its high-efficiency derivatives.
What About AV1, CMAF, and Low Latency?
Traci Ruether has an excellent blog post on the state of low latency, in which she touches on both CMAF and low-latency efforts by Apple to shorten the segment lengths in HLS — thereby allowing for faster startup times and more rapid shifts between bitrates.
I’ll cover live streaming formats in a blog post next month as well, hopefully tying together the benefits of fragmented MP4 with the need for speedy encoding and delivery that’s quickly becoming table stakes for live event streaming servers.
AV1 has been engineered to fit within the ISO Base Media File Format, so it can be contained in an MP4 container, but it behaves differently from standards-based MPEG codecs and has limited support in media servers. As AV1 gains traction, we’ll revisit the topic of additional container formats, such as Matroska (MKV), which has mainly been used for bit-for-bit copies of content for localized playback but holds potential for streaming.
Search Wowza Resources
About Tim Siglin
Tim Siglin, who has over two decades of streaming media design and consulting experience, and an additional 10 years in video conferencing and media production, has written for Streaming Media magazine and other publications for 23 years. He has an MBA in International Entrepreneurship and currently serves as the founding executive director of Help Me Stream Research Foundation, a 501(c)3 dedicated to assisting NGOs in emerging markets with the technologies needed to deliver critical educational messages to under-served populations.