Scalable Video Coding for WebRTCJanuary 12, 2023
The Web Real-Time Communications (WebRTC) technology promises ultra-low latency streaming, but with a catch (so to speak). The technology is famously difficult to scale, and the more developers have tried, the harder it’s been for them to hold on to the speed that makes up the core of WebRTC’s appeal. Enter: supplementary servers and adapted workflows that circumvent these roadblocks.
So, what does all of this have to do with Scalable Video Coding (SVC), a codec extension first released in 2007 (a good six years before WebRTC was initially launched)? The short story is that SVC was designed to improve stream adaptability while WebRTC was still an apple in Google’s eye and has sense shown great promise in addressing WebRTC’s scalability concerns. For the long story, keep reading.
Table of contents
- What is Scalable Video Coding?
- Scalable Video Coding and WebRTC
- How Does Scalable Video Coding Work?
- Benefits of Scalable Video Coding
- Getting Started with Scalable Video Coding
What is Scalable Video Coding?
Let’s pull back from WebRTC for a minute for a brief history lesson. SVC started as an extension for the H.264 (also known as MPEG-4 or AVC) codec. As a video codec, H.264 compresses raw video data for storage and delivery across a network. Built into its 1s and 0s are a series of rules for how this data is compressed, including standards for frame rate and quality. In order to change the bitrate of this data for streaming to different devices, you’d need to decompress and recompress the data at different sizes. This is pretty standard for a media server, whose job often entails transcoding encoded files for improved playback. However, it takes time.
With the addition of SVC, H.264 is able to encode the raw video data in layers such that the file could be “peeled back” to access varying bitrates without having to decompress the data. Basically, a single stream is sent and whittled away as needed for playback across different devices. No transcoding required.
Scalable Video Coding and WebRTC
Although H.264 is a supported video codec for WebRTC, SVC with H.264 is not. For whatever reason, SVC only became available to WebRTC as an extension to Google’s own VP9 codec and, more recently with Google’s AV1 codec. And it’s a good thing it did, as SVC’s unique approach to creating adaptable streams using fewer resources lends itself well to the speed vs scalability conundrum.
Of course, you’ll need more than just the VP9 or AV1 video codecs if you want to use SVC with WebRTC unless your plan is to just send bloated streams at a single high bitrate to all playback devices. Typical WebRTC does not require a media server, usually considered a bonus for smaller scale streams. However, SVC requires a selective forwarding unit (SFU) media server to do the important work of peeling back the encoded stream and sending appropriate bitrates to playback devices for the best possible stream quality.
Get the Ultimate WebRTC Guide
Everything you need in one place.Download Free
How Does Scalable Video Coding Work?
Ok, so we’ve been throwing a lot of jargon at you and, despite that, you probably have a good sense of how exactly SVC promotes stream adaptability and scalability. That said, let’s break it down to better understand how it differs from workflows with similar goals.
Imagine your stream in three parts: publisher, SFU, and a collection of playback devices.
The publisher refers to you or, rather, whatever device you’re using to capture and encode your video files. You can send these files as one or multiple encoded streams to the SFU for distribution, each encoded at a specific bitrate. In the case of SVC, you have a single stream encoded at multiple bitrate layers, such that if you were to peel back one layer, you’d still have an intact stream at a lower bitrate underneath.
This streaming onion is sent to the SFU. It’s the SFU’s job to peel away the stream layers, creating a stream suitable for optimal playback on a given device.
The SFU then sends the adjusted stream along to the appropriate playback device. Depending on the available bandwidth and other limitations of the target devices, some could receive a lower quality stream while others get the whole onion (so to speak).
What Bitrate Layers Should I Use?
WebRTC bandwidth estimation allows publishers to determine target output resolutions based on the available bandwidth between them and a recipient’s device. Your publisher can’t see end-user bandwidth with an SFU sitting in the middle, but it can employ the same tactics to determine a target output resolution based on the available bandwidth between it and the SFU. This target resolution becomes the top layer of the onion and all other layers determined by running that target resolution through an algorithm.
What Does Lowering the Bitrate Really Mean?
Bits equal digital information. Bitrate refers to speed with which bits are transferred over a network. This is typically measured in kilobits per second (Kbps). A higher video bitrate can transfer more information at a speed necessary for streaming, which translates to higher quality video.
That said, the media publisher isn’t just dropping chunks of video to create lower layers of the onion. The process for choosing how to reduce a video is more nuanced. To understand this, it helps to think of video information in three categories: spatial, temporal, and signal quality.
- Spatial Resolution – Think of this as the amount of spatial information in a given frame (i.e. pixels). The higher the spatial resolution, the clearer the image.
- Temporal Resolution – This is the amount of temporal information in a given video (i.e. frame rate). The higher the temporal resolution, the smoother the video.
- Signal Quality – Naturally, bits and frames contribute to the overall quality of a video. Signal quality specifically refers to fidelity, or the degree to which the original image is preserved.
A stream encoded into layers will drop packets (information) to achieve lower bitrates according to these three measures. It may prioritize certain factors over others or opt for a hybrid approach.
Become a Streaming Protocol Expert
Learn about codecs and protocols, the latest live streaming trends, and much more.Subscribe
Benefits of Scalable Video Coding
The magic of video streaming is that it can reach just about any device, anywhere. The challenge of video streaming is that it’s often expected to reach any device, anywhere. Streaming experts are constantly on the search for better and more reliable ways to tailor their streams to a variety of playback device and bandwidth limitations. So is SVC right for you or should explore a different method?
SVC and Adaptive Bitrate Streaming
Adaptive bitrate (ABR) streaming is considered by many to be the gold standard if you want to stream to a high volume of varied devices. This method transcodes encoded data segments as it streams. As each data segment hits a given playback device, information is sent back regarding bandwidth availability. The media server adjusts the bitrate of ensuing segments up or down according to this information. This method is very effective for providing the highest possible quality without sacrificing stream reliability.
For many use cases, this is a solid solution. However, proponents of WebRTC streaming find the time it takes to live transcode data significantly increases latency. Since WebRTC’s main selling point is speed, this presents a problem.
SVC, on the other hand, doesn’t require transcoding. The SFU alters already encoded data without the need for repackaging it. In short, it’s faster than ABR, but not as nuanced, particularly where end-user bandwidth fluctuations are concerned.
SVC and WebRTC Simulcasting
WebRTC simulcasting works very similarly to SVC. They both involve sending a handful of bitrate options to an SFU media server, which selects the best possible bitrate for a given end-user device. However, WebRTC simulcasting doesn’t create a single layered stream. Instead, it creates a few different streams at a few different bitrates.
SVC and WebRTC simulcasting have similar benefits and workflow demands. However, it can take more publisher bandwidth to encode three independent streams than a single layered stream. As such, SVC is a more efficient and less resource-intensive method for achieving the same bandwidth options. That said, the bloated nature of a layered stream does incur a bitrate penalty. In other words, it may need to stream at a higher bitrate to maintain the same quality of a similar single stream. That and it doesn’t work with the VP8 codec for WebRTC.
Getting Started with Scalable Video Coding
Wowza’s Real-Time Streaming at Scale solution makes WebRTC scalable to a million viewers across a multitude of platforms. In our workflow, publisher data is sent to our Wowza Video API. It then funnels through a custom content delivery network (CDN) for large-scale delivery. This CDN acts as an SFU, typically for a WebRTC simulcast workflow. However, our support for both the VP9 and AV1 codecs opens WebRTC workflow possibilities up to scalable video coding. Whatever your WebRTC needs, our video experts can get you on the right path.