Encoding Ladders: What You Need to KnowAugust 23, 2022
This article will introduce you to the concept of an encoding ladder, identifying what an encoding ladder is, what it does, and how to create one. It concludes with a look at the finer details of creating and deploying encoding ladders.
Meet Your Encoding Ladder
All live and video-on-demand (VOD) streaming experiences begin with a single source file. To distribute that file, producers transcode it into multiple additional files designed to optimize the experience for different viewers watching the video on different devices and connection speeds. All adaptive bitrate (ABR) technologies like Apple’s HTTP Live Streaming and the standards-based MPEG-DASH create encoding ladders to distribute live and VOD files.
Over the years, Apple has done a fabulous job documenting and presenting samples of encoding ladders, starting with the venerable Tech Note TN2224 (now taken down) and currently in the HTTP Live Streaming (HLS) Authoring Specification for Apple Devices. The ladder shown in Table 1 is the H.264 ladder from the Apple document, and it’s the most common starting point for new streamers creating their first encoding ladder.
Encoding Ladders with Wowza Products
You create encoding ladders whenever you stream using HLS or DASH with Wowza Video or Wowza Streaming Engine. For VOD experiences, you create the ladder and upload that to your Wowza product. For live, you can either create the streams in your own encoder and deliver them to the server, or use Wowza Transcoder to create the encoding ladder from your source. You see this in Figure 2, where the Transcoder is ingesting a live stream and creating an encoding ladder with the source as the top rung.
At a high level, creating an encoding ladder involves four steps.
1. Setting the floor.
2. Setting the ceiling.
3. Choosing intermediate bitrates.
4. Choosing resolutions for all rungs.
Setting the Floor
The floor is the lowest bitrate that you intend to support, which is 145 kbps in the Apple ladder, which will be a pretty low-quality video. Some producers set the floor higher, essentially concluding that if we can’t deliver at least a reasonable quality experience, we won’t bother.
If you’re currently streaming and have delivery logs you can access, you might check out how frequently your lowest quality streams are being retrieved. If you seldom deliver streams below 500 kbps, you may choose a floor at this level. If you’re delivering lots of streams below that number, you probably want to keep supporting those viewers.
Setting the Ceiling
The ceiling is the top rate that you want to support; you see Apple’s ladder peaks at 7800 kbps. Not only is the top rung the most expensive rung to deliver, but in many regions, like the US, Europe, and Scandinavia, it’s almost always the stream that’s delivered to the most viewers. For this reason, economics almost always dictate your top rate; whatever your monetization plan is for your video, you need to make sure you can afford to distribute at this rate.
Amazon estimates the cost of distributing an 8 Mbps feed in a useful example here, which comes out to just under $0.30/hour. Your cost will vary, but you should determine that cost, and make sure it’s affordable when setting the ceiling.
Choosing Intermediate Bitrates
Let’s say you set the floor at 500 kbps and the ceiling at 5 Mbps. How do you choose the bitrates for the rungs in between? The general rule is to make sure that each “jump” between rungs is no more than 1.5 to 2x. So, rung 2 should be anywhere from 2.5 Mbps to 3.333 Mbps.
The smaller the jump (say 1.5), the more rungs you produce, which increases encoding and storage costs but may increase the quality of experience slightly by delivering more higher quality rungs. The larger the jump (say 2.0), the lower the cost of encoding and storage, but the quality of experience may drop slightly.
Table 2 shows a reasonable middle ground between 1.5 and 2x. Nothing magical here, just working the numbers and rounding to even bitrates.
Choosing the Resolutions for All Rungs
You want your top rung to deliver the best experience. So long as your top bitrate is over 4 Mpbs or so, you should be able to deliver very good quality at 1080p. Below that, you may want to consider 900p (1600×900) or even 720p. From there, you want to decrease resolutions as shown until you get to a minimum of 480×270 or so. There’s seldom any reason to go lower than this.
You may want to consider the source when setting these resolutions. For example, sports content might look better at slightly smaller resolutions while animations will definitely look better at larger resolutions. This leads us to the next point about per-title encoding.
Per-Title Is Better
Per-title encoding (or content-aware, or context-adaptive encoding) creates a unique encoding ladder for each video file. This technique was debuted by Netflix in late 2015, and it’s without question the most efficient way to encode and distribute video.
Why? Because each video file is unique. You see this in Figure 2 from the Netflix announcement. Using the peak signal to noise (PSNR) metric as a quality gauge, the figure shows that some files encode at higher than 48 dB at well under 2.5 Mbps (the High-Quality files on top), while some don’t reach 38 dB even at 20 Mpbs (the Low-Quality files on the bottom).
Use an encoding ladder ideal for high-quality files and your low-quality files will look terrible. Create an encoding ladder that optimizes the quality of your low-quality files, and you’ll waste bandwidth on all other files. As Netflix concluded, “Given this diversity, a one-size-fits-all scheme obviously cannot provide the best video quality for a given title and member’s allowable bandwidth.”
Apple’s Authoring Specification also recognizes this reality, with a note to their encoding ladders stating, “The above bit rates are initial encoding targets for typical content delivered via HLS. Apple recommends that you evaluate them against your specific content and encoding workflow, then adjust accordingly.”
Note that the opposite of per-title is static, as in “the producer eschewed per-title encoding and uses a static encoding ladder.” In terms of implementation, there are techniques for creating your own per-title encoding ladder, but they are generally cumbersome, time-consuming, and expensive. At this point, most encoding tools and cloud services offer a per-title encoding option; you just need to choose it and use it.
Different Codecs: Different Ladders
As a final note, understand that whether you opt for per-title or static, your encoding ladder should change based upon the codec that you’re using. If you check the Apple Authoring note, you’ll see that Apple recommends a different ladder for H.264 and HEVC; the same holds true for VP9 and AV1, and will for newer codecs as they come along.
Why? Typically more advanced codecs are more efficient, which means you can use lower bitrates at higher resolutions. They also tend to work more efficiently at higher resolutions, which means you can eliminate the lower resolutions altogether. If you compare Apple’s H.264 and HEVC-recommended ladders, you’ll see both effects. You might also check out this article, which helped convince Apple that their H.264 and HEVC ladders needed to be different.