HLS Latency Sucks, But Here’s How to Fix It
February 12, 2017 by
Since Android started to support HTTP Live Streaming (HLS) in its Honeycomb release, HLS has taken over the world for streaming video. But while HLS surges in popularity, we continue to hear complaints from customers about tuning and reducing latency. The basic message we hear, “HLS latency sucks.”
It’s not all doom and gloom. HLS has a lot of things going right, and it can’t be ignored or dismissed as a viable option for your streaming decisions.
First, HLS provides excellent quality. Remember when streaming sucked because it was always buffering? HLS solved that problem by using chunks to ensure that your stream can be played back seamlessly, in high quality, without causing the "Spinning Beach Ball of Death." Secondly, HLS reduced the cost of delivering content. Using affordable HTTP infrastructures, content owners have could easily justify delivering their content to online audiences, and expand potential viewership. HLS is also ubiquitous – delivering to more devices and players – which means that it’s cheaper for the consumer to watch without needing a specialized device. Regardless if you’re using an iOS, Android, HTML5 player, and even some set-top boxes (Roku, Apple TV, etc.), HLS streaming is available.
Lastly, HLS scales well. Like the MPEG-DASH (Dynamic Adaptive Streaming over HTTP) codec, HLS uses a packetized content distribution model that cuts and then reassembles video chunks based on the manifest (HLS uses .m3u8 playlist) file. It also provides CDNs and encoding/transcoding software providers with a relatively common platform to standardize across their infrastructure, and allow for edge-based adaptive bit rate (ABR) transcoding.
But for many of the same reasons that HLS is great, it also has faults when it comes to latency. The most likely sources of latency injections ABR delivery include the encoding, transcoding, distribution, and the default playback buffer requirements for HLS.
When changing an adaptive stream in HLS, it demands a new buffer to be built. At the time of this article, Apple defaults to 10-second content chunks and a certain number of packets to create a meaningful playback buffer. This results in about 30 seconds of glass-to-glass delay seconds from capture to final packet assembly. But, when you introduce CDNs for greater scalability, you inject another 15-30 seconds of latency so the servers can cache the content in-flight – not to mention any last-mile congestion that might slow down a stream.
In some recent events streamed using HLS, such as Twitter sporting events and presidential debates, you could see a delay of up 90 seconds by the time that the consumer watches something on his or her mobile device.
HLS isn’t a viable option when interactivity or broadcast-like speed matters. Nobody wants to see spoilers in their Twitter feed while watching a game on their phone. Likewise, you don’t want to have large delays in interactivity with game streaming or UGC broadcasters, like in a Facebook live or Twitch stream. That’s because consumers today expect their content to arrive as fast as satellite or cable feeds, regardless of the realistic nature of the streaming app.
So, when latency and user-experience matters, how can you tune HLS to make a difference?
With Wowza Streaming Engine™ media server software, you can stream lower-latency Apple HLS content smoothly. The process requires several customizations to the way Wowza Streaming Engine manages chunks and packetization, but there are four simple steps to help tune your workflows in Wowza Streaming Engine to deliver lower-latency HLS streams.
Reduce your chunk size
Currently, in the HLS Cupertino default settings, Apple recommends a minimum of 6 seconds for the length of each segment duration. We have seen success manipulating the size to half a second. To reduce this, modify the chunkDurationTarget, to your desired length (in milliseconds). HLS chunks will only be created on Key Frame boundaries so if you reduce the minimum chunk size, you need to ensure it is a multiple of the key frame interval or adjust the key frame interval to suit.
- Increase the number of chunks
Wowza Streaming Engine stores chunks to build a significant buffer, should there be drop in connectivity. The default value is 10, but for reduced-latency streaming we recommend storing 50 seconds of chunks. For one-second chunks, set the MaxChunkCount to 50; if you're using half-second chunks, the value should be 100.
- Modify playlist chunk counts
The number of items in an HLS playlist defaults to three, but for lower latency scenarios, we recommend 12 seconds of data to be sent to the player. This prevents the loss of chunks between playlist requests. For one-second chunks set the PlaylistChunkCount value to 12; if you're using half-second chunks, the value should be doubled (24).
- Set the minimum number of chunks
The last thing you want to adjust is how many chunks must be received before playback begins. We recommend a minimum of 6 seconds of chunks to be delivered. To configure this in Wowza Streaming Engine, use the custom CupertinoMinPlaylistChunkCount property. For one-second chunks, set it to 6, or 12 for half-second chunks.
Downstream Impacts (risks)
Low latency streaming with HLS doesn’t come without inherent risks.
First, smaller chunk sizes may result in playback errors if you fail to increase the number of seconds built into the playlist. If a stream is interrupted, and the player requests the next playlist, the stream may be interrupted when the playlist doesn’t arrive.
Additionally, by increasing the number of segments that are needed to create and deliver low-latency streams, you also increase the server cache requirements. To alleviate this concern, ensure that your server has a large enough cache, or built-in elasticity. You will also need to account for greater CPU and GPU utilization resulting from the increased number of keyframes. This requires careful planning for load-balancing, with the understanding that increased computing and caching overhead incurs a higher cost of operation.
Lastly, as chunk sizes are smaller, the overall quality of the video playback can be impacted. This may result into either not being able to deliver full 4K video reaching the player, or small playback glitches with an increased risk of packet loss. Essentially, as you increase the number of bits (markers on the chunks) you require more processor power to have a smooth playback otherwise, you get packet loss and interruptions.
HLS has a lot going right for it when you want to reach millions of devices around the world. But providing low-latency events at scale takes a lot of work. Certain use cases, like those that deliver to less than 1000 viewers, may benefit from adjusting the latency settings, but it should always be used with caution.