All Things Streaming Icon

All Things Streaming

Streaming industry news, how-to’s, and more

Webinar: What Is Low Latency Streaming?

July 12, 2017 by Holly Regan

Reducing latency has become a top priority for many streaming media broadcasters and content providers. But achieving low-latency streaming can be a challenge. In this presentation from Streaming Media East 2017, Jamie Sherry, senior product manager at Wowza, and Mike Talvensaari, vice president of product and UX, explain everything you need to know about latency, and how to minimize it when streaming at scale.

 

In this session, we’ll cover:

  • Definitions of latency and related terms, and why latency can be a problem.

  • The four primary latency scenarios, and their corresponding use cases.

  • A visual demonstration of streaming at different latencies.

  • How streaming at scale impacts latency, and what can be done to mitigate it.

  • Low-latency protocols and technologies.

  • Techniques for reducing latency at various workflow stages.

 

Download the slides from this presentation. Learn more about low-latency streaming.

 

Full Video Transcription

 

Mike Talvensaari:                    

What is latency? I'm going to start with some definitions here of latency, and a bunch of other terms that are important to the conversation. So latency is the time interval between a simulation and a response. Or from a more general point of view, it's the time delay between cause and effect of some change in the system. So, in terms of streaming, it's the delay between initial capture and the viewer. I like to think of it as the delay between reality, and what you see in your player or on your television.

Other related terms: Time to first frame. This is related to latency, but it is not latency. So, this is the delay between when you click the play button on your player, and when video appears. You could have zero latency, if that was even possible, when you click that button when the video appears—and that is unrelated to the time to first frame. It could take five seconds for the first video to appear, but that's not latency. People will perceive that as a bad experience, and they'll even say your latency is terrible, but what they really mean is the click delay, the time to first frame, is terrible.

The broadcast delay—this is something that some people don't even realize is happening. There is a five- to seven-second delay in everything you've been watching for your entire life, and this is an intentional delay, so that they can delete profanity, bloopers or violence. So, this is latency, but it's intentionally introduced into the system for broadcast.

Quality—generally, the higher-quality video you have, the higher the resolution is, the more data there is to send; and when there's more data to send, your latency goes up.

Scale—now scale is how far away your people are, or your viewers are; and how many viewers there are; and how geographically dispersed they are, as well. If you have more viewers and more data to send, or it's going farther, latency is generally going to increase. We'll talk more about both of those later.

And people use bandwidth and throughput, which are my next two terms, interchangeably—but bandwidth is how much traffic your infrastructure can ideally handle, in a perfect world. That's how much you can handle. Throughput is how much is really delivering.

So when you purchase your cable modem package, and it says you're going to get 100 megs down, that's the bandwidth. But your throughput's probably going to be 75 megs, and you'll think you're getting ripped off. And then bitrate is how many bits of data are being processed over time.

Why is there latency? So, I mentioned a couple things: distance and scale. But really, latency is introduced at every step in the system. This is a typical workflow diagram for satellites, and I put Wowza  in the middle, because we're Wowza, but this could be any media server, software or cloud-based service in the middle. But you have delay from the satellite. I don't know how many have seen the Louis C.K. routine where he says, "give it a minute, it's going to space.” That's what this is doing. Space is introducing latency. It's pretty far to space. So, there's going to be some latency there related to the speed of light, which you can't really change. Then you're going into an IRD, and into the capture card.

Both of those are going to introduce some latency—probably minuscule. Wowza Streaming Engine introduces a little bit of latency. And if I'm going direct out to the viewer from the media server software, I have some latency related to transmission, and that is related to distance—how far away they are from the server. I'm going to a CDN; they will also introduce some latency as well, related to just sending the bits out to all the different edges in preparation for delivery.

So, pretty much every step of the way introduces some latency, and you really have to think about latency at every step when designing your system. And—I'll show a demo of this later—the protocol used for delivery, ingest and delivery, also impacts latency. Some protocols are more latent than others.

Why is latency a problem? I get this problem all the time, especially with Flash dying and people worrying about RTMP, which is pretty good as far as latency—and HLS and DASH are perceived to be not as good at latency, but we get this question all the time at trade show booths and with customers. And I like to ask them, it's not a problem—or I like to say, rather—but people think it is.

People think they need really low-latency streaming all the time. But they don't necessarily need it. So in designing your system, because if you want low latency, it's just going to cost more. You're going to have to buy more expensive devices; you're going to have more servers, higher quality, higher bandwidth. But you often don't need lower latency. A 30-second delay is not often a problem. So, latency doesn't actually matter.

So, really think about, "do I need low latency? Does my use case need low latency? If I'm streaming high school sports, does it really matter if Grandma in California sees it 30 seconds delayed, or not?" Probably doesn't matter. HTTP streaming, which is the most popular form of streaming now, intentionally introduces latency for improved reliability. So Apple, HLS, and MPEG-DASH, both by default out of the box, require three chunks to be loaded. The chunks are 10 seconds. That gives you a huge buffer of time for network hiccups. Buffering has been greatly reduced over the years due to things like this, and so reliability is improved by some of these things.

But, this talk is about latency, and for some live streams, latency is critical. If you're doing real-time communication, it's critical. Anything over a second is painful. If you're doing online gaming, any sort of remote control devices, gambling, auctions—latency is critical for certain use cases. So just really think about the use cases, and whether latency is important for your use case.

I want to define—we hear terms like low latency, ultra-low latency, real-time. What do we really mean when we talk about these things? So, I'm going to start at the left. Common HTTP latencies for HLS and DASH. These are upwards of 45 seconds; 30 to 45 seconds, depending on if you're going through a CDN, and this is really for one-way streams to large audiences. I gotta have massive scale. People want to see it in HD or 4K. And so I can deal with latency there; I really want really good reliability. But that's the latency.

Reduced latency is kind of five to 20 seconds, five to 18 seconds—and this would be live-streaming of news and sports and content for OTT providers. Low latency, five to seven seconds, is typical latency for HD cable TV, and so we call “low latency” anything faster than that. And this is for live streams, game streaming, esports—sometimes you want to try to match that five-second delay if you're streaming television content, so it matches up with what people are seeing if they're watching on TV. You want to be shouting "GOAL!" at the same time as your neighbor if you're watching a sporting event. And then near-real time is sub-one second. So this would be for real-time communications, conferencing, any sort of telepresence, real-time device control—I'm controlling a PTZ camera, I want to hit the button and see it pan right immediately. So, that's near-real time. So, that's some definitions.

I'm go into a quick demo that will hopefully work. Never know what's going to happen. What happened to my mouse? Oh, I need to get out of this. There we go. Okay, so this is a demo of what different protocols look like ... oh, it's not going to load. Let's try one more time. Flip to the other one. There we go.

Okay, this shows you the protocol does matter. So I've got a camera here. And I'm going to wave at it, and you're going to see this cascade across. So, top-left—the first two are going out remote on the network. So the first one on the left is WebRTC, transcoded. This is going out to a cloud server. The second one, which we'll talk a little bit more later, is a low-latency API preview we have in Wowza Streaming Cloud. One of these players didn't load. When you try to embed eight players on a page, the browser doesn't really like it. The third one is RTMP, tuned—so I’m streaming RTMP to a Flash player, but I've tuned it for lower latency. And that's probably around three to five seconds. The first two are around three seconds.

RTMP is kind of standard, RTMP is around five to eight seconds. HLS tuned didn't load—that would be about 10 seconds, and MPEG-DASH is probably 10 seconds. The bottom two are standard latencies for what you're going to see on HLS and MPEG-DASH. This is no tuning in the player, no tuning in the server. So, I'll make another face and we'll watch it cascade across. So, this really shows you protocol does matter, so you have to think about this. And you can see the RTMP ones, those require a Flash player. The latency is pretty good on those. And that's why people are so up in arms about the death of Flash. Everyone wants Flash to die. Flash is horrible for the browser and performance, but you've got to have something to replace it with. So, the first two, WebRTC, and—while Wowza Streaming Cloud's loading, this is the API preview—these are pretty good as far as latency.

 

Jamie Sherry:        

It’s at the bottom.

 

Mike Talvensaari:                    

But yeah, the bottom two, you can see it takes them 30-plus seconds to come through. So that show talks about protocol. The first two I said are remote; the rest, the other ones, are all local to this machine—the camera going into WireCast, going into Wowza Streaming Engine running on this box. So there's no network involved. Back to the demo. Back to the slide. There we go.

Okay, what factors impact latency? I've talked a little bit about some of these, but really, it's quality and scale. We look at this as kind of a triangle. I can pick one of these things—I can't have low latency and large scale and high quality. You can get somewhere in the middle—have sort of low latency, and sort of high quality, and sort of large scale, but it's really a trade-off. As you increase the quality of your stream, as you increase scale, either through audience size or distance, you're going to increase latency. So, on the quality and scale side, you're really looking at resolution; is the stream two-way? Am I running multi-threaded? What's the frame rate?

Actually, on that demo, because I had so many streams running in one box through a network connection, I reduce it to 15 frames a second—and I was worried about the network here, so it's only a 700K stream, but I knew it was going to be a bunch of little players. So that's how I got lower latency by decreasing my quality, because I knew what my playback experience would be. If I want smoother playback, what's my chunk size and how much buffering do I have? On the tuned ones, the HLS one didn't load, but the RTMP-tuned, that was setting a low-latency setting in Wowza Streaming Engine and also reducing the buffer size. That was actually JW Player to zero, as opposed to the RTMP one, which I had a buffer of three.

So, buffering really matters. But with the buffer of zero, you don't have any room for error. Any sort of network hiccup in the player, and you're going to see buffering in the player. So, it's going to be a degraded playback experience for your audience. On the scale side, distance is one of the keys. You are not going to go low latency from New Zealand to New York, and it’s not going to be as good as going from Pittsburgh to New York.

Number of participants, number of viewers—both of those burden your server, burden the pipes. Number of streams—how complex is your setup? Am I just going from a camera into a server and out to the viewer, or do I have a CDN in the middle? Do I have various encoders or other processing going on in the middle? And the network variability is huge. As you get more distance, you're probably going through more servers, you have more chance for network variability, and then am I going to diverse endpoints? I could be going to a desktop browser, or a smart TV, a smartphone, a Roku device. Those are all going to add processing time, and you're probably going to see varying latency on all of those devices. Quality—I mentioned this before. Higher quality is higher resolution is more data to send. So, if your network conditions remain the same, if you have a higher-quality stream, you're going to see lower latency. So, if you're going to add higher quality, you've got to beef up your network.

Trade-offs—there's a way to mitigate latency. I can reduce buffer, I mentioned this before, but that makes your stream more vulnerable to network hiccups. Reducing buffer will improve latency, but will probably degrade reliability. I can reduce the frame rate—I mentioned this for streams, I made it 15 frames a second just because I knew I was going to have eight streams going into one server. But 15 frames a second may be good enough; if you're doing slideware, I don't need 30 frames a second to show this slide. You can probably do it with one frame a second. But 15 frames a second is good for talking heads, but it’s going to be terrible for any sort of live sports or action. And of course, I can improve my network, but that's going to be expensive. So, I can pay for more infrastructure—I'll have overwritten more cost, but that will help as well.

On the scale side, I mentioned distance, complexity and variability. As distance, number of streams and viewers increase, your networks are going to be more imperfect—have more variability, and have more degradation. Again, trade-offs—you can increase the buffer size. This is going to add latency, but increasing the buffer will have more reliability. Your stream will be more resilient with any sort of network hiccups. If I've got a 10-second buffer, then I can handle a pretty long network hiccup. If I got a one-second buffer, I can only handle short network hiccups. TCP versus UDP: UDP is lighter-weight, so it's generally lower-latency—but there's no error-checking, no monitoring. UDP will just drop frames on the floor. So, that's a trade-off as well. There's some technologies like SRT, and others that will help UDP be more TCP-like. But you still have no guarantees on the data you received. And then you can increase throughput again. You can always increase your network to achieve better scale and better latency.

And now I'm going to hand that off that Jamie, who is going to talk more about the history of latency through the ages, and talk more about how to achieve it at that scale.

 

Jamie Sherry:                  

Thanks, Mike; hello, everyone. So, where has latency been? So, I bring up legacy technologies because these—how many people ever used Windows Media? Anyone ever use Windows Media? Okay, So I've been doing this a long time, and I use Windows Media, and Real and QuickTime were the choices back in the day, about 10-15 years ago—and I bring these up because they were thinking about latency back then, and these are really well-done technologies. They are no longer used, but they're great examples to rely on when we talk about the future of latency and other aspects of streaming. So, when you talk about Windows Media, they had their own protocol called MMS, they leveraged RTSP, which is a low-latency streaming protocol designed specifically for media. And they used HTTP, too (meaning, as well). They used—both had the ability to use UDP and TCP. So, they actually started on UDP, and then filtered over to TCP. In general, you could set them up to do whatever you want, but that's how the default behavior was. And the general way that they would talk about reducing latency back then was to look at buffer management.

So the encoder and the server and the player were kind of the pieces you had, in a simplistic way, for streaming back then—and the encoder buffer size could be reduced, the player buffer size could be reduced, and you can tweak stuff in the server, too. The player—it's interesting, back then you had a desktop player. You had Windows Media Player, and you didn't have Flash. You weren't in the browser, although Windows Media could be embedded in the browser. So, you had a little bit of a limitation there, but eventually they allowed you to configure the buffer dynamically and the player, for example, when you leverage Silverlight, or newer technologies from Microsoft back in the day. They would also talk about codecs. Like this low-delay audio codec selection that you could use to try to reduce latency as well. It's not just about the buffer; it's not just about the network; it's about the data, as well.

So, a decent-quality codec that is low-latency, optimized, would help, as well. And then they have techniques that Flash imposes, as well, that others use as well. Things called FastStart and Advanced FastStart, which actually give the ability to kind of speed up how you got the start-delay down, which Mike talked about. As well as things like fast recovery, fast reconnect, and they actually implemented some for error-correction. And these are all techniques while you're streaming to try to keep up and not lose your quality, or have buffer issues when you’re actually playing the content, if you had network changes. They would talk about 15 to 20 seconds latency, by default—but they could get down, on a good network, to two to three seconds.

Mike talked a lot about a good network. I'm going to keep emphasizing that, too. A lot of this has to do with having a good network. If you have an unpredictable network like the Internet, and you don’t have ways to smartly navigate around through the Internet, you're going to end up with some limitations.

The RTSP protocol, developed by Real Networks and a few others—Netscape, of all people, was another one that was a really good protocol at the time. Real was another leader in the space—they're now gone—relied on RTP and RTCP. RTSP is actually the transport, and not responsible for the actual media delivery. And RTSP had an advantage of being able to use UDP, and the ability to send frames one at a time, in real time, which got the latency down to sub-one second—down as low as potentially what I've got here, which is 125 milliseconds, which is actually pretty good. As well, now we move forward in time a little bit. We've got the real-time messaging protocol developed and open-sourced by Adobe. This is the Flash streaming protocol. I'm sure you all know that. Still widely used, will continue to be used on the public side—the playback side, as you all know is hitting a dead end, again, after the first dead end with Steve Jobs on iOS. It's now hitting a dead end in the desktop.

But people have a lot of great success with this protocol. Again, a stateful protocol that has one to three seconds on average, and you can definitely get below one second. We've had people tell us they went from an Amazon from Ireland to Vancouver in sub-two seconds, which is impressive—and they're a particular customer that needed to use this for a sports-related timebase, like a low-latency sports betting type use case.

Okay—and then we jump to HTTP, as you all know. Most popular format out there today. HLS: 30-plus seconds is what iOS needs, on a-10 second chunk; they need three chunks, that's 30 seconds. So, not low-latency in any way, shape or form. And MPEG-DASH comes along, and it is a little bit better. It's variable, because it just depends, and it doesn't have these restrictions that Apple has on iOS devices. But those are all really too high for anyone who wants to use these in a low-latency application.

Okay—where are we going? So, I'm sure you've all heard of at least WebRTC. It's been around for a long time, actually, seven- or eight-plus years, maybe more. Designed for real-time audio and video and data, over good to less-reliable connections; RTC stands for “real-time communications.” So, natural thing here is two-way chat conversations, as an example for that, and obviously, if you're going to talk with somebody, you need it to be low-latency, or have low latency. You don't want to wait for somebody to hear what you said, and they have to wait the same. So, you can use TCP or UDP—again, we talked about network stuff before. A lot of these, we'll talk a little bit more about later. There are advantages to using TCP and UDP, and disadvantages, and things like that. And my opinion is, it's not a perfect world with either today, and that's why you'll see some of these other technologies show up on one of my next slides here.

WebRTC actually uses multiple protocols under the hood that are all related to, or are based on, RTP or RTSP; typical latency, one second or less, as well as 200 milliseconds lower, is what you're going see there. WebSocket is another. This is something that is a two-way communication protocol that works in the browser, and it works really well. It works with—if you've heard of MSE, which is what people are using to do HTML5 playback for audio and video—it works well with MSE, as well. So, it works on top of TCP. And it can be—I wouldn't say it's totally agnostic, but it can be leveraged with all sorts of streaming protocols and formats. So, some examples here are RTMP and WebRTC; Haivision SRT; and we have a protocol called WOWZ, which looks very much like RTMP. All these things can be used with WebSocket in conjunction, and you can get as low as, again, about 200 milliseconds. Your mileage may vary, but the general number that we're putting out there. So HTTP, again, those other protocols, the challenges there are ... go back real quick.

So, the challenge with some of these, as you may or may not have experienced personally, is if you're trying to replace Flash, for example, and you want to use WebRTC because it's got good latency—even if you're not doing two-way situations, people actually use RTMP successfully with two-way conversations and other two-way video delivery. But Flash does a lot of great things. So, if you have, what I call, advanced features on top of your video—you're doing captions, you're doing encryption, you're doing all sorts of things, advertising, or monetization and otherwise—you're going to run into trouble trying to use WebRTC to replace RTMP. So, that's something you should all keep in mind, if you don't know that or haven't experienced that.

So, what we see with these technologies is not only a lack of parity on functionality—being able to do what you want to do with your use cases, or what you're trying to do with streaming—the other side of it is, while a lot of these have security, which is good, the scalability of these requires a lot different infrastructure, and a lot more expensive infrastructure, than HTTP. As we all know, HTTP's popular because of caching infrastructures that are already out there—like CDNs can just leverage those, and use them for the delivery. But then there's the latency challenge.

So, techniques that people have been considering, and are using and considering further to try to reduce HTTP. The natural thing people say is, "Okay, I have a 10-second chunk. Why don't I just reduce that to something much less than 10 seconds?" Apple actually will tell you—I don't want to misquote them, because they don't like that, but down to two seconds, I believe—and then DASH, down to something much lower as well.

And so you can reduce chunk size, but I don't personally feel like that's enough. I think, as Mike said, if you have a network-degrade situation happening, you're going to end up with a user experience that's sub-par. You're either going to simply miss data, which means the video is going to not—you won't see part of the video, but there's other degradations of quality, too, and so you've got to make more changes than just doing chunk reduction. Along with chunk reduction, you have to do keyframe or GOP adjustments, as well. But some techniques people have been talking about are to leverage with CMAF—I don't know if you're familiar with that. It's Common Media—I forget what the “a” stands for—Format. So, this is an ability to try to merge the worlds of HLS and DASH a little closer. People are using that in combination with HTTP 1.1, what they call chunk-transfer coding to basically create, I call it, “chunks of chunks.” It's kind of a horrible quoted name, but you basically can reduce the chunks down to sub-chunks, and you can then start delivering faster, before an actual full chunk is delivered to a player. So you can kind of speed things up there.

You can also optimize parameters within the DASH spec. Some examples here are the availability start time and buffer time. Again, these come simply back to just reducing buffer time, reducing start times—again, which have trade-offs. If you're going to do that and you're not going to maintain a high enough buffer, as you normally would—for example, Flash goes up to eight seconds by default. If you're not going to do such a thing like that, than you run the risk, again, of issues with the playback, if you don't have enough buffer.

The last part relates to decode—people don't talk a lot about it, it seems, but when you talk about impacts in the workflow for latency management, you have the encoding, you have the delivery, you have decoding as key parts. The decoding is another part of this, and people are trying techniques—there's a technology, or technique, really, called H.264 GDR that's part of the encoding enhancements that can be used to help with decode on the other side.

These are some of the things that people are trying to do with reducing HTTP, because people would love to just continue to use CDNs, and use caching infrastructures if they can. But it is, again, an up-and-coming area, and people are still experimenting.

Couple more here to talk about, just examples. By the way, there's numerous others that I didn't mention. We certainly could talk more about those. So, Google has what they call “QUIC,” which is Quick UDP Internet Connections—this what it stands for. This is a technology that uses UDP with TCP fallback. It does focus on reliability and low latency, but has an angle of security built in, like some of the others do as well—like WebRTC and SRT do, for example. It basically tries to be like TCP, without being TCP. So, in a sense of congestion control, and connection, and retries, and round-trip—it's trying to reduce all these things intelligently by stacking information, where it stores it, and is trying to bring more along when it does different things in terms of requests for content. And it was built with HTTP/2 in mind, so it's optimized for when an HTTP/2 network is in use, but it does work on HTTP as well.

The next one, SRT. So Haivision created a technology called SRT—Secure Reliable Transport. It's transport protocol focused on reliability. It has security built in, as well. So, reliability first; low latency, second; but both go hand-in-hand, and so the idea here is, that if you think about an ingest from a server or a camera or an encoder, going into a server infrastructure or an origin infrastructure, and you have either an unpredictable network or a network that can really go downhill quick—this protocol will help maintain the connection, maintain a certain amount of quality and security while you are experiencing that network change, and will allow you to continue to deliver and deliver with a decent amount of quality. So, again, as it says here, things it accounts for—packet loss; jitter, which is packets coming in and out of order; fluctuating bandwidth, and the like.

So, we at Wowza, not to pitch our wares again too much, but we have been working on low latency, as well. So, we currently have what we call an API-only preview feature inside Wowza Streaming Cloud, which is ours our service model, our service offering. We are using our WOWZ protocol, which again, briefly said, it's an RTMP-like protocol. We're combining that with WebSocket to deliver into an origin-edge—a scaled-down architecture and origin-edge architecture. So, what we're touting in this preview right now is sub-three seconds, end-to-end, and what that really means right now is origin-to-player. Eventually, we'll add the encoder or camera into that, as well. But for now, we're seeing that’s origin-to-player, and that's a globally distributed architecture as well, I should say. So, it's meant to be highly scalable, highly resilient—things like that. And again, it's using those technologies to deliver to the player. Okay.

So, reducing workflow-component latency. So, again, I talked a little bit about—we're talking about these protocols, the things you can and can't do with them. I think the overarching thing is, depending on your use case, what you're trying to achieve, and how latency is part of that use case, there are different techniques you can use. There isn't one silver bullet here. You can choose one of the many protocols or technologies to try to solve these things. They have trade-offs, they have positives and negatives.

But you really need to think about it every step, as I said. It's not just about the network; it's not just about the buffer managements, which I do have listed here. It's content-creation, too; it's the features you need in the sense—if you need to do transcoding, for example, that's going to introduce latency. That can be minor, but it can be major, too. If you're going to manipulate content in other ways, with metadata or with encryption or anything else, usually these things are okay. But again, it just depends. And it all depends, topology-wise, where you're doing these things in relation to how you're then delivering it to the user—which I know doesn't sound easy, but that's the way this goes right now.

So, content-creation, things like codecs—again, I mentioned the H.264 GDR stuff. It was bitrate and resolution, these all impact things. Streaming workflow in devices—so, whatever devices you're hitting on whatever networks they are, Wi-Fi, 4G, all those things have network fluctuations. Public internet—if you're on private connections, with enterprise or corporate stuff, it can be different, or can be better, because you have more control there usually. But the workflow, again, reflects on what you're trying to do with the content before you actually get it to the consumer.

Buffer management and the encoder and the player, specifically, also the server—again, and you can reduce these buffers in these areas to almost nothing. But again, you run into the risk of, if you have a network fluctuation, or degradation, then you may have a negative impact on the playback.

Network considerations really boils down two areas—how you optimize the deliveries, and the transport protocols that we talked about, and the protocols that might sit on top of that. You know, HTTP versus RTMP versus RTSP, and these transports like QUIC or SRT.

And then, how you reach your audience—so, as Mike said, again, size, location, all those things matter in terms of the latency. The front-runners that people are trying really today are things like WebRTC and WebSocket. There are definitely a lot of offerings out there that have, from the ground up, written services and software that use these under the hood to do what they do. And a lot of them are doing really good things, and these pieces you need to just make sure, again, when you're dealing with proprietary protocols, or proprietary—these are standard space, but essentially, when you're not dealing with HTTP you need—on the playback side, you need to have, I call them aware clients. Basically, you can't just drop a video tag, HTML video 5 tag, in a browser and have it work. You need to actually write some code around that to make it possible to do this. And then the server scale-out infrastructure piece: Depending on what protocol you use, the scaling of that and either an origin-edge kind of mid-tier topology or otherwise, requires a lot of servers and a lot of specifics to those protocols.

CDNs are definitely investigating, and trying things and doing things in these spaces. So, they want to keep their caching infrastructure going, they want to support new use cases and leverage that to support them. So, what I call tuned HTTP, which Mike demoed, which is to reduce chunk size and to reduce other aspects of the content, are what I would call a front-runner today. But there are CDNs considering WebRTC. There are, again, other options out there, that I'll talk about in a second, that don't use HTTP. They use other technologies. All right.

It's kind of a sideways segway, but measuring latency. So, people, you saw the demo, and you could count on your hand, or whatever, to see the latency. People often ask, “how do you measure latency?” There's a lot of research out there actually suggesting different ways to look at this. I think it's pretty interesting. It's definitely not an easy thing, especially at scale, to do. And as you probably know, latency's measured in milliseconds/seconds, although some actually use frames of video to actually measure that.

What we do a lot of is visual tests. We'll put a clock in the video, and then you can measure that against your system clock that you're playing the content back on, and as long as your system clock is synced with an NTP server, and I suppose, as long as the other clock is as well, you should be able to tell how it's going. The challenge there is with drift on NTP—especially on our laptops, I notice that even go into Apple's time servers it can drift pretty quickly, and in a varied range from milliseconds to seconds. I've actually had three seconds drift in a matter of 15 minutes. It's not always a reliable way to look at that.

When you talk about traditional streaming protocols like RTMP and others, you can actually put markers in there, and use timecode to actually measure that, as well. So there's other techniques other than visual pieces that people are using. And some people are trying to put signatures in video. I read about an audio signature, or some other piece like that. But the bottom line is, they all come back to timecodes, and still you're measuring a delta in time. So you can get fancy as you want, but you can do a visual test and get the same result.

Okay, options for low latency at scale. So, again, I touched on the three components at the top. It's an obvious answer to how you would go about this. You're either going to build it, or you're going to buy it, or you're going to have a mix of pieces that you might build and/or buy. So, but again when you build it, you need to focus on the major areas are—and that's really dumbed-down—but the encoder, the server and the player are the key parts that you need to focus on, and again, you need to look at the network as well. And certainly there's a variety of things within those components you need to focus on. Buffer, and protocol and all those other things.

So, if you're going buy, the 800-pound gorilla on the block is working on low latency—that’s Akamai. So, they're definitely looking at ways to solve these things. Some other examples, of course, there's us. We have this preview I mentioned about. So, we have a low-latency option going on right now in our service. But there are others on the market, depending on what you are trying to do. Agora.io is an example, their low-latency CDN. They focus a lot on two-way conversation use cases. But they have the ability to deliver in other ways, as well, for low-latency situations.

The folks from Phoenix P2P are here as well. They have a booth in the exhibit hall. They have a service as well that's scalable low-latency—so there are no shortage of people trying to solve these problems. And I do think it's early days, but I think that eventually, you're going to see a flood of people trying to solve these problems, and they will.

Got a comment? Drop us a line on Twitter @wowzamedia
Holly Regan

Holly Regan has over a decade of experience as a professional writer and editor. Her work has been featured in major online publications, including The New York Times, Entrepreneur and The Huffington Post. At Wowza, she serves as the content marketing specialist and editor-in-chief.