Ingest Scaling: How To Scale Your Streams in the First Mile


A scaling strategy is an essential part of any streaming solution so that you can quickly accommodate any fluctuations in stream demand. A sudden change in capacity can result from either an increase in streams being published to your platform or a fluctuation in the number of viewer connections. Because a server can handle only so many concurrent streams before it starts to slow down or get overloaded, it’s important not to overwhelm a server. For this reason, it’s best to have a scaling plan prepared in advance so that you’re not caught off guard.

When creating a scaling strategy, you should carefully consider the server load from incoming streams (ingest) as well as the server load for distributing those streams. If you’re new to video streaming terminology, ingest refers to any incoming streams from publishers, and egress typically refers to the streams going out to viewers. Should you have a need for increased capacity in the first mile, a proper scaling plan can handle the excess load by creating a cluster of additional servers. This is known as scaling up.

Photo courtesy of Raskenlund

Because a significant amount of information exists for scaling your streams at playback, this blog will focus on the options for scaling your streams at ingest — or what is often referred to as “the first mile” in streaming.


Server Capacity

The number of concurrent streams a server can comfortably manage varies depending on server hardware, network configuration, stream type, stream bitrate, and connection types. The only sure way to determine the limits for a particular configuration is to perform load tests which show when performance is likely to degrade. If it’s likely that you’ll reach this, you can scale your streaming configuration to handle increased capacity. This is typically accomplished through the addition or removal of servers in a cluster in order to distribute the load. 

Equally important in your scaling strategy is being prepared to scale down. When your platform experiences a decrease in stream demand and you have a decreased need for capacity, removing servers from the cluster can save costs and resources.


Scaling Strategy

So, you’re ready to design your scaling plan. But where do you start?

Streaming media expert Karel Boek of Raskenlund shares his thoughts on a good scaling strategy. As a first step, he suggests that you categorize your video streams based on potential duration. For example, Boek often separates shorter social media streams that are of a known duration — say, 1-minute long — from longer sporting-event broadcasts or live concerts that may have an unknown duration or finish time. Once you have the streams organized into groups, you can assign a server to each of those groups. We’ll explain why this can be so helpful in just a bit.



Once your streams are categorized, the next thing you’ll want to consider is whether or not you’d like to use an autoscaler. You can use a cloud autoscaling platform, which does the scaling up or down automatically for you based on a feedback loop within the algorithm. Several options exist on the market, including Wowza®, AWS, Microsoft Azure, and Google Cloud. When you consider your options, Boek suggests that you look at the costs associated with stream volume, the amount of transcoding required, and the parameters/metrics used for scaling down. Alternatively, you can choose to skip a cloud autoscaling platform or program and instead build and customize your own scaling workflow.


Scaling Architecture

Load balancers are a common component in scaling architecture. Their job is to route incoming streams from the publishers across multiple servers when a single server installation is unable to provide the required capacity from the increase in load. Scaling involves using two or more servers, and when there are multiple servers arranged this way, they are referred to as a server pool or server cluster.

Depending on hardware and infrastructure, virtual servers can be added to or removed from the cluster either manually, based on estimated peak loads, or automatically, based on real-time metrics.


Load Balancers

A proper load-balancing solution should allow for the easy addition and removal of servers from the cluster.

Let’s look at an example for first-mile scaling. It’s a Wednesday morning, and your streaming platform usually averages 10 incoming streams from various publishers, all of which are typically handled by just one server. But suddenly, your platform has 30 incoming streams, and it’s beginning to overwhelm the server’s capacity. If all of the server’s resources continue to be consumed, this will result in poor stream quality — or worse. Some streams may be dropped, never reaching the viewer.

A critical component in scaling architecture is a Listener (Watcher),whichregularly receives metrics updates from the media server(s) in a cluster. As soon as the current metrics are received, the Listener runs its algorithms to quickly identify if there’s a need for increased capacity. If it determines that a server is at peak load, it’ll make a rapid decision to add an another server to the cluster, distributing some of those 30 incoming streams based on the category to which you assigned them. This will reduce the potential for any stream interruptions.

In high-volume use cases, it can be common to have several media servers in a cluster with a very large number of incoming streams that are all assigned to different servers. So, how then does a CDN or player attempting to connect know which server it needs to communicate with for a specific stream?

As Karel Boek explains, “The Load balancer must know which of the media servers the stream is running on. Any incoming connection attempts will then be routed to the correct media server running that stream.”


Scaling Down

According to Boek, “Just as important as scaling up is scaling down,” and he strongly encourages you to get as much information about your streams as possible so that you can properly determine which streams go to which servers.

How does this help with scaling down? Boek states that if you know that certain streams are short lived, such as social streams lasting just a few minutes, you can route them to a media server for short sessions only. During a period of low demand, during which you can reduce capacity, the load balance can decide which server in a cluster can be shut down. Because you know that these social media streams will be ending in just a short time, the server they have been routed to will most likely be the server that gets removed from the cluster. Boek further explains that you can use additional stream information — such as protocol, resolution, and duration — to categorize your streams. Having as much information as possible about your streams can help you when it’s time to decide which servers can be removed in your scale-down strategy.


Helpful Tips

Because there will always be streaming situations that present unpredictable capacity requirements, Boek shares his tips on building a scaling strategy. He urges the use of artificial intelligence, metrics, and viewership trends because all of these tools can give you a better sense of when you may need to scale up or scale down. These resources may even provide you with the opportunity for setting capacity thresholds. For example, when your server is at 75% capacity, your threshold settings will automatically launch a new virtual machine.

Boek also suggests that you study as many metrics as possible in order to implement trend lines. This means that if you notice a need for increased capacity on certain days, you can schedule additional servers during those peak times. In addition, you can schedule the removal of servers at times when you expect slower demand. Really knowing your audience and paying attention to the news or upcoming events can also be helpful. If an upcoming event has the potential to result in higher-than-normal viewership, you can be prepared to scale up.

A final piece of advice is to monitor your scaling strategy consistently and be sure to make adjustments as needed. For example, if you have 120 servers in a cluster and you see that 60 of them are idle, then you may end up paying for streaming resources you didn’t actually need.

There are various types of load-balancing solutions that can be used individually or together, depending on how large and distributed the server cluster is. Depending on the type of streaming your platform will provide, some load-balancing solutions are better than others. Your workflow and type of streaming will dictate the best choice for you. 


Karel Boek has over 20 years industry experience in the OTT and video streaming space. Founded in 2008 by Karel Boek, Raskenlund is a streaming media solutions company offering full-service streaming architecture.


Search Wowza Resources



Follow Us


About Rose Power

Rose Power is the developer community manager for Wowza Media Systems. Passionate about building relationships with the dev community, Rose strives to deliver quality resources for a positive user experience built on trust. When not working, she can be found playing the ukulele around a fire or hiking the mountains of Colorado with her pup.