Wowza Streaming Engine Transcoder performance benchmark

Depending on your hardware configuration, the Transcoder in Wowza Streaming Engine™ media server software offers both accelerated and non-accelerated video encoding.

This article presents performance benchmark numbers captured for software CPU (MainConcept) transcoding, NVIDIA GPU accelerated transcoding, and AMD Xilinx accelerated transcoding. These numbers are for guidance and reference only. They represent the hardware's performance in basic circumstances and against similarly-sized AWS instances.

Our tests address a simple RTMP to HLS streaming scenario. More complicated use cases, for example, involving DVR recording, WebRTC setups, or transcoding CMAF HLS over HEVC, can drastically change these benchmarks. Your results may also vary depending on network traffic, source file composition, configuration, overall operating system overhead, etc.

Note: For information about how to measure the transcoding benchmarks for your Wowza Streaming Engine configuration, see Capture Transcoder benchmark statistics in Wowza Streaming Engine.

Recommendations

CPU transcoding is a good fit for simple workflows with smaller viewership and limited distribution. For example, this workflow may be more cost-efficient when streaming a weekly video with a single adaptive bitrate (ABR) transcoding to 200 viewers within a smaller geographic area.

CPU transcoding at scale is not suggested due to inefficiency and potentially higher server hardware and maintenance costs. GPU-accelerated encoding may provide a better, more cost-efficient alternative for more complex use cases.

Testing methodology


When running Wowza Streaming Engine at load, we recommend that operations don't exceed more than 85 percent of the total CPU usage. This leaves sufficient overhead for network interruptions or other unexpected issues that may occur when streaming.

Notes:

Before executing the tests, the test servers were tuned using the guidelines in Tune Wowza Streaming Engine for optimal performance.

All tests were conducted using the following guidelines and steps:

  • We loaded the hardware with as many incoming streams as possible until Wowza Streaming Engine reached approximately 90 percent CPU utilization. At that point, we collected benchmarks for 10 minutes while ensuring minimal skipped frames.
  • When determining if a server can handle an additional stream, our approach differed based on the hardware:
     
    • For NVIDIA GPU transcoding, we used the NVIDIA System Management Interface (nvidia-smi) command line utility to display the GPU load. The 90 percent threshold served as an arbitrary value, high enough for processing while not completely overloading the GPU. Above this load, we experienced more skipped frames.
    • For CPU (MainConcept) transcoding, we used the same approach as the NVIDIA GPU tests, aiming for 90 percent usage while checking the console commands.
    • For AMD Xilinx transcoding, the card determines the amount of resources it can work with, measuring in pixels per second. Each card can handle two 4K60fps streams (2 x 3840 X 2160 x 60 pixels per second) or eight 1080p60fps streams (8 x 1920 x 1080 x 60 pixels per second). We ran this hardware up to 100 percent utilization without encountering any issues. The number of streams you can handle is a direct calculation based on the resolution and frames per second.

  • We monitored skipped frames and ensured the server didn't crash after reaching 90 percent usage or the 100 percent threshold for the AMD Xilinx U30 card. Some skipped frames are expected and tolerated at that load. However, we monitored skipped frames to ensure they were within acceptable levels and close to zero. We also checked that the ALLFRAMESOFF message didn't appear.

Input test streams

When measuring and analyzing performance benchmarks for the Transcoder in Wowza Streaming Engine, we used the following source files. You can use these files to standardize your process and benchmark your own instances:

All metadata is available when you view the file properties. Benchmarks were captured in August 2023 using an H.264/AAC 1080p source with 720p, 360p, and 140p transcoded renditions. These benchmarks don't account for hardware changes to any of the mentioned instance types after this time.

CPU (MainConcept) benchmarks

The following table summarizes benchmark test results for three Amazon EC2 C5 instances using CPU (MainConcept) transcoding. Download CPU (MainConcept) testing benchmarks

Wowza Streaming Engine release: 4.8.24+4

Implementation: CPU (MainConcept)

Operating system: Ubuntu 22.04.2

EC2 Instance Type c5.4xlarge c5.12xlarge c5.24xlarge
1080p
Source FPS
60 30 24 60 30 24 60 30 24
Number of vCPUs 16 16 16 48 48 48 96 96 96
Instance
Memory GB
32 32 32 96 96 96 192 192 192
CPU Ingests 2 4 5 6 10 13 10 18 20
CPU
Utilization %
99 99 99 96 97 99 88 93 90
Number of Transcoded
Streams
6 12 15 18 30 39 30 54 60
Hardware Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz

NVIDIA GPU benchmarks

The following table summarizes the benchmark test results for five Amazon EC2 G4 instances using NVIDIA GPU accelerated transcoding. Download NVIDIA GPU testing benchmarks

Wowza Streaming Engine release: 4.8.24+4

Implementation: NVIDIA GPU

Operating system: Ubuntu 22.04.2

Driver Version: 525.85.05    CUDA Version: 12.0 Ubuntu 22.04.2
CPU 2nd Generation Intel Xeon Scalable Processors (Cascade Lake P-8259L)
EC2 Instance Type g4dn.12xlarge g4dn.xlarge g4dn.2xlarge g4dn.8xlarge g4dn.16xlarge
1080p
Source FPS
60 30 24 60 30 24 60 30 24 60 30 24 60 30 24
Number of GPUs 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1
Number of vCPUs 48 48 48 4 4 4 8 8 8 32 32 32 64 64 64
NVIDIA
GPU
Tesla T4
Instance
Memory GB
192 192 192 16 16 16 32 32 32 128 128 128 256 256 256
Total Stream Ingests 36 70 90 10 20 25 10 20 27 10 20 28 11 22 28
NVENC Encoding Utilization % 90 90 90 99 99 99 90 99 99 90 90 99 99 99 100
NVDEC Decoding Utilization % 50 50 40 50 50 50 50 50 50 50 50 50 55 55 54
CPU Utilization % 20 20 20 50 60 60 30 30 30 10 10 10 6 6 6
GPU Utilization % 20 20 20 25 25 25 20 20 20 20 20 22 24 25 24
Number of Transcoded
Streams
108 210 270 30 60 75 30 60 81 30 60 84 33 66 84

AMD Xilinx benchmarks

The following table summarizes the benchmark test results for two Amazon EC2 VT1 instances using AMD Xilinx accelerated transcoding. Download AMD Xilinx testing benchmarks

Wowza Streaming Engine release: 4.8.24+4

Implementation: AMD Xilinx U30 media accelerator card

Operating system: Ubuntu 22.04

Driver Version: 3.0.1
EC2 Instance
 Type
vt1.6xlarge vt1.3xlarge
1080p Source
FPS
60 30 24 60 30 24
Number of
GPUs
2 2 2 1 1 1
Number of
vCPUs
24 24 24 12 12 12
Instance
Memory GB
48 48 48 24 24 24
Total Stream Ingests 16 32 36 8 16 18
Encoding Utilization % 57.7 57.7 52 57.7 57.7 52
Decoding Utilization % 100 100 90 100 100 90
CPU
Utilization %
16 20 16 15 17 14
GPU
Utilization %
N/A N/A N/A N/A N/A N/A
Nbr Transcoded
Streams
48 96 108 24 48 54

More resources