Passthrough vs Transcoding vs Streaming: Why The Same Server Supports Different Stream Counts
Quick Summary
Passthrough streaming repackages a stream without re-encoding it and skips the expensive GPU stages. Transcoding decodes and re-encodes the video and engages the full GPU pipeline. A full streaming workflow adds an adaptive bitrate ladder, multi-format delivery, and sometimes AI analysis, which activates every variable at once and forces them to contend for shared CUDA cores and GPU memory. A single Wowza Streaming Engine server handles very different stream counts depending on the workflow it runs, because each workflow engages a different subset of hardware components and streaming variables. Matching the server to the workflow, then load testing that exact pipeline, is what produces an accurate capacity estimate.

Hardware components like GPU, CPU, RAM, and memory cap a server’s capacity, and streaming variables like bitrate, framerate, resolution, and codec draw against that ceiling. Neither set acts in isolation. The type of streaming workflow that is running on the server determines which variables engage and how hard each one works. The same server that delivers a high stream count in one workflow can fall to a fraction of that in another, with no change to the hardware or the source.
Passthrough vs Transcoding vs Streaming Workflows At A Glance
Three workflow archetypes cover most Wowza Streaming Engine deployments, ordered by how much of the pipeline they run:
- Passthrough repackages a stream into a new delivery format without decoding or re-encoding the video.
- Transcoding decodes the source and re-encodes it to change resolution, frame rate, codec, or bitrate.
- Streaming adds an adaptive bitrate ladder, multi-format packaging, cross-device delivery, and often AI analysis stage on top of transcoding.
A single deployment can run all three at once across different applications, which is why capacity planning starts with the mix of workflows a server handles rather than a single number.
What Is Passthrough Streaming?
Passthrough streaming workflows take an incoming stream, leave the encoded video frames exactly as they arrive, and rewrap them in a different container or delivery format. This is also sometimes called transmuxing. A common example is an RTMP contribution feed repackaged into HLS for delivery, with the H.264 video passed straight through. The server performs no decoding, scaling, or re-encoding. This workflow is common for low-latency relay and scale-out edge distribution.
Because the pixels are never touched, a passthrough streaming workflow engages almost none of the expensive variables. The load lands on the CPU for transmuxing and packaging, on system RAM for buffering, and on the network for egress. NVDEC, the CUDA cores, NVENC, and VRAM stay largely out of the picture. Resolution, frame rate, and codec choice do not drive compute, because the server never processes a frame. The result is the highest stream count per server, with CPU and outbound bandwidth as the practical ceilings. Teams optimizing a passthrough deployment focus on CPU headroom and network capacity rather than GPU resources.
What Is Transcoding?
Transcoding is the process of converting already-compressed video from one codec, bitrate, or resolution to another. It sits in the middle of a streaming workflow, taking video that has already been encoded by a camera or capture device, decoding into raw frames, optionally scaling or filtering them, and then encoding them again into one or more new outputs. This produces new versions of the video suited to different delivery targets. Transcoding is what makes adaptive bitrate streaming possible, without it, every viewer would receive the same quality regardless of their connection. This process can convert a single high-resolution contribution feed into lower-resolution renditions, change a codec, or normalize mismatched sources into a consistent format.
GPU resources are used most in the transcoding chain. On NVIDIA GPUs, NVDEC decodes the source, the CUDA cores handle scaling and filtering, and NVENC encodes each output, with every active session holding frames in VRAM. Higher resolutions and frame rates determine the per-frame and per-second work, codec conversion sets encode complexity, and each rendition in a bitrate ladder consumes another NVENC session. CPU continues to play a role, but often sits below the GPU load. Teams optimizing a transcoding deployment tune the rendition count, resolution, and frame rate to fit within the GPU’s session and memory limits. This workflow is common for adaptive bitrate generation and source format normalization.
How Transcoding Fits into A Streaming Pipeline
In a typical pipeline, a camera or software encoder pushes a source stream to a streaming server. The server optionally transcodes the source into multiple renditions, packages those renditions into viewer-friendly protocols, and delivers them to a CDN or directly to viewers. Transcoding sits between ingest and packaging, it can be skipped entirely if the source format already matches what viewers need.
When Transcoding Is Necessary
Transcoding is needed when viewers require multiple quality levels for adaptive playback, when the source format differs from what target viewers can decode, when bitrates need to be reduced for bandwidth-constrained networks, or when codecs need conversion, such as H.264 to H.265. Most production live streams use transcoding to support adaptive bitrate delivery across diverse devices and networks.
When Transcoding Can Be Skipped
Passthrough delivery, or streaming without transcoding, works when the source already matches what viewers can play and bandwidth is sufficient. Internal video distribution, single-quality public streams, and contribution feeds between professional systems often use passthrough because it’s faster, cheaper, and scales to higher concurrent viewer counts per server.
Hardware Implications
Transcoding is the hardware-intensive part of streaming. Each transcoded output rendition consumes CPU or GPU cycles, while passthrough delivery uses primarily network bandwidth. A server that can pass through hundreds of streams might transcode only dozens of source streams at the same hardware tier, which is why deployment planning starts by asking how many streams need transcoding versus how many can be passed through.
What Is Streaming?
Streaming refers to the complete live video pipeline: ingesting source streams, optionally transcoding, packaging into delivery protocols, distributing to viewers, and recording for replay. A streaming server performs all of these roles, often simultaneously for multiple source streams.
A full streaming workflow runs the complete pipeline of ingest, encoding, and decoding, and adds an additional layer largely on the delivery side. This new layer can include generating adaptive bitrate ladders, packaging for cross-device and cross-platform delivery, as well as AI analysis for object detection, metadata logging, or alerting. The ‘full’ qualifier distinguishes server-based streaming workflows from narrower point-solutions that handle only one stage, such as a standalone transcoder or a CDN-only delivery service.
What Is An Intelligent Video Streaming Workflow?
In intelligent video streaming workflows, the server ingests the source, decodes it, runs computer vision models against the decoded frames, transcodes the video into an adaptive bitrate ladder, packages the output into multiple delivery formats, and delivers it. Because all the stages run at once, the variables stop behaving independently and begin contending for shared hardware.
How Does AI Analysis Compete With Transcoding?
AI inference runs on the same GPU as transcoding. Computer vision models consume CUDA cores for processing and occupy VRAM to hold their weights, which is the same pool of resources transcoding uses for scaling, filtering, and active sessions. Adding an analysis stage to a transcoding workflow reduces the transcoding headroom on that card, and similarly, adding transcoding load reduces the headroom available for analysis. The two workloads share the resource, so they share the ceiling. A deployment that runs both should size the GPU for the combined demand.
How Does Frame Rate Impact AI Analysis?
The frame rate sent to the analysis stage sets how many frames the models process each second, and is separate from the delivery frame rate. Many analysis workflows process fewer frames per second than they deliver, because detecting an object or event does not require every frame. Lowering the analysis frame rate frees CUDA and VRAM for transcoding without changing the quality of the delivered stream. This single setting is often the most effective optimization in a combined workflow.
The result is the lowest stream count per server of the three workflows, because every variable is active and several compete for the same CUDA cores and VRAM. This is the workflow where sizing load testing the actual pipeline matters most.
What Variables Bottleneck Streaming Workflows?
Each streaming workflow engages a different set of variables, so the bottleneck depends on the workflow and what resources it draws from the server. The hardware sets the ceiling, the stream variables draw against it, and the workflow decides which variables reach their limit first.
| Variable | Resource Usage: Passthrough | Resource Usage: Transcoding | Resource Usage: Streaming + Analysis |
| CPU | Heavy | Moderate | Heavy |
| Encode (NVEC) | Idle | Heavy | Heavy |
| Decode (NVDEC) | Idle | Moderate | Moderate |
| CUDA | Idle | Moderate | Heavy |
| VRAM | Light | Moderate | Heavy |
| Network egress | Heavy | Heavy | Heavy |
Frequently Asked Questions
What is the difference between passthrough and transcoding?
Passthrough, also called transmuxing, repackages a stream into a new delivery format without decoding or re-encoding the video. It uses mostly CPU and network resources. Transcoding decodes and re-encodes the video to change resolution, frame rate, codec, or bitrate, which engages the GPU’s decode and encode blocks and its memory.
What is the difference between transcoding and streaming?
Streaming is the workflow that delivers live or on-demand video to viewers, encompassing ingest, processing, packaging, and distribution. Transcoding is one specific step inside that workflow, converting a video from one format, bitrate, or resolution to another. A full streaming server handles transcoding alongside ingest, recording, packaging, and distributing to viewers. Transcoding alone produces converted files, but it doesn’t deliver them.
Does passthrough streaming use the GPU?
Passthrough uses little to no GPU, because it never decodes or re-encodes the video. The decode and encode blocks and most GPU memory stay idle, and the load falls on the CPU for packaging and on the network for delivery.
Do I need transcoding to stream video?
No, a streaming server can pass video through to viewers in the same format and bitrate the source provides, without transcoding. This is called passthrough delivery, or transmuxing. Transcoding becomes necessary when viewers need multiple quality levels for adaptive streaming, when source and viewer formats don’t match, or when bandwidth constraints require lower bitrates. For single-source single-quality workflows, passthrough is faster and uses far less hardware.
Is transcoding the same as encoding?
No, transcoding and encoding are not the same. Encoding is the original conversion from raw video into a compressed format. Transcoding is converting already-compressed video from one format or bitrate to another. The hardware and tools overlap heavily, but the workflows differ. Encoding happens once at the source (camera, hardware encoder, software like OBS), whereas transcoding happens later, usually at a streaming server, to adapt the stream for different viewers.
What is the most resource-intensive streaming workflow?
The most resource-intensive streaming workflow combines AI analysis, transcoding, an adaptive bitrate ladder, and multi-format delivery. In this full streaming workflow, every hardware and stream variable is active at once, and several compete for the same GPU resources.
Why does adding AI video analysis reduce transcoding capacity?
AI video analysis can reduce transcoding capacity because both processes run on the same GPU and draw from the same CUDA cores and memory. Models hold their weights in GPU memory and use compute for processing, which leaves fewer resources for transcoding sessions. As a result, the two workloads share a single capacity ceiling.
How does the streaming workflow affect how many streams a server can handle?
The type of streaming workflow determines which hardware components and streaming variables are required, drawing resources from the server until it reaches capacity. Passthrough skips the GPU and reaches the highest stream counts, transcoding is GPU-bound on encode sessions and memory, and full streaming with analysis reaches the lowest counts because every variable is active and contends for shared hardware.
When should I transcode video and when should I just stream it?
Transcode video when viewers need adaptive bitrate quality, when source and target formats differ, or when bandwidth is constrained. Skip transcoding when the source already matches viewer needs, for example, if an RTMP source is going to HLS viewers on good connections at the same bitrate. Pass-through delivery scales to far more concurrent viewers per server because it consumes no CPU or GPU cycles for video processing.
