RTP vs RTSP: The Difference Between Streaming’s Control and Transport Layers

Two of the most commonly confused acronyms in video streaming are RTP and RTSP. They almost always appear together, and in casual conversation, people use them interchangeably. But they do very different jobs.

The Real-Time Transport Protocol (RTP) is the protocol that carries video and audio data between endpoints. The Real-Time Streaming Protocol (RTSP) is the control protocol that manages the streaming session, telling the server when to start, pause, or stop delivering media.

Learn what makes up these common streaming protocols in this article, including:

New call-to-action

RTP vs. RTSP: A Side-by-Side Comparison

 RTPRTSP
Protocol NameReal-Time Transport ProtocolReal-Time Streaming Protocol
What it doesCarries audio and video packets from source to destinationControls the streaming session (setup, playback, teardown)
OSI layerTransport / Session (Layer 4–5)Application (Layer 7)
Delivers media?YesNo — delegates to RTP
Session control?NoYes — PLAY, PAUSE, SETUP, DESCRIBE, TEARDOWN
Typical transportUDP (default); can be interleaved over TCPTCP (port 554)
SpecificationRFC 3550 (2003; originally RFC 1889, 1996)RFC 2326 (1998); RFC 7826 (v2.0, 2016)
Works independently?Yes. RTP can push media without session negotiationNo. RTSP requires a transport protocol (typically RTP) for media delivery
EncryptionSRTP (Secure RTP)RTSPS (RTSP over TLS)
Primary use casesVoIP, video conferencing, media delivery within streaming workflowsIP camera control, surveillance, media server session management
QoS feedbackNo. RTP relies on companion protocol RTCPNo. RTSP relies on RTCP
LatencySub-300ms in optimal conditionsAdds negligible overhead; combined RTSP/RTP typically under 2 seconds on LAN

RTSP doesn’t deliver media at all. It controls the session, while RTP handles the actual transport. RTP can work on its own in push mode, where a sender transmits directly to a known receiver without any session negotiation, but RTSP cannot work without RTP or another transport protocol underneath it. If RTSP is referenced in a product spec or camera datasheet, it almost always implies the full RTSP/RTP stack.

Also, both protocols lack built-in encryption and Quality of Service (QoS) monitoring. Those capabilities come from companion protocols. This could be SRTP for encrypted media, RTCP for quality feedback, or RTSPS for encrypted control. You’ll often see the entire family referenced together as the RTSP/RTP/RTCP stack.

What Is the Real-Time Transport Protocol (RTP)?

The Real-Time Transport Protocol (RTP) is a network-level protocol standardized by the Internet Engineering Task Force (IETF) in RFC 3550. Its job is to deliver audio and video data packets from one endpoint to another as quickly as possible, even when network conditions are imperfect.

When a video stream is transmitted, the raw media is broken into packets, each stamped with the information receivers need to reconstruct a coherent playback experience. RTP adds three critical pieces of metadata to each packet:

  1. A timestamp, so the receiver can synchronize playback timing and detect jitter
  2. A sequence number, so the receiver can identify lost or out-of-order packets
  3. A payload type identifier, so the receiver knows which codec is being used to decode the data

What makes RTP distinct from more familiar protocols like HTTP is its philosophy around reliability. HTTP-based delivery (the foundation of HLS and DASH) prioritizes completeness, where every packet must arrive, in order, before playback continues. Conversely, RTP prioritizes timeliness. It typically runs over UDP, which means packets are sent without waiting for acknowledgment from the receiver. If a packet is lost or arrives late, the stream keeps moving. The result is a slight visual or audio artifact, rather than a buffering stall. This is a key tradeoff in real-time surveillance, video conferencing, or any workflow where latency matters more than pixel-perfect delivery.

RTP is also codec agnostic. It doesn’t matter whether a stream is H.264, H.265, VP8, AAC, or Opus. RTP wraps whatever encoded payload it receives and transports it. This flexibility is one reason it has remained foundational to so many different applications, including IP camera systems running 24/7 surveillance feeds.

What Is the Real-Time Streaming Protocol (RTSP)?

Where RTP is the transport layer that moves media data, RTSP is the application-layer protocol that tells the streaming server what to do with it. The IETF’s own specification describes RTSP as a “network remote control for multimedia servers.” RTSP doesn’t touch media data. It establishes sessions, negotiates transport parameters, and issues playback commands.

RTSP was standardized in 1998 (RFC 2326), with an updated version 2.0 published in 2016 (RFC 7826) to improve NAT traversal and reduce round-trip communication overhead. It was developed collaboratively by RealNetworks, Netscape, and Columbia University in the same era that produced many of the foundational internet protocols still in use today.

In practice, an RTSP session follows a structured request-response sequence. A client (that could be a video management system, a media server like Wowza Streaming Engine, or a software player like VLC) connects to the RTSP server over TCP on port 554. From there, the client issues a series of commands that set up and control the stream:

  • OPTIONS: Queries the server for which commands it supports
  • DESCRIBE: Requests a description of the available media streams, returned in SDP (Session Description Protocol) format, including codec, resolution, and available stream URIs
  • SETUP: Negotiates the transport mechanism for media delivery (UDP ports for RTP/RTCP, or TCP interleaved mode)
  • PLAY: Tells the server to begin sending media via RTP
  • PAUSE: Temporarily halts media delivery without tearing down the session
  • TEARDOWN: Ends the session and releases resources

The Key Difference Between RTSP and HTTP

This stateful architecture is a key distinction from HTTP. An HTTP request is fire-and-forget, where the server responds and the connection has no memory of prior interactions. RTSP maintains a persistent session with a unique identifier, so the server tracks the state of each client connection throughout the interaction. That enables VCR-like control over a live or recorded stream. That’s why RTSP has remained the standard for surveillance and IP camera applications: operators need to issue real-time commands against active video feeds.

RTSP was designed for session control, not for last-mile delivery to end users. Modern browsers don’t support RTSP playback natively. There’s no way to point a Chrome or Safari window at an rtsp:// URL and get video. This is why media programs and servers exist. Wowza Streaming Engine ingests RTSP/RTP from cameras, then transmuxes the stream into browser-friendly formats like HLS for scalable delivery or WebRTC for sub-second latency. The camera speaks RTSP. The viewer’s browser speaks HLS or WebRTC. The media server translates between them.

How RTP and RTSP Work Together in a Streaming Workflow

In nearly every IP camera and surveillance deployment, these two protocols work in tandem. RTSP handles the negotiation and control layer, RTP handles the media transport layer, and RTP Control Protocol (RTCP) runs alongside RTP providing QoS feedback. The full session lifecycle follows the following structured sequence every time a client connects to a camera or media server.

Step 1: Connection

The client (in this case, the media server) opens a TCP connection to the camera’s RTSP server on port 554. This TCP connection will carry all RTSP control messages for the duration of the session.

Step 2: Options

The client sends an OPTIONS request to discover which RTSP methods the camera supports. The camera responds with a list, typically DESCRIBE, SETUP, PLAY, PAUSE, TEARDOWN, and sometimes GET_PARAMETER for keepalive purposes. This confirms both sides speak the same language before proceeding.

Step 3: Describe

The client sends a DESCRIBE request for the camera’s stream URI. The camera responds with an SDP (Session Description Protocol) block. This is a structured text payload that describes everything the client needs to know about the available media, including: codec (e.g., H.264), resolution, framerate, bitrate, and the number of available streams. The SDP also specifies the RTP payload type, which tells the client how to decode incoming packets.

Step 4: Setup

This is where transport negotiation happens, and it’s one of the most consequential steps in the workflow. The client sends a SETUP request specifying how it wants to receive the media. Two modes are available:

  1. UDP mode (default).
    The client proposes a pair of UDP ports: one for RTP media data, one for RTCP control feedback. The camera confirms with its own port pair. Media will flow over these dedicated UDP channels, separate from the RTSP control connection. Since UDP doesn’t wait for acknowledgments or retransmit lost packets, this is lower latency. But, it requires those UDP ports to be open and reachable, which can be a problem in firewalled or NAT-heavy environments.
  2. TCP interleaved mode.
    The client requests that RTP and RTCP data be interleaved directly into the existing RTSP TCP connection. Each RTP packet gets a small framing header (a dollar sign prefix, a one-byte channel identifier, and a two-byte length field). The packet is sent inline alongside RTSP control messages, and everything flows over the single TCP connection on port 554. This is more reliable through firewalls and NAT because there’s only one connection to manage, but it introduces the risk of head-of-line blocking. If a single TCP packet is lost, the entire connection stalls while it’s retransmitted, potentially causing latency spikes.

The choice between UDP and TCP interleaved is one of the most important architectural decisions in any scaled deployment. For cameras on a controlled LAN with predictable network conditions, UDP is almost always the right call. For cameras behind corporate firewalls, across VPN tunnels, or on unpredictable public networks, TCP interleaved provides the reliability needed to maintain a stable connection. Wowza Streaming Engine natively supports a media server configuration that attempts UDP first and falls back to TCP interleaved if the UDP connection fails.

Step 5: Play

The client sends a PLAY request, and the camera begins streaming. RTP packets start flowing over whichever transport was negotiated in SETUP (UDP or TCP interleaved). Each packet carries its timestamped, sequenced media payload. Simultaneously, RTCP packets begin flowing in both directions. The camera sends Sender Reports (with synchronization data like NTP timestamps and packet counts), and the client sends Receiver Reports (with packet loss statistics, jitter measurements, and round-trip time estimates). This feedback loop allows both sides to monitor stream health without interrupting the media flow.

Step 6: Steady State

Once PLAY is issued, the stream runs continuously. The RTSP control connection stays open and is only used for keepalive messages (typically GET_PARAMETER or OPTIONS requests sent at regular intervals to prevent the session from timing out) or any mid-session control commands. If the client needs to pause the stream, it sends PAUSE over RTSP and the camera stops transmitting RTP packets while keeping the session alive. If it needs to resume, it sends PLAY again.

Step 7: Teardown

When the client is done, it sends a TEARDOWN request. The camera stops transmitting RTP, closes the RTP/RTCP channels, and releases the session. The TCP control connection is closed.

The sequence from OPTIONS to TEARDOWN is what people mean when they refer to RTSP Streaming. It’s not one protocol doing everything. It’s RTSP managing the session lifecycle, RTP carrying the media, and RTCP providing the feedback channel, all working in tandem.

Streaming with RTP and RTSP Using Wowza

RTP and RTSP aren’t new or flashy, but they are the protocols the vast majority of the world’s IP cameras actually speak. That isn’t changing anytime soon. Instead of replacing them, build smarter infrastructure behind them that can ingest RTSP/RTP feeds reliably at scale, transmux to modern delivery formats, and add intelligence to streams that were originally designed for passive observation.

Wowza Streaming Engine supports the full RTSP/RTP protocol stack. Whether you’re connecting ten cameras or ten thousand, you can manage streams via REST API, deliver to any device via HLS or WebRTC, and integrate AI-driven analytics into your processing pipeline. Contact our team to learn more.

FAQs: RTP vs RTSP

What ports do RTP and RTSP use?

RTSP typically uses TCP port 554 by default. RTP uses dynamic UDP ports that are negotiated during the SETUP phase of the session.

What is RTP over TCP (interleaved mode)?

RTP over TCP, also called interleaved mode, sends RTP and RTCP packets through the same TCP connection as RTSP. This is useful in firewalled or NAT-restricted environments.

Is RTP secure by default?

No. RTP does not include built-in encryption. Secure implementations use SRTP (Secure RTP) to encrypt media streams.

What is RTSPS?

RTSPS is RTSP over TLS, which encrypts the control channel between client and server to improve security.

Does RTP guarantee packet delivery?

No. RTP typically runs over UDP, which does not guarantee delivery, ordering, or duplication protection. This helps reduce latency

Wowza Streaming Engine: Flexible, Extensible, & Reliable Streaming

About Don Kianian

Don Kianian is a seasoned marketing professional and content strategist with deep expertise in video production technology and media workflows. He has spent more than 10 years building content, fostering awareness, and driving demand for complex technology and media solutions. He holds a Master of Science in Marketing from Santa Clara University and a Professional Certificate in Data Analytics from Google. Prior to Wowza, Don led Marketing efforts for Sherpa Digital Media, which was later acquired by Telestream. As a thought leader in the media production and video streaming space, Don hosted and produced "The Wirecast Show" in 2022-2023, joined as a featured guest in interviews to secure prominent industry analyst coverage, and has helped secure numerous awards at NAB, IBC, and Streaming Media events.
View More

FREE TRIAL

Live stream and Video On Demand for the web, apps, and onto any device. Get started in minutes.

START STREAMING!
  • Stream with WebRTC, HLS and MPEG-DASH
  • Fully customizable with REST and Java APIs
  • Integrate and embed into your apps

Search Wowza Resources


Subscribe


Follow Us


Categories

Blog

Back to All Posts