RTP & RTSP: Answering Frequently Asked Questions About Streaming Protocols
In a previous blog, we discussed the differences between RTP and RTSP. Understanding how RTP and RTSP relate to each other is critical if you’re building or managing any kind of IP camera infrastructure, surveillance system, or video streaming workflow that depends on real-time media transport. Getting that relationship right is the foundation for transport decisions (TCP vs. UDP), security (SRTP and RTSPS), scalability, and how your streams ultimately get delivered to viewers through modern formats like HLS or WebRTC.
This article takes a broader look and examines the landscape of common streaming protocols. Specifically, this blog will cover:
Table of contents

The Complete Protocol Stack: RTP, RTCP, RTSP, and SRTP
It’s not just RTP and RTSP. RTCP and SRTP are two other protocols that work with RTP and RTSP. These protocols operate together as a system, each handling a specific responsibility in the media server.
RTCP or Real-Time Control Protocol
This is RTP’s companion protocol, defined in the same RFC (3550). While RTP carries media data, RTCP carries metadata about that media data. That includes Quality of Service (QoS) feedback that the sender and receiver use to monitor stream health. RTCP messages include
- Sender Reports (the camera’s view: NTP timestamps, packet counts, byte counts)
- Receiver Reports (the client’s view: packet loss fraction, cumulative loss, jitter, round-trip time)
- BYE messages (occasionally) when a source leaves the session
Without RTCP, a media server ingesting hundreds of camera feeds would have no visibility into which streams are degrading, which cameras are dropping packets, or whether jitter is creeping up on a particular network segment.
SRTP or Secure Real-Time Transport Protocol
SRTP is defined in RFC 3711, and layers encryption, message authentication, and replay protection on top of standard RTP. It uses AES for encryption and HMAC-SHA1 for authentication, securing the media payload without fundamentally changing how RTP operates. The same timestamping, sequencing, and transport mechanisms still apply, and the key negotiation typically happens during the RTSP SETUP phase, depending on the deployment.
Now, when you see RTSPS referenced on a camera spec sheet or in a media server’s documentation, it’s RTSP running over TLS, or Transport Layer Security. This is encrypting the control channel. Combined with SRTP for media encryption, you get end-to-end protection. The session negotiation, playback commands, and audio/video data are all encrypted. This is increasingly common in ONVIF-compliant cameras from manufacturers and is a supported ingest workflow for Wowza Streaming Engine.
Here’s how the four protocols map together:
| Protocol | What it handles | Layer | Runs over |
| RTSP | Session control: SETUP, PLAY, PAUSE, TEARDOWN | Application | TCP (port 554) |
| RTP | Media transport: audio and video packets | Transport | UDP (default) or TCP interleaved |
| RTCP | Quality feedback: packet loss, jitter, sync | Transport | UDP (port adjacent to RTP) |
| SRTP | Encrypted media transport | Transport | UDP or TCP interleaved |
| RTSPS | Encrypted session control | Application | TLS over TCP (port 322) |
How the RTSP/RTP Stack Compares to RTMP, SRT, and WebRTC
The RTSP/RTP stack isn’t the only option for ingest. RTMP, SRT, and WebRTC each reflect a fundamentally different streaming architecture design philosophy. Understanding where they overlap and diverge is essential for choosing the right protocol for your deployment.
| RTSP/RTP | RTMP | SRT | WebRTC | |
| Architecture | Separated: RTSP controls, RTP transports | Unified: RTMP handles both control and transport | Unified: SRT transport with built-in error correction | Unified: Peer-to-peer media exchange over WebRTC |
| Transport | UDP (default) or TCP | TCP only | UDP with ARQ retransmission | UDP with DTLS encryption |
| Latency | 2-5 seconds | 2-5 seconds | ~1 second | Sub-second |
| Reliability | UDP: fast but lossy TCP: reliable but blocking risk | Reliable (TCP handshake), but limited bandwidth recovery | Designed for lossy networks. ARQ recovers packets without TCP overhead | Adaptive bitrate adjusts dynamically |
| Encryption | Optional (SRTP + RTSPS) | Optional (RTMPS over TLS) | AES-128/256 built in | DTLS + SRTP mandatory |
| Browser support | None | None (Flash deprecated) | None (requires media server) | Native in all modern browsers |
| Primary use | IP camera ingest and surveillance | Live ingest to streaming platforms | Broadcast contribution over public internet | Real-time communication, low-latency playback |
| Maintained? | Yes (RFCs and ONVIF) | No (Adobe dropped support Dec 2020) | Yes (open-source SRT Alliance) | Yes (W3C and IETF) |
RTSP/RTP vs. RTMP
The most common comparison, and one that often generates more confusion than clarity, is between RTS/RTP and Real-Time Media Protocol (RTMP). The fundamental difference comes down to the architecture:
- RTMP bundles control and transport into a single protocol over TCP
- RTSP/RTP separates control and transport across distinct layers and can use UDP or TCP
In practice, RTMP dominates live streaming ingest to consumer platforms like YouTube, Twitch, and Facebook because it’s simple to configure and widely supported by software encoders like OBS. But, RTSP/RTP dominates IP camera and surveillance workflows because it’s the standard cameras actually speak. Also, it’s required for ONVIF compliance. What’s more, its support for UDP transport delivers the lower latency that real-time monitoring demands.
So, if you’re streaming a live event to a social platform, RTMP is almost certainly the right ingest protocol. However, RTSP/RTP is the standard if you’re ingesting feeds from a fleet of IP cameras into a media server.
RTSP/RTP vs. SRT
SRT (Secure Reliable Transport) is the most direct modern competitor to RTSP/RTP for first-mile contribution. It was designed specifically for reliable, low-latency transport over unpredictable public networks. SRT runs over UDP but adds ARQ (Automatic Repeat reQuest) retransmission. This recovers lost packets without the head-of-line (HOL) blocking penalty of TCP. Also, SRT includes AES encryption by default, eliminating the need to bolt on SRTP separately. For new deployments that involve cameras or encoders transmitting across the public internet, such as remote sites, distributed field operations, and multi-region architectures, SRT is increasingly the better choice.
That being said, SRT doesn’t replace RTSP/RTP in existing camera fleets. The vast majority of installed IP cameras speak RTSP/RTP natively and don’t support SRT. Replacing them isn’t practical or cost-effective at scale. Nor is it necessary with a flexible media server.
RTSP/RTP vs. WebRTC
Web Real-Time Communication (WebRTC) solves a different problem entirely. It’s a peer-to-peer protocol built for real-time, bidirectional communication in web browsers. This is ideal for video calls, telehealth sessions, and interactive broadcasts. WebRTC is the only protocol on this list with native browser support, and its mandatory DTLS/SRTP encryption makes it secure by default. Where WebRTC intersects with RTSP/RTP is on the delivery side. A media server can ingest RTSP/RTP from IP cameras and deliver to operator dashboards via WebRTC, achieving sub-second latency from camera to browser. This RTSP-to-WebRTC workflow has become the standard architecture for real-time surveillance dashboards and traffic operations centers.
These protocols are optimized for different streaming workflows and use cases. In most enterprise or government deployments, it doesn’t come down to just one protocol. By and large, RTSP/RTP gets used for first-mile ingest from cameras. Then, a media server processes and transmuxes so HLS or WebRTC can handle last-mile delivery to viewers. The media server is the connective tissue that makes this multi-protocol architecture work.
RTP and RTSP for IP Cameras: Why They Still Matter
The overwhelming majority of IP cameras deployed in the field today speak RTSP/RTP natively. Not RTMP, not SRT, and not WebRTC. IP camera and surveillance infrastructure is where the RTSP/RTP stack lives. It is the foundational layer that makes modern video intelligence possible. This is driven by three realities.
ONVIF Compliance Depends On RTSP
ONVIF (Open Network Video Interface Forum) is the interoperability standard that allows cameras from different manufacturers. Axis, Bosch, Hanwha, Hikvision, Dahua, and dozens of other devices work with third-party video management software, network video recorders, and media servers through a common interface.
ONVIF’s media streaming profiles rely on RTSP for session control and RTP for media transport. If a camera is ONVIF-compliant, it supports RTSP/RTP. This means any system designed to integrate cameras from mixed vendors at scale is built on RTSP/RTP by default. This is a typical scenario in state DOT deployments, municipal surveillance programs, and enterprise security operations.
Rip-And-Replace Isn’t An Option
Government agencies, transportation departments, and enterprise security teams have thousands (sometimes tens of thousands) of IP cameras already deployed. Many of these IP cameras have been in service for years, and will continue operating for years to come. These cameras produce RTSP/RTP streams. Replacing them to adopt a newer protocol isn’t a realistic option when the hardware is functioning and budgets are constrained.
The practical strategy is to modernize what sits behind the cameras. The media server infrastructure that ingests, processes, and delivers those feeds can add modern capabilities without touching the cameras themselves.
Reliable Ingest Powers Downstream ROI
The real value is in what happens after ingest. This is the shift that’s driving the RTSP revival. Legacy RTSP/RTP camera feeds that were originally designed for simple human observation are now being routed through media servers capable of much more than just transmuxing to HLS.
Modern architectures add intelligent processing at the ingest layer. This Video Intelligence Framework powers computer vision models to run object detection, vehicle classification, or automated counting directly on the ingested stream. The camera doesn’t need to be smart. The infrastructure behind it does. And operators can make smarter decisions as a result.
The Modern RTSP/RTP Architecture
In a contemporary deployment, the architecture typically follows a layered pattern:
At the edge, IP cameras stream via RTSP/RTP to either a centralized origin media server, or a distributed set of regional ingest nodes close to the camera clusters. The media server handles session management (RTSP), receives the media (RTP), and monitors quality (RTCP). TCP interleaved mode ensures reliable connectivity for cameras behind firewalls or on constrained networks. UDP provides the lowest latency path for cameras on stable LANs.
At the processing layer, the media server transmuxes the incoming RTSP/RTP feeds into delivery-ready formats. HLS has broad device compatibility and CDN distribution. WebRTC has sub-second latency delivery for traffic operations centers, security dashboards, or field responders. DASH has adaptive bitrate delivery where variable bandwidth is a concern. This is also where AI inference can be applied to detect objects, trigger alerts based on predefined rules, or inject event metadata (like ID3 tags) for downstream systems to consume.
At the delivery layer, the processed streams reach end users in whatever format their device supports. A DOT operator monitoring a freeway corridor sees a WebRTC feed nearly instantly. A public-facing traffic camera page serves HLS through a CDN to thousands of concurrent viewers. An evidence management system records a transmuxed stream to storage with a complete chain of custody.
RTP and RTSP: Frequently Asked Questions
Is RTP the same as RTSP?
No. They handle different layers of the streaming workflow. RTP is a transport protocol that carries audio and video data packets between endpoints. RTSP is an application-layer control protocol that manages the streaming session: setup, playback, pause, and teardown. They work together in most deployments, but they aren’t interchangeable.
Does RTSP use RTP?
Yes. RTSP relies on RTP to deliver the actual media content. When a client sends a PLAY command via RTSP, the server begins transmitting audio and video using RTP. RTSP handles the control plane; RTP handles the data plane.
Can RTP work without RTSP?
It can. RTP supports a push mode where a sender transmits media directly to a known receiver address without any session negotiation. This is common in some multicast and broadcast scenarios.
What port does RTSP use?
RTSP uses TCP port 554 by default. The encrypted variant, RTSPS, uses port 322. These are IANA-registered port assignments.
What port does RTP use?
RTP doesn’t have a fixed port. Instead, port numbers are dynamically negotiated during the RTSP SETUP phase. By convention, RTP uses an even-numbered UDP port and its companion RTCP channel uses the next odd-numbered port (e.g., RTP on 6970, RTCP on 6971). In TCP interleaved mode, RTP data flows over the existing RTSP TCP connection on port 554 instead of using separate ports.
Which is faster, RTP or RTSP?
The two protocols do different things. RTP is what carries the media, and it can achieve sub-300ms latency over UDP in optimal conditions. RTSP is the control protocol and adds negligible latency to the workflow. Its contribution is a few round-trip exchanges during session setup, not during ongoing media delivery. The combined RTSP/RTP stack typically delivers under two seconds of end-to-end latency on a local network.
Is RTSP still used in 2026?
Very much so. RTSP remains the dominant ingest protocol for IP cameras and is required for ONVIF compliance. While it’s rarely used for last-mile delivery to browsers (that role belongs to HLS and WebRTC), its position as the standard for first-mile camera contribution is secure for the massive installed base of RTSP-speaking cameras in government, transportation, and enterprise security deployments.
Which is more secure, RTP or RTSP?
Neither provides encryption by default. Securing a deployment requires adding SRTP (which encrypts the RTP media stream) and RTSPS (which encrypts the RTSP control channel with TLS). For environments subject to compliance requirements like CJIS or HIPAA, both should be used together to achieve end-to-end encryption from camera to server.
What is the difference between RTSP and RTSPS?
RTSPS is RTSP wrapped in TLS encryption. Standard RTSP sends control commands (SETUP, PLAY, TEARDOWN, etc.) in plaintext over TCP port 554. RTSPS encrypts that control channel, defaulting to port 322. Note that RTSPS only secures the control channel, the media stream itself still requires SRTP for encryption.
Answering The “RTP vs RTSP” Question
The critical point is that RTSP/RTP was designed for ingest and first-mile contribution. RTP and RTSP excel at getting the video from the camera to the server reliably and with minimal latency. Everything after that is the media server’s responsibility. That division of labor makes a video infrastructure stack durable. The cameras don’t need firmware updates to support new delivery formats or AI capabilities. Video Intelligence lives in the infrastructure behind the cameras.
Wowza Streaming Engine supports this full workflow natively: RTSP/RTP ingest (including RTSPS and SRTP for encrypted feeds), transmuxing to HLS, WebRTC, and DASH, REST API-driven stream management for automated camera provisioning, and custom module support for integrating AI inference and metadata injection into the processing pipeline. For teams managing large camera fleets, this means a single server platform that bridges the gap between legacy RTSP/RTP cameras and modern requirements without a rip-and-replace of existing hardware. Contact our team to learn more.
