WebRTC Signaling Servers: Everything You Need to KnowJuly 15, 2020
You may have heard somewhere that WebRTC is a peer-to-peer protocol. What might be less apparent at first glance is that this doesn’t mean you don’t need servers to get WebRTC to work. The most basic servers you need are WebRTC signaling servers. These servers make sure that your device can get connected to other devices you want to communicate with.
I am using the term ‘device’ and not ‘user’ simply because of two main reasons:
- You might not be connecting to a person on the other end, but rather to a machine or a video stream (a surveillance camera source or a video recorder sink).
- Users may have multiple devices. And while you want to connect to a user, WebRTC understands devices.
Before we begin though, you may want to familiarize yourself with the terminology by reading this if you are new to WebRTC.
How Does WebRTC Signaling Work?
Understanding what a WebRTC signaling server does probably starts with this diagram:
I use this illustration in my courses to explain how WebRTC calls get connected in WebRTC.
The server at the top is how the two users find each other. Both are somehow connected to that server, which is left out of the scope of WebRTC. It can be two people registered to a social network, a doctor and a patient logging into a scheduled visitation, a person browsing a website and trying to “call” the site’s owner, etc. The options here are endless.
The exchange illustrated in lines 1-4 is the offer-answer mechanism that is part of WebRTC. These messages aren’t WebRTC messages, but rather proprietary ones that contain SDP. What happens here is that WebRTC creates SDP blobs. These are message fragments that the application needs to signal to the other device to connect a session. It does that by using its own signaling protocol and a WebRTC signaling server.
That 5th line in the diagram denotes the actual media, which is sent directly between the devices. To get there, the devices have to first communicate via the signaling server.
What Is a Signaling Server Then?
A WebRTC signaling server is a server that manages the connections between devices. It doesn’t deal with the media traffic itself, but rather takes care of… signaling. This includes enabling one user to find another in the network, negotiating the connection itself, resetting the connection if needed, and closing it down.
The interesting thing is that WebRTC signaling servers don’t do anything that is WebRTC-specific. They just assist in passing the messages around by the logic dictated by your application.
For signaling, you can pick standard protocols such as SIP over WebSocket, XMPP, or even MQTT. What most developers do though, is use proprietary signaling protocols that fit their own application. Why? Because they need a communication mechanism between users anyway, and that is broader in scope than what WebRTC or pure voice or video communication requires. Think of a dating application: the application already has some kind of “signaling” to connect people, so adding video calling to it over the same signaling protocol makes more sense than adding yet another alternative.
NAT Traversal and Exchanging ICE Candidates
Part of what needs to be done involves attempting to go through firewalls and NAT devices. This is done using a protocol called ICE, which collects, exchanges, and then attempts to connect a session using ICE candidates. ICE candidates are pairs of potential addresses that can get the devices to connect to each other either 1) directly by using a private or a public IP address obtained via a STUN server or 2) indirectly through TURN servers.
The diagram above shows the signaling messages that hold the ICE candidates that are being exchanged. These can hold different types of IP addresses that are then used for connectivity checks.
As with the initial offer-answer SDP messages, these ICE candidates are also sent with the help WebRTC signaling server by wrapping them in signaling messages.
In most resources, the tendency here is to focus on the STUN and TURN servers that are used for the ICE negotiation and less on the fact that the negotiation itself is facilitated by the signaling server.
Other WebRTC Servers
By now, you’ve seen two types of servers:
- WebRTC signaling servers
- STUN and TURN servers, used for ICE negotiation
But to get WebRTC to work, you’ll often need 4 types of WebRTC servers. The other two types are the application servers, which are the usual web servers used to develop applications, and media servers.
Media servers aren’t necessary in every WebRTC deployment, but they are quite common in many such deployments. They play a crucial role in group sessions as well as one-to-many broadcasts.
In our context here, let’s see where media servers fit in these broadcast scenarios.
Why Media Servers Are Required for One-to-Many WebRTC Broadcasts
When streaming WebRTC media to a large group of devices, you need to use a media server.
Let’s assume that you want to generate a 1mbps video stream using WebRTC and broadcast it live to 100 viewers. Without a media server, the device you use will need to use 100mbps uplink connection, something that is both uncommon and in many ways wasteful in resources. Add to that the challenge of having a device sustain such a load with a large number of open connections, and you can see how this is not a feasible alternative.
The solution in such cases is to make use of streaming media servers. The broadcaster’s device sends its media stream towards the media server, which in turn takes care of streaming that content to its viewers. A nice thing about this approach is that the media server can also take care of transcoding and even repackage the WebRTC stream into other protocols. In these cases, the media server can also act as the signaling server.