WebRTC Security: CORS, DTLS, SRTP, OMG!

SHARE

The web isn’t as simple as it used to be. The days of uploading a simple HTML page you coded in Notepad (or BBEdit) and uploaded via FTP to your WebServer are over. Even JavaScript isn’t the simple language it started as (who’s using the once ubiquitous JQuery these days?). The modern day JavaScript development… Continue reading WebRTC Security: CORS, DTLS, SRTP, OMG!

The web isn’t as simple as it used to be. The days of uploading a simple HTML page you coded in Notepad (or BBEdit) and uploaded via FTP to your WebServer are over. Even JavaScript isn’t the simple language it started as (who’s using the once ubiquitous JQuery these days?). The modern day JavaScript development process introduces a slurry of new requirements ranging from pulling code libraries from NPM to transpiling from ES6 so older browsers can interpret your code. Further increasing to the debatable chaos, the myriad of JS frameworks available can even baffle and overwhelm seasoned developers. Not only is the development process more complex, so too are the deployment and security restrictions enforced by modern browsers. Cross-origin resource sharing known as CORS and the need for SSL everywhere is just part of the modern web.

With respect to all that, when you add WebRTC security into the mix your head just might explode. WebRTC is remarkably secure allowing for fully encrypted streams, but with this comes complexity. To add to that complexity, particularly when it comes to WebRTC, not all browsers implement the same security strategy. Let’s delve into what is actually going on with WebRTC and security and how that relates to setting up Red5 Pro for live one-to-many streaming at scale.

CORS

CORS, or cross-origin resource sharing is the W3C standard that restricts resources from one website, sharing accessing data from another. It provides a mechanism in which a browser and server can interact to determine whether or not it is safe to allow the cross-origin request. This same spec has implications for WebRTC as well. In order for a stream to connect to another peer, you start with negotiating that connection via a signaling protocol. The WebRTC spec doesn’t specify how you send these signaling messages, so you can choose to do it over HTTP or with WebSockets. Either way, if you are attempting to connect to a server for signaling that isn’t the same as the web server in which your page is being served, you will need to deal with CORS and the configurations that it provides. The way we implement signaling for Red5 Pro is with WebSockets.

HTTPS and Secure WebSockets (WSS)

As mentioned above, as long as you write CORS headers into the page that’s serving up your broadcaster you will be all set, right? Not so fast chief. Chrome introduced a new requirement about a year ago that only allows access to getUserMedia if it’s served from a secure website. This means if you want someone to be able to stream their camera and microphone, then you need to serve your page via HTTPS. You are probably already doing this if you have a production website, but in case you don’t know how this is done, let’s quickly cover it.

In order for a site to deliver its content over HTTPS you need to 1) use a domain name to access the site, and 2) have a certificate from a verified provider installed on your web server. This way the browser can validate the domain against the cert from the provider that it trusts, and then this allows for a key exchange with your web server to allow SSL encryption. Then your page is delivered encrypted ensuring it is not sent as plain HTML/JavaScript text that anyone in the world could intercept.

So back to our WebRTC use case. Since the HTML page has to be delivered via HTTPS to the browser in order for the user to access a camera and microphone on Chrome, this also means that any server you communicate with from that page also needs to be secure. WebSockets, while not HTTP, also fall under this same criteria. The way SSL is done over WebSockets is through WSS. That last S, as I’m sure you’ve already gathered, stands for Secure. The same kind of cert and domain that you setup for HTTP traffic can be used exactly the same way for WebSocket communication. Since Red5 Pro uses WebSockets for signaling, this means that WebSocket server also needs a cert installed and a domain associated with it. The WebSocket server, in this case, is Red5 Pro. While installing a cert isn’t the easiest thing to do, luckily we have really easy to follow documentation on how to do it with some of the most popular cert providers.

Once you have a cert installed on your web server and another one on the Red5 Pro server, then the browser allows access to the camera and mic, and the signaling to Red5 Pro is allowed over secure WebSockets. Keep in mind that the next couple of sections are a very low-level description of what’s going on, and if you want, you can simply skip reading it. Red5 Pro’s HTML5 Streaming SDK takes care of all the signaling and setting up the stream with the server for you. However, if you want a better understand of what’s going on under the covers, please read on.

You are almost there! Next up, let’s look at what happens with signaling and the streams themselves. But first, a diagram:

Layout

Signaling in More Detail

Now that you have a secure connection to the Red5 Pro server from your web page, the two sides begin the negotiation over the WebSocket. This is called signaling. In the simplest terms, signaling is used to allow the browser and the server to setup a connection to each other to be able to send and receive video/audio. Since WebRTC is a Peer to Peer protocol by design, when making a connection to Red5 Pro server, Red5 is acting as one of the peers in the topology. This approach allows for the Red5 Pro server to become a peer client communicating with the browser, which then pulls its video and audio to relay to the rest of the Red5 streaming pipeline. On the flip side, a WebRTC subscriber client wanting to watch a stream also makes a P2P connection to Red5 Pro in the same manner, which after the connection is negotiated, Red5 pushes the video and audio down to the browser for viewing.

During the signaling phase, the browser and server begin to exchange data back and forth in an attempt to setup the connection to push and receive the streaming audio and video. The signaling data being exchanged are comprised of two types:

  • SPD – session control messages that cover media capabilities
  • ICE candidates – network configuration details used for making the P2P connection

SDP Exchange

Let’s start with SDPs. Session Description Protocol or SDP as it’s known, is a format for describing the capabilities of a media capable device. In our case, that’s the Red5 Pro server and the browser. Since the subject of WebRTC signaling and SDP exchange has been covered numerous times before, and the focus of this article is on security, I’m going to oversimplify what’s going on here. Basically, the browser sends the server with a list of its capabilities, like which codecs it can use, the resolutions it can produce, and a lot of other detailed information to set-up the stream. The server then responds back with what it can handle. In our case, we encourage the client to broadcast with h.264 to minimize transcoding across multiple platforms and services. Once the two sides agree on how they can communicate, the process moves to the ICE candidates phase.

ICE Candidates

Another aspect of establishing the P2P connection with the server is the exchange of what’s known as ICE candidates. ICE is a protocol for use to establish connections between devices across the internet. The information in an ICE candidate includes whether to use UDP or TCP for transmission, the IP address of the client, and other details for making a direct connection to the peer.

ICE also consists of two sub-protocols known as STUN (Session Traversal Utilities for NAT) and TURN (Traversal Using Relay around NAT). STUN is used to punch through firewalls/NATs, and TURN is used if it can’t get a direct P2P using STUN. TURN in this case basically routes the traffic through a middleman server AKA a TURN server. With Red5 Pro, our media server doesn’t use a firewall (like all servers on the internet), so the need to route through a TURN server is typically non-existent. However, you definitely will need to use a STUN server, as many of the world’s computers/devices sit behind firewalls. Luckily for you, in our HTML5 examples that use our streaming SDK, they default to public STUN servers hosted by Google and Mozilla to get you up and running quickly. That said, when you go to deploy your application to the public, you will want to setup and host your own STUN/TURN server. Curious about how to do that? We’ve got you covered.

DTLS and SRTP

Now that the Red5 Pro server and the browser client know how to connect to each other, the next step is to establish a secure connection using the info in the ICE candidates. One of the great things about WebRTC, as already mentioned before, is how the spec forces all traffic to be encrypted.

Encrypting the video and audio channels that are being sent though, requires a step to get it going. This is where DTLS (Datagram Transport Layer Security) comes in. DTLS (for those hardcore security geeks out there), is a subset of TLS but modified to be used with UDP connections. DTLS takes care of exchanging keys used to encrypt and decrypt the stream at both peers. At this step, DTLS exchanges the first keys to be used to encrypt and decrypt the stream at both peers. Then the browser is able to start streaming the video and audio over SRTP.

SRTP (Secure Realtime Protocol) is the transport protocol that WebRTC uses to send and receive encrypted video and audio. DTLS exchanges the keys that SRTP uses for the encryption. Part of the way SRTP works is that the encryption key used changes periodically, so DTLS needs to update that from time to time and will do so as needed by SRTP. The two protocols work closely in tandem to keep the stream secure throughout the session, and because of this, a lot of folks just lump them together as DTLS/SRTP.

One thing to note: most of the focus here was describing the peer connection from a broadcasting client connected to the server peer. However, everything described above works in reverse as well. The Red5 Pro server relaying streams out to WebRTC subscriber clients watching the video is done in the exact same manner ensuring that each stream is perfectly secure.

As you can see, there are a few different layers built into WebRTC to ensure that streams are fully encrypted and the connections established are secure. Understandably, with that security comes a bit of complexity. However, we’ve done our best to minimize the amount of work required for a developer to implement a smooth, protected, and low latency stream using Red5 Pro. Frequent reports of internet hacking, have highlighted the increasing need for cyber security. This was a big part of Red5 Pro’s decision to integrate with WebRTC. If you have any questions, concerns or counterpoints, please shoot us a message or get on a call.

As always, happy coding!