2015-11-17

Intro to WebSockets

The websockets logo. I think.

We're going to do a little three-part introduction to WebSockets, which provide a way to communicate back and forth with the web server without all the overhead of a standard HTTP connection.

And, in the course of it, we'll be writing a simple chat server and client.

We're going to be piggybacking on the previous blog entry, on writing A NodeJS Webserver. In particular, that webserver will be used for generic webserving on this project, and will also be what the WebSockets run on.

So you might as well start there, if you haven't already.

Part 0: A NodeJS Webserver
Part 1: Intro to WebSockets, conceptually
Part 2: Writing the client side chat code for the browser in JavaScript
Part 3: Writing the server-side chat code in NodeJS

If you want to jump right on in, get the full source on GitHub.

WebSockets: Your Friends in Cyberspace

How's that for a catchy 90s title? No? Whatever.

WebSockets are named after the famous Unix sockets API. With this API, you could do a number of things, one of which was communicate over the Internet. Because of this, many other network communication APIs were called "sockets", even though they were only indirectly related.

Free plug: for more information on network programming in C, see Beej's Guide to Network Programming. 😀

In any case, WebSockets perform much the same function: allow people to send and receive information to and from the web server.

How's it different than a regular HTTP request?

Why not just use HTTP with AJAX?

As outlined in the webserver blog, HTTP requests tend to be one-off request-response transmissions, after which the communication is ended. There's a lot of overhead setting up and tearing down these connections. (This is mitigated on the TCP layer with HTTP Persistent Connections.) And there's also a lot of overhead with HTTP headers that can be avoided.

And, perhaps most importantly, HTTP is not designed with having the server provide an unsolicited response. HTTP wants the client to say something, and then the server say something back. But what if you want the server to be able to send things on its own? In the past, this has led to hackish solutions, like using a Flash proxy, or using long polling, or abusing the HTTP persistent connection.

So how does WebSockets solve those problems?

You might not believe it, because you're naturally overly-skeptical, but a WebSocket connection begins life as a regular HTTP connection. But after a couple milliseconds, it's had enough of that nonsense and requests a connection upgrade to the WebSockets protocol. This protocol does away with those pesky HTTP headers, and, of course, keeps the connection open to avoid teardown overhead.

Then, if the server agrees that all this is a good idea, the connection is established and the fun can begin!

Once it's set up, the server can send messages out over the socket, and the client on the browser will get an event letting it know data has arrived. This is how the server pushes unrequested data.

Likewise, the client can also send messages over the socket, and the server will handle them in whichever way it sees fit. For this toy project, we'll be coding the server in NodeJS, and it also gets an event on an incoming message. (Other servers might handle messages in callbacks or with other mechanisms, depending on the server and language involved.)

This isn't a difference, but a sameness: the WebSocket continues to use the same port as the web server (normally port 80), which helps it get through firewalls which might not have other ports open. Very convenient. Almost too convenient... like they planned it that way...

Yeah, but is it well-supported?

WebSockets, like many other web technologies, is 100% supported on all browsers and servers that support it completely. Among browsers and servers that do not support it, penetration remains low.

Thanks, I'll be here all night. Really.

To find out if your browser supports it, try the fantastic website CanIUse. As you can see, it's pretty well-supported as long as you rightfully don't include IE 9 or earlier.

If you need support on earlier browsers, then you'll have to use some kind of library that supports one or more of those hackish solutions I mentioned earlier. One of the more popular ones is Socket.IO, which will actually use WebSockets if they're available.

But this blog is about learning a specific tech as opposed to being as practical as possible, so we'll just stick with WebSockets.

On the server side, there are libraries for NodeJS (we'll be using this websocket package, itself one of many), Go (if you're writing a server in Go), and undoubtedly many other languages and servers.

A Practical Example

Let's set up a client and server, and have the client politely ask if it can set up a WebSocket with the server.

Communication with Server Denied What gives, AngryServer?

Well, that was anticlimactic. What happened?

It could have been a number of things. Maybe the server just plain doesn't support WebSockets. Or maybe the server detected that the request was coming from an unexpected website. Or maybe the client has asked for an unsupported protocol.

Let's tackle these in turn.

What if the server doesn't support WebSockets?

In that case, when the client asks for an upgrade to WebSockets, the server merely replies "no", and that's the end of it.

What if the request was from an unexpected site?

This one's a little bit trickier to explain, but in a nutshell, WebSockets are not constrained by the same-origin policy. This means someone could write JavaScript, host it on another server, and then initiate a WebSocket connection to your server. You might, or might not, want to allow this.

When the connection is being established, the server has an opportunity to see where the page that holds the connector's JS is hosted. If it's hosted somewhere the server doesn't like, the connection can be denied.

Although the origin of the script can be spoofed by non-web-browser clients, this is still very useful against cross-site scripting attacks. A malicious user might, for example, inject script in someone else's blog comments that, when read by an unsuspecting third party, would cause that third party to hit your WebSocket server and unsuspectingly do... bad things with it. (The exact nature of the badness depends on what data your server's handing out.)

A common case here would be to only allow WebSocket connections from sites that you own.

What if the request asked for an unsupported protocol?

See the section titled "Protocols", way, way below.

Protocols

A protocol droid. This has nothing to do with anything we're about to discuss.

What is a protocol? It's just an agreement from both sides about the details of which language they're going to speak. In this case, we're going to identify the protocol by a name you just make up. For a chat program, I might call it beej-chat-protocol.

Even though both sides are speaking the WebSockets protocol deep down, you can specify higher-level protocols for your own use. This is completely free-form and up to you.

Why would you bother? Why not just code the client and server the same way and not name the protocol at all?

Well, let's say you wrote a client and server that both spoke the same chat protocol. Everything's running swimmingly. But then you discover that a customer wants some features that your chat program doesn't support.

So you think, "No problem. We'll just update the server and client to support that." But then it turns out you have another customer who likes the old chat program and doesn't want it updated.

Time for cleverness. What you do is define two protocols, beej-chat-protocol and beej-chat-protocol-v2, and program the server to know both of them. Then the old client can keep using chat-protocol just as always, and the new client can start using chat-protocol-v2.

When the connection is establishing, the client tells the servers the protocols it knows, and the server replies with the protocol that both are going to use. Or the server can shut down the connection if the client doesn't request any protocols it knows.

Protocol Design

This is a topic unto itself, but basically, for a given protocol, we need to define the data that is transmitted and received for any action that can be taken on the part of the client, server, or user.

When the user types "Hello!" into the chat, what data, exactly is transmitted to the server?

WebSockets can send binary data and text data. Some protocols will use binary. Some will use text. Binary tends to be more terse and thus faster to transmit, but really, it's all up to you.

For the chat program, we'll keep it simple. We'll say that all the data that's transferred will be in the form of a JSON string. (So it will be transmitted as text, not binary.)

The JSON will have a type property (a string), and a payload property (an object). The structure of the payload will depend on the type.

(None of this is written in stone tablets, anywhere. I'm just making it up. That's how protocols are created.)

Here's a sample transmission from the beej-chat-protocol that represents a message from a user:

{
    "type": "chat-message",

    "payload": {
        "username": "Beej",
        "message": "Hello, Goats!"
    }
}

The beej-chat-protocol will also define messages of type chat-join and chat-leave, but we'll leave those definitions to your imagination, given the chat-message description, above.

Once the protocol is defined, it must be implemented on both the client and server so they're speaking the same language, oui?

Connection to Server Accepted Now we're cookin' with gas!

Normal communications

Once the WebSocket is connected, then data can be sent from the client to the server (or vice-versa), in the format specified in the protocol.

(Again, the protocol's not written in stone anywhere, so technically it merely very much should be in the format specified by the protocol. The computer won't burst into flame if you don't obey the protocol, but you might cause yourself debugging pain later. And other developers will scowl when they see your code.)

But at this point, everything's pretty simple. The client can build up a JSON packet, convert it to a string, and send that to the server.

When it arrives, the server will receive an event with the data attached. It turns it back into a JS Object from the JSON string, looks at the type and decides what to do with the payload.

For example, it might receive a packet of type chat-message, and it knows that it should broadcast that packet out to all connected clients so they can display the chat message.

At that point, the same thing happens in reverse. The server sends the message, and the client gets an event saying the message has arrived. The client looks at the type and decides what to do with the payload. If it's a chat-message, for example, it would display the message on the screen.

Closing the Connection

The connection can be closed from either the server or the client side. It can be explicit (where the server or client deliberately closes the connection in code), or implicit (when the server crashes, or the browser tab is closed.)

And there are events on the client and server for the end of a connection, similar to how there are events for regular data. You can listen for those, and do the Right Thing when they occur.

Errors

Similar to regular communications and close events, there's an error event that can be caught and handled. Often, a close event follows on its heels.

In Conclusion

That's the overall of how these beasts work. In the next episode we'll write some actual client-side code!

Continue to Part 2: Writing the client side chat code for the browser in JavaScript

Beej's Bit Bucket

⚡ Tech and Programming Fun