Chapter 8. The Real-Time Web

Why is the real-time Web so important? We live in a real-time world, so it’s only natural that the Web is moving in that direction. Users clamor for real-time communication, data, and search. Our expectations for how quickly the Internet should deliver us information have changed—delays of minutes in breaking news stories are now unacceptable. Major companies like Google, Facebook, and Twitter have been quick to catch onto this, offering real-time functionality in their services. This is a growing trend that’s only going to get bigger.

Real Time’s History

Traditionally, the Web was built around the request/response model of HTTP: a client requests a web page, the server delivers it, and nothing happens again until the client requests another page. Then, Ajax came along and made web pages feel a bit more dynamic—requests to the server could now be made in the background. However, if the server had additional data for clients, there was no way of notifying them after the page was loaded; live data couldn’t be pushed to clients.

Lots of solutions were devised. The most basic was polling: just asking the server over and over again for any new information. This gave users the perception of real time. In practice, it introduced latency and performance problems because servers had to process a huge number of connections a second, with both TCP handshake and HTTP overheads. Although polling is still used, it’s by no means ideal.

Then, more advanced transports were devised under the umbrella term Comet. These techniques consisted of iframes (forever frames), xhr-multipart, htmlfiles, and long polling. With long polling, a client opens an XMLHttpRequest (XHR) connection to a server that never closes, leaving the client hanging. When the server has new data, it’s duly sent down the connection, which then closes. The whole process then repeats itself, essentially allowing server push.

Comet techniques were unstandardized hacks, and as such, browser compatibility was a problem. On top of that, there were performance issues. Every connection to the server contained a full set of HTTP headers, so if you needed low latency, this could be quite a problem. That’s not to knock Comet, though—it was a valid solution when there were no other alternatives.

Browser plug-ins, such as Adobe Flash and Java, were also used for server push. These would allow raw TCP socket connections with servers, which could be used for pushing real-time data out to clients. The caveat was that these plug-ins weren’t guaranteed to be installed, and they often suffered from firewall issues, especially on corporate networks.

There are now alternative solutions as part of the HTML5 specification. However, it will be a while before all the browsers, particularly Internet Explorer, are up to speed with the current developments. Until then, Comet will remain a useful tool in any frontend developer’s arsenal.

WebSockets

WebSockets are part of the HTML5 specification, providing bidirectional, full-duplex sockets over TCP. This means that servers can push data to clients without developers resorting to long polling or browser plug-ins, which is quite an improvement. Although a number of browsers have implemented support, the protocol is still in flux due to security issues. However, that shouldn’t put you off; the teething problems will soon get ironed out and the spec will be finalized. In the meantime, browsers that don’t support WebSockets can fall back to legacy methods like Comet or polling.

WebSockets have significant advantages over previous server push transports because they are full-duplex, aren’t over HTTP, and persist once opened. The real drawback to Comet was the overhead of HTTP—every request also had a full set of HTTP headers. Then there was the overhead of multiple extraneous TCP handshakes, which was significant at high levels of requests.

With WebSockets, once a handshake is completed between client and server, messages can be sent back and forth without the overhead of HTTP headers. This greatly reduces bandwidth usage, thus improving performance. Since there is an open connection, servers can reliably push updates to clients as soon as new data becomes available (no polling is required). In addition, the connection is duplex, so clients can also send messages back to the server, again without the overhead of HTTP.

This is what Google’s Ian Hickson, the HTML5 specification lead, said about WebSockets:

Reducing kilobytes of data to 2 bytes...and reducing latency from 150ms to 50ms is far more than marginal. In fact, these two factors alone are enough to make WebSockets seriously interesting to Google.

So, let’s look at WebSocket support in the browsers:

  • Chrome >= 4

  • Safari >= 5

  • iOS >= 4.2

  • Firefox >= 4*

  • Opera >= 11*

Although Firefox and Opera have WebSocket implementations, it’s currently disabled due to recent security scares. This will all get sorted out though, probably by the time this book goes to print. In the meantime, you can gracefully degrade with older technologies like Comet and Adobe Flash. IE support is nowhere on the map at the moment, and it probably won’t be added until after IE9.

Detecting support for WebSockets is very straightforward:

var supported = ("WebSocket" in window);
if (supported) alert("WebSockets are supported");

From a browser perspective, the WebSocket API is clear and logical. You instantiate a new socket using the WebSocket class, passing the socket server endpoint—in this case, ws://example.com:

var socket = new WebSocket("ws://example.com");

Then, we can add some event listeners to the socket:

// The connection has connected
socket.onopen = function(){ /* ... */ }

// The connection has some new data
socket.onmessage = function(data){ /* ... */ }

// The connection has closed
socket.onclose = function(){ /* ... */ }

When the server sends some data, onmessage will be called. Clients, in turn, can call the send() function to transmit data back to the server. Clearly, we should call that only after the socket has connected and the onopen event has fired:

socket.onmessage = function(msg){
  console.log("New data - ", msg);
};

socket.onopen = function(){
  socket.send("Why, hello there").
};

When sending and receiving messages, only strings are supported. However, it’s simple enough to serialize and deserialize the message strings into JSON, creating your own protocol:

var rpc = {
  test: function(arg1, arg2) { /* ... */ }
};

socket.onmessage = function(data){
  // Parse JSON
  var msg = JSON.parse(data);

  // Invoke RPC function
  rpc[msg.method].apply(rpc, msg.args);
};

Above, we’ve created a remote procedure call (RPC) script. Our server can send some simple JSON, like the following, to invoke functions on the client side:

{"method": "test", "args": [1, 2]}

Notice we’re restricting invocation to the rpc object. This is important for security reasons—we don’t want to expose clients to hackers by evaluating arbitrary JavaScript.

To terminate the connection, just call the close() function:

var socket = new WebSocket("ws://localhost:8000/server");

You’ll notice when instantiating a WebSocket that we’re using the WebSocket scheme, ws://, rather than http://. WebSockets also allow encrypted connections via TLS using the wss:// schema. By default, WebSockets will use port 80 for nonencrypted connections and port 443 for encrypted ones. You can override this by providing a custom port in the URL. Keep in mind that not all ports are available to clients; firewalls may block the more uncommon ones.

At this stage, you may be thinking, “I can’t possibly use this in production—the standard’s a moving target and there’s no IE support.” Well, those are valid concerns, but luckily there’s a solution. Web-socket-js is a WebSocket implementation powered by Adobe Flash. You can use this library to provide legacy browsers a WebSocket fallback to Flash, a plug-in that’s almost ubiquitously available. It mirrors the WebSocket API exactly, so when WebSockets have better penetration, you’ll only need to remove the library—not change the code.

Although the client-side API is fairly straightforward, things aren’t quite so simple server side. The WebSocket protocol has been through several incompatible iterations: drafts 75 and 76. Servers need to take account of both drafts by detecting the type of handshake clients use.

WebSockets work by first performing an HTTP “upgrade” request to your server. If your server has WebSocket support, it will perform the WebSocket handshake and a connection will be initiated. Included in the upgrade request is information about the origin domain (where the request is coming from). Clients can make WebSocket connections to any domain—it’s the server that decides which clients can connect, often by using a whitelist of allowed domains.

From conception, WebSockets were designed to work well with firewalls and proxies, using popular ports and HTTP headers for the initial connection. However, things rarely work out so simply in the wild Web. Some proxies change the WebSockets upgrade headers, breaking them. Others don’t allow long-lived connections and will time out after a while. In fact, the most recent update to the protocol draft (version 76) unintentionally broke compatibility with reverse-proxies and gateways. There are a few steps you can take to give your WebSockets the best chance of success:

  • Use secured WebSocket connections (wss). Proxies won’t meddle with encrypted connections, and you get the added advantage that the data is safe from eavesdroppers.

  • Use a TCP load balancer in front of your WebSocket servers, rather than an HTTP one. Consider an HTTP balancer only if it actively advertises WebSocket support.

  • Don’t assume that if a browser has WebSocket support, it will work. Instead, time out connections if they aren’t established quickly, gracefully degrading to a different transport like Comet or polling.

So, what server options are there? Luckily, there is a multitude of implementations in languages like Ruby, Python, and Java. Make sure any implementation supports at least draft 76 of the protocol, as this is most common in clients.

Node.js and Socket.IO

Node.js is the newest kid on the block, but one of the most exciting. Node.js is an evented JavaScript server, powered by Google’s V8 JS engine. As such, it’s incredibly fast and is great for services that have a large number of connected clients, like a WebSocket server.

Socket.IO is a Node.js library for WebSockets. What’s interesting, though, is that it goes far beyond that. Here’s a blurb from its site:

Socket.IO aims to make real-time apps possible in every browser and mobile device, blurring the differences between the different transport mechanisms.

Socket.IO will try and use WebSockets if they’re supported, but it will fall back to other transports if necessary. The list of supported transports is very comprehensive and offers a lot of browser compatibility.

  • WebSocket

  • Adobe Flash Socket

  • ActiveX HTMLFile (IE)

  • XHR with multipart encoding

  • XHR with long-polling

  • JSONP polling (for cross-domain)

Socket.IO’s browser support is brilliant. Server push can be notoriously difficult to implement, but the Socket.IO team has gone through all that pain for you, ensuring compatibility with most browsers. As such, it works in the following browsers:

  • Safari >= 4

  • Chrome >= 5

  • IE >= 6

  • iOS

  • Firefox >= 3

  • Opera >= 10.61

Although the server side to Socket.IO was initially written for Node.js, there are now implementations in other languages, like Ruby (Rack), Python (Tornado), Java, and Google Go.

A quick look at the API will demonstrate how simple and straightforward it is. The client-side API looks very similar to the WebSocket one:

var socket = new io.Socket(); 

socket.on("connect", function(){ 
  socket.send('hi!'); 
});

socket.on("message", function(data){ 
  alert(data);
});

socket.on("disconnect", function(){});

Behind the scenes, Socket.IO will work out the best transport to use. As written in its readme file, Socket.IO is “making creating real-time apps that work everywhere a snap.”

If you’re looking for something a bit higher level than Socket.IO, you may be interested in Juggernaut, which builds upon it. Juggernaut has a channel interface: clients can subscribe to channels and servers can publish to them, i.e.—the PubSub pattern. The library can manage scaling, publishing to specific clients, TLS, and more.

If you need hosted solutions, look no further than Pusher. Pusher lets you leave behind the hassle of managing your own server so that you can concentrate on the fun part: developing web applications. For clients, it is as simple as including a JavaScript file in the page and subscribing to a channel. When it comes to publishing messages, it’s just a case of sending an HTTP request to their REST API.

Real-Time Architecture

It’s all very well being able to push data to clients in theory, but how does that integrate with a JavaScript application? Well, if your application is modeled correctly, it’s actually remarkably straightforward. We’re going to go through all the steps involved in making your application real time, specifically following the PubSub pattern. The first thing to understand is the process that updates go through to reach clients.

A real-time architecture is event-driven. Typically, events are driven by user interaction: a user changes a record and events are propagated throughout the system until data is pushed to connected clients, updating them. When you’re thinking about making your application real time, you need to consider two things:

  • Which models need to be real time?

  • When those models’ instances change, which users need notifying?

It may be that when a model changes, you want to send notifications to all connected clients. This would be the case for a real-time activity feed on the home page, for example, where every client saw the same information. However, the common use case is when you have a resource associated with a particular set of users. You need to notify those users of that resource change.

Let’s consider an example scenario of a chat room:

  1. A user posts a new message to the room.

  2. An Ajax request is sent off to the server, and a Chat record is created.

  3. Save callbacks fire on the Chat model, invoking our method to update clients.

  4. We search for all users associated with the Chat record’s room—these are the ones we need to notify.

  5. An update detailing what’s happened (Chat record created) is pushed to the relevant users.

The process details are specific to your chosen backend. However, if you’re using Rails, Holla is a good example. When Message records are created, the JuggernautObserver updates relevant clients.

That brings us to the next question: how can we send notifications to specific users? Well, an excellent way of doing so is with the PubSub pattern: clients subscribe to particular channels and servers publish to those channels. A user just subscribes to a unique channel containing an identifier, perhaps the user’s database ID; then, the server simply needs to publish to that unique channel to send notifications to that specific user.

For example, a particular user could subscribe to the following channel:

/observer/0765F0ED-96E6-476D-B82D-8EBDA33F4EC4

where the random set of digits is a unique identifier for the currently logged-in user. To send notifications to that particular user, the server just needs to publish to that same channel.

You may be wondering how the PubSub pattern works with transports like WebSockets and Comet. Fortunately, there are already a lot of solutions, such as Juggernaut and Pusher, both mentioned previously. PubSub is a common abstraction on top of WebSockets, and its API should be fairly similar to whatever service or library you end up choosing.

Once notifications have been pushed to clients, you’ll see the real beauty of the MVC architecture. Let’s go back to our chat example. The notification we sent out to clients could look like this.

{
  "klass":  "Chat",
  "type":   "create",
  "id":     "3",
  "record": {"body": "New chat"}
}

It contains the model that’s changed, the type of change, and any relevant attributes. Using this, our client can create a new Chat record locally. As the client’s models are bound to the UI, the interface is automatically updated to reflect the new chat message.

What’s brilliant is that none of this is specific to the Chat model. If we want to make another model real time, it’s just a case of adding another observer server side, making sure clients are updated when it changes. Our backend and client-side models are now tied together. Any changes to the backend models get automatically propagated to all the relevant clients, updating their UI. With this architecture, the application is truly real time. Any interaction a user makes is instantly broadcast to other users.

Perceived Speed

Speed is a critical but often neglected part of UI design because it makes a huge difference to the user experience (UX) and can have a direct impact on revenue. Companies, such as the following, are studying its implications all the time:

Amazon

100 ms of extra load time caused a 1% drop in sales (source: Greg Linden, Amazon).

Google

500 ms of extra load time caused 20% fewer searches (source: Marrissa Mayer, Google).

Yahoo!

400 ms of extra load time caused a 5–9% increase in the number of people who clicked “back” before the page even loaded (source: Nicole Sullivan, Yahoo!).

Perceived speed is just as important as actual speed because this is what users are going to notice. So, the key is to make users think an application is fast, even if in reality it isn’t. The ability to do this is one of the benefits of JavaScript applications—UI doesn’t block, even if a background request is taking a while.

Let’s take the chat room scenario again. A user sends a new message, firing off an Ajax request to the server. We could wait until the message performs a roundtrip through the server and clients before appending it to the chat log. However, that would introduce a couple of seconds’ latency between the time a user submitted a new message and when it appeared in her chat log. The application would seem slow, which would definitely hurt the user experience.

Instead, why not create the new message locally, thereby immediately adding it to the chat log? From a user’s perspective, it seems like the message has been sent instantly. Users won’t know (or care) that the message hasn’t yet been delivered to other clients in the chat room. They’ll just be happy with a fast and snappy user experience.

Aside from interactions, one of the slowest parts of Web applications is loading in new data. It’s important to do intelligent preloading to try to[ predict what a user will need before he actually asks for it. Then, cache the data in memory; if the user needs it subsequently, you shouldn’t have to request it again from the server. Upon startup, the application should preload commonly used data. Users are more likely to be forgiving of slower initial load times than once the application’s loaded.

You should always give users feedback when they interact with your application, usually with some sort of visual indicator. In business jargon this is called expectation managment, making sure clients are aware of a project’s status and ETA. The same applies to UX—users will be more patient if they’re given an indication that something’s happening. While users are waiting for new data, show them a message or a spinner. If a file’s being uploaded, show a progress bar and an estimated duration. All this gives a perception of speed, improving the user experience.