HTTP

One of the core tasks of Node.js is to act as a web server. This is such a key part of the system that when Ryan Dahl started the project, he rewrote the HTTP stack for V8 to make it nonblocking. Although both the API and the internals for the original HTTP implementation have morphed a lot since it was created, the core activities are still the same. The Node implementation of HTTP is nonblocking and fast. Much of the code has moved from C into JavaScript.

HTTP uses a pattern that is common in Node. Pseudoclass factories provide an easy way to create a new server.^[7] The http.createServer() method provides us with a new instance of the HTTP Server class, which is the class we use to define the actions taken when Node receives incoming HTTP requests. There are a few other main pieces of the HTTP module and other Node modules in general. These are the events the Server class fires and the data structures that are passed to the callbacks. Knowing about these three types of class allows you to use the HTTP module well.

HTTP Servers

Acting as an HTTP server is probably the most common current use case for Node. In Chapter 1, we set up an HTTP server and used it to serve a very simple request. However, HTTP is a lot more multifaceted than that. The server component of the HTTP module provides the raw tools to build complex and comprehensive web servers. In this chapter, we are going to explore the mechanics of dealing with requests and issuing responses. Even if you end up using a higher-level server such as Express, many of the concepts it uses are extensions of those defined here.

As we’ve already seen, the first step in using HTTP servers is to create a new server using the http.createServer() method. This returns a new instance of the Server class, which has only a few methods because most of the functionality is going to be provided through using events. The http server class has six events and three methods. The other thing to notice is how most of the methods are used to initialize the server, whereas events are used during its operation.

Let’s start by creating the smallest basic HTTP server code we can in Example 4-7.

Example 4-7. A simple, and very short, HTTP server

require('http').createServer(function(req,res){res.writeHead(200, {}); 
res.end('hello world');}).listen(8125);

This example is not good code. However, it illustrates some important points. We’ll fix the style shortly. The first thing we do is require the http module. Notice how we can chain methods to access the module without first assigning it to a variable. Many things in Node return a function,^[8] which allows us to invoke those functions immediately. From the included http module, we call createServer. This doesn’t have to take any arguments, but we pass it a function to attach to the request event. Finally, we tell the server created with createServer to listen on port 8125.

We hope you never write code like this in real situations, but it does show the flexibility of the syntax and the potential brevity of the language. Let’s be a lot more explicit about our code. The rewrite in Example 4-8 should make it a lot easier to understand and maintain.

Example 4-8. A simple, but more descriptive, HTTP server

var http = require('http');
var server = http.createServer();
var handleReq = function(req,res){
  res.writeHead(200, {});
  res.end('hello world');
};
server.on('request', handleReq);
server.listen(8125);

This example implements the minimal web server again. However, we’ve started assigning things to named variables. This not only makes the code easier to read than when it’s chained, but also means you can reuse it. For example, it’s not uncommon to use http more than once in a file. You want to have both an HTTP server and an HTTP client, so reusing the module object is really helpful. Even though JavaScript doesn’t force you to think about memory, that doesn’t mean you should thoughtlessly litter unnecessary objects everywhere. So rather than use an anonymous callback, we’ve named the function that handles the request event. This is less about memory usage and more about readability. We’re not saying you shouldn’t use anonymous functions, but if you can lay out your code so it’s easy to find, that helps a lot when maintaining it.

Note

Remember to look at Part I of the book for more help with programming style. Chapters 1 and 2 deal with programming style in particular.

Because we didn’t pass the request event listener as part of the factory method for the http Server object, we need to add an event listener explicitly. Calling the on method from EventEmitter does this. Finally, as with the previous example, we call the listen method with the port we want to listen on. The http class provides other functions, but this example illustrates the most important ones.

The http server supports a number of events, which are associated with either the TCP or HTTP connection to the client. The connection and close events indicate the buildup or teardown of a TCP connection to a client. It’s important to remember that some clients will be using HTTP 1.1, which supports keepalive. This means that their TCP connections may remain open across multiple HTTP requests.

The request, checkContinue, upgrade, and clientError events are associated with HTTP requests. We’ve already used the request event, which signals a new HTTP request.

The checkContinue event indicates a special event. It allows you to take more direct control of an HTTP request in which the client streams chunks of data to the server. As the client sends data to the server, it will check whether it can continue, at which point this event will fire. If an event handler is created for this event, the request event will not be emitted.

The upgrade event is emitted when a client asks for a protocol upgrade. The http server will deny HTTP upgrade requests unless there is an event handler for this event.

Finally, the clientError event passes on any error events sent by the client.

The HTTP server can throw a few events. The most common one is request, but you can also get events associated with the TCP connection for the request as well as other parts of the request life cycle.

When a new TCP stream is created for a request, a connection event is emitted. This event passes the TCP stream for the request as a parameter. The stream is also available as a request.connection variable for each request that happens through it. However, only one connection event will be emitted for each stream. This means that many requests can happen from a client with only one connection event.

HTTP Clients

Node is also great when you want to make outgoing HTTP connections. This is useful in many contexts, such as using web services, connecting to document store databases, or just scraping websites. You can use the same http module when doing HTTP requests, but should use the http.ClientRequest class. There are two factory methods for this class: a general-purpose one and a convenience method. Let’s take a look at the general-purpose case in Example 4-9.

Example 4-9. Creating an HTTP request

var http = require('http');

var opts = {
  host: 'www.google.com'
  port: 80,
  path: '/',
  method: 'GET'
};

var req = http.request(opts, function(res) {
  console.log(res);
  res.on('data', function(data) {
    console.log(data);
  });
});

req.end();

The first thing you can see is that an options object defines a lot of the functionality of the request. We must provide the host name (although an IP address is also acceptable), the port, and the path. The method is optional and defaults to a value of GET if none is specified. In essence, the example is specifying that the request should be an HTTP GET request to http://www.google.com/ on port 80.

The next thing we do is use the options object to construct an instance of http.ClientRequest using the factory method http.request(). This method takes an options object and an optional callback argument. The passed callback listens to the response event, and when a response event is received, we can process the results of the request. In the previous example, we simply output the response object to the console. However, it’s important to notice that the body of the HTTP request is actually received via a stream in the response object. Thus, you can subscribe to the data event of the response object to get the data as it becomes available (see the section Readable streams for more information).

The final important point to notice is that we had to end() the request. Because this was a GET request, we didn’t write any data to the server, but for other HTTP methods, such as PUT or POST, you may need to. Until we call the end() method, request won’t initiate the HTTP request, because it doesn’t know whether it should still be waiting for us to send data.

Making HTTP GET requests

Since GET is such a common HTTP use case, there is a special factory method to support it in a more convenient way, as shown in Example 4-10.

Example 4-10. Simple HTTP GET requests

var http = require('http');

var opts = {
  host: 'www.google.com'
  port: 80,
  path: '/',
};

var req = http.get(opts, function(res) {
  console.log(res);
  res.on('data', function(data) {
    console.log(data);
  });
});

This example of http.get() does exactly the same thing as the previous example, but it’s slightly more concise. We’ve lost the method attribute of the config object, and left out the call request.end() because it’s implied.

If you run the previous two examples, you are going to get back raw Buffer objects. As described later in this chapter, a Buffer is a special class defined in Node to support the storage of arbitrary, binary data. Although it’s certainly possible to work with these, you often want a specific encoding, such as UTF-8 (an encoding for Unicode characters). You can specify this with the response.setEncoding() method (see Example 4-11).

Example 4-11. Comparing raw Buffer output to output with a specified encoding

> var http = require('http');
> var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) { 
... console.log(res); 
... res.on('data', function(c) { console.log(c); }); 
... });
> <Buffer 3c 21 64 6f 63 74 79 70

...

65 2e 73 74>
<Buffer 61 72 74 54 69

...

69 70 74 3e>

>
> var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) { 
... res.setEncoding('utf8'); 
... res.on('data', function(c) { console.log(c); }); 
... });
> <!doctype html><html><head><meta http-equiv="content-type

...

load.t.prt=(f=(new Date).getTime());
})();
</script>

>

In the first case, we do not pass ClientResponse.setEncoding(), and we get chunks of data in Buffers. Although the output is abridged in the printout, you can see that it isn’t just a single Buffer, but that several Buffers have been returned with data. In the second example, the data is returned as UTF-8 because we specified res.setEncoding('utf8'). The chunks of data returned from the server are still the same, but are given to the program as strings in the correct encoding rather than as raw Buffers. Although the printout may not make this clear, there is one string for each of the original Buffers.

Uploading data for HTTP POST and PUT

Not all HTTP is GET. You might also need to call POST, PUT, and other HTTP methods that alter data on the other end. This is functionally the same as making a GET request, except you are going to write some data upstream, as shown in Example 4-12.

Example 4-12. Writing data to an upstream service

var options = {
  host: 'www.example.com',
  port: 80,
  path: '/submit',
  method: 'POST'
};

var req = http.request(options, function(res) {
  res.setEncoding('utf8');
  res.on('data', function (chunk) {
    console.log('BODY: ' + chunk);
  });
});

req.write("my data");
req.write("more of my data");

req.end();

This example is very similar to Example 4-10, but uses the http.ClientRequest.write() method. This method allows you to send data upstream, and as explained earlier, it requires you to explicitly call http.ClientRequest.end() to indicate you’re finished sending data. Whenever ClientRequest.write() is called, the data is sent upstream (it isn’t buffered), but the server will not respond until ClientRequest.end() is called.

You can stream data to a server using ClientRequest.write() by coupling the writes to the data event of a Stream. This is ideal if you need to, for example, send a file from disk to a remote server over HTTP.

The ClientResponse object

The ClientResponse object stores a variety of information about the request. In general, it is pretty intuitive. Some of its obvious properties that are often useful include statusCode (which contains the HTTP status) and header (which is the response header object). Also hung off of ClientResponse are various streams and properties that you may or may not want to interact with directly.

URL

The URL module provides tools for easily parsing and dealing with URL strings. It’s extremely useful when you have to deal with URLs. The module offers three methods: parse, format, and resolve. Let’s start by looking at Example 4-13, which demonstrates parse using Node REPL.

Example 4-13. Parsing a URL using the URL module

> var URL = require('url');
> var myUrl = "http://www.nodejs.org/some/url/?with=query&param=that&are=awesome
#alsoahash";
> myUrl
'http://www.nodejs.org/some/url/?with=query&param=that&are=awesome#alsoahash'
> parsedUrl = URL.parse(myUrl);
{ href: 'http://www.nodejs.org/some/url/?with=query&param=that&are=awesome#alsoahash'
, protocol: 'http:'
, slashes: true
, host: 'www.nodejs.org'
, hostname: 'www.nodejs.org'
, hash: '#alsoahash'
, search: '?with=query&param=that&are=awesome'
, query: 'with=query&param=that&are=awesome'
, pathname: '/some/url/'
}
> parsedUrl = URL.parse(myUrl, true);
{ href: 'http://www.nodejs.org/some/url/?with=query&param=that&are=awesome#alsoahash'
, protocol: 'http:'
, slashes: true
, host: 'www.nodejs.org'
, hostname: 'www.nodejs.org'
, hash: '#alsoahash'
, search: '?with=query&param=that&are=awesome'
, query:
   { with: 'query'
   , param: 'that'
   , are: 'awesome'
   }, pathname: '/some/url/'
}
>

The first thing we do, of course, is require the URL module. Note that the names of modules are always lowercase. We’ve created a url as a string containing all the parts that will be parsed out. Parsing is really easy: we just call the parse method from the URL module on the string. It returns a data structure representing the parts of the parsed URL. The components it produces are:

href
protocol
host
auth
hostname
port
pathname
search
query
hash

The href is the full URL that was originally passed to parse. The protocol is the protocol used in the URL (e.g., http://, https://, ftp://, etc.). host is the fully qualified hostname of the URL. This could be as simple as the hostname for a local server, such as print server, or a fully qualified domain name such as www.google.com. It might also include a port number, such as 8080, or username and password credentials like un:pw@ftpserver.com. The various parts of the hostname are broken down further into auth, containing just the user credentials; port, containing just the port; and hostname, containing the hostname portion of the URL. An important thing to know about hostname is that it is still the full hostname, including the top-level domain (TLD; e.g., .com, .net, etc.) and the specific server. If the URL were http://sport.yahoo.com/nhl, hostname would not give you just the TLD (yahoo.com) or just the host (sport), but the entire hostname (sport.yahoo.com). The URL module doesn’t have the capability to split the hostname down into its components, such as domain or TLD.

The next set of components of the URL relates to everything after the host. The pathname is the entire filepath after the host. In http://sports.yahoo.com/nhl, it is /nhl. The next component is the search component, which stores the HTTP GET parameters in the URL. For example, if the URL were http://mydomain.com/?foo=bar&baz=qux, the search component would be ?foo=bar&baz=qux. Note the inclusion of the ?. The query parameter is similar to the search component. It contains one of two things, depending on how parse was called.

parse takes two arguments: the url string and an optional Boolean that determines whether the queryString should be parsed using the querystring module, discussed in the next section. If the second argument is false, query will just contain a string similar to that of search but without the leading ?. If you don’t pass anything for the second argument, it defaults to false.

The final component is the fragment portion of the URL. This is the part of the URL after the #. Commonly, this is used to refer to named anchors in HTML pages. For instance, http://abook.com/#chapter2 might refer to the second chapter on a web page hosting a whole book. The hash component in this case would contain #chapter2. Again, note the included # in the string. Some sites, such as http://twitter.com, use more complex fragments for AJAX applications, but the same rules apply. So the URL for the Twitter mentions account, http://twitter.com/#!/mentions, would have a pathname of / but a hash of #!/mentions.

querystring

The querystring module is a very simple helper module to deal with query strings. As discussed in the previous section, query strings are the parameters encoded at the end of a URL. However, when reported back as just a JavaScript string, the parameters are fiddly to deal with. The querystring module provides an easy way to create objects from the query strings. The main methods it offers are parse and decode, but some internal helper functions, —such as escape, unescape, unescapeBuffer, encode, and stringify, are also exposed. If you have a query string, you can use parse to turn it into an object, as shown in Example 4-14.

Example 4-14. Parsing a query string with the querystring module in Node REPL

> var qs = require('querystring');
> qs.parse('a=1&b=2&c=d');
{ a: '1', b: '2', c: 'd' }
>

Here, the class’s parse function turns the query string into an object in which the properties are the keys and the values correspond to the ones in the query string. You should notice a few things, though. First, the numbers are returned as strings, not numbers. Because JavaScript is loosely typed and will coerce a string into a number in a numerical operation, this works pretty well. However, it’s worth bearing in mind for those times when that coercion doesn’t work.

Additionally, it’s important to note that you must pass the query string without the leading ? that demarks it in the URL. A typical URL might look like http://www.bobsdiscount.com/?item=304&location=san+francisco. The query string starts with a ? to indicate where the filepath ends, but if you include the ? in the string you pass to parse, the first key will start with a ?, which is almost certainly not what you want.

This library is really useful in a bunch of contexts because query strings are used in situations other than URLs. When you get content from an HTTP POST that is x-form-encoded, it will also be in query string form. All the browser manufacturers have standardized around this approach. By default, forms in HTML will send data to the server in this way also.

The querystring module is also used as a helper module to the URL module. Specifically, when decoding URLs, you can ask URL to turn the query string into an object for you rather than just a string. That’s described in more detail in the previous section, but the parsing that is done uses the parse method from querystring.

Another important part of querystring is encode (Example 4-15). This function takes a query string’s key-value pair object and stringifies it. This is really useful when you’re working with HTTP requests, especially POST data. It makes it easy to work with a JavaScript object until you need to send the data over the wire and then simply encode it at that point. Any JavaScript object can be used, but ideally you should use an object that has only the data that you want in it because the encode method will add all properties of the object. However, if the property value isn’t a string, Boolean, or number, it won’t be serialized and the key will just be included with an empty value.

Example 4-15. Encoding an object into a query string

> var myObj = {'a':1, 'b':5, 'c':'cats', 'func': function(){console.log('dogs')}}
> qs.encode(myObj);
'a=1&b=5&c=cats&func='
>

^[7]When we talk about a pseudoclass, we are referring to the definition found in Douglas Crockford’s JavaScript: The Good Parts (O’Reilly). From now on, we will use “class” to refer to a “pseudoclass.”

^[8]This works in JavaScript because it supports first-class functions.

Table of Contents for Node: Up and Running