One of the core tasks of Node.js is to act as a web server. This is such a key part of the system that when Ryan Dahl started the project, he rewrote the HTTP stack for V8 to make it nonblocking. Although both the API and the internals for the original HTTP implementation have morphed a lot since it was created, the core activities are still the same. The Node implementation of HTTP is nonblocking and fast. Much of the code has moved from C into JavaScript.
HTTP uses a pattern that is common in Node.
Pseudoclass factories provide an easy way to create a new server.[7] The http.createServer()
method provides us with a new instance of the HTTP
Server class, which is the class we use to define the
actions taken when Node receives incoming HTTP requests. There are a few
other main pieces of the HTTP module and other Node modules in general.
These are the events the Server class
fires and the data structures that are passed to the callbacks. Knowing
about these three types of class allows you to use the HTTP module
well.
Acting as an HTTP server is probably the most common current use case for Node. In Chapter 1, we set up an HTTP server and used it to serve a very simple request. However, HTTP is a lot more multifaceted than that. The server component of the HTTP module provides the raw tools to build complex and comprehensive web servers. In this chapter, we are going to explore the mechanics of dealing with requests and issuing responses. Even if you end up using a higher-level server such as Express, many of the concepts it uses are extensions of those defined here.
As we’ve already seen, the first step in
using HTTP servers is to create a new server using the http.createServer() method. This returns a new instance of the Server class, which
has only a few methods because most of the functionality is going to be
provided through using events. The http server class has six events and three
methods. The other thing to notice is how most of the methods are used
to initialize the server, whereas events are used during its
operation.
Let’s start by creating the smallest basic HTTP server code we can in Example 4-7.
This example is not
good code. However, it illustrates some important points. We’ll fix the
style shortly. The first thing we do is require the http module. Notice how we can chain methods
to access the module without first assigning it to a variable. Many
things in Node return a function,[8] which allows us to invoke those functions immediately.
From the included http module, we
call createServer. This doesn’t have
to take any arguments, but we pass it a function to attach to the
request event. Finally, we tell the
server created with createServer to
listen on port 8125.
We hope you never write code like this in real situations, but it does show the flexibility of the syntax and the potential brevity of the language. Let’s be a lot more explicit about our code. The rewrite in Example 4-8 should make it a lot easier to understand and maintain.
This example implements the minimal web
server again. However, we’ve started assigning things to named
variables. This not only makes the code easier to read than when it’s
chained, but also means you can reuse it. For example, it’s not uncommon
to use http more than once in a file.
You want to have both an HTTP server and an HTTP client, so reusing the
module object is really helpful. Even though JavaScript doesn’t force
you to think about memory, that doesn’t mean you should thoughtlessly
litter unnecessary objects everywhere. So rather than use an anonymous
callback, we’ve named the function that handles the request event. This is less about memory usage
and more about readability. We’re not saying you shouldn’t use anonymous
functions, but if you can lay out your code so it’s easy to find, that
helps a lot when maintaining it.
Remember to look at Part I of the book for more help with programming style. Chapters 1 and 2 deal with programming style in particular.
Because we didn’t pass the request event listener as part of the factory
method for the http Server object, we
need to add an event listener explicitly. Calling the on method from EventEmitter does this. Finally, as with the
previous example, we call the listen method with the port we want to
listen on. The http class provides
other functions, but this example illustrates the most important
ones.
The http
server supports a number of events, which are associated with either the
TCP or HTTP connection to the client. The connection and close events indicate the buildup or teardown of a TCP connection to a
client. It’s important to remember that some clients will be using HTTP
1.1, which supports keepalive. This means that their TCP connections may
remain open across multiple HTTP requests.
The request, checkContinue, upgrade, and clientError events are associated with HTTP requests. We’ve already used the
request event, which signals a new
HTTP request.
The checkContinue event indicates a special event.
It allows you to take more direct control of an HTTP request in which
the client streams chunks of data to the server. As the client sends
data to the server, it will check whether it can continue, at which
point this event will fire. If an event handler is created for this
event, the request event will
not be emitted.
The upgrade event is emitted when a client asks
for a protocol upgrade. The http
server will deny HTTP upgrade requests unless there is an event handler
for this event.
Finally, the clientError event passes on any error events
sent by the client.
The HTTP server can throw a few events. The
most common one is request, but you
can also get events associated with the TCP connection for the request as well as
other parts of the request life cycle.
When a new TCP stream is created for a
request, a connection event is
emitted. This event passes the TCP stream for the request as a
parameter. The stream is also available as a request.connection variable for each request
that happens through it. However, only one connection event will be emitted for each
stream. This means that many requests
can happen from a client with only one connection
event.
Node is also great when you want to make
outgoing HTTP connections. This is useful in many contexts, such as
using web services, connecting to document store databases, or just
scraping websites. You can use the same http module when doing HTTP requests, but
should use the http.ClientRequest
class. There are two factory methods for this class: a
general-purpose one and a convenience method. Let’s take a look at the
general-purpose case in Example 4-9.
The first thing you can see is that an
options object defines a lot of the
functionality of the request. We must provide the host name (although an IP address is also
acceptable), the port, and the
path. The method is optional and defaults to a value of
GET if none is specified. In essence,
the example is specifying that the request should be an HTTP GET request to http://www.google.com/ on port 80.
The next thing
we do is use the options object to
construct an instance of http.ClientRequest using
the factory method http.request().
This method takes an options
object and an optional callback argument. The passed callback listens to
the response event, and when a response
event is received, we can process the results of the request. In the
previous example, we simply output the response object to the console.
However, it’s important to notice that the body of the HTTP request is
actually received via a stream in the response object. Thus, you can subscribe to
the data event of the response object to get the data as it becomes
available (see the section Readable streams for more
information).
The final important point to notice is that
we had to end() the request. Because this was a GET request, we didn’t write any data to the
server, but for other HTTP methods,
such as PUT or POST, you may need to. Until we call the
end() method, request won’t initiate the HTTP request, because it doesn’t know whether
it should still be waiting for us to send data.
Since GET is such a common HTTP use case, there is a special factory method to support it in
a more convenient way, as shown in Example 4-10.
This example of http.get() does exactly the same thing as
the previous example, but it’s slightly more concise. We’ve lost the
method attribute of the config
object, and left out the call request.end() because it’s implied.
If you run the previous two examples, you
are going to get back raw Buffer
objects. As described later in this chapter, a Buffer is a special
class defined in Node to support the storage of arbitrary, binary
data. Although it’s certainly possible to work with these, you often
want a specific encoding, such as UTF-8 (an encoding for Unicode
characters). You can specify this with the response.setEncoding() method (see Example 4-11).
Example 4-11. Comparing raw Buffer output to output with a specified encoding
> var http = require('http');
> var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) {
... console.log(res);
... res.on('data', function(c) { console.log(c); });
... });
> <Buffer 3c 21 64 6f 63 74 79 70
...
65 2e 73 74>
<Buffer 61 72 74 54 69
...
69 70 74 3e>
>
> var req = http.get({host:'www.google.com', port:80, path:'/'}, function(res) {
... res.setEncoding('utf8');
... res.on('data', function(c) { console.log(c); });
... });
> <!doctype html><html><head><meta http-equiv="content-type
...
load.t.prt=(f=(new Date).getTime());
})();
</script>
>In the first case, we do not pass ClientResponse.setEncoding(), and we get
chunks of data in Buffers. Although
the output is abridged in the printout, you can see that it isn’t just
a single Buffer, but that several
Buffers have been returned with
data. In the second example, the data is returned as UTF-8 because we
specified res.setEncoding('utf8').
The chunks of data returned from the server are still the same, but
are given to the program as strings
in the correct encoding rather than as raw Buffers. Although the printout may not make
this clear, there is one string for
each of the original Buffers.
Not all HTTP is GET. You might also need to call POST,
PUT, and other HTTP methods that alter data on the other
end. This is functionally the same as making a GET request, except you are going to write
some data upstream, as shown in Example 4-12.
This example
is very similar to Example 4-10, but uses
the http.ClientRequest.write() method. This
method allows you to send data upstream, and as explained earlier, it
requires you to explicitly call http.ClientRequest.end() to indicate
you’re finished sending data. Whenever ClientRequest.write() is called, the data is
sent upstream (it isn’t buffered), but the server will not respond
until ClientRequest.end() is
called.
You can stream data to a server using
ClientRequest.write() by coupling
the writes to the data event of a
Stream. This is ideal if you need
to, for example, send a file from disk to a remote server over
HTTP.
The ClientResponse object stores a variety of information about the request. In general,
it is pretty intuitive. Some of its obvious properties that are often
useful include statusCode (which contains the HTTP
status) and header (which is
the response header object). Also hung off of ClientResponse are various streams and
properties that you may or may not want to interact with directly.
The URL
module provides tools for easily parsing and dealing with URL
strings. It’s extremely useful when you have to deal with URLs. The
module offers three methods: parse,
format, and resolve. Let’s start by looking at Example 4-13,
which demonstrates parse
using Node REPL.
Example 4-13. Parsing a URL using the URL module
> var URL = require('url');
> var myUrl = "http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome
#alsoahash";
> myUrl
'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash'
> parsedUrl = URL.parse(myUrl);
{ href: 'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash'
, protocol: 'http:'
, slashes: true
, host: 'www.nodejs.org'
, hostname: 'www.nodejs.org'
, hash: '#alsoahash'
, search: '?with=query¶m=that&are=awesome'
, query: 'with=query¶m=that&are=awesome'
, pathname: '/some/url/'
}
> parsedUrl = URL.parse(myUrl, true);
{ href: 'http://www.nodejs.org/some/url/?with=query¶m=that&are=awesome#alsoahash'
, protocol: 'http:'
, slashes: true
, host: 'www.nodejs.org'
, hostname: 'www.nodejs.org'
, hash: '#alsoahash'
, search: '?with=query¶m=that&are=awesome'
, query:
{ with: 'query'
, param: 'that'
, are: 'awesome'
}, pathname: '/some/url/'
}
>The first thing we do, of course, is require
the URL module. Note that the names
of modules are always lowercase. We’ve created a url as a string containing all the parts that
will be parsed out. Parsing is really easy: we just call the parse method from the URL module on the string. It returns a data
structure representing the parts of the parsed URL. The components it
produces are:
The href
is the full URL that was originally
passed to parse. The protocol is the
protocol used in the URL (e.g.,
http://, https://, ftp://, etc.). host is the fully qualified hostname of the
URL. This could be as simple as the
hostname for a local server, such as print
server, or a fully qualified domain name such as www.google.com. It might also include a port
number, such as 8080, or username and
password credentials like un:pw@ftpserver.com. The various parts of the
hostname are broken down further into auth, containing just the user credentials;
port, containing just the port; and
hostname, containing the hostname
portion of the URL. An important
thing to know about hostname is that
it is still the full hostname, including the top-level domain (TLD;
e.g., .com, .net, etc.) and the specific server. If the
URL were http://sport.yahoo.com/nhl, hostname would not give you just the TLD
(yahoo.com) or just the host
(sport), but the entire hostname
(sport.yahoo.com). The URL module doesn’t have the capability to
split the hostname down into its components, such as domain or
TLD.
The next set of components of the URL
relates to everything after the host.
The pathname is the entire filepath
after the host. In http://sports.yahoo.com/nhl, it is /nhl. The next component is the search component, which stores the HTTP GET parameters in the URL. For example,
if the URL were http://mydomain.com/?foo=bar&baz=qux, the
search component would be ?foo=bar&baz=qux. Note the inclusion of
the ?. The query parameter is similar to the search component. It contains one of two
things, depending on how parse was
called.
parse
takes two arguments: the url string
and an optional Boolean that determines whether the queryString should be parsed using the
querystring module, discussed in the
next section. If the second argument is false, query will just contain a string similar to
that of search but without the
leading ?. If you don’t pass anything
for the second argument, it defaults to false.
The final component is the fragment portion of the URL. This is the part
of the URL after the #. Commonly,
this is used to refer to named anchors in HTML pages. For instance, http://abook.com/#chapter2 might refer to the
second chapter on a web page hosting a whole book. The hash component in this case would contain
#chapter2. Again, note the included
# in the string. Some sites, such as
http://twitter.com, use more complex
fragments for AJAX applications, but the same rules apply. So the URL
for the Twitter mentions account, http://twitter.com/#!/mentions, would have a
pathname of / but a hash of #!/mentions.
The querystring module is a very simple helper module to deal with query strings.
As discussed in the previous section, query strings are the parameters
encoded at the end of a URL. However, when reported back as just a
JavaScript string, the parameters are fiddly to deal with. The querystring module provides an easy way to
create objects from the query strings. The main methods it offers are parse and
decode, but some internal helper
functions, —such as escape,
unescape, unescapeBuffer, encode, and stringify, are also exposed. If you have a
query string, you can use parse to
turn it into an object, as shown in Example 4-14.
Here, the class’s parse function turns the query string into an
object in which the properties are the keys and the values correspond to
the ones in the query string. You should notice a few things, though.
First, the numbers are returned as strings, not numbers. Because
JavaScript is loosely typed and will coerce a string into a number in a
numerical operation, this works pretty well. However, it’s worth bearing
in mind for those times when that coercion doesn’t work.
Additionally, it’s important to note that
you must pass the query string without the leading ? that demarks it in the URL. A typical URL
might look like http://www.bobsdiscount.com/?item=304&location=san+francisco.
The query string starts with a ? to
indicate where the filepath ends, but if you include the ? in the string you pass to parse, the first key will start with a
?, which is almost certainly not what
you want.
This library is really useful in a bunch of
contexts because query strings are used in situations other than URLs.
When you get content from an HTTP
POST that is x-form-encoded, it
will also be in query string form. All the browser manufacturers have
standardized around this approach. By default, forms in HTML will send
data to the server in this way also.
The querystring module is also used as a helper
module to the URL module.
Specifically, when decoding URLs, you can ask URL to turn the query string into an object
for you rather than just a string. That’s described in more detail in
the previous section, but the parsing that is done uses the parse method from querystring.
Another important part of querystring is encode (Example 4-15).
This function takes a query string’s key-value pair object and
stringifies it. This is really useful when you’re working with HTTP requests, especially POST data. It makes it easy to work with a
JavaScript object until you need to send the data over the wire and then
simply encode it at that point. Any JavaScript object can be used, but
ideally you should use an object that has only the data that you want in
it because the encode method will add
all properties of the object. However, if the property value isn’t a
string, Boolean, or number, it won’t be serialized and the key will just
be included with an empty value.
[7] When we talk about a pseudoclass, we are referring to the definition found in Douglas Crockford’s JavaScript: The Good Parts (O’Reilly). From now on, we will use “class” to refer to a “pseudoclass.”
[8] This works in JavaScript because it supports first-class functions.