Learning Node

Chapter 3. The Node Core

Chapter 1 provided a first look at a Node application with the traditional (and always entertaining) Hello, World application. The examples in the chapter made use of a couple of modules from what is known as the Node core: the API providing much of the functionality necessary for building Node applications.

In this chapter, I’m going to provide more detail on the Node core system. It’s not an exhaustive overview, since the API is quite large and dynamic in nature. Instead, we’ll focus on key elements of the API, and take a closer look at those that we’ll use in later chapters and/or are complex enough to need a more in-depth review.

Topics covered in this chapter include:

Node global objects, such as global, process, and Buffer
The timer methods, such as setTimeout
A quick overview of socket and stream modules and functionality
The Utilities object, especially the part it plays in Node inheritance
The EventEmitter object and events

Note

Node.js documentation for the current stable release is available at http://nodejs.org/api/.

Globals: global, process, and Buffer

There are several objects available to all Node applications without the user having to incorporate any module. The Node.js website groups these items under the descriptive label of globals.

We’ve been using one global, require, to include modules into our applications. We’ve also made extensive use of another global, console, to log messages to the console. Other globals are essential to the underlying implementation of Node, but aren’t necessarily anything we’d access or need to know about directly. Some, though, are important enough for us to take a closer look at, because they help define key aspects of how Node works.

In particular, we’re going to explore:

The global object—that is, the global namespace
The process object, which provides essential functionality, such as wrappers for the three STDIO (Standard IO) streams, and functionality to transform a synchronous function into an asynchronous callback
The Buffer class, a global object that provides raw data storage and manipulation
Child processes
Modules useful for domain resolution and URL processing

global

global is the global namespace object. In some ways, it’s similar to windows in a browser environment, in that it provides access to global properties and methods and doesn’t have to be explicitly referenced by name.

From REPL, you can print out the global object to the console like so:

> console.log(global)

What prints out is the interface for all of the other global objects, as well as a good deal of information about the system in which you’re running.

I mentioned that global is like the windows object in a browser, but there are key differences—and not just the methods and properties available. The windows object in a browser is truly global in nature. If you define a global variable in client-side JavaScript, it’s accessible by the web page and by every single library. However, if you create a variable at the top-level scope in a Node module (a variable outside a function), it only becomes global to the module, not to all of the modules.

You can actually see what happens to the global object when you define a module/global variable in REPL. First, define the top-level variable:

> var test = "This really isn't global, as we know global";

Then print out global:

> console.log(global);

You should see your variable, as a new property of global, at the bottom. For another interesting perspective, assign global to a variable, but don’t use the var keyword:

gl = global;

The global object interface is printed out to the console, and at the bottom you’ll see the local variable assigned as a circular reference:

> gl = global;
...
  gl: [Circular],
  _: [Circular] }

Any other global object or method, including require, is part of the global object’s interface.

When Node developers discuss context, they’re really referring to the global object. In Example 2-1 in Chapter 2, the code accessed the context object when creating a custom REPL object. The context object is a global object. When an application creates a custom REPL, it exists within a new context, which in this case means it has its own global object. The way to override this and use the existing global object is to create a custom REPL and set the useGlobal flag to true, rather than the default of false.

Modules exist in their own global namespace, which means that if you define a top-level variable in one module, it is not available in other modules. More importantly, it means that only what is explicitly exported from the module becomes part of whatever application includes the module. In fact, you can’t access a top-level module variable in an application or other module, even if you deliberately try.

To demonstrate, the following code contains a very simple module that has a top-level variable named globalValue, and functions to set and return the value. In the function that returns the value, the global object is printed out using a console.log method call.

var globalValue;

exports.setGlobal = function(val) {
   globalValue = val;
};

exports.returnGlobal = function() {
   console.log(global);
   return globalValue;
};

We might expect that in the printout of the global object we’ll see globalValue, as we do when we set a variable in our applications. This doesn’t happen, though.

Start a REPL session and issue a require call to include the new module:

> var mod1 = require('./mod1.js');

Set the value and then ask for the value back:

> mod1.setGlobal(34);
> var val = mod1.returnGlobal();

The console.log method prints out the global object before returning its globally defined value. We can see at the bottom the new variable holding a reference to the imported module, but val is undefined because the variable hasn’t yet been set. In addition, the output includes no reference to that module’s own top-level globalValue:

  mod1: { setGlobal: [Function], returnGlobal: [Function] },
  _: undefined,
  val: undefined }

If we ran the command again, then the outer application variable would be set, but we still wouldn’t see globalValue:

  mod1: { setGlobal: [Function], returnGlobal: [Function] },
  _: undefined,
  val: 34 }

The only access we have to the module data is by whatever means the module provides. For JavaScript developers, this means no more unexpected and harmful data collisions because of accidental or intentional global variables in libraries.

process

Each Node application is an instance of a Node process object, and as such, comes with certain built-in functionality.

Many of the process object’s methods and properties provide identification or information about the application and its environment. The process.execPath method returns the execution path for the Node application; process.version provides the Node version; and process.platform identifies the server platform:

console.log(process.execPath);
console.log(process.version);
console.log(process.platform);

This code returns the following in my system (at the time of this writing):

/usr/local/bin/node
v0.6.9
linux

The process object also wraps the STDIO streams stdin, stdout, and stderr. Both stdin and stdout are asynchronous, and are readable and writable, respectively. stderr, however, is a synchronous, blocking stream.

To demonstrate how to read and write data from stdin and stdout, in Example 3-1 the Node application listens for data in stdin, and repeats the data to stdout. The stdin stream is paused by default, so we have to issue a resume call before sending data.

Example 3-1. Reading and writing data to stdin and stdout, respectively

process.stdin.resume();

process.stdin.on('data', function (chunk) {
  process.stdout.write('data: ' + chunk);
});

Run the application using Node, and then start typing into the terminal. Every time you type something and press Enter, what you typed is reflected back to you.

Another useful process method is memoryUsage, which tells us how much memory the Node application is using. This could be helpful for performance tuning, or just to satisfy your general curiosity about the application. The response has the following structure:

{ rss: 7450624, heapTotal: 2783520, heapUsed: 1375720 }

The heapTotal and heapUsed properties refer to the V8 engine’s memory usage.

The last process method I’m going to cover is process.nextTick. This method attaches a callback function that’s fired during the next tick (loop) in the Node event loop.

You would use process.nextTick if you wanted to delay a function for some reason, but you wanted to delay it asynchronously. A good example would be if you’re creating a new function that has a callback function as a parameter and you want to ensure that the callback is truly asynchronous. The following code is a demonstration:

function asynchFunction = function (data, callback) {
   process.nextTick(function() {
      callback(val);
   });
);

If we just called the callback function, then the action would be synchronous. Now, the callback function won’t be called until the next tick in the event loop, rather than right away.

You could use setTimeout with a zero (0) millisecond delay instead of process.nextTick:

setTimeout(function() {
   callback(val);
}, 0);

However, setTimeout isn’t as efficient as process.nextTick. When they were tested against each other, process.nextTick was called far more quickly than setTimeout with a zero-millisecond delay. You might also use process.nextTick if you’re running an application that has a function performing some computationally complex, and time-consuming, operation. You could break the process into sections, each called via process.nextTick, to allow other requests to the Node application to be processed without waiting for the time-consuming process to finish.

Of course, the converse of this is that you don’t want to break up a process that you need to ensure executes sequentially, because you may end up with unexpected results.

Buffer

The Buffer class, also a global object, is a way of handling binary data in Node. In the section Servers, Streams, and Sockets later in the chapter, I’ll cover the fact that streams are often binary data rather than strings. To convert the binary data to a string, the data encoding for the stream socket is changed using setEncoding.

As a demonstration, you can create a new buffer with the following:

var buf = new Buffer(string);

If the buffer holds a string, you can pass in an optional second parameter with the encoding. Possible encodings are:

ascii: Seven-bit ASCII
utf8: Multibyte encoded Unicode characters
usc2: Two bytes, little-endian-encoded Unicode characters
base64: Base64 encoding
hex: Encodes each byte as two hexadecimal characters

You can also write a string to an existing buffer, providing an optional offset, length, and encoding:

buf.write(string); // offset defaults to 0, length defaults to
                     buffer.length - offset, encoding is utf8

Data sent between sockets is transmitted as a buffer (in binary format) by default. To send a string instead, you either need to call setEncoding directly on the socket, or specify the encoding in the function that writes to the socket. By default, the TCP (Transmission Control Protocol) socket.write method sets the second parameter to utf8, but the socket returned in the connectionListener callback to the TCP createServer function sends the data as a buffer, not a string.

The Timers: setTimeout, clearTimeout, setInterval, and clearInterval

The timer functions in client-side JavaScript are part of the global windows object. They’re not part of JavaScript, but have become such a ubiquitous part of JavaScript development that the Node developers incorporated them into the Node core API.

The timer functions operate in Node just like they operate in the browser. In fact, they operate in Node exactly the same as they would in Chrome, since Node is based on Chrome’s V8 JavaScript engine.

The Node setTimeout function takes a callback function as first parameter, the delay time (in milliseconds) as second parameter, and an optional list of arguments:

// timer to open file and read contents to HTTP response object
function on_OpenAndReadFile(filename, res) {

  console.log('opening ' + filename);

  // open and read in file contents
  fs.readFile(filename, 'utf8', function(err, data) {
     if (err)
         res.write('Could not find or open file for reading\n');
     else {
         res.write(data);
     }
  // reponse is done
  res.end();
}

setTimeout(openAndReadFile, 2000, filename, res);

In the code, the callback function on_OpenAndReadFile opens and reads a file to the HTTP response when the function is called after approximately 2,000 milliseconds have passed.

Warning

As the Node documentation carefully notes, there’s no guarantee that the callback function will be invoked in exactly n milliseconds (whatever n is). This is no different than the use of setTimeout in a browser—we don’t have absolute control over the environment, and factors could slightly delay the timer.

The function clearTimeout clears a preset setTimeout. If you need to have a repeating timer, you can use setInterval to call a function every n milliseconds—n being the second parameter passed to the function. Clear the interval with clearInterval.

Servers, Streams, and Sockets

Much of the Node core API has to do with creating services that listen to specific types of communications. In the examples in Chapter 1, we used the HTTP module to create an HTTP web server. Other methods can create a TCP server, a TLS (Transport Layer Security) server, and a UDP (User Datagram Protocol)/datagram socket. I’ll cover TLS in Chapter 15, but in this section I want to introduce the TCP and UDP Node core functionality. First, though, I’ll offer a brief introduction to the terms used in this section.

A socket is an endpoint in a communication, and a network socket is an endpoint in a communication between applications running on two different computers on the network. The data flows between the sockets in what’s known as a stream. The data in the stream can be transmitted as binary data in a buffer, or in Unicode as a string. Both types of data are transmitted as packets: parts of the data split off into specifically sized pieces. There is a special kind of packet, a finish packet (FIN), that is sent by a socket to signal that the transmission is done. How the communication is managed, and how reliable the stream is, depends on the type of socket created.

TCP Sockets and Servers

We can create a basic TCP server and client with the Node Net module. TCP forms the basis for most Internet applications, such as web service and email. It provides a way of reliably transmitting data between client and server sockets.

Creating the TCP server is a little different than creating the HTTP server in Example 1-1 in Chapter 1. We create the server, passing in a callback function. The TCP server differs from the HTTP server in that, rather than passing a requestListener, the TCP callback function’s sole argument is an instance of a socket listening for incoming connections.

Example 3-2 contains the code to create a TCP server. Once the server socket is created, it listens for two events: when data is received, and when the client closes the connection.

Example 3-2. A simple TCP server, with a socket listening for client communication on port 8124

var net = require('net');

var server = net.createServer(function(conn) {
   console.log('connected');

   conn.on('data', function (data) {
      console.log(data + ' from ' + conn.remoteAddress + ' ' +
        conn.remotePort);
      conn.write('Repeating: ' + data);
   });

   conn.on('close', function() {
        console.log('client closed connection');
   });

}).listen(8124);

console.log('listening on port 8124');

There is an optional parameter for createServer: allowHalfOpen. Setting this parameter to true instructs the socket not to send a FIN when it receives a FIN packet from the client. Doing this keeps the socket open for writing (not reading). To close the socket, you’d then need to explicitly use the end method. By default, allowHalfOpen is false.

Notice how a callback function is attached to the two events via the on method. Many objects in Node that emit events provide a way to attach a function as an event listener by using the on method. This method takes the name of the event as first parameter, and the function listener as the second.

Note

Node objects that inherit from a special object, the EventEmitter, expose the on method event handling, as discussed later in this chapter.

The TCP client is just as simple to create as the server, as shown in Example 3-3. The call to the setEncoding method on the client changes the encoding for the received data. As discussed earlier in the section Buffer, data is transmitted as a buffer, but we can use setEncoding to read it as a utf8 string. The socket’s write method is used to transmit the data. It also attaches listener functions to two events: data, for received data, and close, in case the server closes the connection.

Example 3-3. The client socket sending data to the TCP server

 var net = require('net');

var client = new net.Socket();
client.setEncoding('utf8');

// connect to server
client.connect ('8124','localhost', function () {
    console.log('connected to server');
    client.write('Who needs a browser to communicate?');
});

// prepare for input from terminal
process.stdin.resume();

// when receive data, send to server
process.stdin.on('data', function (data) {
   client.write(data);
});

// when receive data back, print to console
client.on('data',function(data) {
    console.log(data);
});

// when server closed
client.on('close',function() {
    console.log('connection is closed');
});

The data being transmitted between the two sockets is typed in at the terminal, and transmitted when you press Enter. The client application first sends the string you just typed, which the TCP server writes out to the console. The server repeats the message back to the client, which in turn writes the message out to the console. The server also prints out the IP address and port for the client using the socket’s remoteAddress and remotePort properties. Following is the console output for the server after several strings were sent from the client (with the IP address edited out for security):

Hey, hey, hey, hey-now.
 from #ipaddress 57251
Don't be mean, we don't have to be mean.
 from #ipaddress 57251
Cuz remember, no matter where you go,
 from #ipaddress 57251
there you are.
 from #ipaddress 57251

The connection between the client and server is maintained until you kill one or the other using Ctrl-C. Whichever socket is still open receives a close event that’s printed out to the console. The server can also serve more than one connection from more than one client, since all the relevant functions are asynchronous.

As I mentioned earlier, TCP is the underlying transport mechanism for much of the functionality we use on the Internet today, including HTTP, which we’ll cover next.

HTTP

You had a chance to work with the HTTP module in Chapter 1. We created servers using the createServer method, passing in the function that will act as the requestListener. Requests are processed as they come, asynchronously.

In a network, TCP is the transportation layer and HTTP is the application layer. If you scratch around in the modules included with Node, you’ll see that when you create an HTTP server, you’re inheriting functionality from the TCP-based net.Server.

For the HTTP server, the requestListener is a socket, while the http.ServerRequest object is a readable stream and the http.ServerResponse is a writable stream. HTTP adds another level of complexity because of the chunked transfer encoding it supports. The chunked transfer encoding allows transfer of data when the exact size of the response isn’t known until it’s fully processed. Instead, a zero-sized chunk is sent to indicate the end of a query. This type of encoding is useful when you’re processing a request such as a large database query output to an HTML table: writing the data can begin before the rest of the query data has been received.

Note

More on streams in the upcoming section titled, appropriately enough, Streams, Pipes, and Readline.

The TCP examples earlier in this chapter, and the HTTP examples in Chapter 1, were both coded to work with network sockets. However, all of the server/socket modules can also connect to a Unix socket, rather than a specific network port. Unlike a network socket, a Unix or IPC (interprocess communication) socket enables communication between processes within the same system.

To demonstrate Unix socket communication, I duplicated Example 1-3’s code, but instead of binding to a port, the new server binds to a Unix socket, as shown in Example 3-4. The application also makes use of readFileSync, the synchronous version of the function to open a file and read its contents.

Example 3-4. HTTP server bound to a Unix socket

// create server
// and callback function
var http = require('http');
var fs = require('fs');

http.createServer(function (req, res) {

  var query = require('url').parse(req.url).query;
  console.log(query);
  file = require('querystring').parse(query).file;

  // content header
  res.writeHead(200, {'Content-Type': 'text/plain'});

  // increment global, write to client
  for (var i = 0; i<100; i++) {
    res.write(i + '\n');
  }

  // open and read in file contents
  var data = fs.readFileSync(file, 'utf8');
  res.write(data);
  res.end();
}).listen('/tmp/node-server-sock');

The client is based on a code sample provided in the Node core documentation for the http.request object at the Node.js site. The http.request object, by default, makes use of http.globalAgent, which supports pooled sockets. The size of this pool is five sockets by default, but you can adjust it by changing the agent.maxSockets value.

The client accepts the chunked data returned from the server, printing out to the console. It also triggers a response on the server with a couple of minor writes, as shown in Example 3-5.

Example 3-5. Connecting to the Unix socket and printing out received data

var http = require('http');

var options = {
   method: 'GET',
   socketPath: '/tmp/node-server-sock',
   path: "/?file=main.txt"
};

var req = http.request(options, function(res) {
  console.log('STATUS: ' + res.statusCode);
  console.log('HEADERS: ' + JSON.stringify(res.headers));
  res.setEncoding('utf8');
  res.on('data', function (chunk) {
    console.log('chunk o\' data: ' + chunk);
  });
});

req.on('error', function(e) {
  console.log('problem with request: ' + e.message);
});

// write data to request body
req.write('data\n');
req.write('data\n');
req.end();

I didn’t use the asynchronous file read function with the http.request object because the connection is already closed when the asynchronous function is called and no file contents are returned.

Before leaving this section on the HTTP module, be aware that much of the behavior you’ve come to expect with Apache or other web servers isn’t built into a Node HTTP server. For instance, if you password-protect your website, Apache will pop up a window asking for your username and password; a Node HTTP server will not. If you want this functionality, you’re going to have to code for it.

Note

Chapter 15 covers the SSL version of HTTP, HTTPS, along with Crypto and TLS/SSL.

UDP/Datagram Socket

TCP requires a dedicated connection between the two endpoints of the communication. UDP is a connectionless protocol, which means there’s no guarantee of a connection between the two endpoints. For this reason, UDP is less reliable and robust than TCP. On the other hand, UDP is generally faster than TCP, which makes it more popular for real-time uses, as well as technologies such as VoIP (Voice over Internet Protocol), where the TCP connection requirements could adversely impact the quality of the signal.

Node core supports both types of sockets. In the last couple of sections, I demonstrated the TCP functionality. Now, it’s UDP’s turn.

The UDP module identifier is dgram:

require ('dgram');

To create a UDP socket, use the createSocket method, passing in the type of socket—either udp4 or udp6. You can also pass in a callback function to listen for events. Unlike messages sent with TCP, messages sent using UDP must be sent as buffers, not strings.

Example 3-6 contains the code for a demonstration UDP client. In it, data is accessed via process.stdin, and then sent, as is, via the UDP socket. Note that we don’t have to set the encoding for the string, since the UDP socket accepts only a buffer, and the process.stdin data is a buffer. We do, however, have to convert the buffer to a string, using the buffer’s toString method, in order to get a meaningful string for the console.log method call that echoes the input.

Example 3-6. A datagram client that sends messages typed into the terminal

var dgram = require('dgram');

var client = dgram.createSocket("udp4");

// prepare for input from terminal
process.stdin.resume();

process.stdin.on('data', function (data) {
   console.log(data.toString('utf8'));
   client.send(data, 0, data.length, 8124, "examples.burningbird.net",
      function (err, bytes) {
        if (err)
          console.log('error: ' + err);
        else
          console.log('successful');
   });
});

The UDP server, shown in Example 3-7, is even simpler than the client. All the server application does is create the socket, bind it to a specific port (8124), and listen for the message event. When a message arrives, the application prints it out using console.log, along with the IP address and port of the sender. Note especially that no encoding is necessary to print out the message—it’s automatically converted from a buffer to a string.

We didn’t have to bind the socket to a port. However, without the binding, the socket would attempt to listen in on every port.

Example 3-7. A UDP socket server, bound to port 8124, listening for messages

var dgram = require('dgram');

var server = dgram.createSocket("udp4");

server.on ("message", function(msg, rinfo) {
   console.log("Message: " + msg + " from " + rinfo.address + ":"
                + rinfo.port);
});

server.bind(8124);

I didn’t call the close method on either the client or the server after sending/receiving the message. However, no connection is being maintained between the client and server—just the sockets capable of sending a message and receiving communication.

Streams, Pipes, and Readline

The communication stream between the sockets discussed in the previous sections is an implementation of the underlying abstract stream interface. Streams can be readable, writable, or both, and all streams are instances of EventEmitter, discussed in the upcoming section Events and EventEmitter.

It’s important to take away from this section that all of these communication streams, including process.stdin and process.stdout, are implementations of the abstract stream interface. Because of this underlying interface, there is basic functionality available in all streams in Node:

You can change the encoding for the stream data with setEncoding.
You can check whether the stream is readable, writable, or both.
You can capture stream events, such as data received or connection closed, and attach callback functions for each.
You can pause and resume the stream.
You can pipe data from a readable stream to a writable stream.

The last capability is one we haven’t covered yet. A simple way to demonstrate a pipe is to open a REPL session and type in the following:

> process.stdin.resume();
> process.stdin.pipe(process.stdout);

...and then enjoy the fact that everything you type from that point on is echoed back to you.

If you want to keep the output stream open for continued data, pass an option, { end: false }, to the output stream:

process.stdin.pipe(process.stdout, { end : false });

There is one additional object that provides a specific functionality to readable streams: readline. You include the Readline module with code like the following:

var readline = require('readline');

The Readline module allows line-by-line reading of a stream. Be aware, though, that once you include this module, the Node program doesn’t terminate until you close the interface and the stdin stream. The Node site documentation contains an example of how to begin and terminate a Readline interface, which I adapted in Example 3-8. The application asks a question as soon as you run it, and then outputs the answer. It also listens for any “command,” which is really any line that terminates with \n. If the command is .leave, it leaves the application; otherwise, it just repeats the command and prompts the user for more. A Ctrl-C or Ctrl-D key combination also causes the application to terminate.

Example 3-8. Using Readline to create a simple, command-driven user interface

var readline = require('readline');

// create a new interface
var interface = readline.createInterface(process.stdin, process.stdout, null);

// ask question
interface.question(">>What is the meaning of life?  ", function(answer) {
   console.log("About the meaning of life, you said " + answer);
   interface.setPrompt(">>");
   interface.prompt();
});

// function to close interface
function closeInterface() {
   console.log('Leaving interface...');
   process.exit();
}
// listen for .leave
interface.on('line', function(cmd) {
   if (cmd.trim() == '.leave') {
      closeInterface();
      return;
   } else {
      console.log("repeating command: " + cmd);
   }
   interface.setPrompt(">>");
   interface.prompt();
});

interface.on('close', function() {
    closeInterface();
});

Here’s an example session:

>>What is the meaning of life?  ===
About the meaning of life, you said ===
>>This could be a command
repeating command: This could be a command
>>We could add eval in here and actually run this thing
repeating command: We could add eval in here and actually run this thing
>>And now you know where REPL comes from
repeating command: And now you know where REPL comes from
>>And that using rlwrap replaces this Readline functionality
repeating command: And that using rlwrap replaces this Readline functionality
>>Time to go
repeating command: Time to go
>>.leave
Leaving interface...

This should look familiar. Remember from Chapter 2 that we can use rlwrap to override the command-line functionality for REPL. We use the following to trigger its use:

env NODE_NO_READLINE=1 rlwrap node

And now we know what the flag is triggering—it’s instructing REPL not to use Node’s Readline module for command-line processing, but to use rlwrap instead.

This is a quick introduction to the Node stream modules. Now it’s time to change course, and check out Node’s child processes.

Child Processes

Operating systems provide access to a great deal of functionality, but much of it is only accessible via the command line. It would be nice to be able to access this functionality from a Node application. That’s where child processes come in.

Node enables us to run a system command within a new child process, and listen in on its input/output. This includes being able to pass arguments to the command, and even pipe the results of one command to another. The next several sections explore this functionality in more detail.

Warning

All but the last example demonstrated in this section use Unix commands. They work on a Linux system, and should work in a Mac. They won’t, however, work in a Windows Command window.

child_process.spawn

There are four different techniques you can use to create a child process. The most common one is using the spawn method. This launches a command in a new process, passing in any arguments. In the following, we create a child process to call the Unix pwd command to print the current directory. The command takes no arguments:

var spawn = require('child_process').spawn,
    pwd = spawn('pwd');

pwd.stdout.on('data', function (data) {
  console.log('stdout: ' + data);
});

pwd.stderr.on('data', function (data) {
  console.log('stderr: ' + data);
});

pwd.on('exit', function (code) {
  console.log('child process exited with code ' + code);
});

Notice the events that are captured on the child process’s stdout and stderr. If no error occurs, any output from the command is transmitted to the child process’s stdout, triggering a data event on the process. If an error occurs, such as in the following where we’re passing an invalid option to the command:

var spawn = require('child_process').spawn,
    pwd = spawn('pwd', ['-g']);

Then the error gets sent to stderr, which prints out the error to the console:

stderr: pwd: invalid option -- 'g'
Try `pwd --help' for more information.

child process exited with code 1

The process exited with a code of 1, which signifies that an error occurred. The exit code varies depending on the operating system and error. When no error occurs, the child process exits with a code of 0.

The earlier code demonstrated sending output to the child process’s stdout and stderr, but what about stdin? The Node documentation for child processes includes an example of directing data to stdin. It’s used to emulate a Unix pipe (|) whereby the result of one command is immediately directed as input to another command. I adapted the example in order to demonstrate one of my favorite uses of the Unix pipe—being able to look through all subdirectories, starting in the local directory, for a file with a specific word (in this case, test) in its name:

find . -ls | grep test

Example 3-9 implements this functionality as child processes. Note that the first command, which performs the find, takes two arguments, while the second one takes just one: a term passed in via user input from stdin. Also note that, unlike the example in the Node documentation, the grep child process’s stdout encoding is changed via setEncoding. Otherwise, when the data is printed out, it would be printed out as a buffer.

Example 3-9. Using child processes to find files in subdirectories with a given search term, “test”

var spawn = require('child_process').spawn,
    find = spawn('find',['.','-ls']),
    grep = spawn('grep',['test']);

grep.stdout.setEncoding('utf8');

// direct results of find to grep
find.stdout.on('data', function(data) {
   grep.stdin.write(data);
});

// now run grep and output results
grep.stdout.on('data', function (data) {
  console.log(data);
});

// error handling for both
find.stderr.on('data', function (data) {
  console.log('grep stderr: ' + data);
});
grep.stderr.on('data', function (data) {
  console.log('grep stderr: ' + data);
});

// and exit handling for both
find.on('exit', function (code) {
  if (code !== 0) {
    console.log('find process exited with code ' + code);
  }

  // go ahead and end grep process
  grep.stdin.end();
});

grep.on('exit', function (code) {
  if (code !== 0) {
    console.log('grep process exited with code ' + code);
  }
});

When you run the application, you’ll get a listing of all files in the current directory and any subdirectories that contain test in their filename.

All of the example applications up to this point work the same in Node 0.8 as in Node 0.6. Example 3-9 is an exception because of a change in the underlying API.

In Node 0.6, the exit event would not be emitted until the child process exits and all STDIO pipes are closed. In Node 0.8, the event is emitted as soon as the child process finishes. This causes the application to crash, because the grep child process’s STDIO pipe is closed when it tries to process its data. For the application to work in Node 0.8, the application needs to listen for the close event on the find child process, rather than the exit event:

// and exit handling for both
find.on('close', function (code) {
  if (code !== 0) {
    console.log('find process exited with code ' + code);
  }

  // go ahead and end grep process
  grep.stdin.end();
});

In Node 0.8, the close event is emitted when the child process exits and all STDIO pipes are closed.

child_process.exec and child_process.execFile

In addition to spawning a child process, you can also use child_process.exec and child_process.execFile to run a command in a shell and buffer the results. The only difference between child_process.exec and child_process.execFile is that execFile runs an application in a file, rather than running a command.

The first parameter in the two methods is either the command or the file and its location; the second parameter is options for the command; and the third is a callback function. The callback function takes three arguments: error, stdout, and stderr. The data is buffered to stdout if no error occurs.

If the executable file contains:

#!/usr/local/bin/node

console.log(global);

the following application prints out the buffered results:

var execfile = require('child_process').execFile,
    child;

child = execfile('./app.js', function(error, stdout, stderr) {
  if (error == null) {
    console.log('stdout: ' + stdout);
  }
});

child_process.fork

The last child process method is child_process.fork. This variation of spawn is for spawning Node processes.

What sets the child_process.fork process apart from the others is that there’s an actual communication channel established to the child process. Note, though, that each process requires a whole new instance of V8, which takes both time and memory.

Note

The Node documentation for fork provides several good examples of its use.

Running a Child Process Application in Windows

Earlier I warned you that child processes that invoke Unix system commands won’t work with Windows, and vice versa. I know this sounds obvious, but not everyone knows that, unlike with JavaScript in browsers, Node applications can behave differently in different environments.

It wasn’t until recently that the Windows binary installation of Node even provided access to child processes. You also need to invoke whatever command you want to run via the Windows command interpreter, cmd.exe.

Example 3-10 demonstrates running a Windows command. In the application, Windows cmd.exe is used to create a directory listing, which is then printed out to the console via the data event handler.

Example 3-10. Running a child process application in Windows

var cmd = require('child_process').spawn('cmd', ['/c', 'dir\n']);

cmd.stdout.on('data', function (data) {
    console.log('stdout: ' + data);
});

cmd.stderr.on('data', function (data) {
    console.log('stderr: ' + data);
});

cmd.on('exit', function (code) {
    console.log('child process exited with code ' + code);
});

The /c flag passed as the first argument to cmd.exe instructs it to carry out the command and then terminate. The application doesn’t work without this flag. You especially don’t want to pass in the /K flag, which tells cmd.exe to execute the application and then remain because your application won’t terminate.

Note

I provide more demonstrations of child processes in Chapter 9 and Chapter 12.

Domain Resolution and URL Processing

The DNS module provides DNS resolution using c-ares, a C library that provides asynchronous DNS requests. It’s used by Node with some of its other modules, and can be useful for applications that need to discover domains or IP addresses.

To discover the IP address given a domain, use the dns.lookup method and print out the returned IP address:

var dns = require('dns');
dns.lookup('burningbird.net',function(err,ip) {
   if (err) throw err;
   console.log(ip);
});

The dns.reverse method returns an array of domain names for a given IP address:

dns.reverse('173.255.206.103', function(err,domains) {
domains.forEach(function(domain) {
  console.log(domain);
  });
});

The dns.resolve method returns an array of record types by a given type, such as A, MX, NS, and so on. In the following code, I’m looking for the name server domains for my domain name, burningbird.net:

var dns = require('dns');
dns.resolve('burningbird.net', 'NS', function(err,domains) {
domains.forEach(function(domain) {
  console.log(domain);
  });
});

This returns:

ns1.linode.com
ns3.linode.com
ns5.linode.com
ns4.linode.com

We used the URL module in Example 1-3 in Chapter 1. This simple module provides a way of parsing a URL and returning an object with all of the URL components. Passing in the following URL:

var url = require('url');
var urlObj = url.parse('http://examples.burningbird.net:8124/?file=main');

returns the following JavaScript object:

{ protocol: 'http:',
  slashes: true,
  host: 'examples.burningbird.net:8124',
  port: '8124',
  hostname: 'examples.burningbird.net',
  href: 'http://examples.burningbird.net:8124/?file=main',
  search: '?file=main',
  query: 'file=main',
  pathname: '/',
  path: '/?file=main' }

Each of the components can then be discretely accessed like so:

var qs = urlObj.query; // get the query string

Calling the URL.format method performs the reverse operation:

console.log(url.format(urlObj)); // returns original URL

The URL module is often used with the Query String module. The latter module is a simple utility module that provides functionality to parse a received query string, or prepare a string for use as a query string.

To chunk out the key/value pairs in the query string, use the querystring.parse method. The following:

var vals = querystring.parse('file=main&file=secondary&type=html");

results in a JavaScript object that allows for easy access of the individual query string values:

{ file: [ 'main', 'secondary' ], type: 'html' }

Since file is given twice in the query string, both values are grouped into an array, each of which can be accessed individually:

console.log(vals.file[0]); // returns main

You can also convert an object into a query string, using querystring.stringify:

var qryString = querystring.stringify(vals)

The Utilities Module and Object Inheritance

The Utilities module provides several useful functions. You include this module with:

var util = require('util');

You can use Utilities to test if an object is an array (util.isArray) or regular expression (util.isRegExp), and to format a string (util.format). A new experimental addition to the module provides functionality to pump data from a readable stream to a writable stream (util.pump):

util.pump(process.stdin, process.stdout);

However, I wouldn’t type this into REPL, as anything you type from that point on is echoed as soon as you type it—making the session a little awkward.

I make extensive use of util.inspect to get a string representation of an object. I find it’s a great way to discover more about an object. The first required argument is the object; the second optional argument is whether to display the nonenumerable properties; the third optional argument is the number of times the object is recursed (depth); and the fourth, also optional, is whether to style the output in ANSI colors. If you assign a value of null to the depth, it recurses indefinitely (the default is two times)—as much as necessary to exhaustively inspect the object. From experience, I’d caution you to be careful using null for the depth because you’re going to get a large output.

You can use util.inspect in REPL, but I recommend a simple application, such as the following:

var util = require('util');
var jsdom = require('jsdom');

console.log(util.inspect(jsdom, true, null, true));

When you run it, pipe the result to a file:

node inspectjsdom.js > jsdom.txt

Now you can inspect and reinspect the object interface at your leisure. Again, if you use null for depth, expect a large output file.

The Utilities module provides several other methods, but the one you’re most likely to use is util.inherits. The util.inherits function takes two parameters, constructor and superConstructor. The result is that the constructor will inherit the functionality from the superconstructor.

Example 3-11 demonstrates all the nuances associated with using util.inherits. The explanation of the code follows.

Note

Example 3-11 and its explanation cover some core JavaScript functionality you might already be familiar with. However, it’s important that all readers come away from this section with the same understanding of what’s happening.

Example 3-11. Enabling object inheritance via the util.inherits method

var util = require('util');

// define original object
function first() {
  var self = this;
  this.name = 'first';
  this.test = function() {
    console.log(self.name);
   };
}

first.prototype.output = function() {
   console.log(this.name);
}

// inherit from first
function second() {
  second.super_.call(this);
  this.name = 'second';
}
util.inherits(second,first);

var two = new second();


function third(func) {
   this.name = 'third';
   this.callMethod = func;
}

var three = new third(two.test);

// all three should output "second"
two.output();
two.test();
three.callMethod();

The application creates three objects named first, second, and third, respectively.

The first object has two methods: test and output. The test method is defined directly in the object, while the output method is added later via the prototype object. The reason I used both techniques for defining a method on the object is to demonstrate an important aspect of inheritance with util.inherits (well, of JavaScript, but enabled by util.inherits).

The second object contains the following line:

  second.super_.call(this);

If we eliminate this line from the second object constructor, any call to output on the second object would succeed, but a call to test would generate an error and force the Node application to terminate with a message about test being undefined.

The call method chains the constructors between the two objects, ensuring that the superconstructor is invoked as well as the constructor. The superconstructor is the constructor for the inherited object.

We need to invoke the superconstructor since the test method doesn’t exist until first is created. However, we didn’t need the call method for the output method, because it’s defined directly on the first object’s prototype object. When the second object inherits properties from the first, it also inherits the newly added method.

If we look under the hood of util.inherits, we see where super_ is defined:

exports.inherits = function(ctor, superCtor) {
  ctor.super_ = superCtor;
  ctor.prototype = Object.create(superCtor.prototype, {
    constructor: {
      value: ctor,
      enumerable: false,
      writable: true,
      configurable: true
    }
  });
};

super_ is assigned as a property to the second object when util.inherits is called:

util.inherits (second, first);

The third object in the application, third, also has a name property. It doesn’t inherit from either first or second, but does expect a function passed to it when it’s instantiated. This function is assigned to its own callMethod property. When the code creates an instance of this object, the two object instance’s test method is passed to the constructor:

var three = new third(two.test);

When three.callMethod is called, “second” is output, not “third” as you might expect at first glance. And that’s where the self reference in the first object comes in.

In JavaScript, this is the object context, which can change as a method is passed around, or passed to an event handler. The only way you can preserve data for an object’s method is to assign this to an object variable—in this case, self—and then use the variable within any functions in the object.

Running this application results in the following output:

second
second
second

Much of this is most likely familiar to you from client-side JavaScript development, though it’s important to understand the Utilities part in the inheritance. The next section, which provides an overview of Node’s EventEmitter, features functionality that is heavily dependent on the inheritance behavior just described.

Events and EventEmitter

Scratch underneath the surface of many of the Node core objects, and you’ll find EventEmitter. Anytime you see an object emit an event, and an event handled with on, you’re seeing EventEmitter in action. Understanding how EventEmitter works and how to use it are two of the more important components of Node development.

The EventEmitter object is what provides the asynchronous event handling to objects in Node. To demonstrate its core functionality, we’ll try a quick test application.

First, include the Events module:

var events = require('events');

Next, create an instance of EventEmitter:

var em = new events.EventEmitter();

Use the newly created EventEmitter instance to do two essential tasks: attach an event handler to an event, and emit the actual event. The on event handler is triggered when a specific event is emitted. The first parameter to the method is the name of the event, the second a function to process the event:

em.on('someevent', function(data) { ... });

The event is emitted on the object, based on some criteria, via the emit method:

if (somecriteria) {
   en.emit('data');
}

In Example 3-12, we create an EventEmitter instance that emits an event, timed, every three seconds. In the event handler function for this event, a message with a counter is output to the console.

Example 3-12. Very basic test of the EventEmitter functionality

var eventEmitter = require('events').EventEmitter;
var counter = 0;

var em = new eventEmitter();

setInterval(function() { em.emit('timed', counter++); }, 3000);

em.on('timed', function(data) {
  console.log('timed ' + data);
});

Running the application outputs timed event messages to the console until the application is terminated.

This is an interesting example, but not particularly helpful. What we need is the ability to add EventEmitter functionality to our existing objects—not use instances of EventEmitter throughout our applications.

To add this necessary EventEmitter functionality to an object, use the util.inherits method, described in the preceding section:

util.inherits(someobj, EventEmitter);

By using util.inherits with the object, you can call the emit method within the object’s methods, and code event handlers on the object instances:

someobj.prototype.somemethod = function() { this.emit('event'); };
...
someobjinstance.on('event', function() { });

Rather than attempt to decipher how EventEmitter works in the abstract sense, let’s move on to Example 3-13, which shows a working example of an object inheriting EventEmitter’s functionality. In the application, a new object, inputChecker, is created. The constructor takes two values, a person’s name and a filename. It assigns the person’s name to an object variable, and also creates a reference to a writable stream using the File System module’s createWriteStream method (for more on the File System module, see the sidebar Readable and Writable Stream).

The Node File System module (fs) enables us to open a file for reading and writing, to watch specific files for new activity, and to manipulate directories. It also provides us with readable and writable stream capability.

You create a readable stream using fs.createReadStream, passing in the name and path for the file and other options. You create a writable stream with fs.createWriteStream, also passing in a filename and path.

You’d use a writable and readable stream over the more traditional read and write methods in situations when you’re reading and writing from a file based on events where the reads and writes can occur frequently. The streams are opened in the background, and reads (and writes) are queued.

The object also has a method, check, that checks incoming data for specific commands. One command (wr:) triggers a write event, another (en:) an end event. If no command is present, then an echo event is triggered. The object instance provides event handlers for all three events. It writes to the output file for the write event, it echoes the input for the commandless input, and it terminates the application with an end event, using the process.exit method.

All input comes from standard input (process.stdin).

Example 3-13. Creating an event-based object that inherits from EventEmitter

var util = require('util');
var eventEmitter = require('events').EventEmitter;
var fs = require('fs');

function inputChecker (name, file) {
   this.name = name;
   this.writeStream = fs.createWriteStream('./' + file + '.txt',
      {'flags' : 'a',
       'encoding' : 'utf8',
       'mode' : 0666});
};

util.inherits(inputChecker,eventEmitter);

inputChecker.prototype.check = function check(input) {
  var command = input.toString().trim().substr(0,3);
  if (command == 'wr:') {
     this.emit('write',input.substr(3,input.length));
  } else if (command == 'en:') {
     this.emit('end');
  } else {
     this.emit('echo',input);
  }
};
// testing new object and event handling
var ic = new inputChecker('Shelley','output');

ic.on('write', function(data) {
   this.writeStream.write(data, 'utf8');
});

ic.on('echo', function( data) {
   console.log(this.name + ' wrote ' + data);
});

ic.on('end', function() {
   process.exit();
});

process.stdin.resume();
process.stdin.setEncoding('utf8');
process.stdin.on('data', function(input) {
    ic.check(input);
});

The EventEmitter functionality is bolded in the example. Note that the functionality also includes the process.stdin.on event handler method, since process.stdin is one of the many Node objects that inherit from EventEmitter.

We don’t have to chain the constructors from the new object to EventEmitter, as demonstrated in the earlier example covering util.inherits, because the functionality we need—on and emit—consists of prototype methods, not object instance properties.

The on method is really a shortcut for the EventEmitter.addListener method, which takes the same parameters. So this:

ic.addListener('echo', function( data) {
   console.log(this.name + ' wrote ' + data);
});

is exactly equivalent to:

ic.on('echo', function( data) {
   console.log(this.name + ' wrote ' + data);
});

You can listen only to the first event with:

ic.once(event, function);

When you exceed 10 listeners for an event, you’ll get a warning by default. Use setMaxListeners, passing in a number, to change the number of listeners. Use a value of zero (0) for an unlimited amount of listeners.

Many of the core Node objects, as well as third-party modules, make use of EventEmitter. In Chapter 4, I’ll demonstrate how to convert the code in Example 3-13 into a module.

Previous Chapter

2. Interactive Node with REPL

Next Chapter

4. The Node Module System

Table of Contents for Learning Node

Chapter 3. The Node Core

Note

Globals: global, process, and Buffer

global

process

Buffer

The Timers: setTimeout, clearTimeout, setInterval, and clearInterval

Warning

Servers, Streams, and Sockets

TCP Sockets and Servers

Note

HTTP

Note

Note

UDP/Datagram Socket

Streams, Pipes, and Readline

Child Processes

Warning

child_process.spawn

child_process.exec and child_process.execFile

child_process.fork

Note

Running a Child Process Application in Windows

Note

Domain Resolution and URL Processing

The Utilities Module and Object Inheritance

Note

Events and EventEmitter

Table of Contents for
Learning Node