Chapter 1 provided a first look at a Node application with the traditional (and always entertaining) Hello, World application. The examples in the chapter made use of a couple of modules from what is known as the Node core: the API providing much of the functionality necessary for building Node applications.
In this chapter, I’m going to provide more detail on the Node core system. It’s not an exhaustive overview, since the API is quite large and dynamic in nature. Instead, we’ll focus on key elements of the API, and take a closer look at those that we’ll use in later chapters and/or are complex enough to need a more in-depth review.
Topics covered in this chapter include:
Node global objects, such as global, process, and Buffer
The timer methods, such as setTimeout
A quick overview of socket and stream modules and functionality
The Utilities object,
especially the part it plays in Node inheritance
The EventEmitter object and
events
Node.js documentation for the current stable release is available at http://nodejs.org/api/.
There are several objects available to all Node applications without the user having to incorporate any module. The Node.js website groups these items under the descriptive label of globals.
We’ve been using one global, require, to include modules into our
applications. We’ve also made extensive use of another global, console, to log messages to the console. Other
globals are essential to the underlying implementation of Node, but aren’t
necessarily anything we’d access or
need to know about directly. Some, though, are important enough for us to
take a closer look at, because they help define key aspects of how Node
works.
In particular, we’re going to explore:
The global object—that is,
the global namespace
The process object, which
provides essential functionality, such as wrappers for the three
STDIO (Standard IO) streams, and functionality to
transform a synchronous function into an asynchronous callback
The Buffer class, a global
object that provides raw data storage and manipulation
Child processes
Modules useful for domain resolution and URL processing
global is the global
namespace object. In some ways, it’s similar to
windows in a browser environment, in
that it provides access to global properties and methods and doesn’t
have to be explicitly referenced by name.
From REPL, you can print out the global object to the console like so:
> console.log(global)
What prints out is the interface for all of the other global objects, as well as a good deal of information about the system in which you’re running.
I mentioned that global is like
the windows object in a browser, but
there are key differences—and not just the methods and properties
available. The windows object in a
browser is truly global in nature. If you define a global variable in
client-side JavaScript, it’s accessible by the web page and by every
single library. However, if you create a variable at the top-level scope
in a Node module (a variable outside a function), it only becomes global
to the module, not to all of the modules.
You can actually see what happens to the global object when you define a module/global
variable in REPL. First, define the top-level variable:
> var test = "This really isn't global, as we know global";
Then print out global:
> console.log(global);
You should see your variable, as a new property of global, at the bottom. For another interesting
perspective, assign global to a
variable, but don’t use the var
keyword:
gl = global;
The global object interface is
printed out to the console, and at the bottom you’ll see the local
variable assigned as a circular reference:
> gl = global; ... gl: [Circular], _: [Circular] }
Any other global object or method, including require, is part of the global object’s interface.
When Node developers discuss context, they’re really referring to
the global object. In Example 2-1 in Chapter 2, the code accessed the context object when creating a custom REPL
object. The context object is a global object. When an application creates a
custom REPL, it exists within a new context, which in this case means it
has its own global object. The way to
override this and use the existing global object is to create a custom REPL and
set the useGlobal flag to
true, rather than the default of
false.
Modules exist in their own global namespace, which means that if you define a top-level variable in one module, it is not available in other modules. More importantly, it means that only what is explicitly exported from the module becomes part of whatever application includes the module. In fact, you can’t access a top-level module variable in an application or other module, even if you deliberately try.
To demonstrate, the following code contains a very simple module
that has a top-level variable named globalValue, and functions to set and return
the value. In the function that returns the value, the global object is printed out using a
console.log method
call.
var globalValue;
exports.setGlobal = function(val) {
globalValue = val;
};
exports.returnGlobal = function() {
console.log(global);
return globalValue;
};We might expect that in the printout of the global object we’ll see globalValue, as we do when we set a variable
in our applications. This doesn’t happen, though.
Start a REPL session and issue a require call to include the new module:
> var mod1 = require('./mod1.js');Set the value and then ask for the value back:
> mod1.setGlobal(34); > var val = mod1.returnGlobal();
The console.log method
prints out the global object before
returning its globally defined value. We can see at the bottom the new
variable holding a reference to the imported module, but val is undefined because the variable hasn’t yet been
set. In addition, the output includes no reference to that module’s own
top-level globalValue:
mod1: { setGlobal: [Function], returnGlobal: [Function] },
_: undefined,
val: undefined }If we ran the command again, then the outer application variable
would be set, but we still wouldn’t see globalValue:
mod1: { setGlobal: [Function], returnGlobal: [Function] },
_: undefined,
val: 34 }The only access we have to the module data is by whatever means the module provides. For JavaScript developers, this means no more unexpected and harmful data collisions because of accidental or intentional global variables in libraries.
Each Node application is an instance of a Node process object, and as such, comes with
certain built-in functionality.
Many of the process object’s
methods and properties provide identification or information about the
application and its environment. The process.execPath method
returns the execution path for the Node application; process.version
provides the Node version; and process.platform
identifies the server platform:
console.log(process.execPath); console.log(process.version); console.log(process.platform);
This code returns the following in my system (at the time of this writing):
/usr/local/bin/node v0.6.9 linux
The process object also wraps
the STDIO streams stdin, stdout, and stderr. Both stdin and stdout are
asynchronous, and are readable and writable, respectively. stderr, however, is a
synchronous, blocking stream.
To demonstrate how to read and write data from stdin and stdout, in Example 3-1 the Node
application listens for data in stdin, and repeats the
data to stdout. The stdin stream is paused
by default, so we have to issue a resume call before
sending data.
process.stdin.resume();
process.stdin.on('data', function (chunk) {
process.stdout.write('data: ' + chunk);
});Run the application using Node, and then start typing into the terminal. Every time you type something and press Enter, what you typed is reflected back to you.
Another useful process method is
memoryUsage, which
tells us how much memory the Node application is using. This could be
helpful for performance tuning, or just to satisfy your general
curiosity about the application. The response has the following
structure:
{ rss: 7450624, heapTotal: 2783520, heapUsed: 1375720 }The heapTotal and
heapUsed properties
refer to the V8 engine’s memory usage.
The last process method I’m
going to cover is process.nextTick. This
method attaches a callback function that’s fired during the next tick
(loop) in the Node event loop.
You would use process.nextTick if you
wanted to delay a function for some reason, but you wanted to delay it
asynchronously. A good example would be if you’re creating a new
function that has a callback function as a parameter and you want to
ensure that the callback is truly asynchronous. The following code is a
demonstration:
function asynchFunction = function (data, callback) {
process.nextTick(function() {
callback(val);
});
);If we just called the callback function, then the action would be synchronous. Now, the callback function won’t be called until the next tick in the event loop, rather than right away.
You could use setTimeout with a zero
(0) millisecond delay instead of
process.nextTick:
setTimeout(function() {
callback(val);
}, 0);However, setTimeout isn’t as
efficient as process.nextTick. When
they were tested against each other, process.nextTick was
called far more quickly than setTimeout with a
zero-millisecond delay. You might also use process.nextTick if
you’re running an application that has a function performing some
computationally complex, and time-consuming, operation. You could break
the process into sections, each called via process.nextTick, to
allow other requests to the Node application to be processed without
waiting for the time-consuming process to finish.
Of course, the converse of this is that you don’t want to break up a process that you need to ensure executes sequentially, because you may end up with unexpected results.
The Buffer class, also
a global object, is a way of handling binary data in Node. In the
section Servers, Streams, and Sockets later
in the chapter, I’ll cover the fact that streams are often binary data
rather than strings. To convert the binary data to a string, the data
encoding for the stream socket is changed using setEncoding.
As a demonstration, you can create a new buffer with the following:
var buf = new Buffer(string);
If the buffer holds a string, you can pass in an optional second parameter with the encoding. Possible encodings are:
asciiSeven-bit ASCII
utf8Multibyte encoded Unicode characters
usc2Two bytes, little-endian-encoded Unicode characters
base64Base64 encoding
hexEncodes each byte as two hexadecimal characters
You can also write a string to an existing buffer, providing an
optional offset, length, and encoding:
buf.write(string); // offset defaults to 0, length defaults to
buffer.length - offset, encoding is utf8Data sent between sockets is transmitted as a buffer (in binary
format) by default. To send a string instead, you either need to call
setEncoding directly on
the socket, or specify the encoding in the function that writes to the
socket. By default, the TCP (Transmission Control Protocol) socket.write method
sets the second parameter to utf8,
but the socket returned in the connectionListener
callback to the TCP createServer function
sends the data as a buffer, not a string.
The timer functions in client-side JavaScript are part of
the global windows object. They’re not
part of JavaScript, but have become such a ubiquitous part of JavaScript
development that the Node developers incorporated them into the Node core
API.
The timer functions operate in Node just like they operate in the browser. In fact, they operate in Node exactly the same as they would in Chrome, since Node is based on Chrome’s V8 JavaScript engine.
The Node setTimeout function takes
a callback function as first parameter, the delay time (in milliseconds)
as second parameter, and an optional list of arguments:
// timer to open file and read contents to HTTP response object
function on_OpenAndReadFile(filename, res) {
console.log('opening ' + filename);
// open and read in file contents
fs.readFile(filename, 'utf8', function(err, data) {
if (err)
res.write('Could not find or open file for reading\n');
else {
res.write(data);
}
// reponse is done
res.end();
}
setTimeout(openAndReadFile, 2000, filename, res);In the code, the callback function on_OpenAndReadFile opens and reads a file to the
HTTP response when the function is called after approximately 2,000
milliseconds have passed.
As the Node documentation carefully notes, there’s no guarantee
that the callback function will be invoked in exactly
n milliseconds (whatever
n is). This is no different than the use of
setTimeout in a
browser—we don’t have absolute control over the environment, and factors
could slightly delay the timer.
The function clearTimeout clears a
preset setTimeout. If you need
to have a repeating timer, you can use setInterval to call a
function every n
milliseconds—n being the second parameter
passed to the function. Clear the interval with clearInterval.
Much of the Node core API has to do with creating services that listen to specific types of communications. In the examples in Chapter 1, we used the HTTP module to create an HTTP web server. Other methods can create a TCP server, a TLS (Transport Layer Security) server, and a UDP (User Datagram Protocol)/datagram socket. I’ll cover TLS in Chapter 15, but in this section I want to introduce the TCP and UDP Node core functionality. First, though, I’ll offer a brief introduction to the terms used in this section.
A socket is an endpoint in a communication, and a network socket is an endpoint in a communication between applications running on two different computers on the network. The data flows between the sockets in what’s known as a stream. The data in the stream can be transmitted as binary data in a buffer, or in Unicode as a string. Both types of data are transmitted as packets: parts of the data split off into specifically sized pieces. There is a special kind of packet, a finish packet (FIN), that is sent by a socket to signal that the transmission is done. How the communication is managed, and how reliable the stream is, depends on the type of socket created.
We can create a basic TCP server and client with the Node Net module. TCP forms the basis for most Internet applications, such as web service and email. It provides a way of reliably transmitting data between client and server sockets.
Creating the TCP server is a little different than creating the
HTTP server in Example 1-1
in Chapter 1. We create
the server, passing in a callback function. The TCP server differs from
the HTTP server in that, rather than passing a requestListener, the TCP callback function’s
sole argument is an instance of a socket listening for incoming
connections.
Example 3-2 contains the code to create a TCP server. Once the server socket is created, it listens for two events: when data is received, and when the client closes the connection.
var net = require('net');
var server = net.createServer(function(conn) {
console.log('connected');
conn.on('data', function (data) {
console.log(data + ' from ' + conn.remoteAddress + ' ' +
conn.remotePort);
conn.write('Repeating: ' + data);
});
conn.on('close', function() {
console.log('client closed connection');
});
}).listen(8124);
console.log('listening on port 8124');There is an optional parameter for createServer: allowHalfOpen. Setting
this parameter to true instructs the
socket not to send a FIN when it receives a FIN packet from the client.
Doing this keeps the socket open for writing (not reading). To close the
socket, you’d then need to explicitly use the end method. By default,
allowHalfOpen is
false.
Notice how a callback function is attached to the two events via
the on method. Many objects
in Node that emit events provide a way to attach a function as an event
listener by using the on method. This method
takes the name of the event as first parameter, and the function
listener as the second.
Node objects that inherit from a special object, the EventEmitter, expose
the on method event
handling, as discussed later in this chapter.
The TCP client is just as simple to create as the server, as shown
in Example 3-3. The call
to the setEncoding method on
the client changes the encoding for the received data. As discussed
earlier in the section Buffer, data is transmitted as
a buffer, but we can use setEncoding to read it
as a utf8 string. The socket’s
write method is used to
transmit the data. It also attaches listener functions to two events:
data, for received
data, and close, in case the
server closes the connection.
var net = require('net');
var client = new net.Socket();
client.setEncoding('utf8');
// connect to server
client.connect ('8124','localhost', function () {
console.log('connected to server');
client.write('Who needs a browser to communicate?');
});
// prepare for input from terminal
process.stdin.resume();
// when receive data, send to server
process.stdin.on('data', function (data) {
client.write(data);
});
// when receive data back, print to console
client.on('data',function(data) {
console.log(data);
});
// when server closed
client.on('close',function() {
console.log('connection is closed');
});The data being transmitted between the two sockets is typed in at
the terminal, and transmitted when you press Enter. The client
application first sends the string you just typed, which the TCP server
writes out to the console. The server repeats the message back to the
client, which in turn writes the message out to the console. The server
also prints out the IP address and port for the client using the
socket’s remoteAddress and
remotePort properties.
Following is the console output for the server after several strings
were sent from the client (with the IP address edited out for
security):
Hey, hey, hey, hey-now. from #ipaddress 57251 Don't be mean, we don't have to be mean. from #ipaddress 57251 Cuz remember, no matter where you go, from #ipaddress 57251 there you are. from #ipaddress 57251
The connection between the client and server is maintained until
you kill one or the other using Ctrl-C. Whichever socket is still open
receives a close event that’s
printed out to the console. The server can also serve more than one
connection from more than one client, since all the relevant functions
are asynchronous.
As I mentioned earlier, TCP is the underlying transport mechanism for much of the functionality we use on the Internet today, including HTTP, which we’ll cover next.
You had a chance to work with the HTTP module in Chapter 1. We created servers using the
createServer method,
passing in the function that will act as the requestListener.
Requests are processed as they come, asynchronously.
In a network, TCP is the transportation layer and HTTP is the
application layer. If you scratch around in the modules included with
Node, you’ll see that when you create an HTTP server, you’re inheriting
functionality from the TCP-based net.Server.
For the HTTP server, the requestListener is a
socket, while the http.ServerRequest
object is a readable stream and the http.ServerResponse is
a writable stream. HTTP adds another level of complexity because of the
chunked transfer encoding it supports. The chunked
transfer encoding allows transfer of data when the exact size of the
response isn’t known until it’s fully processed. Instead, a zero-sized
chunk is sent to indicate the end of a query. This type of encoding is
useful when you’re processing a request such as a large database query
output to an HTML table: writing the data can begin before the rest of
the query data has been received.
More on streams in the upcoming section titled, appropriately enough, Streams, Pipes, and Readline.
The TCP examples earlier in this chapter, and the HTTP examples in Chapter 1, were both coded to work with network sockets. However, all of the server/socket modules can also connect to a Unix socket, rather than a specific network port. Unlike a network socket, a Unix or IPC (interprocess communication) socket enables communication between processes within the same system.
To demonstrate Unix socket communication, I duplicated Example 1-3’s code, but
instead of binding to a port, the new server binds to a Unix socket, as
shown in Example 3-4. The
application also makes use of readFileSync, the
synchronous version of the function to open a file and read its
contents.
// create server
// and callback function
var http = require('http');
var fs = require('fs');
http.createServer(function (req, res) {
var query = require('url').parse(req.url).query;
console.log(query);
file = require('querystring').parse(query).file;
// content header
res.writeHead(200, {'Content-Type': 'text/plain'});
// increment global, write to client
for (var i = 0; i<100; i++) {
res.write(i + '\n');
}
// open and read in file contents
var data = fs.readFileSync(file, 'utf8');
res.write(data);
res.end();
}).listen('/tmp/node-server-sock');The client is based on a code sample provided in the Node core
documentation for the http.request object at
the Node.js site. The http.request object, by
default, makes use of http.globalAgent, which
supports pooled sockets. The size of this pool is five sockets by
default, but you can adjust it by changing the agent.maxSockets
value.
The client accepts the chunked data returned from the server, printing out to the console. It also triggers a response on the server with a couple of minor writes, as shown in Example 3-5.
var http = require('http');
var options = {
method: 'GET',
socketPath: '/tmp/node-server-sock',
path: "/?file=main.txt"
};
var req = http.request(options, function(res) {
console.log('STATUS: ' + res.statusCode);
console.log('HEADERS: ' + JSON.stringify(res.headers));
res.setEncoding('utf8');
res.on('data', function (chunk) {
console.log('chunk o\' data: ' + chunk);
});
});
req.on('error', function(e) {
console.log('problem with request: ' + e.message);
});
// write data to request body
req.write('data\n');
req.write('data\n');
req.end();I didn’t use the asynchronous file read function with the
http.request object
because the connection is already closed when the asynchronous function
is called and no file contents are returned.
Before leaving this section on the HTTP module, be aware that much of the behavior you’ve come to expect with Apache or other web servers isn’t built into a Node HTTP server. For instance, if you password-protect your website, Apache will pop up a window asking for your username and password; a Node HTTP server will not. If you want this functionality, you’re going to have to code for it.
Chapter 15 covers the SSL version of HTTP, HTTPS, along with Crypto and TLS/SSL.
TCP requires a dedicated connection between the two endpoints of the communication. UDP is a connectionless protocol, which means there’s no guarantee of a connection between the two endpoints. For this reason, UDP is less reliable and robust than TCP. On the other hand, UDP is generally faster than TCP, which makes it more popular for real-time uses, as well as technologies such as VoIP (Voice over Internet Protocol), where the TCP connection requirements could adversely impact the quality of the signal.
Node core supports both types of sockets. In the last couple of sections, I demonstrated the TCP functionality. Now, it’s UDP’s turn.
The UDP module identifier is dgram:
require ('dgram');To create a UDP socket, use the createSocket method,
passing in the type of socket—either udp4 or udp6. You can also pass in a callback function
to listen for events. Unlike messages sent with TCP, messages sent using
UDP must be sent as buffers, not strings.
Example 3-6
contains the code for a demonstration UDP client. In it, data is
accessed via process.stdin, and then
sent, as is, via the UDP socket. Note that we don’t have to set the
encoding for the string, since the UDP socket accepts only a buffer, and
the process.stdin data
is a buffer. We do, however, have to convert the
buffer to a string, using the buffer’s toString method, in
order to get a meaningful string for the console.log method call that echoes the
input.
var dgram = require('dgram');
var client = dgram.createSocket("udp4");
// prepare for input from terminal
process.stdin.resume();
process.stdin.on('data', function (data) {
console.log(data.toString('utf8'));
client.send(data, 0, data.length, 8124, "examples.burningbird.net",
function (err, bytes) {
if (err)
console.log('error: ' + err);
else
console.log('successful');
});
});The UDP server, shown in Example 3-7, is even simpler
than the client. All the server application does is create the socket,
bind it to a specific port (8124), and listen for the message event. When a
message arrives, the application prints it out using console.log, along with the IP address
and port of the sender. Note especially that no encoding is necessary to
print out the message—it’s automatically converted from a buffer to a
string.
We didn’t have to bind the socket to a port. However, without the binding, the socket would attempt to listen in on every port.
var dgram = require('dgram');
var server = dgram.createSocket("udp4");
server.on ("message", function(msg, rinfo) {
console.log("Message: " + msg + " from " + rinfo.address + ":"
+ rinfo.port);
});
server.bind(8124);I didn’t call the close method on either
the client or the server after sending/receiving the message. However,
no connection is being maintained between the client and server—just the
sockets capable of sending a message and receiving
communication.
The communication stream between the sockets discussed in
the previous sections is an implementation of the underlying abstract
stream interface. Streams can be
readable, writable, or both, and all streams are instances of EventEmitter, discussed
in the upcoming section Events and EventEmitter.
It’s important to take away from this section that all of these
communication streams, including process.stdin and
process.stdout, are
implementations of the abstract stream interface. Because of this underlying
interface, there is basic functionality available in all streams in
Node:
You can change the encoding for the stream data with
setEncoding.
You can check whether the stream is readable, writable, or both.
You can capture stream events, such as data received or connection closed, and attach callback functions for each.
You can pause and resume the stream.
You can pipe data from a readable stream to a writable stream.
The last capability is one we haven’t covered yet. A simple way to demonstrate a pipe is to open a REPL session and type in the following:
> process.stdin.resume(); > process.stdin.pipe(process.stdout);
...and then enjoy the fact that everything you type from that point on is echoed back to you.
If you want to keep the output stream open for continued data,
pass an option, { end: false }, to
the output stream:
process.stdin.pipe(process.stdout, { end : false });There is one additional object that provides a specific
functionality to readable streams: readline. You include
the Readline module with code like the following:
var readline = require('readline');The Readline module allows line-by-line reading of a stream. Be
aware, though, that once you include this module, the Node program
doesn’t terminate until you close the interface and the stdin stream. The Node
site documentation contains an example of how to begin and terminate a
Readline interface, which I adapted in Example 3-8. The application
asks a question as soon as you run it, and then outputs the answer. It
also listens for any “command,” which is really any line that terminates
with \n. If the command is
.leave, it leaves the
application; otherwise, it just repeats the command and prompts the user
for more. A Ctrl-C or Ctrl-D key combination also causes the application
to terminate.
var readline = require('readline');
// create a new interface
var interface = readline.createInterface(process.stdin, process.stdout, null);
// ask question
interface.question(">>What is the meaning of life? ", function(answer) {
console.log("About the meaning of life, you said " + answer);
interface.setPrompt(">>");
interface.prompt();
});
// function to close interface
function closeInterface() {
console.log('Leaving interface...');
process.exit();
}
// listen for .leave
interface.on('line', function(cmd) {
if (cmd.trim() == '.leave') {
closeInterface();
return;
} else {
console.log("repeating command: " + cmd);
}
interface.setPrompt(">>");
interface.prompt();
});
interface.on('close', function() {
closeInterface();
});Here’s an example session:
>>What is the meaning of life? === About the meaning of life, you said === >>This could be a command repeating command: This could be a command >>We could add eval in here and actually run this thing repeating command: We could add eval in here and actually run this thing >>And now you know where REPL comes from repeating command: And now you know where REPL comes from >>And that using rlwrap replaces this Readline functionality repeating command: And that using rlwrap replaces this Readline functionality >>Time to go repeating command: Time to go >>.leave Leaving interface...
This should look familiar. Remember from Chapter 2 that we can use rlwrap to override the
command-line functionality for REPL. We use the following to trigger its
use:
env NODE_NO_READLINE=1 rlwrap node
And now we know what the flag is triggering—it’s instructing REPL
not to use Node’s Readline module for command-line processing, but to
use rlwrap instead.
This is a quick introduction to the Node stream modules. Now it’s time to change course, and check out Node’s child processes.
Operating systems provide access to a great deal of functionality, but much of it is only accessible via the command line. It would be nice to be able to access this functionality from a Node application. That’s where child processes come in.
Node enables us to run a system command within a new child process, and listen in on its input/output. This includes being able to pass arguments to the command, and even pipe the results of one command to another. The next several sections explore this functionality in more detail.
All but the last example demonstrated in this section use Unix commands. They work on a Linux system, and should work in a Mac. They won’t, however, work in a Windows Command window.
There are four different techniques you can use to create
a child process. The most common one is using the spawn method. This
launches a command in a new process, passing in any arguments. In the
following, we create a child process to call the Unix pwd command to print
the current directory. The command takes no arguments:
var spawn = require('child_process').spawn,
pwd = spawn('pwd');
pwd.stdout.on('data', function (data) {
console.log('stdout: ' + data);
});
pwd.stderr.on('data', function (data) {
console.log('stderr: ' + data);
});
pwd.on('exit', function (code) {
console.log('child process exited with code ' + code);
});Notice the events that are captured on the child process’s
stdout and stderr. If no error
occurs, any output from the command is transmitted to the child
process’s stdout, triggering a
data event on the
process. If an error occurs, such as in the following where we’re
passing an invalid option to the command:
var spawn = require('child_process').spawn,
pwd = spawn('pwd', ['-g']);Then the error gets sent to stderr, which prints
out the error to the console:
stderr: pwd: invalid option -- 'g' Try `pwd --help' for more information. child process exited with code 1
The process exited with a code of 1, which signifies that an error occurred. The
exit code varies depending on the operating system and error. When no
error occurs, the child process exits with a code of 0.
The earlier code demonstrated sending output to the child
process’s stdout and stderr, but what about
stdin? The Node documentation for
child processes includes an example of directing data to stdin. It’s used to
emulate a Unix pipe (|) whereby the result of one command is immediately
directed as input to another command. I adapted the example in order to
demonstrate one of my favorite uses of the Unix pipe—being able to look
through all subdirectories, starting in the local directory, for a file
with a specific word (in this case, test) in its
name:
find . -ls | grep test
Example 3-9
implements this functionality as child processes. Note that the first
command, which performs the find, takes two
arguments, while the second one takes just one: a term passed in via
user input from stdin. Also note that,
unlike the example in the Node documentation, the grep child process’s
stdout encoding is
changed via setEncoding. Otherwise,
when the data is printed out, it would be printed out as a
buffer.
var spawn = require('child_process').spawn,
find = spawn('find',['.','-ls']),
grep = spawn('grep',['test']);
grep.stdout.setEncoding('utf8');
// direct results of find to grep
find.stdout.on('data', function(data) {
grep.stdin.write(data);
});
// now run grep and output results
grep.stdout.on('data', function (data) {
console.log(data);
});
// error handling for both
find.stderr.on('data', function (data) {
console.log('grep stderr: ' + data);
});
grep.stderr.on('data', function (data) {
console.log('grep stderr: ' + data);
});
// and exit handling for both
find.on('exit', function (code) {
if (code !== 0) {
console.log('find process exited with code ' + code);
}
// go ahead and end grep process
grep.stdin.end();
});
grep.on('exit', function (code) {
if (code !== 0) {
console.log('grep process exited with code ' + code);
}
});When you run the application, you’ll get a listing of all files in the current directory and any subdirectories that contain test in their filename.
All of the example applications up to this point work the same in Node 0.8 as in Node 0.6. Example 3-9 is an exception because of a change in the underlying API.
In Node 0.6, the exit event would not be
emitted until the child process exits and all STDIO pipes are closed. In
Node 0.8, the event is emitted as soon as the child process finishes.
This causes the application to crash, because the grep child process’s STDIO pipe is closed when
it tries to process its data. For the application to work in Node 0.8,
the application needs to listen for the close event on the
find child process, rather than the
exit event:
// and exit handling for both
find.on('close', function (code) {
if (code !== 0) {
console.log('find process exited with code ' + code);
}
// go ahead and end grep process
grep.stdin.end();
});In Node 0.8, the close event is emitted
when the child process exits and all STDIO pipes are closed.
In addition to spawning a child process, you can also use
child_process.exec and child_process.execFile to run a command in a
shell and buffer the results. The only difference between child_process.exec and child_process.execFile is that execFile runs an application in a file, rather
than running a command.
The first parameter in the two methods is either the command or
the file and its location; the second parameter is options for the
command; and the third is a callback function. The callback function
takes three arguments: error, stdout, and stderr. The data is buffered to stdout if no error occurs.
If the executable file contains:
#!/usr/local/bin/node console.log(global);
the following application prints out the buffered results:
var execfile = require('child_process').execFile,
child;
child = execfile('./app.js', function(error, stdout, stderr) {
if (error == null) {
console.log('stdout: ' + stdout);
}
});The last child process method is child_process.fork.
This variation of spawn is for
spawning Node processes.
What sets the child_process.fork
process apart from the others is that there’s an actual communication
channel established to the child process. Note, though, that each
process requires a whole new instance of V8, which takes both time and
memory.
Earlier I warned you that child processes that invoke Unix system commands won’t work with Windows, and vice versa. I know this sounds obvious, but not everyone knows that, unlike with JavaScript in browsers, Node applications can behave differently in different environments.
It wasn’t until recently that the Windows binary installation of
Node even provided access to child processes. You also need to invoke
whatever command you want to run via the Windows command interpreter,
cmd.exe.
Example 3-10
demonstrates running a Windows command. In the application, Windows
cmd.exe is used to
create a directory listing, which is then printed out to the console via
the data event handler.
var cmd = require('child_process').spawn('cmd', ['/c', 'dir\n']);
cmd.stdout.on('data', function (data) {
console.log('stdout: ' + data);
});
cmd.stderr.on('data', function (data) {
console.log('stderr: ' + data);
});
cmd.on('exit', function (code) {
console.log('child process exited with code ' + code);
});The /c flag passed as the
first argument to cmd.exe instructs it to
carry out the command and then terminate. The application doesn’t work
without this flag. You especially don’t want to pass in the /K flag, which tells
cmd.exe to execute the
application and then remain because your application won’t
terminate.
I provide more demonstrations of child processes in Chapter 9 and Chapter 12.
The DNS module provides DNS resolution using c-ares, a C library that provides asynchronous DNS requests. It’s used by Node with some of its other modules, and can be useful for applications that need to discover domains or IP addresses.
To discover the IP address given a domain, use the dns.lookup method and
print out the returned IP address:
var dns = require('dns');
dns.lookup('burningbird.net',function(err,ip) {
if (err) throw err;
console.log(ip);
});The dns.reverse method
returns an array of domain names for a given IP address:
dns.reverse('173.255.206.103', function(err,domains) {
domains.forEach(function(domain) {
console.log(domain);
});
});The dns.resolve method
returns an array of record types by a given type, such as A, MX, NS, and so on. In the
following code, I’m looking for the name server domains for my domain
name, burningbird.net:
var dns = require('dns');
dns.resolve('burningbird.net', 'NS', function(err,domains) {
domains.forEach(function(domain) {
console.log(domain);
});
});This returns:
ns1.linode.com ns3.linode.com ns5.linode.com ns4.linode.com
We used the URL module in Example 1-3 in Chapter 1. This simple module provides a way of parsing a URL and returning an object with all of the URL components. Passing in the following URL:
var url = require('url');
var urlObj = url.parse('http://examples.burningbird.net:8124/?file=main');returns the following JavaScript object:
{ protocol: 'http:',
slashes: true,
host: 'examples.burningbird.net:8124',
port: '8124',
hostname: 'examples.burningbird.net',
href: 'http://examples.burningbird.net:8124/?file=main',
search: '?file=main',
query: 'file=main',
pathname: '/',
path: '/?file=main' }Each of the components can then be discretely accessed like so:
var qs = urlObj.query; // get the query string
Calling the URL.format method
performs the reverse operation:
console.log(url.format(urlObj)); // returns original URL
The URL module is often used with the Query String module. The latter module is a simple utility module that provides functionality to parse a received query string, or prepare a string for use as a query string.
To chunk out the key/value pairs in the query string, use the
querystring.parse method.
The following:
var vals = querystring.parse('file=main&file=secondary&type=html");results in a JavaScript object that allows for easy access of the individual query string values:
{ file: [ 'main', 'secondary' ], type: 'html' }Since file is given twice in the
query string, both values are grouped into an array, each of which can be
accessed individually:
console.log(vals.file[0]); // returns main
You can also convert an object into a query string, using querystring.stringify:
var qryString = querystring.stringify(vals)
The Utilities module provides several useful functions. You include this module with:
var util = require('util');You can use Utilities to test if an object is an array (util.isArray) or regular expression (util.isRegExp), and to format a string (util.format). A new experimental addition to the
module provides functionality to pump data from a readable stream to a
writable stream (util.pump):
util.pump(process.stdin, process.stdout);
However, I wouldn’t type this into REPL, as anything you type from that point on is echoed as soon as you type it—making the session a little awkward.
I make extensive use of util.inspect to get a
string representation of an object. I find it’s a great way to discover
more about an object. The first required argument is the object; the
second optional argument is whether to display the nonenumerable
properties; the third optional argument is the number of times the object
is recursed (depth); and the fourth, also optional, is whether to style
the output in ANSI colors. If you assign a value of null to the depth, it recurses indefinitely (the
default is two times)—as much as necessary to exhaustively inspect the
object. From experience, I’d caution you to be careful using null for the depth because you’re going to get a
large output.
You can use util.inspect in REPL, but
I recommend a simple application, such as the following:
var util = require('util');
var jsdom = require('jsdom');
console.log(util.inspect(jsdom, true, null, true));When you run it, pipe the result to a file:
node inspectjsdom.js > jsdom.txt
Now you can inspect and reinspect the object interface at your
leisure. Again, if you use null for
depth, expect a large output file.
The Utilities module provides several other methods, but the one
you’re most likely to use is util.inherits. The
util.inherits function takes two
parameters, constructor and
superConstructor. The
result is that the constructor will inherit the functionality from the
superconstructor.
Example 3-11
demonstrates all the nuances associated with using util.inherits. The explanation of the code
follows.
Example 3-11 and its explanation cover some core JavaScript functionality you might already be familiar with. However, it’s important that all readers come away from this section with the same understanding of what’s happening.
var util = require('util');
// define original object
function first() {
var self = this;
this.name = 'first';
this.test = function() {
console.log(self.name);
};
}
first.prototype.output = function() {
console.log(this.name);
}
// inherit from first
function second() {
second.super_.call(this);
this.name = 'second';
}
util.inherits(second,first);
var two = new second();
function third(func) {
this.name = 'third';
this.callMethod = func;
}
var three = new third(two.test);
// all three should output "second"
two.output();
two.test();
three.callMethod();The application creates three objects named first, second, and third, respectively.
The first object has two methods:
test and output. The test method is defined directly in the object,
while the output method is added later
via the prototype object. The reason I
used both techniques for defining a method on the object is to demonstrate
an important aspect of inheritance with util.inherits (well, of JavaScript, but enabled
by util.inherits).
The second object contains the
following line:
second.super_.call(this);
If we eliminate this line from the second object constructor, any call to output on the second object would succeed, but a call to
test would generate an error and force
the Node application to terminate with a message about test being undefined.
The call method chains the
constructors between the two objects, ensuring that the superconstructor
is invoked as well as the constructor. The superconstructor is the
constructor for the inherited object.
We need to invoke the superconstructor since the test method doesn’t exist until first is created. However, we didn’t need the
call method for the output method, because it’s defined directly on
the first object’s prototype object. When the second object inherits properties from the
first, it also inherits the newly added
method.
If we look under the hood of util.inherits, we see where super_ is defined:
exports.inherits = function(ctor, superCtor) {
ctor.super_ = superCtor;
ctor.prototype = Object.create(superCtor.prototype, {
constructor: {
value: ctor,
enumerable: false,
writable: true,
configurable: true
}
});
};super_ is assigned as a property
to the second object when util.inherits is called:
util.inherits (second, first);
The third object in the application, third, also has a name property. It doesn’t inherit from either
first or second, but does expect a function passed to it
when it’s instantiated. This function is assigned to its own callMethod property. When the code creates an
instance of this object, the two object
instance’s test method is passed to the
constructor:
var three = new third(two.test);
When three.callMethod is called,
“second” is output, not “third” as you might expect at first glance. And
that’s where the self reference in the
first object comes in.
In JavaScript, this is the object
context, which can change as a method is passed around, or passed to an
event handler. The only way you can preserve data for an object’s method
is to assign this to an object
variable—in this case, self—and then
use the variable within any functions in the object.
Running this application results in the following output:
second second second
Much of this is most likely familiar to you from client-side
JavaScript development, though it’s important to understand the Utilities
part in the inheritance. The next section, which provides an overview of
Node’s EventEmitter, features
functionality that is heavily dependent on the inheritance behavior just
described.
Scratch underneath the surface of many of the Node core
objects, and you’ll find EventEmitter.
Anytime you see an object emit an event, and an event handled with
on, you’re seeing
EventEmitter in action. Understanding
how EventEmitter works and how to use
it are two of the more important components of Node development.
The EventEmitter object is what
provides the asynchronous event handling to objects in Node. To
demonstrate its core functionality, we’ll try a quick test
application.
First, include the Events module:
var events = require('events');Next, create an instance of EventEmitter:
var em = new events.EventEmitter();
Use the newly created EventEmitter instance to do two essential tasks:
attach an event handler to an event, and emit the actual event. The
on event handler is
triggered when a specific event is emitted. The first parameter to the
method is the name of the event, the second a function to process the
event:
em.on('someevent', function(data) { ... });The event is emitted on the object, based on some criteria, via the
emit method:
if (somecriteria) {
en.emit('data');
}In Example 3-12, we
create an EventEmitter instance that
emits an event, timed, every three
seconds. In the event handler function for this event, a message with a
counter is output to the console.
var eventEmitter = require('events').EventEmitter;
var counter = 0;
var em = new eventEmitter();
setInterval(function() { em.emit('timed', counter++); }, 3000);
em.on('timed', function(data) {
console.log('timed ' + data);
});Running the application outputs timed event messages to the console until the application is terminated.
This is an interesting example, but not particularly helpful. What
we need is the ability to add EventEmitter functionality to our existing
objects—not use instances of EventEmitter throughout our applications.
To add this necessary EventEmitter functionality to an object, use the
util.inherits method,
described in the preceding section:
util.inherits(someobj, EventEmitter);
By using util.inherits with the
object, you can call the emit method within the
object’s methods, and code event handlers on the object instances:
someobj.prototype.somemethod = function() { this.emit('event'); };
...
someobjinstance.on('event', function() { });Rather than attempt to decipher how EventEmitter works in the abstract sense, let’s
move on to Example 3-13,
which shows a working example of an object inheriting EventEmitter’s functionality. In the
application, a new object, inputChecker, is created. The constructor takes
two values, a person’s name and a filename. It assigns the person’s name
to an object variable, and also creates a reference to a writable stream
using the File System module’s createWriteStream method
(for more on the File System module, see the sidebar Readable and Writable Stream).
The object also has a method, check, that checks incoming data for specific
commands. One command (wr:) triggers a
write event, another (en:) an end
event. If no command is present, then an echo event is triggered.
The object instance provides event handlers for all three events. It
writes to the output file for the write event, it echoes the input for the
commandless input, and it terminates the application with an end event,
using the process.exit method.
All input comes from standard input (process.stdin).
var util = require('util');
var eventEmitter = require('events').EventEmitter;
var fs = require('fs');
function inputChecker (name, file) {
this.name = name;
this.writeStream = fs.createWriteStream('./' + file + '.txt',
{'flags' : 'a',
'encoding' : 'utf8',
'mode' : 0666});
};
util.inherits(inputChecker,eventEmitter);
inputChecker.prototype.check = function check(input) {
var command = input.toString().trim().substr(0,3);
if (command == 'wr:') {
this.emit('write',input.substr(3,input.length));
} else if (command == 'en:') {
this.emit('end');
} else {
this.emit('echo',input);
}
};
// testing new object and event handling
var ic = new inputChecker('Shelley','output');
ic.on('write', function(data) {
this.writeStream.write(data, 'utf8');
});
ic.on('echo', function( data) {
console.log(this.name + ' wrote ' + data);
});
ic.on('end', function() {
process.exit();
});
process.stdin.resume();
process.stdin.setEncoding('utf8');
process.stdin.on('data', function(input) {
ic.check(input);
});The EventEmitter functionality is
bolded in the example. Note that the functionality also includes the
process.stdin.on event
handler method, since process.stdin is one of
the many Node objects that inherit from EventEmitter.
We don’t have to chain the constructors from the new object to
EventEmitter, as demonstrated in the
earlier example covering util.inherits, because
the functionality we need—on and emit—consists of prototype methods, not object
instance properties.
The on method is really a
shortcut for the EventEmitter.addListener
method, which takes the same parameters. So this:
ic.addListener('echo', function( data) {
console.log(this.name + ' wrote ' + data);
});is exactly equivalent to:
ic.on('echo', function( data) {
console.log(this.name + ' wrote ' + data);
});You can listen only to the first event with:
ic.once(event, function);
When you exceed 10 listeners for an event, you’ll get a warning by
default. Use setMaxListeners, passing
in a number, to change the number of listeners. Use a value of zero
(0) for an unlimited amount of
listeners.
Many of the core Node objects, as well as third-party modules, make
use of EventEmitter. In Chapter 4, I’ll demonstrate how to convert the
code in Example 3-13 into a
module.