Table of Contents for
Node.js 8 the Right Way

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Node.js 8 the Right Way by Jim Wilson Published by Pragmatic Bookshelf, 2018
  1. Title Page
  2. Node.js 8 the Right Way
  3. Node.js 8 the Right Way
  4. Node.js 8 the Right Way
  5. Node.js 8 the Right Way
  6.  Acknowledgments
  7.  Preface
  8. Why Node.js the Right Way?
  9. What’s in This Book
  10. What This Book Is Not
  11. Code Examples and Conventions
  12. Online Resources
  13. Part I. Getting Up to Speed on Node.js 8
  14. 1. Getting Started
  15. Thinking Beyond the web
  16. Node.js’s Niche
  17. How Node.js Applications Work
  18. Aspects of Node.js Development
  19. Installing Node.js
  20. 2. Wrangling the File System
  21. Programming for the Node.js Event Loop
  22. Spawning a Child Process
  23. Capturing Data from an EventEmitter
  24. Reading and Writing Files Asynchronously
  25. The Two Phases of a Node.js Program
  26. Wrapping Up
  27. 3. Networking with Sockets
  28. Listening for Socket Connections
  29. Implementing a Messaging Protocol
  30. Creating Socket Client Connections
  31. Testing Network Application Functionality
  32. Extending Core Classes in Custom Modules
  33. Developing Unit Tests with Mocha
  34. Wrapping Up
  35. 4. Connecting Robust Microservices
  36. Installing ØMQ
  37. Publishing and Subscribing to Messages
  38. Responding to Requests
  39. Routing and Dealing Messages
  40. Clustering Node.js Processes
  41. Pushing and Pulling Messages
  42. Wrapping Up
  43. Node.js 8 the Right Way
  44. Part II. Working with Data
  45. 5. Transforming Data and Testing Continuously
  46. Procuring External Data
  47. Behavior-Driven Development with Mocha and Chai
  48. Extracting Data from XML with Cheerio
  49. Processing Data Files Sequentially
  50. Debugging Tests with Chrome DevTools
  51. Wrapping Up
  52. 6. Commanding Databases
  53. Introducing Elasticsearch
  54. Creating a Command-Line Program in Node.js with Commander
  55. Using request to Fetch JSON over HTTP
  56. Shaping JSON with jq
  57. Inserting Elasticsearch Documents in Bulk
  58. Implementing an Elasticsearch Query Command
  59. Wrapping Up
  60. Node.js 8 the Right Way
  61. Part III. Creating an Application from the Ground Up
  62. 7. Developing RESTful Web Services
  63. Advantages of Express
  64. Serving APIs with Express
  65. Writing Modular Express Services
  66. Keeping Services Running with nodemon
  67. Adding Search APIs
  68. Simplifying Code Flows with Promises
  69. Manipulating Documents RESTfully
  70. Emulating Synchronous Style with async and await
  71. Providing an Async Handler Function to Express
  72. Wrapping Up
  73. 8. Creating a Beautiful User Experience
  74. Getting Started with webpack
  75. Generating Your First webpack Bundle
  76. Sprucing Up Your UI with Bootstrap
  77. Bringing in Bootstrap JavaScript and jQuery
  78. Transpiling with TypeScript
  79. Templating HTML with Handlebars
  80. Implementing hashChange Navigation
  81. Listing Objects in a View
  82. Saving Data with a Form
  83. Wrapping Up
  84. 9. Fortifying Your Application
  85. Setting Up the Initial Project
  86. Managing User Sessions in Express
  87. Adding Authentication UI Elements
  88. Setting Up Passport
  89. Authenticating with Facebook, Twitter, and Google
  90. Composing an Express Router
  91. Bringing in the Book Bundle UI
  92. Serving in Production
  93. Wrapping Up
  94. Node.js 8 the Right Way
  95. 10. BONUS: Developing Flows with Node-RED
  96. Setting Up Node-RED
  97. Securing Node-RED
  98. Developing a Node-RED Flow
  99. Creating HTTP APIs with Node-RED
  100. Handling Errors in Node-RED Flows
  101. Wrapping Up
  102. A1. Setting Up Angular
  103. A2. Setting Up React
  104. Node.js 8 the Right Way

Processing Data Files Sequentially

By now your lib/parse-rdf.js is a robust module that can reliably convert RDF content into JSON documents. All that remains is to walk through the Project Gutenberg catalog directory and collect all the JSON documents.

More concretely, we need to do the following:

  1. Traverse down the data/cache/epub directory looking for files ending in rdf.
  2. Read each RDF file.
  3. Run the RDF content through parseRDF.
  4. Collect the JSON serialized objects into a single, bulk file for insertion.

The NoSQL database we’ll be using is Elasticsearch, a document datastore that indexes JSON objects. Soon, in Chapter 6, Commanding Databases, we’ll dive deep into Elasticsearch and how to effectively use it with Node.js. You’ll learn how to install it, configure it, and make the most of its HTTP-based APIs.

For now, though, our focus is just on transforming the Gutenberg data into an intermediate form for bulk import.

Conveniently, Elasticsearch has a bulk-import API that lets you pull in many records at once. Although we could insert them one at a time, it is significantly faster to use the bulk-insert API.

The format of the file we need to create is described on Elasticsearch’s Bulk API page.[44] It’s an LDJ file consisting of actions and the source objects on which to perform each action.

In our case, we’re performing index operations—that is, inserting new documents into an index. Each source object is the book object returned by parseRDF. Here’s an example of an action followed by its source object:

 {​"index"​:{​"_id"​:​"pg11"​}}
 {​"id"​:11,​"title"​:​"Alice's Adventures in Wonderland"​,​"authors"​:...}

And here’s another one:

 {​"index"​:{​"_id"​:​"pg132"​}}
 {​"id"​:132,​"title"​:​"The Art of War"​,​"authors"​:...}

In each case, an action is a JSON object on a line by itself, and the source object is another JSON object on the next line. Elasticsearch’s bulk API allows you to chain any number of these together like so:

 {​"index"​:{​"_id"​:​"pg11"​}}
 {​"id"​:11,​"title"​:​"Alice's Adventures in Wonderland"​,​"authors"​:...}
 {​"index"​:{​"_id"​:​"pg132"​}}
 {​"id"​:132,​"title"​:​"The Art of War"​,​"authors"​:...}

The _id field of each index operation is the unique identifier that Elasticsearch will use for the document. Here I’ve chosen to use the string pg followed by the Project Gutenberg ID. This way, if we ever wanted to store documents from another source in the same index, they shouldn’t collide with the Project Gutenberg book data.

To find and open each of the RDF files under the data/cache/epub directory, we will use a module called node-dir. Install and save it as usual. Then we will begin like this:

 $ ​​npm​​ ​​install​​ ​​--save​​ ​​--save-exact​​ ​​node-dir@0.1.16

This module comes with a handful of useful methods for walking a directory tree. The method we’ll use is readFiles, which sequentially operates on files as it encounters them while walking a directory tree.

Let’s use this method to find all the RDF files and send them through our RDF parser. Open a text editor and enter this:

 'use strict'​;
 
 const​ dir = require(​'node-dir'​);
 const​ parseRDF = require(​'./lib/parse-rdf.js'​);
 
 const​ dirname = process.argv[2];
 
 const​ options = {
  match: ​/​​\.​​rdf$/​, ​// Match file names that in '.rdf'.
  exclude: [​'pg0.rdf'​], ​// Ignore the template RDF file (ID = 0).
 };
 
 dir.readFiles(dirname, options, (err, content, next) => {
 if​ (err) ​throw​ err;
 const​ doc = parseRDF(content);
  console.log(JSON.stringify({ index: { _id: ​`pg​${doc.id}​`​ } }));
  console.log(JSON.stringify(doc));
  next();
 });

Save the file as rdf-to-bulk.js in your databases project directory. This short program walks down the provided directory looking for files that end in rdf, but excluding the template RDF file called pg0.rdf.

As the program reads each file’s content, it runs it through the RDF parser. For output, it produces JSON serialized actions suitable for Elasticsearch’s bulk API.

Run the program, and let’s see what it produces.

 $ ​​node​​ ​​rdf-to-bulk.js​​ ​​../data/cache/epub/​​ ​​|​​ ​​head

If all went well, you should see 10 lines consisting of interleaved actions and documents—like the following, which has been truncated to fit on the page.

 {"index":{"_id":"pg1"}}
 {"id":1,"title":"The Declaration of Independence of the United States of Ame...
 {"index":{"_id":"pg10"}}
 {"id":10,"title":"The King James Version of the Bible","authors":[],"subject...
 {"index":{"_id":"pg100"}}
 {"id":100,"title":"The Complete Works of William Shakespeare","authors":["Sh...
 {"index":{"_id":"pg1000"}}
 {"id":1000,"title":"La Divina Commedia di Dante: Complete","authors":["Dante...
 {"index":{"_id":"pg10000"}}
 {"id":10000,"title":"The Magna Carta","authors":["Anonymous"],"subjects":["M...

Because the head command closes the pipe after echoing the beginning lines, this can sometimes cause Node.js to throw an exception, sending the following to the standard error stream:

 events.js:160
  throw er; // Unhandled 'error' event
  ^
 
 Error: write EPIPE
  at exports._errnoException (util.js:1022:11)
  at WriteWrap.afterWrite [as oncomplete] (net.js:804:14)

To mitigate this error, you can capture error events on the process.stdout stream. Try adding the following line to rdf-to-bulk.js and rerunning it.

 process.stdout.on(​'error'​, err => process.exit());

Now, when head closes the pipe, the next attempt to use console.log will trigger the error event listener and the process will exit silently. If you’re worried about output errors other than EPIPE, you can check the err object’s code property and take action as appropriate.

 process.stdout.on(​'error'​, err => {
 if​ (err.code === ​'EPIPE'​) {
  process.exit();
  }
 throw​ err; ​// Or take any other appropriate action.
 });

At this point we’re ready to let rdf-to-bulk.js run for real. Use the following command to capture this LDJ output in a new file called bulk_pg.ldj.

 $ ​​node​​ ​​rdf-to-bulk.js​​ ​​../data/cache/epub/​​ ​​>​​ ​​../data/bulk_pg.ldj

This will run for quite a while, as rdf-to-bulk.js traverses the epub directory, parses each file, and tacks on the Elasticsearch action for it. When it’s finished, the bulk_pg.ldj file should be about 11 MB.