Elasticsearch is a distributed, document-oriented, NoSQL database. In recent years, Elasticsearch has been gaining in popularity relative to other JSON-based document datastores, and for good reason. Built on Apache Lucene,[47] Elasticsearch provides a rich suite of querying capabilities, including full-text search, stemming, and fuzzy search. With Elasticsearch you can also execute a variety of aggregation queries, apply filters, and perform numeric comparisons.
No one tool is best for all jobs, of course, and Elasticsearch is no exception. But given that our Project Gutenberg documents are textual in nature—including titles of books, author names, and subject strings—Elasticsearch is a natural fit. Once the documents are stored in Elasticsearch, we’ll be able to develop our own specific RESTful APIs on top, starting in the next chapter.
The scalability and reliability of Elasticsearch comes from its clustered architecture. By sharding indices and replicating the shards, an Elasticsearch cluster guards against outages and can often parallelize the execution of queries. Proper configuring and tuning of an Elasticsearch cluster are huge topics, and are outside the scope of this book. Fortunately, the default configuration settings are sufficient for our exploratory use case.
As for interacting with Elasticsearch, it’s all about making proper HTTP requests. Doing this will give us the opportunity to talk about HTTP and RESTful practices—information that will be handy in Chapter 7, Developing RESTful Web Services, when we’ll implement our own RESTful web services on top of Elasticsearch. And the techniques you’ll use here apply to any other RESTful APIs you use with Node.js, as well.
Of course, to do anything with Elasticsearch, you’ll need to install it. Let’s do that now.
Elasticsearch is built on Java 8, which means you’ll need to install a Java Runtime Environment if you haven’t already. For production use, Elastic recommends using Oracle’s Java Development Kit (JDK) version 1.8.0_73 or higher. Instructions on how to install Java 8 are available on Oracle’s website.[48]
You can run java -version from the command line to confirm that Java is installed and ready.
| | $ java -version |
| | openjdk version "1.8.0_91" |
| | OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14) |
| | OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) |
Once you have Java installed, it’s time to download and install Elasticsearch.
We’ll be using version 5.2 of Elasticsearch, available from Elastic’s download page.[49] Once you download the archive, unzip it and run bin/elasticsearch from the command line. You should see a lot of output containing something like the following (much of the output is omitted here for brevity).
| | $ bin/elasticsearch |
| | [INFO ][o.e.n.Node ] [] initializing ... |
| | ... many lines omitted ... |
| | [INFO ][o.e.h.HttpServer ] [kAh7Q7Z] publish_address {127.0.0.1:9200}, |
| | bound_addresses {[::1]:9200}, {127.0.0.1:9200} |
| | [INFO ][o.e.n.Node ] [kAh7Q7Z] started |
| | [INFO ][o.e.g.GatewayService ] [kAh7Q7Z] recovered [0] indices into |
| | cluster_state |
Notice the publish_address and bound_addresses listed toward the end of the output. By default, Elasticsearch binds TCP port 9200 for its HTTP endpoint.
You can specify a lot of settings when setting up an Elasticsearch cluster. We haven’t specified any here, which means it’s running in development mode. A full discussion of the Elasticsearch cluster settings is outside the scope of this book, but you can read about them on Elastic’s Important System Configuration page.[50]
With Elasticsearch running, we can now implement a command-line utility program for it.