Node.js 8 the Right Way

Shaping JSON with jq

jq is a command-line program for querying and manipulating JSON objects. Technically speaking, it’s not a Node.js thing, but it’s so useful for working with JSON (and JSON is so prevalent in Node.js development) that it’s absolutely worth knowing. You’ll be a better developer knowing how to use jq to explore and manipulate your JSON data.

You can find instructions for downloading and installing jq on the jq website,^[56] or use the package manager of your choice. Once you have it installed, you should be able to see the version number on the command line if you run it with the -V option.

	$ jq -V
	jq-1.5-1-a5b5cbe

The examples in this book assume you’re using version 1.5.x.

jq reads JSON from standard input and operates on it according to a query string you provide. This string uses jq’s custom domain-specific language for articulating transformations (more on this in a bit).

The simplest query is the string containing a single dot (.), which means output the object as is. To try it out, pipe the output from your esclu command into jq with that string argument.

	$ ./esclu li -j \| jq '.'
	{
	"books": {
	"aliases": {},
	"mappings": {},
	"settings": {
	"index": {
	"creation_date": "1484650920414",
	"number_of_shards": "5",
	"number_of_replicas": "1",
	"uuid": "3t4pwCBmTwyVKMe_0j26kg",
	"version": {
	"created": "5010199"
	},
	"provided_name": "books"
	}
	}
	}
	}

Already it’s looking better, because by default jq will format its output using pretty indentation. Now let’s try another jq function, keys, which extracts the keys of an object as an array.

	$ ./esclu li -j \| jq 'keys'
	[
	"books"
	]

You may have noticed that while the JSON and non-JSON outputs of the list-indices command both contain interesting information, it’s not quite the same. For example, two interesting fields from the non-JSON output are doc.count (the number of documents), and store.size (the number of bytes on disk used by this index).

We can get the same information in JSON form, but we have to go to Elasticsearch’s _stats API endpoint to get it. Be warned, though; _stats provides a lot of information we’ll have to troll through to find what we need.

To start, let’s take a peek at the first few lines of _stats output after nicely formatting it with jq. For the purpose of this book, these examples use the program head to show only the first N lines of output. In your own terminal, you could use an interactive paging program like less instead.

	$ ./esclu get _stats \| jq '.' \| head -n 20
	{
	"_shards": {
	"total": 10,
	"successful": 5,
	"failed": 0
	},
	"_all": {
	"primaries": {
	"docs": {
	"count": 0,
	"deleted": 0
	},
	"store": {
	"size_in_bytes": 650,
	"throttle_time_in_millis": 0
	},
	"indexing": {
	"index_total": 0,
	"index_time_in_millis": 0,
	"index_current": 0,

OK, from this output we can tell a couple of things. First, the return value of _stats is an object with at least two keys: _shards and _all. In Elasticsearch, leading underscores are reserved, and in particular _all usually means all indices.

We can also see that under _all the path primaries.docs.count is a number (currently zero since we have inserted no documents). And primary.store.size_in_bytes is 650.

To see what else is in this object, let’s use a jq function called keys,^[57] which, like JavaScript’s Object.keys, returns an array containing the keys of an object.

	$ ./esclu get _stats \| jq 'keys'
	[
	"_all",
	"_shards",
	"indices"
	]

In addition to _all and _shards, there is also indices. We can take a look at that by using a jq filter,^[58] which is a string that describes a path into an object. The filter .indices will return the value for the key indices. We’ll have to use head again to truncate the output.

	$ ./esclu get _stats \| jq '.indices' \| head -n 20
	{
	"books": {
	"primaries": {
	"docs": {
	"count": 0,
	"deleted": 0
	},
	"store": {
	"size_in_bytes": 650,
	"throttle_time_in_millis": 0
	},
	"indexing": {
	"index_total": 0,
	"index_time_in_millis": 0,
	"index_current": 0,
	"index_failed": 0,
	"delete_total": 0,
	"delete_time_in_millis": 0,
	"delete_current": 0,
	"noop_update_total": 0,

The keys of the indices object are the names of the indices we’ve created. So far, the only key is books. Under each index, the structure looks roughly the same as the _all object we inspected earlier.

Using jq, we can combine filters and functions by piping one expression into another using the pipe operator (|). For example, we can see what keys the books object has by piping the output of the .indices.books filter into the keys function. Try this:

	$ ./esclu get _stats \| jq '.indices.books \| keys'
	[
	"primaries",
	"total"
	]

With jq, you can also compose new objects using filters and functions. For example, we could create a custom JSON report containing the total number of all documents in Elasticsearch and the total size in bytes for those documents.

	$ ./esclu get _stats \| \
	jq '._all.primaries \| { count: .docs.count, size: .store.size_in_bytes }'
	{
	"count": 0,
	"size": 650
	}

The expression here tells jq to start by applying the filter ._all.primaries. The resulting object is piped into an object constructor, which is a set of curly braces wrapping the desired content. In this case, we want to construct an object with a count key containing the value under .docs.count, and a size key with the value under .store.size_in_bytes.

You can do more than this, of course, but that’s enough jq for now. Keep this tool in mind as you deal with JSON data in the future—it makes it easy to poke around, and the jq manual does a superb job of explaining the tool’s features.^[59]

Previous Chapter

Using request to Fetch JSON over HTTP

Next Chapter

Inserting Elasticsearch Documents in Bulk

Table of Contents for Node.js 8 the Right Way

Shaping JSON with jq

Table of Contents for
Node.js 8 the Right Way