Now that the documents are in the index, we can start querying for them. First we’ll take a look around using the existing get command, and then we’ll implement a specific command just for querying.
The Elasticsearch API endpoint we want to hit is /_search. We can already hit this endpoint using the get command. Let’s try that now.
| | $ ./esclu get '_search' | jq '.' | head -n 20 |
| | { |
| | "took": 3, |
| | "timed_out": false, |
| | "_shards": { |
| | "total": 5, |
| | "successful": 5, |
| | "failed": 0 |
| | }, |
| | "hits": { |
| | "total": 53212, |
| | "max_score": 1, |
| | "hits": [ |
| | { |
| | "_index": "books", |
| | "_type": "book", |
| | "_id": "pg100", |
| | "_score": 1, |
| | "_source": { |
| | "id": 100, |
| | "title": "The Complete Works of William Shakespeare", |
Peeking at the head of the JSON response gives a good idea of what’s in it. As with the bulk API response, we see a took field, which indicates how long the request took to execute in milliseconds.
The results of the query are in the hits object, which contains three fields: total, max_score, and hits. The total field shows that all documents matched the query (we’ll perform more specific queries in a bit). The max_score field indicates the score value of the highest-scoring match. And the internal hits key points to an array of individual results.
Note that by default the _search API will return only the top 10 results. This can be increased by specifying the size URL parameter.
To dig into the results further, let’s again use the handy command-line tool jq.
Using jq, we can dig down into the hits result from a query and see that the _source object is the original document. Try out the following command, which contains a fairly deep filter expression.
| | $ ./esclu get '_search' | jq '.hits.hits[]._source' | head -n 20 |
| | { |
| | "id": 100, |
| | "title": "The Complete Works of William Shakespeare", |
| | "authors": [ |
| | "Shakespeare, William" |
| | ], |
| | "subjects": [ |
| | "English drama -- Early modern and Elizabethan, 1500-1600" |
| | ] |
| | } |
| | { |
| | "id": 1000, |
| | "title": "La Divina Commedia di Dante: Complete", |
| | "authors": [ |
| | "Dante Alighieri" |
| | ], |
| | "subjects": [] |
| | } |
| | { |
| | "id": 10000, |
Here, the jq expression .hits.hits[]._source is a compact way of describing the following steps:
The output is a stream of JSON objects, one after the other. To get a true JSON array instead, we can wrap the whole jq expression in brackets.
| | $ ./esclu get '_search' | jq '[ .hits.hits[]._source ]' | head -n 20 |
| | [ |
| | { |
| | "id": 100, |
| | "title": "The Complete Works of William Shakespeare", |
| | "authors": [ |
| | "Shakespeare, William" |
| | ], |
| | "subjects": [ |
| | "English drama -- Early modern and Elizabethan, 1500-1600" |
| | ] |
| | }, |
| | { |
| | "id": 1000, |
| | "title": "La Divina Commedia di Dante: Complete", |
| | "authors": [ |
| | "Dante Alighieri" |
| | ], |
| | "subjects": [] |
| | }, |
| | { |
Through the _search API, if you pass a query parameter, q, Elasticsearch will use its value to find documents. For example, say we were interested in books by Mark Twain. We could search for documents whose authors array includes the substring Twain using the query expression authors:Twain, like this:
| | $ ./esclu get '_search/?q=authors:Twain' | jq '.' | head -n 30 |
| | { |
| | "took": 3, |
| | "timed_out": false, |
| | "_shards": { |
| | "total": 5, |
| | "successful": 5, |
| | "failed": 0 |
| | }, |
| | "hits": { |
| | "total": 229, |
| | "max_score": 6.302847, |
| | "hits": [ |
| | { |
| | "_index": "books", |
| | "_type": "book", |
| | "_id": "pg1837", |
| | "_score": 6.302847, |
| | "_source": { |
| | "id": 1837, |
| | "title": "The Prince and the Pauper", |
| | "authors": [ |
| | "Twain, Mark" |
| | ], |
| | "subjects": [ |
| | "London (England) -- Fiction", |
| | "Historical fiction", |
| | "Boys -- Fiction", |
| | "Poor children -- Fiction", |
| | "Social classes -- Fiction", |
| | "Impostors and imposture -- Fiction", |
Elasticsearch’s query string syntax is a DSL with many useful features like wildcards, Boolean AND/OR operators, negation, and even regular expressions. We’ll explore some of these in future chapters, but a full treatment is outside the scope of this book. You can read about these options on Elasticsearch’s query string syntax page.[61]
Sometimes when you query Elasticsearch, you may not be interested in retrieving the whole source document for each match. Continuing the previous example, say we wanted to find just the title of each book by Mark Twain. For this, Elasticsearch supports another DSL for specifying a source filter.[62] Here we could use the source filter expression _source=title.
| | $ ./esclu get '_search?q=authors:Twain&_source=title' | jq '.' | head -n 30 |
| | { |
| | "took": 2, |
| | "timed_out": false, |
| | "_shards": { |
| | "total": 5, |
| | "successful": 5, |
| | "failed": 0 |
| | }, |
| | "hits": { |
| | "total": 229, |
| | "max_score": 6.302847, |
| | "hits": [ |
| | { |
| | "_index": "books", |
| | "_type": "book", |
| | "_id": "pg1837", |
| | "_score": 6.302847, |
| | "_source": { |
| | "title": "The Prince and the Pauper" |
| | } |
| | }, |
| | { |
| | "_index": "books", |
| | "_type": "book", |
| | "_id": "pg19987", |
| | "_score": 6.302847, |
| | "_source": { |
| | "title": "Chapters from My Autobiography" |
| | } |
| | }, |
Now the _source objects contain only the title key. As you might imagine, this pairs well again with jq, which lets you extract just those specific strings.
(Note the trailing backslash to indicate a continuing line.)
| | $ ./esclu get '_search?q=authors:Twain&_source=title' | \ |
| | jq '.hits.hits[]._source.title' |
| | "The Prince and the Pauper" |
| | "Chapters from My Autobiography" |
| | "The Awful German Language" |
| | "Personal Recollections of Joan of Arc — Volume 1" |
| | "Personal Recollections of Joan of Arc — Volume 2" |
| | "In Defence of Harriet Shelley" |
| | "The Innocents Abroad" |
| | "The Mysterious Stranger, and Other Stories" |
| | "The Curious Republic of Gondour, and Other Whimsical Sketches" |
| | "Jenkkejä maailmalla I\nHeidän toivioretkensä Pyhälle Maalle" |
Now that we’ve covered a bit of what you can do with the _search API, let’s add a final command to esclu to issue these kinds of queries.
Using your understanding of Elasticsearch’s _search API endpoint, it’s time to add one final command to esclu before closing out the chapter. The command will be called query, with the alias q for short. It will take any number of optional query parts, so the user won’t have to wrap the query in quotes.
Here’s an example usage of the q command that we’ll be adding:
| | $ ./esclu q authors:Twain AND subjects:children |
As before, we’ll want it to understand the --index flag to limit the query to a particular index. But additionally, it would be nice if we could specify an optional source filter expression to limit the output documents.
Begin by adding the --filter option alongside the other options during the initial program setup stanza toward the top of the file.
| | program |
| | // Other options... |
| | .option('-f, --filter <filter>', 'source filter for query results'); |
Finally, add this new query command just before the program.parse line at the bottom.
| | program |
| | .command('query [queries...]') |
| | .alias('q') |
| | .description('perform an Elasticsearch query') |
| | .action((queries = []) => { |
| | const options = { |
| | url: fullUrl('_search'), |
| | json: program.json, |
| | qs: {}, |
| | }; |
| | |
| | if (queries && queries.length) { |
| | options.qs.q = queries.join(' '); |
| | } |
| | |
| | if (program.filter) { |
| | options.qs._source = program.filter; |
| | } |
| | |
| | request(options, handleResponse); |
| | }); |
This code sets up a query command in much the same fashion as other commands you’ve created while going through this chapter. The parameter declaration [queries...] tells Commander that we expect any number of arguments to this command (even zero).
Inside the action callback, virtually all of the work focuses on building out the query string for the URL by adding properties to options.qs. request will take the properties of options.qs and encode them into query string parameters to append to the URL.
If the user provided any query parameters on the command line (using the -q flag), then we would concatenate them with spaces and assign the result to the q parameter. For example, if the user entered esclu -q "Mark" "Twain", then the q parameter would become the string "Mark Twain". When request encodes options.qs, this would become ?q=Mark%20Twain, appended to the end of the URL. Elasticsearch would then use this q parameter to search for matching documents. Likewise, if the user provided a filter on the command line with the -f flag, we would convert this to a _source parameter for the query string.
Now, continuing with the previous example, if the user only wanted to get back the title field of matching documents, then the thing to do is add -f title to the command line. request would then encode both options together as ?q=Mark%20Twain&_source=title.
After you save the file, give the new query command a try. The simplest query is the empty query, which matches all documents.
| | $ ./esclu q | jq '.' | head -n 30 |
| | { |
| | "took": 4, |
| | "timed_out": false, |
| | "_shards": { |
| | "total": 5, |
| | "successful": 5, |
| | "failed": 0 |
| | }, |
| | "hits": { |
| | "total": 53212, |
| | "max_score": 1, |
| | "hits": [ |
| | { |
| | "_index": "books", |
| | "_type": "book", |
| | "_id": "pg100", |
| | "_score": 1, |
| | "_source": { |
| | "id": 100, |
| | "title": "The Complete Works of William Shakespeare", |
| | "authors": [ |
| | "Shakespeare, William" |
| | ], |
| | "subjects": [ |
| | "English drama -- Early modern and Elizabethan, 1500-1600" |
| | ] |
| | } |
| | }, |
| | { |
| | "_index": "books", |
To abbreviate output, we can focus on just the title and author fields.
| | $ ./esclu q -f title,authors | jq '.' | head -n 30 |
| | { |
| | "took": 5, |
| | "timed_out": false, |
| | "_shards": { |
| | "total": 5, |
| | "successful": 5, |
| | "failed": 0 |
| | }, |
| | "hits": { |
| | "total": 53212, |
| | "max_score": 1, |
| | "hits": [ |
| | { |
| | "_index": "books", |
| | "_type": "book", |
| | "_id": "pg100", |
| | "_score": 1, |
| | "_source": { |
| | "title": "The Complete Works of William Shakespeare", |
| | "authors": [ |
| | "Shakespeare, William" |
| | ] |
| | } |
| | }, |
| | { |
| | "_index": "books", |
| | "_type": "book", |
| | "_id": "pg1000", |
| | "_score": 1, |
| | "_source": { |
Now using jq we can focus on just the source objects.
| | $ ./esclu q -f title,authors | jq '.hits.hits[]._source' | head -n 30 |
| | { |
| | "title": "The Complete Works of William Shakespeare", |
| | "authors": [ |
| | "Shakespeare, William" |
| | ] |
| | } |
| | { |
| | "title": "La Divina Commedia di Dante: Complete", |
| | "authors": [ |
| | "Dante Alighieri" |
| | ] |
| | } |
| | { |
| | "title": "The Magna Carta", |
| | "authors": [ |
| | "Anonymous" |
| | ] |
| | } |
| | { |
| | "title": "My First Years as a Frenchwoman, 1876-1879", |
| | "authors": [ |
| | "Waddington, Mary King" |
| | ] |
| | } |
| | { |
| | "title": "A Voyage to the Moon\r\nWith Some Account of the Manners and ... |
| | "authors": [ |
| | "Tucker, George" |
| | ] |
| | } |
Taking advantage of the joining of query parts, you can specify a complex query without wrapping it in quotes.
| | $ ./esclu q authors:Shakespeare AND subjects:Drama -f title |\ |
| | jq '.hits.hits[]._source.title' |
| | "The Tragedy of Othello, Moor of Venice" |
| | "The Tragedy of Romeo and Juliet" |
| | "The Tempest" |
| | "The Comedy of Errors" |
| | "Othello" |
| | "As You Like It" |
| | "The Two Gentlemen of Verona" |
| | "The Merchant of Venice" |
| | "Two Gentlemen of Verona" |
| | "All's Well That Ends Well" |
Be aware that if you need any characters that your shell treats as special, you should wrap the whole query in quotes. For example, to do a multiword query with Elasticsearch you can wrap the expression in double quotes (q title:"United States"). But your shell may strip out these quotes unless they’re wrapped in another set of quotes (q ’title:"United States"’).