Tutorial - Elasticsearch
An Elasticsearch Tutorial: Getting Started :
Installing Elasticsearch
The requirements for Elasticsearch are simple: Java 8 (specific version recommended: Oracle JDK version 1.8.0_131). Take a look at this Logstash tutorial to ensure that you are set. Also, you will want to make sure your operating system is on the Elastic support matrix, otherwise you might run up against strange and unpredictable issues. Once that is done, you can start by installing Elasticsearch.
You can download Elasticsearch as a standalone distribution or install it using the apt
and yum
repositories. We will install Elasticsearch on an Ubuntu 16.04 machine running on AWS EC2 using apt
.
First, you need to add Elastic’s signing key so you can verify the downloaded package (skip this step if you’ve already installed packages from Elastic):
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
For Debian, we need to then install the apt-transport-https
package:
sudo apt-get install apt-transport-https
The next step is to add the repository definition to your system:
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
All that’s left to do is to update your repositories and install Elasticsearch:
sudo apt-get update sudo apt-get install elasticsearch
Configuring Elasticsearch
Elasticsearch configurations are done using a configuration file whose location depends on your operating system. In this file, you can configure general settings (e.g. node name), as well as network settings (e.g. host and port), where data is stored, memory, log files, and more.
For development and testing purposes, the default settings will suffice yet it is recommended you do some research into what settings you should manually define before going into production.
For example, and especially if installing Elasticsearch on the cloud, it is a good best practice to bind Elasticsearch to either a private IP or localhost:
sudo vim /etc/elasticsearch/elasticsearch.yml network.host: "localhost" http.port:9200
Running Elasticsearch
Elasticsearch will not run automatically after installation and you will need to manually start it. How you run Elasticsearch will depend on your specific system. On most Linux and Unix-based systems you can use this command:
sudo service elasticsearch start
And that’s it! To confirm that everything is working fine, simply point curl or your browser to http://localhost:9200
, and you should see something like the following output:
{ "name" : "33QdmXw", "cluster_name" : "elasticsearch", "cluster_uuid" : "mTkBe_AlSZGbX-vDIe_vZQ", "version" : { "number" : "6.1.2", "build_hash" : "5b1fea5", "build_date" : "2018-01-10T02:35:59.208Z", "build_snapshot" : false, "lucene_version" : "7.1.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }
To debug the process of running Elasticsearch, use the Elasticsearch log files located (on Deb) in /var/log/elasticsearch/
.
Creating an Elasticsearch Index
Indexing is the process of adding data to Elasticsearch. This is because when you feed data into Elasticsearch, the data is placed into Apache Lucene indexes. This makes sense because Elasticsearch uses the Lucene indexes to store and retrieve its data. Although you do not need to know a lot about Lucene, it does help to know how it works when you start getting serious with Elasticsearch.
Elasticsearch behaves like a REST API, so you can use either the POST
or the PUT
method to add data to it. You use PUT
when you know the or want to specify the id
of the data item, or POST
if you want Elasticsearch to generate an id
for the data item:
curl -XPOST 'localhost:9200/logs/my_app' -H 'Content-Type: application/json' -d' { "timestamp": "2018-01-24 12:34:56", "message": "User logged in", "user_id": 4, "admin": false } ' curl -X PUT 'localhost:9200/app/users/4' -H 'Content-Type: application/json' -d ' { "id": 4, "username": "john", "last_login": "2018-01-25 12:34:56" } '
And the response:
{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1} {"_index":"app","_type":"users","_id":"4","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}
The data for the document is sent as a JSON object. You might be wondering how we can index data without defining the structure of the data. Well, with Elasticsearch, like with any other NoSQL database, there is no need to define the structure of the data beforehand. To ensure optimal performance, though, you can define Elasticsearch mappings according to data types. More on this later.
If you are using any of the Beats shippers (e.g. Filebeat or Metricbeat), or Logstash, those parts of the ELK Stack will automatically create the indices.
To see a list of your Elasticsearch indices, use:
curl -XGET 'localhost:9200/_cat/indices?v&pretty' health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open logstash-2018.01.23 y_-PguqyQ02qOqKiO6mkfA 5 1 17279 0 9.9mb 9.9mb yellow open app GhzBirb-TKSUFLCZTCy-xg 5 1 1 0 5.2kb 5.2kb yellow open .kibana Vne6TTWgTVeAHCSgSboa7Q 1 1 2 0 8.8kb 8.8kb yellow open logs T9E6EdbMSxa8S_B7SDabTA 5 1 1 0 5.7kb 5.7kb
The list in this case includes the indices we created above, a Kibana index and an index created by a Logstash pipeline.
Elasticsearch Querying
Once you index your data into Elasticsearch, you can start searching and analyzing it. The simplest query you can do is to fetch a single item. Read our article focused exclusively on Elasticsearch queries.
Once again, via the Elasticsearch REST API, we use GET
:
curl -XGET 'localhost:9200/app/users/4?pretty'
And the response:
{ "_index" : "app", "_type" : "users", "_id" : "4", "_version" : 1, "found" : true, "_source" : { "id" : 4, "username" : "john", "last_login" : "2018-01-25 12:34:56" } }
The fields starting with an underscore are all meta fields of the result. The _source
object is the original document that was indexed.
We also use GET to do searches by calling the _search
endpoint:
curl -XGET 'localhost:9200/_search?q=logged' {"took":173,"timed_out":false,"_shards":{"total":16,"successful":16,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"logs","_type":"my_app","_id":"ZsWdJ2EBir6MIbMWSMyF","_score":0.2876821,"_source": { "timestamp": "2018-01-24 12:34:56", "message": "User logged in", "user_id": 4, "admin": false } }]}}
The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:
took
: The time in milliseconds the search tooktimed_out: If the search timed out
_shards
: The number of Lucene shards searched, and their success and failure rateshits
: The actual results, along with meta information for the results
The search we did above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, ES will search all of the fields of all the documents for that word. You can build more specific searches by using Lucene queries:
- username:johnb – Looks for documents where the username field is equal to “johnb”
- john* – Looks for documents that contain terms that start with john and is followed by zero or more characters such as “john,” “johnb,” and “johnson”
- john? – Looks for documents that contain terms that start with john followed by only one character. Matches “johnb” and “johns” but not “john.”
There are many other ways to search including the use of boolean logic, the boosting of terms, the use of fuzzy and proximity searches, and the use of regular expressions.
Elasticsearch Query DSL
URI searches are just the beginning. Elasticsearch also provides a request body search with a Query DSL for more advanced searches. There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require.
It contains two kinds of clauses: 1) leaf query clauses that look for a value in a specific field, and 2) compound query clauses (which might contain one or several leaf query clauses).
Elasticsearch Query Types
There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require. Query types include:
- Geo queries,
- “More like this” queries
- Scripted queries
- Full text queries
- Shape queries
- Span queries
- Term-level queries
- Specialized queries
As of Elasticsearch 6.8, the ELK Stack has merged Elasticsearch queries and Elasticsearch filters, but ES still differentiates them by context. The DSL distinguishes between a filter context and a query context for query clauses. Clauses in a filter context test documents in a boolean fashion: Does the document match the filter, “yes” or “no?” Filters are also generally faster than queries, but queries can also calculate a relevance score according to how closely a document matches the query. Filters do not use a relevance score. This determines the ordering and inclusion of documents:
curl -XGET 'localhost:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "match_phrase": { "message": "User logged in" } } } '
And the result:
{ "took" : 28, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.8630463, "hits" : [ { "_index" : "logs", "_type" : "my_app", "_id" : "ZsWdJ2EBir6MIbMWSMyF", "_score" : 0.8630463, "_source" : { "timestamp" : "2018-01-24 12:34:56", "message" : "User logged in", "user_id" : 4, "admin" : false } } ] } } '
Creating an Elasticsearch Cluster
Maintaining an Elasticsearch cluster can be time-consuming, especially if you are doing DIY ELK. But, given Elasticsearch’s powerful search and analytic capabilities, such clusters are indispensable. We have a deeper dive on the subject with our Elasticsearch cluster tutorial, so we will use this as a springboard for that more thorough walk-through.
What is an Elasticsearch cluster, precisely? Elasticsearch clusters group multiple Elasticsearch nodes and/or instances together. Of course, you can always choose to maintain a single Elasticsearch instance or node inside a given cluster. The main point of such a grouping lies in the cluster’s distribution of tasks, searching, and indexing across its nodes. Node options include data nodes, master nodes, client nodes, and ingest nodes.
Installing nodes can involve a lot of configurations, which our aforementioned tutorial covers. But here’s the basic Elasticsearch cluster node installation:
First and foremost, install Java:
sudo apt-get install default-jre
Next, add Elasticsearch’s sign-in key:
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Next, install the new iteration of Elasticsearch:
sudo apt-get update && apt-get install elasticsearch
You will have to create and/or set up each Elasticsearch node’s own elasticsearch.yml
config file (sudo vim /etc/elasticsearch/elasticsearch.yml
). From there, start Elasticsearch and then check your Elasticsearch cluster status. Responses will look something like this:
{ "cluster_name" : "elasticsearch-cluster-demo", "compressed_size_in_bytes" : 255, "version" : 7, "state_uuid" : "50m3ranD0m54a531D", "master_node" : "IwEK2o1-Ss6mtx50MripkA", "blocks" : { }, "nodes" : { "m4-aw350m3-n0D3" : { "name" : "es-node-1", "ephemeral_id" : "x50m33F3mr--A11DnuM83r", "transport_address" : "172.31.50.123:9200", "attributes" : { } }, }, }
Elasticsearch cluster health will be next on your list. Periodically check your cluster’s health with the following API call:
curl -X GET "localhost:9200/_cluster/health?wait_for_status=yellow&local=false&level=shards&pretty"
This example shows the parameter local
as false
, (which is actually by default). This will show you the status of the master node. To check the local node, change to true
.
The level parameter will, by default, show you cluster health, but ranks beyond that include indices
and shards
(as in the above example).
There are additional optional parameters for timeouts…
timeout
master_timeout
…or, to wait for certain events to occur:
wait_for_active_shards
wait_for_events
wait_for_no_initializing_shards
wait_for_no_relocating_shards
wait_for_nodes
wait_for_status
Of course, with Logz.io, creating an Elasticsearch (now OpenSearch) cluster is as easy as starting a free trial. And scaling up your Elasticsearch cluster requires nothing from the user.
Removing Elasticsearch Data
Deleting items from Elasticsearch is just as easy as entering data into Elasticsearch. The HTTP method to use this time is—surprise, surprise—DELETE
:
$ curl -XDELETE 'localhost:9200/app/users/4?pretty' { "_index" : "app", "_type" : "users", "_id" : "4", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 1, "_primary_term" : 1 }
To delete an index, use:
$ curl -XDELETE 'localhost:9200/logs?pretty'
To delete all indices (use with extreme caution) use:
$ curl -XDELETE 'localhost:9200/_all?pretty'$
The response in both cases should be:
{ "acknowledged" : true }
To delete a single document:
$ curl -XDELETE 'localhost:9200/index/type/document'
Comments
Post a Comment