Where does Elasticsearch start

Elasticsearch is probably the most popular search engine at the moment with a developed community, support and a mountain of information on the web. However, this information arrives inconsistently and fractionally.


The very first and main misconception is "you need a search, so take an elastic!". But in reality, if you need a quick search for a small or even quite a large project, you should understand the topic in more detail and you will refuse to use this particular system.


The second problem is that trying to figure it out from the beginning, getting the big picture will not be easy. Yes, information in bulk, but the sequence in its study is built ex post. You will have to run from the books to the documentation, and from the documentation back to the books, at the same time googling subtleties, only to understand what Elasticsearch is, why it works that way and why to use it at all, and where to choose something simpler.


In this article, I tried to consistently explain what seems to me the main thing in Elasticsearch, why it was invented and how it works.


For clarity, we’ll invent a task for ourselves. Implementation of a search in a collective blog for all materials and users. The system allows you to create tags, communities, geometries, and all other things that help us categorize a huge amount of information.



, :


  • , , ,
  • ( , )

. , , , , , , . . . , .


? . - . .


, . .



{
    "title" : "   Elasticsearch",
    "author" : {
        "name": "Roman"
    },
    "content" : "Elasticsearch, ,    ...",
    "tags":[
        "elasticsearch"
    ],
    "ps": ",      JSON, BSON  XML"
}

, , , .


, , , . .


Elasticsearch


. . ?


-. , . .


, , , . .


. .


? , , , ( ) .



, , , . , , . , .


— , . , , , .



.


— open-source , , , Apache Lucene.


Elasticsearch Lucene


, , . , , , . .


— . , , . .


— . , , . , ?


, . ? - , . - — , , . , , .


, , . , — ( ), . — .



, Lucene. , . , ? . Lucene , (shard) . .



Shard Elasticsearch — , Lucene.

Index — , . , .

.
ElasticsearchSQLMongoDB
IndexDatabaseDatabase
Mapping/TypeTableCollection
FieldColumnField
Object(JSON)TupleObject(BSON)


. . , . ! Elasticsearch . , , . . , .

5, index.number_of_shards: 1 .
PUT _template/all
{
  "template": "*",
      "settings": {
        "number_of_shards": 1
      }
}


. .

232 4294967296 , , . , , . , .

. Elasticsearch , . , 10 .


. , .


, Lucene(). , , , . — .


— CRUD- . data node coordinating node. .



Elasticsearch (node). Cluster — . .

:
  • cluster.name


elasticsearch.yml . , .

, http.port, 9200 9300 . , .

. , data- .


, . "" . . , , 2007 .


6.7 Elasticsearch . — hot, warm cold.

. hot- SSD, warm cold HDD-. / :
  • hot — 1:30
  • warm — 1:100
  • cold — 1:500


data node node.data: true, , .

. , MapReduce. :


  • Map — ,
  • Reduce — worker-


. , worker-( data-).


, coordinating- , , , .


, . . .


, .


Elasticsearch . , coordinating-, . , , false.


coordinating-. , / . , .


, coordinating- . data-, . , .


. , , . " " .


master-node. , : , , . .


Elasticsearch master node. node.master: true.

Master- , . , . 10 only-master .


, . , . , , .


data-, .


. , . , , .



number_of_replicas. :
PUT / _settings {
  "index": {
      "number_of_replicas": someVal
  }
}

.


primary shard, replica shard, .


, , flush commit Lucene, .



Elasticsearch cluster health status. :
  • green
  • yellow — , ,
  • red — ,

, - .



. ? .


— .


, , , , ? ? , .


.


Elasticsearch node.master: true , , , . :
  • node.master: true
  • node.data: true

. , . , . - , . split-brain problem.



. . , .


. . . , -.


. - , .



, ? ? ? ?


. .


, , 50%+ . , . .


, . :
_ = __/2 + 1


, . , , .


Elasticsearch , : node.voting_only: true.

Elasticsearch . . , 7 4, , 3 , 4.

, , , , , .



, , . , . .


HTTP, . HTTP . HTTP API , ES, . . JSON., . .
ES Native. , .JVM. ES. . .

Elsaticsearch .


, . — Elasticsearch , .


I tried to briefly and consistently talk about how and why this is exactly how it works. In this article, I intentionally did not mention the Elastic ecosystem, plugins, requests, tokenization, mapping, and the rest. I also did not say about Ingest and machine learning nodes, in my opinion, they provide additional features and are not basic.


Additional materials


Elastic Stack and Product Documentation


Sizing Elasticsearch


Elasticsearch 5.x Cookbook, Third Edition


All Articles