200 TB Elasticsearch Cluster +


Elasticsearch wird von vielen konfrontiert. Aber was passiert, wenn Sie damit Protokolle "in einem besonders großen Volumen" speichern möchten? Ja, und den Ausfall eines von mehreren Rechenzentren schmerzlos überleben? Welche Architektur lohnt sich und welche Fallstricke werden auftreten?


elasticsearch -, : , .


— , . , Manticore Search, Sphinx search, Elasticsearch. , - ...search, . .


, , Elasticsearch. , - , -, .



:


  • Graylog. , , .
  • : 50-80 , - , , 2-3
  • , , : .
  • , , , .
  • , — - .
  • , : , , - (, ).

, .



-, - Elasticsearch ( ).


- 18 — , , .


: Podman , one-cloud. 2 , 2.0Ghz v4 , .


:




:


  • 3-4 VIP - Graylog, , .
  • VIP LVS.
  • Graylog, GELF, syslog.
  • Elasticsearch.
  • , , -.



, , .


Elasticsearch — master, coordinator, data node. , .


Master
, , , cluster wide housekeeping.


Coordinator
- : . , , , master, , .


Data node
, , , .


Graylog
- Kibana Logstash ELK-. Graylog UI . Graylog Kafka Zookeeper, Graylog . Graylog (Kafka) Elasticsearch , . Logstash, Graylog Elasticsearch.


, Graylog service discovery, Elasticsearch , .


:



. , .



, , , .


: Elasticsearch data nodes.


— , Elasticsearch. , Lucene index. Lucene index, , .



, «» -.


, ( ) -. -, , replication factor, ( ). , -, , , , .


30 .


:



- — . — primary-, . — replica-. -.


, -. , , , :



, .. , 48 ( : 48 ).


:


- , , , , . “” . “ ” , .


, - , . - , - . .


latency , SSD. , , 56 . 56 - , , Elasticsearch . Elasitcsearch thread pool , - " — ".


, - 20 , 1 360 . , 48 , 15 . 2 .



, .


, Graylog - . , 2-3 .


, Graylog , : « , — ».


Master : « 71», -, primary-shard 71.


replica-shard, -.



Graylog . , Elasticsearch round-robin primary-shard replica-shard.



180 , , , , «» -. , , , .


48 300-400ms, , leading wildcard.


«» Elasticsearch: Java



, , .


, Elasticsearch Java.



, Lucene, background job', Lucene . , OutOfMemoryError-. , , , .


, Lucene- . . ( heap.size RAM), - off-heap , - ~500MB, .


: RAM , , .



4-5 , - 10-20.


, , off-heap Elasticsearch . , direct buffer pools , , explicit GC Elasticsearch.


- , . .


: Java . 16 (-XX:MaxDirectMemorySize=16g), , explicit GC , , .



, «, » , .


, mmapfs, . , mmapfs , mapped-. - , GC safepoint, , . , master , . 5-10 garbage collector, , . “, ” - .


, niofs, , Elastic , hybridfs, . .



, . 2-3 , .


Full GC, - , . GC : , , , — .


, , - , . , , .


, , - , - Elasticsearch, , .


, , , . GC , . GC, , .


, , — JDK13 Shenandoah. , .


Java .


«» Elasticsearch:



, , - .


: - «» , , Graylog es_rejected_execution.


- , thread_pool.write.queue - , Elasticsearch , 200 . Elasticsearch . - .


, : 300 , , Full GC.


, , , Graylog , , 3 , . , , Elasticsearch , , ( ), , , .


, - - , Elastic, - — - Graylog.


, , : , , .


. , , , , , , .


, Elasticsearch , - round-robin (, primary-shard, , ), replica-shard, . , use_adaptive_replica_selection: true.


:



query time , .


, -.


:


  • - master, .
  • .
  • : - - primary-, replica- -, .
  • , , .

, - :



:



?


- .


?


, TaskBatcher, , . , replica primary, - - — TaskBatcher, .


- , - - « - - -».


- , . , , . , , .


, - , full GC. - , , .


, 6.4.0, , 10 - 360 , .


:



6.4.0, , - . «» . : 2, 3 10 ( , ) -, - , , , B, C, D.


- - - , - 20-30 , - .


, , , « » . , , 7.2.


, - , , , , - primary-shard ( replica-shard - primary, ).


, «», - stale . , , - , -, - - . .


- 5 . .


:


  • 360 - 700 .
  • 60 -.
  • 40 , 6.4.0 — -, ,
  • , .
  • heap.size, 31 : , leading wildcard - , circuit breaker Elasticsearch.
  • , , , .


, , :


  • - , , - . - - , 2-3 , 2, 3, 4 — , - , .
  • , pending-. , , - , , , replica- primary, - .
  • garbage collector, .
  • , , « ».
  • , heap, RAM I/O.

Thread Pool Elasticsearch. Elasticsearch , , thread_pool.management. , , _cat/shards , . , , thread_pool.management , , 5 , , .


: ! , .


, - , , , , .



All Articles