"Hadoop. ZooKeeper ”from the Mail.Ru Group Technostream seriesβ€œ Distributed Processing of Large Data in Hadoop ”

I suggest that you familiarize yourself with the transcript of the lecture "Hadoop. ZooKeeper" from the series "Methods of distributed processing of large amounts of data in Hadoop"


What is ZooKeeper, its place in the Hadoop ecosystem. The truth about distributed computing. Scheme of a standard distributed system. The complexity of coordinating distributed systems. Typical coordination problems. The principles embodied in the design of ZooKeeper. ZooKeeper data model. Znode flags. Sessions. Client API Primitives (configuration, group membership, simple locks, leader election, locking without herd effect). ZooKeeper Architecture. ZooKeeper DB. ZAB. Request handler.




Today let's talk about ZooKeeper. This thing is very useful. It, like any Apache Hadoop product, has a logo. It depicts a man.


, , , . . - - . . ZooKeeper – , . , , .


, , . , MapReduce , , . , , . MapReduce - , . MapReduce , , , . , ZooKeeper.


, Hadoop, Yahoo! Apache. HBase. JIRA HBase, , - , . . . ZooKeeper, , , , . , Hadoop. , , , .



, - , . , , . , , ZooKeeper, . . – , . HDFS, MapReduce , . , ZooKeeper. - , .



? , , , . , , , , , , - , . TCP, , . TCP . . - . . , , . , - , , . .


, , latency. , . Latency – - , , .


. – . -, , , . , . . . , , . .


. , - . , - Hadoop. . , . , - , , . - . .


, , -, . , , cat . – Vim . , . Vim , , , - . , , .



, .


, , , . – ? , . , ? , , - . - , , - , - . , -, - , ?


, . , – , - .


, , , - , – ? - race condition, , , - ? - . .


– . , , , .


, , , , . , , , . - , , , . , , . . . . - , , .


ZooKeeper. – , , , .



, . , – HDFS, HBase. -, , slave-. , , , .



– Coordination Service, . . , - backup stanby , . , . backup. - , , backup. Coordination Service. , .



– , . , . - , . , - slaves, , , . .



, , -slaves, . , . , Cassandra, .


, . .



- , , , , . – , , , . - , . . .



(), , , .



– partial failures. , , - , , , , , . . .


, . – , . . , .


ZooKeeper , .



, , , , , . , shared-nothing. , , , , , .


, shared memory . context switch, . . , , .



, , , . - . , , , , , , . , , , , . , .


. , , , , . . , .


. , , .


? , ? , , , . -, , - , - . - , , - . , . . , , .


– group membership. - , – , . , , , , . , , .


– leader election, , . – , - , . , , . , , , .


– mutually exclusive access. . , , - , , , . - . , , -. , , locks.


ZooKeeper . , .



. - , - . , , - . .


.


- , , .



ZooKeeper . – standalone, . . , . – 100 , , 100 . , ZooKeeper. high availability. ZooKeeper . , . , . – , , – . , .


. , , . .


, - . - . , , , . . – . – -? . watch mechanism, , - . .



Client – , ZooKeeper.


Server – ZooKeeper.


Znode – ZooKeeper. znode ZooKeeper , .


. – update/write, - . .


, , , ZooKeeper.



ZooKeeper . , . , . znodes.


znode - , , , 10 . znode - .



Znode . . znode , .


. – ephemeral flag. Znode - . , . , . , - . , .


– sequential flag. znode. , 1_5. , p_1, – p_2. , , , , – sequential.


znode. , .



– watch flag. , - . , . ZooKeeper , . , - . , - , .


, . , , .



. , , . - .


- . , - . , , .



, API . , , create znode . znode, . - , . . znode.


– . , , znode . , znode, , , , .


, Β«-1Β».



– znode. true, , false.


flag watch, .


flag , . .


– getData. , znode . flag watch. , . , , .



SetData. version. , znode .


Β«-1Β», .


– getChildren. znode, . , flag watch.


sync , , .


, , , write, - , , , . , , , , .



. . , .


, , update/write, . create, setData, sync, delete. read – exists, getData, getChildren.



, . , -. . , . . ZooKeeper? , ? , , , , ?


ZooKeeper . , znode. , , . , . , , .


getData . true. - , , , , . , - , true. , .


SetData. , Β«-1Β», . . , , , . , . , , , . , , , - . , , . , , . .



– group membership. , , . , . -, - , .


? workers create. . sequential , . children , , .




, Java-. , main. , . host, , . . . – .


? API, . . ZooKeeper. hosts. , , 5 . , connectedSignal. , . , - . persistent. , , . . . , . , close , . , - , ZooKeeper .



- ? . , - , . , , lock1. , , lock . , getData , . . , watcher , , . . , lock, , lock , , . . . , . lock , , - .



, . , . ? , . . , . . - , . , .


- , , , . , . - , , .


, , herd effect, . . , , , , , .



, . , lock, hert effect. , id lock. , lock , , , , lock. , lock. , id, lock, . , .


id, lock, watcher , - . . . lock. , id lock, . , lock, - .



ZooKeeper? 4 . – Request. ZooKeeper Atomic Broadcast. Commit Log, . In-memory Replicated DB, . . , .


, Request Processor. In-memory .



. instances ZooKeeper .


, Commit log. , , , log . .



ZooKeeper Atomic Broadcast – , .


ZAB ZooKeeper. , - . , . , . , , , . . broadcasting , .


write request. , transactional update. .


And here it is worth noting that the idempotency of updates for the same operation is guaranteed. What it is? This thing, if performed twice, it will have the same state, i.e. the request itself will not change from this. And you need to do this so that in case of a fall you can restart the operation, thereby rolling the changes that have fallen off at the moment. In this case, the state of the system will become the same, that is, it should not be such that a series of the same, for example, update processes, leads to different final states of the system.









Source: https://habr.com/ru/post/undefined/


All Articles