“ Hadoop。ZooKeeper”来自Mail.Ru Group Technostream系列文章“ Hadoop中大数据的分布式处理”

我建议您熟悉“ Hadoop中的大量数据的分布式处理方法”系列中的讲座“ Hadoop。ZooKeeper”的解码。


什么是ZooKeeper,它在Hadoop生态系统中的位置。关于分布式计算的真相。标准分布式系统的方案。协调分布式系统的复杂性。典型的协调问题。ZooKeeper设计中体现的原理。ZooKeeper数据模型。Znode标志。会议。客户端API 基元(配置,组成员身份,简单的锁定,组长选举,没有成群效应的锁定)。ZooKeeper体系结构。ZooKeeper数据库。ZAB。请求处理程序。




今天让我们谈论ZooKeeper。这东西很有用。与任何Apache Hadoop产品一样,它也带有徽标。它描绘了一个男人。


, , , . . - - . . ZooKeeper – , . , , .


, , . , MapReduce , , . , , . MapReduce - , . MapReduce , , , . , ZooKeeper.


, Hadoop, Yahoo! Apache. HBase. JIRA HBase, , - , . . . ZooKeeper, , , , . , Hadoop. , , , .



, - , . , , . , , ZooKeeper, . . – , . HDFS, MapReduce , . , ZooKeeper. - , .



? , , , . , , , , , , - , . TCP, , . TCP . . - . . , , . , - , , . .


, , latency. , . Latency – - , , .


. – . -, , , . , . . . , , . .


. , - . , - Hadoop. . , . , - , , . - . .


, , -, . , , cat . – Vim . , . Vim , , , - . , , .



, .


, , , . – ? , . , ? , , - . - , , - , - . , -, - , ?


, . , – , - .


, , , - , – ? - race condition, , , - ? - . .


– . , , , .


, , , , . , , , . - , , , . , , . . . . - , , .


ZooKeeper. – , , , .



, . , – HDFS, HBase. -, , slave-. , , , .



– Coordination Service, . . , - backup stanby , . , . backup. - , , backup. Coordination Service. , .



– , . , . - , . , - slaves, , , . .



, , -slaves, . , . , Cassandra, .


, . .



- , , , , . – , , , . - , . . .



(), , , .



– partial failures. , , - , , , , , . . .


, . – , . . , .


ZooKeeper , .



, , , , , . , shared-nothing. , , , , , .


, shared memory . context switch, . . , , .



, , , . - . , , , , , , . , , , , . , .


. , , , , . . , .


. , , .


? , ? , , , . -, , - , - . - , , - . , . . , , .


– group membership. - , – , . , , , , . , , .


– leader election, , . – , - , . , , . , , , .


– mutually exclusive access. . , , - , , , . - . , , -. , , locks.


ZooKeeper . , .



. - , - . , , - . .


.


- , , .



ZooKeeper . – standalone, . . , . – 100 , , 100 . , ZooKeeper. high availability. ZooKeeper . , . , . – , , – . , .


. , , . .


, - . - . , , , . . – . – -? . watch mechanism, , - . .



Client – , ZooKeeper.


Server – ZooKeeper.


Znode – ZooKeeper. znode ZooKeeper , .


. – update/write, - . .


, , , ZooKeeper.



ZooKeeper . , . , . znodes.


znode - , , , 10 . znode - .



Znode . . znode , .


. – ephemeral flag. Znode - . , . , . , - . , .


– sequential flag. znode. , 1_5. , p_1, – p_2. , , , , – sequential.


znode. , .



– watch flag. , - . , . ZooKeeper , . , - . , - , .


, . , , .



. , , . - .


- . , - . , , .



, API . , , create znode . znode, . - , . . znode.


– . , , znode . , znode, , , , .


, «-1».



– znode. true, , false.


flag watch, .


flag , . .


getData. , znode . flag watch. , . , , .



SetData. version. , znode .


«-1», .


getChildren. znode, . , flag watch.


sync , , .


, , , write, - , , , . , , , , .



. . , .


, , update/write, . create, setData, sync, delete. read – exists, getData, getChildren.



, . , -. . , . . ZooKeeper? , ? , , , , ?


ZooKeeper . , znode. , , . , . , , .


getData . true. - , , , , . , - , true. , .


SetData. , «-1», . . , , , . , . , , , . , , , - . , , . , , . .



group membership. , , . , . -, - , .


? workers create. . sequential , . children , , .




, Java-. , main. , . host, , . . . – .


? API, . . ZooKeeper. hosts. , , 5 . , connectedSignal. , . , - . persistent. , , . . . , . , close , . , - , ZooKeeper .



- ? . , - , . , , lock1. , , lock . , getData , . . , watcher , , . . , lock, , lock , , . . . , . lock , , - .



, . , . ? , . . , . . - , . , .


- , , , . , . - , , .


, , herd effect, . . , , , , , .



, . , lock, hert effect. , id lock. , lock , , , , lock. , lock. , id, lock, . , .


id, lock, watcher , - . . . lock. , id lock, . , lock, - .



ZooKeeper? 4 . – Request. ZooKeeper Atomic Broadcast. Commit Log, . In-memory Replicated DB, . . , .


, Request Processor. In-memory .



. instances ZooKeeper .


, Commit log. , , , log . .



ZooKeeper Atomic Broadcast – , .


ZAB ZooKeeper. , - . , . , . , , , . . broadcasting , .


write request. , transactional update. .


这里值得一提的是,可以保证同一操作的更新是幂等的。这是什么?如果执行两次,它将具有相同的状态,即请求本身不会因此改变。并且您需要这样做,以便在跌倒的情况下可以重新启动操作,从而滚动当前已掉落的更改。在这种情况下,系统的状态将变得相同,也就是说,不应使一系列相同的状态(例如更新过程)导致系统的最终状态不同。









Source: https://habr.com/ru/post/undefined/


All Articles