ZooKeeper

Introduction

ZooKeeper - a service for coordinating process of distributed applications. It's actually a key/value table with hierarchical keys. But it's not designed to general data storage purposes, but still some clients information could be stored there.

How it actually tries to achieve all it? By incorporating elements from group messaging, shared registers and distributed lock services into a replicated, centralized service. To combine all these features into one, ZooKeeper uses wait-free shared registered and FIFO execution guarantees with an event-driven mechanism, so it could be still simple and powerful at the same time.

The idea fo ZooKeeper was mainly inspired by Chubby, locking service with strong synchronization guarantees. Meanwhile, ZooKeeper was developed avoiding blocking primitives, locks. This system implements an API that manipulates simple wait-free data objects , organized hierarchically as in file system.

In the system consensus could be implemented for any number of processes, only if the guarantee for both, FIFO client ordering of all operations (implemented by simple pipelined architecture) and linearizable writes (guaranteed by leader-based atomic broadcast protocol - Zab), would be provided. One of the advantages of ZooKeeper, in comparison to Chubby, is that former uses watches to keep track on the updates, so it avoid blocking updates to invalidate the cache, like they do in Chubby.

ZooKeeper Service

  • Znodes - abstraction for clients; set of data nodes, organized into a hierarchical name space.
    • Regular. Client can create and delete them.
    • Ephemeral. Client can create. They will be deleted automatically by the system when the session is over.
  • ZooKeeper implements watches to allow clients to receive updates on the preferred directory/znode without requiring polling.
  • Methods within the system could be both synchronous and asynchronous, depends on the operation performed by client (one operation or multiple ones respectively).
  • Guarantees.
    • Linearizable writes.
    • FIFO client order.
    • Handling shared configuration updates.
    • Liveness and durability

ZooKeeper usage

Two features introduced in ZooKeeper allows it to cover thing mentioned below:

  • Two types of nodes: regular and ephemeral.
  • Watches.

Where exactly ZooKeeper could be implemented?

  • Configuration Management
    For dynamic configuration in a distributed application, Zookeeper could be used. Let's assume that configuration for some process is stored on one of the znode and watch is set to true for this path, so each time it's updated, the process will be notified.
  • Rendezvouz
    In order to keep track on not clear a priory configuration, znode could be created. This node will collect all the information about final configuration, made by scheduler, and notify the client.
  • Group Membership
    Ephemeral nodes should be used for group membership deployment. Each time, when a process member of the group starts, it creates a ephemeral node under some znode (which represent the group). The process could store in this child all necessary information: ports, addresses. When the process fails or ends, the znode that represent the process and located under parent node, will be deleted automatically. Finally, if the process want to be notified about any changes in the group, it should simply put watch to the parent folder of the group.
  • Simple Locks
    It's not a lock service, but still it could be used as one. The lock is represented by the lock file. To acquire the lock ephemeral node should be created under the lock node. If this child node is created, the lock is acquired. Releasing the lock will delete the child node, and all nodes with watch will be notified of the free critical section. After this, all waiting nodes will try to acquire the section - create the child ephemeral node.
  • Simple Locks without Herd Effect
    It's the implementation of the simple lock but with arrival order allowance: so each request will be served in the arrival order. Notification about lock release will be received only by the node watching this event.
  • Read/Write Locks
    Write lock is similar to the previous lock, when read one is slightly different, as this lock could be shared by several clients. So that if there are some write lock, read lock could be prevented. If there are only read locks, new read lock could be easily obtained. Herd effect could be introduced, as a few nodes could wait for the read lock, but this is actually we want to achieve.
  • Double Barrier
    It's implemented through creating nodes under some defined parent. Entering the barrier will be specified by adding a child to the specific folder by process, and deleting the child by a process will assure that this process finished the computation. The exit the barrier process should be notified that no children are in the parent folder already. Barrier is started when all necessary processes are registered the child node.

ZooKeeper Applications

  • The Fetching Service
    Configuration metadata to prevent consequences of master failure, just failures. Also it could be used for leader election.
  • Katta
    It's a distributed indexer. ZooKeeper used for group membership and leader election. Also used for the configuration management.
  • Yahoo! Message Broker
    It is a distributed publish-subscribe system. ZooKeeper here used for the configuration metadata, failure detection and group membership.

ZooKeeper Implementation

The main features of the implementation are following:

  • Request Processor
  • Atomic Broadcast
    Based on Zab usage.
  • Replicated Database
  • Client-Server Interactions

Related Work

  • Chubby
    It manages advisory locks for the distributed applications. It is a lock service. It allowes client to connect only to the master server.
  • ISIS
    It used for providing fault-tolerance.
    • Horus
    • Ensemble
    • Totem
  • State-machine Replication and Paxos
  • Boxwood
    It is a distributed lock servers and based on Paxos.
  • Sinfonia
    Mini-transactions for scalable distributed systems is the main goal of this system.
  • DepSpace
    Main goal is to provide Byzantine fault-tolerance.

Summary and Critique

ZooKeeper uses wait-free protocols to implement process coordination in distributed systems. Main goal of the system is its lack of locks, so that performance of the system could be improved significantly. Due to some features (like watches and different types of the nodes), ZooKeeper support following applications:

  • Configuration Management
  • Leader Election
  • Group Membership

The weakness of the ZooKeeper is that changes happened are dropped. So there could be a chance that during the time between getting the event and setting the watch multiple changes in the object could be happen. So there is a chance that not all changes will be tracked, but only the last ones. This actually means that ZooKeeper is a state based system, not an event system.

Another drawback of ZooKeeper is that an application state machine is tied to the ZooKeeper state machine. So when ZooKeeper server dies, application have to reestablish all the watches on the new server.

Also, complete replication within the ZooKeeper limits the total size of data that could be managed by ZooKeeper. As well, serializing all updates through a single leader could be a possible performance bottleneck.

zookeeper_research/flavio_paper_review_zookeeper.txt · Last modified: 2012/05/18 21:30 by julia
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki