Apache Zookeeper is a well-known framework for providing primitives for distributed consensus. It provides features such as the following:
- Recipes that allow a set of nodes in a distributed system to conduct elections for leadership, to partition work in an equitable manner, and so on.
- It can be used as a highly-available state store for persisting small amounts of metadata. Zookeeper, like HBase, is strongly consistent and accepts a write only once it's been safely persisted to the majority of nodes in the Zookeeper cluster.
- Zookeeper implements a heartbeat mechanism between the Zookeeper client and the server nodes. So, if a Zookeeper client instance stops sending a heartbeat, the server will consider that client instance to be dead.
- Zookeeper allows clients to set watches on specific events of interest. When those specific events of interest come to pass, interested clients can be notified.
- If a given Zookeeper client instance's heartbeat stopped, depending on how things are set up, it's possible for other client instances to be notified so that corrective action can be taken.
- Zookeeper represents state variables as nodes in a logical file system. These nodes are called znodes. Watches can be kept on any znode. When there is any change to the znode, all client instances that are maintaining watches on it will be notified.
HBase uses Zookeeper for a number of things:
- The state of all regions, the key range they represent, and the RegionServers they are assigned to are kept in Zookeeper
- All RegionServers in the cluster, both the ones that are active and the ones that appear dead, are tracked in Zookeeper
- Various other pieces of metadata related to replication, region splits, merges, and transitions are also kept track of in Zookeeper, and are leveraged heavily for various operations in HBase