ZooKeeper – Primer
[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D
Distributed collaborative applications involve a set of processes or agents interacting with one another to accomplish a common goal. They execute on Wide Area environments with little or no knowledge of the infrastructure and almost no control over the resources available. Besides, they need to sequence and order events, and ensure atomicity of actions. Above all, the application needs to keep itself from nightmarish bugs like race conditions, deadlocks and partial failures.
ZooKeeper helps to build a distributed application by working as a coordination service provider.
It’s reliable and highly available. It exposes a simple set of primitives upon which distributed applications can build higher level services for
- Configuration Maintenance,
- Leader elections and other niche needs.
What lies beneath?
ZooKeeper maintains a shared hierarchical namespace modeled after standard file systems. The namespace consists of data registers, called znodes. They are similar to files and directories.
Note: Znodes store data in Memory primarily, with a logged backup on disk for reliability. It means that whatever data znodes can keep must fit into memory, hence it must be small, max to 1MB. On the other hand, it means high throughput and low latency.
Znodes are identified by unique absolute paths which are “/” delimited Unicode strings. To help achieve uniqueness, ZooKeeper provides sequential znodes where a globally maintained sequence number will be appended by ZooKeeper to paths, i.e. path “/zoo-1/tiger/white-” can be assigned with a sequence, say 5, and will become “/zoo-1/tiger/white-5”.
- A client can create a znode, store up to 1MB of data and associate as many as children znodes as it wants.
- Data access to and fro a znode is always atomic. Either the data is read and/or written in its entirety or it fails.
- There are no renames and no append semantics available.
- Each znode has an Access Control List (ACL) that restricts who can do what.
- Znodes maintain version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates.
Znodes can be one of two types: ephemeral and persistent. Once set, the type can’t be changed.
- Ephemeral znodes are deleted by ZooKeeper when the creating client’s session gets closed, while persistent znodes stay as long as not deleted explicitly.
- Ephemeral znodes can’t have children.
- Both types of znodes are visible to all clients eligible with ACL policy.
Up and Running
There are enough literature on installing ZooKeeper on Linux machine already. So, I am going to focus how to install ZooKeeper on Windows machines.
- Download and install Cygwin. http://www.cygwin.com/
- Download stable release of ZooKeeper. http://hadoop.apache.org/zookeeper/releases.html
- Unzip ZooKeeper to some directory, say, D:/iLabs/zookeeper-3.3.1
- Add a new environment variable ZOOKEEPER_INSTALL and point it to D:/iLabs/zookeeper-3.3.1
- Edit PATH variable and append $ZOOKEEPER_INSTALL/bin to it.
- Now start Cygwin.
Now, start ZooKeeper server.
$ zkServer.sh start
ouch! It threw an error:
ZooKeeper exited abnormally because it could not find the configuration file, zoo.cfg, which it expects in
$ZOOKEEPER_INSTALL/conf directory. This is a standard Java properties file.
Go ahead and create zoo.cfg file in the conf directory. Open it up, and add below properties:
# The number of milliseconds of each tick tickTime=2000 # The directory where the snapshot is stored. dataDir=D:/iLabs/zoo-data/ # The port at which the clients will connect clientPort=2181
Go back to Cygwin, and issue the same command again. This time ZooKeeper should load properly.
Now, connect to ZooKeeper. You should probably open a new Cygwin window, and issue the following command.
This will connect to your ZooKeeper server running at localhost:2181 by default, and will open zk console.
Let’s create a znode, say /zoo-1
[zk: localhost:2181<CONNECTED> 1] create -s /zoo-1 “Hello World!” null
Flag –s creates a persistent znode. Hello World! is the data you assign to znode (/zoo-1) and null is its ACL.
To see all znodes,
[zk: localhost:2181<CONNECTED> 2] ls / [zoo-1, zookeeper]
This means, there are 2 nodes at the root level, /zoo-1 and /zookeeper. ZooKeeper uses the /zookeeper sub-tree to store management information, such as information on quotas.
For more commands, type help. If you want to further explore on the command line tools, refer: http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html