Wednesday, July 18, 2012

Cassandra Architecture - Data Partitioning

Before starting a Cassandra cluster, you must choose how the data will be divided across the nodes in the cluster. This involves choosing a partitioner for the cluster. Cassandra uses a ring architecture. The ring is divided up into ranges equal to the number of nodes, with each node being responsible for one or more ranges of the overall data. Before a node can join the ring, it must be assigned a token. The token determines the node’s position on the ring and the range of data it is responsible for.
Once the partitioner is chosen it is unlikely to change the configuration choice without reloading all the data. This makes it very important to choose and configure the correct partitioner before initializing the cluster.
The important distinction between the partitioners is order preservation (OP). Users can define their own partitioners by implementing IPartitioner, or they can use one of the native partitioners.

Random Partitioner
RandomPartitioner  is the default choice for cassandra as it uses an MD5 hash function to map keys into tokens. These keys will evenly distribute across the clusters. The row key determines where the node placement.  Consistent hashing algorithm used by Random partioning ensures that when nodes are added to the cluster, the minimum possible set of data is affected. The hashing algorithm creates an MD5 hash value of the row key ranging from 0 to 2*127. Then nodes in the cluster are assigned a token that represents the hash value in the above mentioned range. This value determines the row keys to be placed in the node. For e.g the below given row with row key ‘Prajeesh’ is assigned a hash key like 98002736AD65AB which determines the node that holds the range to store the row.
Scrum Master

Notice that the keys are not in order. With RandomPartitioner, the keys are evenly distributed across the ring using hashes, but you sacrifice order, which means any range query needs to query all nodes in the ring.

Ordered Preserving Partitioners
The Order Preserving Partitioners preserve the order of the row keys as they are mapped into the token space. This allows range scans over rows, meaning you can scan rows as though you were moving a cursor through a traditional index. For example, if your application has user names as the row key, you can scan rows for users whose names fall between Albert and Amy. This type of query would not be possible with randomly partitioned row keys, since the keys are stored in the order of their MD5 hash (not sequentially).
An advantage of using OPP is that the range queries are simplified since the query need not consult each node in the ring the fetch the data. It can directly visit the node based on the order of row keys.
A disadvantage of using OPP is that the ring becomes unstable over a time if your application tends to write or update a sequential block of rows at a time, then the writes will not be distributed across the cluster, putting it all to a node. This makes one node holding more data than the rest disturbing the even distribution of data across nodes. 

No comments: