animesh kumar

Running water never grows stale. Keep flowing!

Cassandra – Data Model

with 7 comments

[tweetmeme source=”anismiles” only_single=false

Cassandra is a completely different concept for a database. And if you are coming from the RDBMS background, you sure are going to have a tough time understanding the fundamentals, for example, you won’t find anything like tables, columns, constraints, indexes, queries etc. at least not in the since sense of relational databases. Cassandra has an altogether different approach toward DataModels.


The column is the lowest/smallest data container in Cassandra. It’s a tuple (triplet) that contains a name, a value and a timestamp.

user: {
    name: user,
    value: animesh.kumar,
    timestamp: 98989L

Name and Value are both binary (technically byte[]) and can be of any length.


To understand SuperColumn, try to look at it as a tuple which instead of containing binary values, contains a Map of unbounded Columns.

homeAddress: {
    name: homeAddress,
    value: {
        street: {
            timestamp: 98989L
        city: {
            timestamp: 98989L
            value:Bombay Hospital,
            timestamp: 98989L
            timestamp: 98989L

SuperColumn can be summarized as a column of Columns. It has got a Map styled containers to hold unbounded number of Columns (key has to be the same as the name of the Column). Also notice that SuperColumns don’t have a timestamp component.


Now, since we got two elementary DataModels, i.e. Column and SuperColumn, we need some mechanisms to hold them together, or group them.

Column Family

ColumnFamily is a structure that can keep an infinite number of rows – for most people with an RDBMS background – which is the structure that resembles a Table the most.

A ColumnFamily has:

  1. A name (think of the name of a Table),
  2. A map with a key (Row Identifier, like Primary Key) and
  3. A value which is a Map containing Columns.

For the Map with the columns: the key has to be the same as the name of the Column.

Profile = {

SuperColumn Family

SuperColumnFamily is easy after you have gotten through ColumnFamily. Instead of having Columns in the inner most Map we have SuperColumns. So it just adds an extra dimension.

AddressBook = {
    smile.animesh: {  // 1st row.
        ashish: {
        papa: {
    }, // end row
    ashish: {     // 2nd row.


A Keyspace is the outer most grouping of the data. From an RDBMS point of view you can compare this to your database schema, normally you have one per application.

A Keyspace contains the ColumnFamilies. There is no relationship between the ColumnFamiliies. They are just separate containers. They are NOT like tables in MySQL – you can’t join them, neither can you enforce any constraint. They are just separate containers.

Update: May 19, 2010 at 6:12 pm IST

Okay! So we have learnt the basics. You might need some time before you can start thinking in Cassandra DataModel’s terms.
Anyhow, let’s revise what we learnt in brief:

  1. Column is the basic data holding entity,
  2. SuperColumn contains a Map of Columns,
  3. ColumnFamily is where Cassandra stores all Columns; it loosely resembles databases’ Table.
  4. SuperColumn Family is just a ColumnFamily of SuperColumns.

Phew! It hasn’t yet got digested fully. It will take some time. 🙂

Next thing to learn about Cassandra is that it does NOT have any SQL like query features, so you can NOT sort the data when you are fetching it. Rather Cassandra sorts the data as soon as the data is put into the Cassandra clusters, and always remains sorted. Columns are sorted by their names, and the mechanism of sorting can be defined and controlled at the ColumnFamily’s definition, using “CompareWith” attribute.

Cassandra comes with following sorting options, though you can write your own sorting behavior it you need.

  1. BytesTypeSimple sort by byte value. No validation is performed.
  2. AsciiType:   Like BytesType, but validates that the input can be parsed as US-ASCII.
  3. UTF8TypeA string encoded as UTF8
  4. LongTypeA 64bit long
  5. LexicalUUIDType: A 128bit UUID, compared lexically (by byte value)
  6. TimeUUIDType: A 128bit version 1 UUID, compared by timestamp

Let’s try to understand it using some examples. Let’s say we have a raw Column set, i.e. which is not yet stored in Cassandra.

9:  {name: 9,  value: Ronald},
3:  {name: 3,  value: John},
15: {name: 15, value: Eric}

And, suppose that we have a ColumnFamily with UTF8Type sorting option.

<ColumnFamily CompareWith="UTF8Type" Name="Names"/>

Then, Cassandra will sort like,

15: {name: 15,  value: Eric},
3:  {name: 3,    value: John},
9:  {name: 9,   value: Ronald}

And with another ColumnFamily with LongType sorting option,

<ColumnFamily CompareWith="LongType" Name="Names"/>

Result will be like,

3:  {name: 3,  value: John},
9:  {name: 9,  value: Ronald},
15: {name: 15, value: Eric}

The same rules of sorting also get applied to SuperColumns. However, in this case we also need to specify a second sorting rule using the "CompareSubcolumnsWith" attribute for internal Columns’ sorting behavior.

For example consider following definition:

<ColumnFamily ColumnType="Super" CompareWith="UTF8Type"
CompareSubcolumnsWith="LongType" Name="Posts"/>

In this case, SuperColumns will be sorted by UTF8Type policy, and Columns by LongType policy.

If your need asks for custom sorting policies, you can easily write one.

  1. Create a Class extending org.apache.cassandra.db.marshal.AbstractType class.
  2. Package this class in a Java Archive and add it to the /lib folder of your Cassandra installation.
  3. Specify the fully qualified classname in the CompareSubcolumnsWith or CompareWith attribute. That’s it.

So, that’s all about Cassandra DataModels. Now, in our next step, we will write Cassandra client and see how deep the rabbit hole goes!


Written by Animesh

May 18, 2010 at 6:12 am

Posted in Technology

Tagged with , , ,

7 Responses

Subscribe to comments with RSS.

  1. […] Lazy-Loading/Selective-Loading of entity properties and many more. Possibly related posts: (automatically generated)Connecting to Cassandra – 1Cassandra – Data Model […]

  2. […] This post was mentioned on Twitter by Abdiel Sanchez, socmediadev. socmediadev said: Cassandra – Data Model: #apache #cassandra #database […]

  3. Great article.


    August 5, 2011 at 7:26 pm

  4. very nice explanation… thanks


    October 18, 2012 at 9:58 am

  5. Fantastic explanation…


    April 2, 2013 at 10:58 pm

  6. Nice blog it helps me … But i am very new to cassandra . SO please update full examples on spring+cassandra which includes jpa+controller+service+serviceimpl+dao+daoimpl +project hierarchy.

    Thank u in advance..


    January 22, 2014 at 4:40 pm

  7. Too Good for Newbie ………………..

    prakash gatade

    February 3, 2014 at 3:15 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: