Posts Tagged ‘NoSql’
Fiddling with Cassandra 0.7-beta2
[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D
I have been dilly-dallying with Cassandra 0.7 for quite some time. My intensions were to build Cassandra 0.7 support into Kundera (a JPA 1.0 compliant ORM library to work with Cassandra). I must admit that often times I was very upset about the lack of documentation on Cassandra and libraries that I had planned to use, Pelops and Hector. So I decided that I should post my findings for your help.
Now since Cassandra 0.7 beta-2 has been released, I will concentrate my talk around this release.
Installing Cassandra 0.7
- Download 0.7.0-beta2 (released on 2010-10-01) from here: http://cassandra.apache.org/download/
- Extract the jar to some location say, D:\apache-cassandra-0.7.0-beta2
- Set CASSANDRA_HOME environment variable to D:\apache-cassandra-0.7.0-beta2
- You can also update you PATH variable to include $CASSANDRA_HOME/bin
- Now, to start the server you would need to run this command:
> cassandra -start
That’s it.
Okay, since you’ve gotten the basics right. I would like to tell you few important things about this new Cassandra release.
- Unlike .6.x versions, 0.7.x employs YAML instead of XML, that is, you are going to find cassandra.yaml instead of storage-conf.xml.
- 0.7 allows you to manage entire cluster, Keyspaces, Column Families everything from Thrift API.
- There is also support for Apache Avro. (I haven’t explored this though, so no more comment)
- 0.7 comes with secondary index features. What does it mean? It means, you can look for your data not just by Row Identifier, but also by Column Values. Interesting huh?
If you look into cassandra.yaml, you will find a default Keyspace1 and few Column Families too, but Cassandra doesn’t load them. I am not sure why. Theoretically, everything defined in the yaml file should have been created at the start. I am going to dig around this. Anyways for now, let’s create some Keyspaces and few Column Families ourselves. We can use Thrift API (and Cassandra client which uses Thrift itself) or JMX interface.
Dealing with Cassandra Client
Cassandra comes with a command line interface tool cassandra-cli. This tool is really really impressive. You should certainly spend some time with it.
- Start the client,
> cassandra-cli
- Connect to server,
> [default@unknown] connect localhost/9160
-
Create a new keyspace, (I picked this up from cassandra.yaml)
> [default@unknown] create keyspace Keyspace1 with replication_factor=1 -
Create Column Families,
> [default@unknown] use Keyspace1
> [default@Keyspace1] create column family Standard1 with column_type = ‘Standard’ and comparator = ‘BytesType’ -
Describe keyspace,
> [default@Keyspace1] describe keyspace Keyspace1
And so on. Use ‘help’ to learn more about cassandra-cli.
JConsole
As I mentioned above, you can also use JMX to check what Keyspaces and Column Families exist in your server. But there is a little problem. Cassandra does not come with the mx4j-tools.jar, so you need to download and copy this jar to Cassandra’s lib folder. Download it from here: http://www.java2s.com/Code/Jar/MNOPQR/Downloadmx4jtoolsjar.htm
Now, just run ‘jconsole’ and pick ‘org.apache.cassandra.thrift.CassandraDaemon’ process.
Java clientèle
Well, there are two serious contenders, Pelops and Hector. Both have released experimental support for Version 0.7. I had worked with Pelops earlier, so I thought this is time to give Hector a chance.
- Download Hector (Sync release with Cassandra 0.7.0-beta2) from here: http://github.com/rantav/hector/downloads
You can also use ‘git clone‘ to download the latest source. - Hector is a maven project. To compile the source into ‘jar’, just extract the release and run,
> mvn package
My first program
To start with Hector, I thought to write a very small code to insert a Column and then later fetch it back. If you remember, in the previous section, we already created a keyspace ‘Keyspace1‘ and a Column Family ‘Standard1‘, and not we are going to make use of them.
import me.prettyprint.cassandra.serializers.StringSerializer; import me.prettyprint.hector.api.Cluster; import me.prettyprint.hector.api.Keyspace; import me.prettyprint.hector.api.beans.HColumn; import me.prettyprint.hector.api.exceptions.HectorException; import me.prettyprint.hector.api.factory.HFactory; import me.prettyprint.hector.api.mutation.Mutator; import me.prettyprint.hector.api.query.ColumnQuery; import me.prettyprint.hector.api.query.QueryResult; public class HectorFirstExample { public static void main(String[] args) throws Exception { String keyspaceName = "Keyspace1"; String columnFamilyName = "Standard1"; String serverAddress = "localhost:9160"; // Create Cassandra cluster Cluster cluster = HFactory.getOrCreateCluster("Cluster-Name", serverAddress); // Create Keyspace Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster); try { // Mutation Mutator mutator = HFactory.createMutator(keyspace, StringSerializer.get()); // Insert a new column with row-id 'id-1' mutator.insert("id-1", columnFamilyName, HFactory.createStringColumn("Animesh", "Kumar")); // Look up the same column ColumnQuery columnQuery = HFactory.createStringColumnQuery(keyspace); columnQuery.setColumnFamily(columnFamilyName).setKey("id-1").setName("Animesh"); QueryResult> result = columnQuery.execute(); System.out.println("Read HColumn from cassandra: " + result.get()); } catch (HectorException e) { e.printStackTrace(); } } }
That was simple. By the way, ‘Nate McCall‘ has written a set of example classes to help us understand Hector with Cassandra 0.7. Check it out here: http://github.com/zznate/hector-examples
I am working towards introducing Cassandra 0.7 support in Kundera, and will be publishing my findings intermittently.
Kundera: now JPA 1.0 Compatible
[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D
If you are new to Kundera, you should read Kundera: knight in the shining armor! to get a brief idea about it.
Kundera has reached a major milestone lately, so I thought to sum up the developments here. First and foremost, Kundera is now JPA 1.0 compatible, thought it doesn’t support relationships yet, it does support easy JPA style @Entity declarations and Linear JPA Queries. 🙂 Didn’t you always want to search over Cassandra?
To begin with let’s see what the changes are.
- Kundera do not have @CassandraEntity annotation anymore. It now expects JPA @Entity.
- Kundera specific @Id has been replaced with JPA @Id.
- Kundera specific @Column has been replaced with JPA @Column.
- @ColumnFamily, @SuperColumnFamily and @SuperColumn are still there, and are expected to be there for a long time to come, because JPA doesn’t have any of these ideas.
- @Index is introduced to control indexing of an entity bean. You can safely ignore it and let Kundera do the defaults for you.
I would recommend you to read about Entity annotation rules discussed in the earlier post. Apart from the points mentioned above, everything remains the same: https://anismiles.wordpress.com/2010/06/30/kundera-knight-in-the-shining-armor/#general-rules
How to define an entity class?
@Entity // makes it an entity class @ColumnFamily("Authors") // assign ColumnFamily type and name public class Author { @Id // row identifier String username; @Column(name = "email") // override column-name String emailAddress; @Column String country; @Column(name = "registeredSince") Date registered; String name; public Author() { // must have a default constructor } // getters, setters etc. }
There is an important deviation from JPA specification here.
- Unlike JPA you must explicitly annotate fields/properties you want to persist. Any field/property that is not @Column annotated will be ignored by Kundera.
- In short, the paradigm is reversed here. JPA assumes everything persist-able unless explicitly defined @Transient. Kundera expects everything transient unless explicitly defined @Column.
How to instantiate EntityManager?
Kundera expects some properties to be provided with before you can bootstrap it.
# kundera.properties # Cassandra nodes to with Kundera will connect kundera.nodes=localhost #Cassandra port kundera.port=9160 #Cassandra keyspace which Kundera will use kundera.keyspace=Blog #Whether or not EntityManager can have sessions, that is L1 cache. sessionless=false #Cassandra client implementation. It must implement com.impetus.kundera.CassandraClient kundera.client=com.impetus.kundera.client.PelopsClient
You can define these properties in a java Map object, or in JPA persistence.xml or in a property file “kundera.properties” kept in the classpath.
- Instantiating with persistence.xml > Just replace the provider with com.impetus.kundera.ejb.KunderaPersistence which extends JPA PersistenceProvider. And either provide Kundera specific properties in the xml file or keep “kundera.properties” in the classpath.
- Instantiating in standard J2SE environment, with explicit Map object.
Map map = new HashMap(); map.put("kundera.nodes", "localhost"); map.put("kundera.port", "9160"); map.put("kundera.keyspace", "Blog"); map.put("sessionless", "false"); map.put("kundera.client", "com.impetus.kundera.client.PelopsClient"); EntityManagerFactory factory = new EntityManagerFactoryImpl("test", map); EntityManager manager = factory.createEntityManager();
- Instantiating in standard J2SE environment, with “Kundera.properties” file. Pass null to EntityManagerFactoryImpl and it will automatically look for the property file.
EntityManagerFactory factory = new EntityManagerFactoryImpl("test", null); EntityManager manager = factory.createEntityManager();
Entity Operations
Once you have EntityManager object you are good to go, applying all your JPA skills. For example, if you want to find an Entity object by key,
try { Author author = manager.find(Author.class, "smile.animesh"); } catch (PersistenceException pe) { pe.printStackTrace(); }
Similarly, there are other JPA methods for various operations: merge, remove etc.
JPA Query
Note: Kundera uses Lucene to index your Entities. Beneath Lucene, Kundera uses Lucandra to store the indexes in Cassandra itself. One fun implication of using Lucene is that apart from regular JPA queries, you can also run Lucene queries. 😉
Here are some indexing fundamentals:
- By default, all entities are indexed along with with all @Column properties.
- If you do not want to index an entity, annotate it like, @Index (index=false)
- If you do not want to index a @column property of an entity, annotate it like, @Index (index=false)
That’s it. Here is an example of JPA query:
// write a JPA Query String jpaQuery = "SELECT a from Author a"; // create Query object Query query = manager.createQuery(jpaQuery); // get results List<Author> list = query.getResultList(); for (Author a : list) { System.out.println(a.getUsername()); }
Kundera also supports multiple “where” clauses with “AND”, “OR”, “=” and “like” operations.
// find all Autors with email like anismiles String jpaQuery_for_emails_like = "SELECT a from Author a WHERE a.emailAddress like anismiles"; // find all Authors with email like anismiles or username like anim String jpaQuery_for_email_or_name = "SELECT a from Author a WHERE a.emailAddress like anismiles OR a.username like anim";
I think this will enable you to play around with Kundera. I will be writing up more on how Kundera indexes various entities and how you can execute Lucene Queries in subsequent posts.
Kundera’s next milestones will be:
- Implementation of JPA listeners, @PrePersist @PostPersist etc.
- Implementation of Relationships, @OneToMany, @ManyToMany etc.
- Implementation of Transactional support, @Transactional
Kundera: knight in the shining armor!
[tweetmeme source=”anismiles” only_single=false http://www.URL.com%5D
The idea behind Kundera is to make working with Cassandra drop-dead simple, and fun. Kundera does not reinvent the wheel by making another client library; rather it leverages the existing libraries, and builds – on top of them – a wrap-around API to developers do away with the unnecessary boiler plate codes, and program a neater, cleaner code that reduces code-complexity and improves quality. And above all, improves productivity.
Download Kundera here: http://code.google.com/p/kundera/
Note: Kundera is now JPA 1.0 compatible, and there are some ensuing changes. You should read about it here: https://anismiles.wordpress.com/2010/07/14/kundera-now-jpa-1-0-compatible/
Objectives:
- To completely remove unnecessary details, such as Column lists, SuperColumn lists, byte arrays, Data encoding etc.
- To be able to work directly with Domain models just with the help of annotations
- To eliminate “code plumbing”, so as to keep the flow of data processing clear and obvious
- To completely separate out Cassandra and its obvious concerns from application-level logics for robust application development
- To include the latest Cassandra developments without breaking anything, anywhere in the business layer
Cassandra Data Models
At the very basic level, Cassandra has Column and SuperColumn to hold your data. Column is a tuple with a name, value and a timestamp; while SuperColumn is Column of Columns. Columns are stored in a ColumnFamily, and SuperColumns in SuperColumnFamily. The most important thing to note is that Cassandra is not your old relational database, it is a flat system. No joins, No foreign keys, nothing. Everything you store here is 100% de-normalized.
Read more details here: https://anismiles.wordpress.com/2010/05/18/cassandra-data-model/
Using Kundera
Kundera defines a range of annotations to describe your Entity objects. Kundera is now JPA1.0 compatible. It builds a range of various Annotations, on top of JPA annotations, to suit its needs. Here are the basic rules:
General Rules
- Entity classes must have a default no-argument constructor.
- Entity classes must be annotated with @CassandraEntity @Entity (@CassandraEntity annotation is dropped in favor of JPA @Entity)
- Entity classes for ColumnFamily must be annotated with @ColumnFamily(“column-family-name”)
- Entity classes for SuperColumnFamily must be annotated with @SuperColumnFamily(“super-column-family-name”)
- Each entity must have a field annotation with @Id
- @Id field must of String type. (Since you can define sorting strategies in Cassandra’s storage-conf file, keeping @Id of String type makes life simpler, you will see later)
- There must be 1 and only 1 @Id per entity.
Note: Kundera works only at property level for now, so all method level annotations are ignored. Idea: keep life simple. 🙂
ColumnFamily Rules
- You must define the name of the column family in @ColumnFamily, like @ColumnFamily (“Authors”) Kundera will link this entity class with “Authors” column family.
- Entities annotated with @ColumnFamily are scanned for properties for @Colum annotations.
- Each such field will qualify to become a Cassandra Column with
- Name: name of the property.
- Value: value of the property
- By default the name of the column will be the name of the property. However, you fancy changing the name, you can override it like, @Column (name=”fancy-name”)
@Column (name="email") // override column-name String emailAddress;
- Properties of type Integer, String, Long and Date are inherently supported, rest all will be serialized before they get saved, and de-serialized while getting read. Serialization has some inherent limitations; that is why Kundera discourages you to use custom objects as Cassandra Column properties. However, you are free to do as you want. Just read the serialization tweaks before insanity reins over you, 😉
- Kundera also supports Collection and Map properties. However there are few things you must take care of:
- You must initialize any Collection or Map properties, like
List<String> list = new ArrayList<String>(); Set<String> set = new HashSet<String>(); Map<String, String> map = new HashMap<String, String>();
- Type parameters follow the same rule, described in #5.
- If you don’t explicitly define the type parameter, elements will be serialized/de-serialized before saving and retrieving.
- There is no guarantee that the Collection element order will be maintained.
- Collection and Map both will create as many columns as the number of elements it has.
- Collection will break into Columns like,
- Name~0: Element at index 0
- Name~1: Element at index 1 and so on.
Name follows rule #4.
- Map will break into Columns like,
- Name~key1: Element at key1
- Name~key2: Element at key2 and so on.
Again, name follows rule #4.
- You must initialize any Collection or Map properties, like
SuperColumnFamily Rules
- You must define the name of the super column family in @SuperColumnFamily, like @SuperColumnFamily (“Posts”) Kundera will link this entity class with “Posts” column family.
- Entities annotated with @SuperColumnFamily are scanned for properties for 2 annotations:
- @Column and
- @SuperColumn
- Only properties annotated with both annotations are picked up, and each such property qualifies to become a Column and fall under SuperColumn.
- You can define the name of the column like you did for ColumnFamily.
- However, you must define the name of the SuperColumn a particular Column must fall under like, @SuperColumn(column = “super-column-name”)
@Column @SuperColumn(column = "post") // column 'title' will fall under super-column 'post' String title;
- Rest of the things are same as above.
Up and running in 5 minutes
Let’s learn by example. We will create a simple Blog application. We will have Posts, Tags and Authors.
Cassandra data model for “Authors” might be like,
ColumnFamily: Authors = { “Eric Long”:{ // row 1 “email”:{ name:“email”, value:“eric (at) long.com” }, “country”:{ name:“country”, value:“United Kingdom” }, “registeredSince”:{ name:“registeredSince”, value:“01/01/2002” } }, ... }
And data model for “Posts” might be like,
SuperColumnFamily: Posts = { “cats-are-funny-animals”:{ // row 1 “post” :{ // super-column “title”:{ “Cats are funny animals” }, “body”:{ “Bla bla bla… long story…” } “author”:{ “Ronald Mathies” } “created”:{ “01/02/2010" } }, “tags” :{ “0”:{ “cats” } “1”:{ “animals” } } }, // row 2 }
Create a new Cassandra Keyspace: “Blog”
<Keyspace Name="Blog"> <!—family definitions--> <!-- Necessary for Cassandra --> <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> </Keyspace>
Create 2 column families: SuperColumnFamily for “Posts” and ColumnFamily for “Authors”
<Keyspace Name="Blog"> <!—family definitions--> <ColumnFamily CompareWith="UTF8Type" Name="Authors"/> <ColumnFamily ColumnType="Super" CompareWith="UTF8Type" CompareSubcolumnsWith="UTF8Type" Name="Posts"/> <!-- Necessary for Cassandra --> <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy> <ReplicationFactor>1</ReplicationFactor> <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch> </Keyspace>
Create entity classes
Author.java
@Entity // makes it an entity class @ColumnFamily ("Authors") // assign ColumnFamily type and name public class Author { @Id // row identifier String username; @Column (name="email") // override column-name String emailAddress; @Column String country; @Column (name="registeredSince") Date registered; String name; public Author () { // must have a default constructor } ... // getters/setters etc. }
Post.java
@Entity // makes it an entity class @SuperColumnFamily("Posts") // assign column-family type and name public class Post { @Id // row identifier String permalink; @Column @SuperColumn(column = "post") // column 'title' will be stored under super-column 'post' String title; @Column @SuperColumn(column = "post") String body; @Column @SuperColumn(column = "post") String author; @Column @SuperColumn(column = "post") Date created; @Column @SuperColumn(column = "tags") // column 'tag' will be stored under super-column 'tags' List<String> tags = new ArrayList<String>(); public Post () { // must have a default constructor } ... // getters/setters etc. }
Note the annotations, match them against the rules described above. Please see how “tags” property has been initialized. This becomes very important because Kundera uses Java Reflection to read and populate the entity classes. Anyways, once we have entity classes in place…
Instantiate EnityManager
Kundera now works as a JPA provider, and here is how you can instantiate EntityManager. https://anismiles.wordpress.com/2010/07/14/kundera-now-jpa-1-0-compatible/#entity-manager
EntityManager manager = new EntityManagerImpl();
manager.setClient(new PelopsClient());
manager.getClient().setKeySpace("Blog");
And that’s about it. You are ready to rock-and-roll like a football. Sorry, I just got swayed with FIFA fever. 😉
Supported Operations
Kundera supports JPA EntityManager based operations, along with JPA queries. Read more here: https://anismiles.wordpress.com/2010/07/14/kundera-now-jpa-1-0-compatible/#entity-operations
Save entities
Post post = ... // new post object
try {
manager.save(post);
} catch (IllegalEntityException e) { e.printStackTrace(); }
catch (EntityNotFoundException e) { e.printStackTrace(); }
If the entity is already saved in Cassandra database, it will be updated, else a new entity will be saved.
Load entity
try {
Post post = manager.load(Post.class, key); // key is the identifier, for our case, "permalink"
} catch (IllegalEntityException e) { e.printStackTrace(); }
catch (EntityNotFoundException e) { e.printStackTrace(); }
Load multiple entities
try {
List posts = manager.load(Post.class, key1, key2, key3...); // key is the identifier, "permalink"
} catch (IllegalEntityException e) { e.printStackTrace(); }
catch (EntityNotFoundException e) { e.printStackTrace(); }
Delete entity
try {
manager.delete(Post.class, key); // key is the identifier, "permalink"
} catch (IllegalEntityException e) { e.printStackTrace(); }
catch (EntityNotFoundException e) { e.printStackTrace(); }
Wow! Was it fun? Was it easy? I’m sure it was. Keep an eye on Kundera, we will be rolling out sooner-than-you-imagine more features like,
- Transaction support
- More fine-grained methods for better control
- Lazy-Loading/Selective-Loading of entity properties and many more.