I have spent the last few days playing with Cassandra and I have attempted to
create a simple "Java->Thrift->Cassandra" Discussion Group Server (because the
world needs another one) to teach myself the data model and try everything out.
With all the great blog posts on cassandra out there, I am now able to
read/write/delete/modify a nested discussion server. YEA!!!
I decided to have two simple ColumnFamilies.
One called Posts
Post = {
'7561a442-24e2-11df-8924-001ff3591711': { //UUID
'id': '7561a442-24e2-11df-8924-001ff3591711', //ID == UUID
'parent_id': '89da3178-24e2-11df-8924-001ff3591711' //Parent Post
UUID
'author': 'a4a70900-24e1-11df-8924-001ff3591711', //Users UUID
'subject': 'This is a forum post', //Subject
'body': 'Forum post body. This is awesome!', //Body
'_ts': '89da3178-24e2-11df-8924-001ff3596713', //TimeUUID
},
}
Where the key is a simple UUID and the columns are the Forum/Post/Replies. A
Forum has a hardcoded Parent UUID which I store in Java, while the Posts and
Replies are tied to their parent posts/forums/etc by the parent_id. I sort by
UTF8Type, but it really doesn't matter in this case as I drive into this map
always by the Key and always get all columns (6 of them).
All queries drive into the second ColumnFamily called Threads
Thread = {
'7561a442-24e2-11df-8924-001ff3591711': { //Parent
thread UUID
#timestamp of post: post UUID
'89da3178-24e2-11df-8924-001ff3596713':
'7561a442-24e2-11df-8924-001ff3591711',//TimeUUID column name -> post UUID value
},
}
With a Parent UUID I can drive into Threads which will give me the list of
Posts/Replies at that level sorted by TimeUUID. Column name is the post
TimeUUID and the value is the Post UUID. This ColumnFamily is sorted by
TimeUUID.
Thus I can walk the tree (of any depth) of Forum/Post/Replies with the Thread
table.
I have this all working on a single cassandra node and it works great. Inserts
go to both tables while deletes need to use the Thread ColumnFamily to
recursively delete all child posts, the Column in the Parent key of Thread and
all associated data in Post.
Any comments on whether this is a good/terrible data model, etc so far are
welcome. :)
My question comes from the fact that during this process I have
written/read/deleted many "key->Columns" to these ColumnFamilies (many of which
failed half-way through) so I decided to write a "clean" script to remove all
data from these ColumnFamilies (much like a truncate table command in SQL).
Using the following Java code
//get the ID column for each KEY we find
List<byte[]> l_columns = new ArrayList<byte[]>();
l_columns.add(Transcoder.encode(ID));
SlicePredicate l_slicePredicate = new SlicePredicate();
l_slicePredicate.setColumn_names(l_columns);
//get 100 keys at a time
KeyRange keyRange = new KeyRange(100);
keyRange.setStart_key("");
keyRange.setEnd_key("");
List<KeySlice> l_keySlices =
p_context.getClient().get_range_slices("Discussions", new ColumnParent("Posts"),
l_slicePredicate, keyRange, ConsistencyLevel.ONE);
I get ALL of the KEYS I ever wrote to the server. Most of them have no Columns
associated with them. In fact if I query the same key with
SlicePredicate l_slicePredicate = new SlicePredicate();
SliceRange l_sliceRange = new SliceRange();
l_sliceRange.setStart(new byte[] {});
l_sliceRange.setFinish(new byte[] {});
l_slicePredicate.setSlice_range(l_sliceRange);
List<ColumnOrSuperColumn> l_result =
p_context.getClient().get_slice("Discussions", <KEY FROM
GET_RANGE_SLICES>, new ColumnParent("Posts"),
l_slicePredicate, ConsistencyLevel.ONE);
it returns a empty array list (the same if I give it a KEY it has never seen).
It is OK with me if get_range_slices returns keys with no columns (although it
makes it a little harder to explain to others -- is there garbage collection
that will clean these out in the future?), however I am stuck on how to simply
truncate the table without looping through all the values looking for something
that has a Column associated with it and then deleting that key->value.
It is possible I am not deleting correctly as well. For that I simply do:
p_context.getClient().remove("Discussions", p_postUUID.toString(),
new ColumnPath("Posts"), l_rightNow,
ConsistencyLevel.ALL);
Just trying to understand what I am getting and compare it against what I
expected. I am also still trying to write a simple "clean" command.
If you read this far, thanks.... If you can add some clarity it would help me.
I have tried to find it in archives and blog posts, but I didn't see anything.
Thanks,
Kevin
This email and any attachments may contain confidential and proprietary
information of Xythos that is for the sole use of the intended recipient. If
you are not the intended recipient, disclosure, copying, re-distribution or
other use of any of this information is strictly prohibited. Please immediately
notify the sender and delete this transmission if you received this email in
error.