Happy New Year all!

I'm working on a solution for the following scenario: I have tuples coming to a 
cassandra bolt. The tuples are of this form: TupleData(String name, Int count, 
Long time) Time field is unique per batch only but not overall because some 
tuples may come in late but have the same name and time but different count.

For example:
I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111)
Then the bolt may receive (x1,5,1111)
After these are put in cassandra, column family x1 should have value 8 for time 
1111 and column family x2 should have value 4 for time 1111

Caching aside, cassandra bolt needs to check if there is a count already in the 
db for the tuple with given name and time. If it does exist then retrieve, 
increment it with newly received value, and update db exntry w the new value. 
(At this point I'm not sure if update or delete+reinsert is speedier)
If no db entry exists, then add the new tuple.

I've looked at cassandra bolts code from 
https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/hmsonline/storm/cassandra/bolt
which is the same as cassandra bolt from storm-contrib.

There is a class CassandraCounterBatchingBolt, but after looking at it I don't 
believe it does the look up in db first before saving the value to db, which 
leads me to believe that this will not work.

What I'm looking for seems pretty basic and I wonder if there is a cassandra 
bolt to do db lookup before updating db. Does such a bolt exist open-sourced?
Otherwise I'm thinking of building mine on top of CassandraBatchingBolt.

-Adrian

Reply via email to