Hi Brian
Thanks for the reply. Currently using raw storm; doing ok without trident.

For me the use case is a simpler because I use fieldGrouping so all tuples w 
the same "ID" will go to the same Cassandra bolt which means no other bolts 
will update the values for the tuples read by current bolt. This eliminates 
some of the headaches with race conditions. 
(http://aphyr.com/posts/294-call-me-maybe-cassandra/)

I am also thinking of using a time expiry cache (from guava) to write through 
it my tuples to the db which should make things even simpler and give time to 
Cassandra to propagate the writes to the replicas before the next read of that 
same value happens.

Well, these are just thoughts at this points. Still have to look more into it.

Thanks!
Adrian

From: Brian O'Neill [mailto:boneil...@gmail.com] On Behalf Of Brian O'Neill
Sent: January-03-14 4:10 PM
To: user@storm.incubator.apache.org
Subject: Re: Cassandra bolt


Adrian,

See the email I just sent out to Laurent.

We have the exact same use case, and we are evaluating the use of lightweight 
transactions (available in C* 2.0) to accomplish what you described without 
falling into all the traps involved in a read-before-write counter update.

I think a CassandraState implementation built on top of CQL may suffice.
I can probably get something published out to github by Monday or Tuesday.

Are you in a position to use Trident?
Or are you using raw Storm?

-brian

---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive * King of Prussia, PA * 19406
M: 215.588.6024 * @boneill42<http://www.twitter.com/boneill42>  *
healthmarketscience.com

This information transmitted in this email message is for the intended 
recipient only and may contain confidential and/or privileged material. If you 
received this email in error and are not the intended recipient, or the person 
responsible to deliver it to the intended recipient, please contact the sender 
at the email above and delete this email and any attachments and destroy any 
copies thereof. Any review, retransmission, dissemination, copying or other use 
of, or taking any action in reliance upon, this information by persons or 
entities other than the intended recipient is strictly prohibited.


From: Adrian Mocanu 
<amoc...@verticalscope.com<mailto:amoc...@verticalscope.com>>
Reply-To: 
<user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>>
Date: Friday, January 3, 2014 at 4:00 PM
To: "user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>" 
<user@storm.incubator.apache.org<mailto:user@storm.incubator.apache.org>>
Subject: Cassandra bolt

Happy New Year all!

I'm working on a solution for the following scenario: I have tuples coming to a 
cassandra bolt. The tuples are of this form: TupleData(String name, Int count, 
Long time) Time field is unique per batch only but not overall because some 
tuples may come in late but have the same name and time but different count.

For example:
I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111)
Then the bolt may receive (x1,5,1111)
After these are put in cassandra, column family x1 should have value 8 for time 
1111 and column family x2 should have value 4 for time 1111

Caching aside, cassandra bolt needs to check if there is a count already in the 
db for the tuple with given name and time. If it does exist then retrieve, 
increment it with newly received value, and update db exntry w the new value. 
(At this point I'm not sure if update or delete+reinsert is speedier)
If no db entry exists, then add the new tuple.

I've looked at cassandra bolts code from 
https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/hmsonline/storm/cassandra/bolt
which is the same as cassandra bolt from storm-contrib.

There is a class CassandraCounterBatchingBolt, but after looking at it I don't 
believe it does the look up in db first before saving the value to db, which 
leads me to believe that this will not work.

What I'm looking for seems pretty basic and I wonder if there is a cassandra 
bolt to do db lookup before updating db. Does such a bolt exist open-sourced?
Otherwise I'm thinking of building mine on top of CassandraBatchingBolt.

-Adrian

Reply via email to