Astyanax is performing the increment using counter columns. In storm-cassandra, the code for incrementing the column value is here:
AstyanaxClient.java:422 mutation.withRow(columnFamily, rowKey) .incrementCounterColumn(columnName, incrementAmount); This uses the counter column mechanisms exposed by Astyanax. For more information, go here: https://github.com/Netflix/astyanax/wiki/Working-with-counter-columns This should work, except for the caveats mentioned already. Cassandra is addressing this under: https://issues.apache.org/jira/browse/CASSANDRA-4775) -brian --- Brian O'Neill Chief Architect Health Market Science The Science of Better Results 2700 Horizon Drive King of Prussia, PA 19406 M: 215.588.6024 @boneill42 <http://www.twitter.com/boneill42> healthmarketscience.com This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited. From: Adrian Mocanu <amoc...@verticalscope.com> Reply-To: <user@storm.incubator.apache.org> Date: Monday, January 6, 2014 at 10:21 AM To: "user@storm.incubator.apache.org" <user@storm.incubator.apache.org> Subject: RE: Cassandra bolt Hi I am actually looking into using CassandraCounterBatchingBolt but atm I¹m not sure how Cassandra handles these eventual consistency issues so I need to research that. The reason I mention this issues is because I cannot find anywhere in the code where before a write there is a read .. which bothers me .. maybe Cassandra does it w counter columns? IDK. The issue I¹m talking ab is updating the same counter consecutively, but faster than the updates propagate to other Cassandra nodes. Example: Say I have 3 cassandra nodes. The counters on each of these nodes are 0. Node1:0, node2:0, node3:0 An increment comes: 5 5 -> Node1:0, node2:0, node3:0 Increment starts at node 5 still needs to propagate to node1 and node3 Node1:0, node2:5, node3:0 In the meantime, another increment arrives before previous increment is propagated: 3 -> Node1:0, node2:5, node3:0 Assuming 3 starts at a different node than where 5 started we have: Node1:3, node2:5, node3:0 Now if 3 gets propagated to the other nodes AS AN INCREMENT and not as a new value (and the same for 5) then eventually they would all equal 8 and this is what I want. If 3 overwrites 5 (because it has a later timestamp) this is problematic not what I want. Will see what the Cassandra group says... or if the creators of CassandraCounterBatchingBolt is on this group please let me know J Thanks Adrian From: Vladi Feigin [mailto:vladi...@gmail.com] Sent: January-04-14 2:00 AM To: user@storm.incubator.apache.org Subject: Re: Cassandra bolt Hi Adrian, Why you don't use C* counters? Looks like your scenario fits for this. I think CassandraCounterBatchingBolt provides what you need Vladi On Fri, Jan 3, 2014 at 11:00 PM, Adrian Mocanu <amoc...@verticalscope.com> wrote: > > Happy New Year all! > > I'm working on a solution for the following scenario: I have tuples coming to > a cassandra bolt. The tuples are of this form: TupleData(String name, Int > count, Long time) Time field is unique per batch only but not overall because > some tuples may come in late but have the same name and time but different > count. > > For example: > I can receive these tuples for the same time: (x1,3,1111), (x2,4,1111) > Then the bolt may receive (x1,5,1111) > After these are put in cassandra, column family x1 should have value 8 for > time 1111 and column family x2 should have value 4 for time 1111 > > Caching aside, cassandra bolt needs to check if there is a count already in > the db for the tuple with given name and time. If it does exist then retrieve, > increment it with newly received value, and update db exntry w the new value. > (At this point I'm not sure if update or delete+reinsert is speedier) > If no db entry exists, then add the new tuple. > > I've looked at cassandra bolts code from > https://github.com/hmsonline/storm-cassandra/tree/master/src/main/java/com/hms > online/storm/cassandra/bolt > which is the same as cassandra bolt from storm-contrib. > > There is a class CassandraCounterBatchingBolt, but after looking at it I don't > believe it does the look up in db first before saving the value to db, which > leads me to believe that this will not work. > > What I'm looking for seems pretty basic and I wonder if there is a cassandra > bolt to do db lookup before updating db. Does such a bolt exist open-sourced? > Otherwise I'm thinking of building mine on top of CassandraBatchingBolt. > > -Adrian >