If you are deleting the messages after processing, it sounds like you
are using Cassandra as a work queue.

Here are some links for implementing a distributed queue in Cassandra:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Distributed-work-queues-td5226248.html
http://comments.gmane.org/gmane.comp.db.cassandra.user/16633

There is a placeholder on the use cases wiki for this, but no info:
http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue

We were looking to do the same thing, but in the end decided to go with Kafka.
Given your throughput requirements, Kafka might be a good option for
you as well.

-brian


On Fri, Aug 3, 2012 at 2:18 PM, Philip Nelson
<philipomailbox-c...@yahoo.com> wrote:
> Hello,
>
> I am using a Column Family in Cassandra to store incoming messages, which 
> arrive at a high rate (100s of thousands per second). I then have a process 
> wake up periodically to work on those messages, and then delete them. I'd 
> like to understand how I could have multiple processes running, each pulling 
> off a bunch of messages in parallel. It would be nice to be able to add 
> processes dynamically, and not have to explicitly assign message ranges to 
> various processes.
>
> Any suggestions on how to ensure that each process pulls off a different 
> bunch of messages? Any recommended design patterns? I was going to look at 
> qsandra too, for inspiration. Would this be worthwhile?
>
> If this was a relational database, I would have the processes lock the table 
> (or perhaps a row), set flags on a row indicating that it's being 
> "processed", and then unlock. Processes would choose messages by SELECTing on 
> unflagged messages. I'm not sure how this might map to Cassandra. I realise 
> it may not. Even if I configure the cluster such that seting a flag on a row 
> requires all nodes to be written, two processes could still race setting that 
> flag, right?
>
> I am open to the idea that it might help to store the messages in wide rows, 
> if that helps.
>
> Thanks,
>
> Philip



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

Reply via email to