If you are deleting the messages after processing, it sounds like you are using Cassandra as a work queue.
Here are some links for implementing a distributed queue in Cassandra: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Distributed-work-queues-td5226248.html http://comments.gmane.org/gmane.comp.db.cassandra.user/16633 There is a placeholder on the use cases wiki for this, but no info: http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue We were looking to do the same thing, but in the end decided to go with Kafka. Given your throughput requirements, Kafka might be a good option for you as well. -brian On Fri, Aug 3, 2012 at 2:18 PM, Philip Nelson <philipomailbox-c...@yahoo.com> wrote: > Hello, > > I am using a Column Family in Cassandra to store incoming messages, which > arrive at a high rate (100s of thousands per second). I then have a process > wake up periodically to work on those messages, and then delete them. I'd > like to understand how I could have multiple processes running, each pulling > off a bunch of messages in parallel. It would be nice to be able to add > processes dynamically, and not have to explicitly assign message ranges to > various processes. > > Any suggestions on how to ensure that each process pulls off a different > bunch of messages? Any recommended design patterns? I was going to look at > qsandra too, for inspiration. Would this be worthwhile? > > If this was a relational database, I would have the processes lock the table > (or perhaps a row), set flags on a row indicating that it's being > "processed", and then unlock. Processes would choose messages by SELECTing on > unflagged messages. I'm not sure how this might map to Cassandra. I realise > it may not. Even if I configure the cluster such that seting a flag on a row > requires all nodes to be written, two processes could still race setting that > flag, right? > > I am open to the idea that it might help to store the messages in wide rows, > if that helps. > > Thanks, > > Philip -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/