How to process new rows in parallel?

Philip Nelson Fri, 03 Aug 2012 11:18:40 -0700

Hello,

I am using a Column Family in Cassandra to store incoming messages, which 
arrive at a high rate (100s of thousands per second). I then have a process 
wake up periodically to work on those messages, and then delete them. I'd like 
to understand how I could have multiple processes running, each pulling off a 
bunch of messages in parallel. It would be nice to be able to add processes 
dynamically, and not have to explicitly assign message ranges to various 
processes.


Any suggestions on how to ensure that each process pulls off a different bunch 
of messages? Any recommended design patterns? I was going to look at qsandra 
too, for inspiration. Would this be worthwhile?

If this was a relational database, I would have the processes lock the table 
(or perhaps a row), set flags on a row indicating that it's being "processed", 
and then unlock. Processes would choose messages by SELECTing on unflagged 
messages. I'm not sure how this might map to Cassandra. I realise it may not. 
Even if I configure the cluster such that seting a flag on a row requires all 
nodes to be written, two processes could still race setting that flag, right?

I am open to the idea that it might help to store the messages in wide rows, if 
that helps.

Thanks,

Philip

How to process new rows in parallel?

Reply via email to