Hello, I am using a Column Family in Cassandra to store incoming messages, which arrive at a high rate (100s of thousands per second). I then have a process wake up periodically to work on those messages, and then delete them. I'd like to understand how I could have multiple processes running, each pulling off a bunch of messages in parallel. It would be nice to be able to add processes dynamically, and not have to explicitly assign message ranges to various processes.
Any suggestions on how to ensure that each process pulls off a different bunch of messages? Any recommended design patterns? I was going to look at qsandra too, for inspiration. Would this be worthwhile? If this was a relational database, I would have the processes lock the table (or perhaps a row), set flags on a row indicating that it's being "processed", and then unlock. Processes would choose messages by SELECTing on unflagged messages. I'm not sure how this might map to Cassandra. I realise it may not. Even if I configure the cluster such that seting a flag on a row requires all nodes to be written, two processes could still race setting that flag, right? I am open to the idea that it might help to store the messages in wide rows, if that helps. Thanks, Philip