Hi, I have a cassandra cluster of 3 nodes holding around 300 million rows of items. I have a replication factor of 3 with read/write consistency as Quorum. I want to scan all rows of database to generate sum of items having value "available" in column name state and value "batch1" in column name batch. Row key for item is a 15 digit random number. I want to do this processing in multiple threads for instance one thread generating sum for one portion of data and other thread generating sum for another disjoint portion of data and later I would add up total from these 2 threads to get final sum. What can be the possible way to achieve this? Can I use concept of virtual nodes here. Each node owns set of virtual nodes. Can I get data owned by a particular node and this way generate sum on different nodes by iterating over data from virtual nodes and later generate total sum by doing sum of data from all virtual nodes.
Regards, Gaurav