Re: how to scan all rows of cassandra using multiple threads

2015-02-26 Thread mck
> Can I get data owned by a particular node and this way generate sum > on different nodes by iterating over data from virtual nodes and later > generate total sum by doing sum of data from all virtual nodes. > You're pretty much describing a map/reduce job using CqlInputFormat.

Re: how to scan all rows of cassandra using multiple threads

2015-02-25 Thread Clint Kelly
Hi Gaurav, I recommend you just run a MapReduce job for this computation. Alternatively, you can look at the code for the C* MapReduce input format: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlInputFormat.java That should give you what you need to

how to scan all rows of cassandra using multiple threads

2015-02-24 Thread Gaurav Bhatnagar
Hi, I have a cassandra cluster of 3 nodes holding around 300 million rows of items. I have a replication factor of 3 with read/write consistency as Quorum. I want to scan all rows of database to generate sum of items having value "available" in column name state and value "batch1" in column na