Look for the message "Re: Fastest way to map/parallel read all values in a 
table?" in the mailing list, it was recently discussed. You can have several 
parallel processes each one reading a slice of the data, by splitting min/max 
murmur3 hash ranges.

In the company I used to work we developed a system to run custom python 
processes on demand to process Cassandra data among other things to be able to 
do that. I hope it will be released as open source soon, it seems there is a 
lot of people having always this same problem.

If you use Cassandra enterprise, you can use hive, AFAIK. A good idea would be 
running a hadoop or spark process over your cluster and do the processing you 
want, but sometimes I think it might be a bit hard to achieve good results for 
that, mainly because these tools work fine but are "auto magic". It's hard to 
control where intermediate data will be stored, for example.


From: user@cassandra.apache.org 
Subject: Re:How to speed up SELECT * query in Cassandra

Is there a simple way (or even a complicated one) how can I speed up SELECT * 
FROM [table] query?
I need to get all rows form one table every day. I split tables, and create one 
for each day, but still query is quite slow (200 millions of records)

I was thinking about run this query in parallel, but I don't know if it is 
possible

Reply via email to