The fastest way I am aware of is to do the queries in parallel to
multiple cassandra nodes and make sure that you only ask them for keys
they are responsible for. Otherwise, the node needs to resend your query
which is much slower and creates unnecessary objects (and thus GC pressure).

You can manually take advantage of the token range information, if the
driver does not get this into account for you. Then, you can play with
concurrency and batch size of a single query against one node.
Basically, what you/driver should do is to transform the query to series
of "SELECT * FROM TABLE WHERE TOKEN IN (start, stop)".

I will need to look up the actual code, but the idea should be clear :)

Jirka H.


On 02/11/2015 11:26 AM, Ja Sam wrote:
> Is there a simple way (or even a complicated one) how can I speed up
> SELECT * FROM [table] query?
> I need to get all rows form one table every day. I split tables, and
> create one for each day, but still query is quite slow (200 millions
> of records)
>
> I was thinking about run this query in parallel, but I don't know if
> it is possible

Reply via email to