Hello Jeremy

Basically what you are doing is to ask Cassandra to do a distributed full
scan on all the partitions across the cluster, it's normal that the nodes
are somehow.... stressed.

How did you make the query? Are you using Thrift or CQL3 API?

Please note that there is another way to get all partition keys : SELECT
DISTINCT <partition_key> FROM..., more details here :
www.datastax.com/dev/blog/cassandra-2-0-1-2-0-2-and-a-quick-peek-at-2-0-3
I ran an application today that attempted to fetch 20,000+ unique row keys
in one query against a set of completely empty column families. On a 4-node
cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
heap), every single node immediately ran out of memory and became
unresponsive, to the point where I had to kill -9 the cassandra processes.

Now clearly this query is not the best idea in the world, but the effects
of it are a bit disturbing. What could be going on here? Are there any
other query pitfalls I should be aware of that have the potential to
explode the entire cluster?

-j

Reply via email to