Hi AJ,

On Wed, Jun 8, 2011 at 9:29 AM, AJ <a...@dude.podzone.net> wrote:

> Is there a performance hit when dropping a CF?  What if it contains .5 TB of
> data?  If not, is there a quick and painless way to drop a large amount of
> data w/minimal perf hit?

Dropping a CF is quick - it snapshots the files (which creates hard
links) and removes the CF definition.  To actually delete the data,
remove the snapshot files from your data directory.

> Is there a performance hit running multiple keyspaces on a cluster versus
> only one keyspace given a constant total data size?  Is there some quantity
> limit?

There is a tiny amount of memory used per keyspace, but unless you
have very many keyspaces you won't notice any impact of running
multiple keyspaces.

There is however a difference in running multiple column families
versus putting everything in the same column family and separating
them with e.g. a key prefix.  E.g. if you have a large data set and a
small one, it will be quicker to query the small one if it is in its
own column family.

> Using a Random Partitioner, but with a RF = 1, will the rows still be
> spread-out evenly on the cluster or will there be an affinity to a single
> node (like the one receiving the data from the client)?

The rows will be spread out the same way - RF=1 doesn't affect the
load balancing.

> I see a lot of mention of using RAID-0, but not RAID-5/6.  Why?  Even though
> Cass can tolerate a down node due to data loss, it would still be more
> efficient to just rebuild a bad hdd live, right?

There's a trade-off - RAID-0 will give better performance, but
rebuilds are over a network.  WIth RF > 1, RAID-0 is enough so that
that you're unlikely to lose data, but as you say, replacing a failed
node will be slower.

> Maybe perf related:  Will there be a problem having multiple keyspaces on a
> cluster all with different replication factors, from 1-3?

No.

Richard.

-- 
Richard Low
Acunu | http://www.acunu.com | @acunu

Reply via email to