Hi AJ, On Wed, Jun 8, 2011 at 9:29 AM, AJ <a...@dude.podzone.net> wrote:
> Is there a performance hit when dropping a CF? What if it contains .5 TB of > data? If not, is there a quick and painless way to drop a large amount of > data w/minimal perf hit? Dropping a CF is quick - it snapshots the files (which creates hard links) and removes the CF definition. To actually delete the data, remove the snapshot files from your data directory. > Is there a performance hit running multiple keyspaces on a cluster versus > only one keyspace given a constant total data size? Is there some quantity > limit? There is a tiny amount of memory used per keyspace, but unless you have very many keyspaces you won't notice any impact of running multiple keyspaces. There is however a difference in running multiple column families versus putting everything in the same column family and separating them with e.g. a key prefix. E.g. if you have a large data set and a small one, it will be quicker to query the small one if it is in its own column family. > Using a Random Partitioner, but with a RF = 1, will the rows still be > spread-out evenly on the cluster or will there be an affinity to a single > node (like the one receiving the data from the client)? The rows will be spread out the same way - RF=1 doesn't affect the load balancing. > I see a lot of mention of using RAID-0, but not RAID-5/6. Why? Even though > Cass can tolerate a down node due to data loss, it would still be more > efficient to just rebuild a bad hdd live, right? There's a trade-off - RAID-0 will give better performance, but rebuilds are over a network. WIth RF > 1, RAID-0 is enough so that that you're unlikely to lose data, but as you say, replacing a failed node will be slower. > Maybe perf related: Will there be a problem having multiple keyspaces on a > cluster all with different replication factors, from 1-3? No. Richard. -- Richard Low Acunu | http://www.acunu.com | @acunu