Hello!

Thank you.

In 1) I hope, that processing smaller files will be more easy to monitor. Also, if we have disk failure, we can delete just one file and repair, for example. Actually, CF per customer will be the best (easy to delete/backup specified customer data only, customers are totally independent), but Cassandra likely doesn't support 15000 CF per Keyspace.

Regarding 3) - yes, I understand.

One related question there - if we can choose, should we prefer
5 nodes, 16 cores/16 GB/8 TB disk space each
or
10 nodes, 8 cores/8 GB/4 TB disk space each ?

When it worth to use multiple Cassandra instance per node ? We run now 6 instances on 3 nodes, and it works much better, than 3 instances on the same 3 nodes. Is it rule or exception ?




On 28.05.2010 07:11, Jonathan Ellis wrote:
2) is correct, but for 1) I'm not sure what manageability improvements
you anticipate from dealing with multiple entities instead of one.
I'm not sure what you're thinking of for 3) but routing is done by key
only.

2010/5/27 Maxim Kramarenko<maxi...@trackstudio.com>:
Hello!

We have mail archive with one large CF for mail body. In our case, it's easy
to shard data to 5-10 CF by customer id. We like to do this because:

1) We get more manageable instances, because we have many small CF instead
of one multi-TB CF on each node.

2) Better disk space usage (need to reserve 50% of the largest shard for
compaction only)

3) Can manage node load not by token only, but also by defining shards
available per node.

Is my assumptions correct ? Any negative side effects ?

Reply via email to