We are engaging in both strategies at the same time:

1) We call it functional sharding - we write to clusters targeted according
to the type of data being written.  Because different data types often have
different workloads this has the nice side effect of being able to tune
each cluster according to its workload.  Your ability to grow in this
dimension is limited by the number of business object types you're
recording.

2) We write to clusters sharded by time.  Our objects are network security
events, so there's always an element of time.  We encode that time into
deterministic object IDs so that we are able to identify in the read path
which shard to direct the request to by extracting the time component.
This basic idea should be able to work any time you're able to use
surrogate keys instead of natural keys.  If you are using natural keys, you
may be facing an unpleasant migration should you need to increase the
number of shards in this dimension.

Our reason for engaging in the second strategy was not purely Cassandra's
fault, rather we were using DSE with a search workload, and the cost of
rebuilding Solr indexes on streaming operations (such as adding nodes to an
existing cluster) required enough resources that we found it prohibitive.
That's because the bootstrapping node was also taking a production write
workload, and we didn't want to run our cluster with enough overhead that a
node could bootstrap and take production workload at the same time.

For vanilla Cassandra workloads we have run clusters with quite a bit more
nodes than 100 without any appreciable trouble.  Curious if you can share
documents about clusters over 100 nodes causing troubles for users.  I'm
wondering if it's related to node failure rate combined with vnodes meaning
that several concurrent node failures cause a part of the ring to go
offline too reliably.

On Mon, Nov 5, 2018 at 7:38 AM onmstester onmstester
<onmstes...@zoho.com.invalid> wrote:

> Hi,
>
> One of my applications requires to create a cluster with more than 100
> nodes, I've read documents recommended to use clusters with less than 50 or
> 100 nodes (Netflix got hundreds of clusters with less 100 nodes on each).
> Is it a good idea to use multiple clusters for a single application, just
> to decrease maintenance problems and system complexity/performance?
> If So, which one of below policies is more suitable to distribute data
> among clusters and Why?
> 1. each cluster' would be responsible for a specific partial set of tables
> only (table sizes are almost equal so easy calculations here) for example
> inserts to table X would go to cluster Y
> 2. shard data at loader level by some business logic grouping of data, for
> example all rows with some column starting with X would go to cluster Y
>
> I would appreciate sharing your experiences working with big clusters,
> problem encountered and solutions.
>
> Thanks in Advance
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>

Reply via email to