Thanks for the responses! We'll definitely go for powerful servers to
reduce the total count. Beyond a dozen servers there really doesn't seem
to be much point in trying to increase count anymore for
replication/redundancy. I'm assuming we will use level compaction, which
means that we'll most likely run out of cpu before we run out of I/O. At
least that has been my experience so far. I'm glad to hear that 100+
nodes isn't that unusual anymore in the cassandra world.

On 1/21/2012 3:38 AM, Eric Czech wrote:
> I'd also add that one of the biggest complications to arise from
> having multiple clusters is that read biased client applications would
> need to be aware of all clusters and either aggregate result sets or
> involve logic to choose the right cluster based on a particular query.
>
> And from a more operational perspective, I think you'd have a tough
> time find monitoring applications (like Opscenter) that would support
> multiple clusters within the same viewport.  Having used multiple
> clusters in the past, I can definitely tell you that from an
> administrative, operational, and development standpoint, one cluster
> is almost definitely better than many. 
>
> Oh and I'm positive that there are other cassandra deployments out
> there with well beyond 100 nodes so I don't thinking you're really
> treading on dangerous ground here.
>
> I'd definitely say that you should try to use a single cluster if
> possible.
>
> On Fri, Jan 20, 2012 at 9:34 PM, Maxim Potekhin <potek...@bnl.gov
> <mailto:potek...@bnl.gov>> wrote:
>
>     You can also scale not "horizontally" but "diagonally",
>     i.e. raid SSDs and have multicore CPUs. This means that
>     you'll have same performance with less nodes, making
>     it far easier to manage.
>
>     SSDs by themselves will give you an order of magnitude
>     improvement on I/O.
>
>
>
>     On 1/19/2012 9:17 PM, Thorsten von Eicken wrote:
>
>         We're embarking on a project where we estimate we will need on
>         the order
>         of 100 cassandra nodes. The data set is perfectly
>         partitionable, meaning
>         we have no queries that need to have access to all the data at
>         once. We
>         expect to run with RF=2 or =3. Is there some notion of ideal
>         cluster
>         size? Or perhaps asked differently, would it be easier to run
>         one large
>         cluster or would it be easier to run a bunch of, say, 16 node
>         clusters?
>         Everything we've done to date has fit into 4-5 node clusters.
>
>
>

Reply via email to