Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

daemeon reiydelle Thu, 02 Apr 2015 09:53:13 -0700

Jack did a superb job of explaining all of your issues, and his last
sentence seems to fit your needs (and my experience) very well. The only
other point I would add is to ascertain if the use patterns commend
microservices to abstract from data locality, even if the initial
deployment is a noop to a single cluster. This depends on whether you see a
rapid stream of special purpose business functions. A second question is
about data access ... does Pig support your data access response times?
Many clients find Hadoop ideally suited to a sophisticated ECTL (extract,
cleanup, transformation, and load) model to fast, schema oriented,
repositories like e.g. MySQL. All depends on the use case, growth &
fragmentation expectations for your business model(s), etc.


Good luck.

PS, Jack thanks, for your succint comment.




On Thu, Apr 2, 2015 at 6:33 AM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> There is an old saying in the software industry: The structure of a system
> follows from the structure of the organization that created it (Conway's
> Law). Seriously, the main, first question for your end is who owns the
> applications in terms of executive management, such that if management
> makes a decision that dramatically affects the app's impact on the cluster,
> is it likely that they will have done so with the concurrence of management
> who owns the other app. Trust me, you do not want to be in the middle when
> two managers are in dispute over whose app is more important. IOW, if one
> manager owns both apps, you are probably safe, but if two different
> managers might have differing views of each other's priorities, tread with
> caution.
>
> In any case, be prepared to move one of the apps to a different cluster if
> and when usage patterns cause them to conflict.
>
> There is also the concept of devOps, where the app developers also own
> operations. You really can't have two separate development teams administer
> operations for one set of hardware.
>
> If you are dedicated to operations for both app teams and the teams seem
> to be reasonably compatible, then it could be fine.
>
> In short, sure, technically a single cluster can support  any number of
> key spaces, but mostly it will come down to whether there might be an
> excess of contention for load and operations of the cluster in production.
>
> And then little things like software upgrades - one app might really need
> a disruptive or risky upgrade or need to bounce the entire cluster, but
> then the other app may be impacted even though it had no need for the
> upgrade or be bounced.
>
> Are the apps synergistic in some way, such that there is an architectural
> benefit from running on the same hardware?
>
> In the end, the simplest solution is typically the better solution, unless
> any of these other factors loom too large.
>
>
> -- Jack Krupansky
>
> On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose <ianr...@fullstory.com> wrote:
>
>> Hi all -
>>
>> We currently have a single cassandra cluster that is dedicated to a
>> relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
>> for another, unrelated, system, and my debate is whether to just add the
>> new tables to our existing cassandra cluster or whether to spin up an
>> entirely new, separate cluster for this new system.
>>
>> Does anyone have pros/cons to share on this?  It appears from watching
>> talks and such online that the big users (e.g. Netflix, Spotify) tend to
>> favor multiple, single-purpose clusters, and thus that was my initial
>> preference.  But we are (for now) no where close to them in traffic so I'm
>> wondering if running an entirely separate cluster would be a premature
>> optimization which wouldn't pay for the (nontrivial) overhead in
>> configuration management and ops.  While we are still small it might be
>> much smarter to reuse our existing clusters so that I can get it done
>> faster...
>>
>> Thanks!
>> - Ian
>>
>>
>

Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

Reply via email to