Jack did a superb job of explaining all of your issues, and his last sentence seems to fit your needs (and my experience) very well. The only other point I would add is to ascertain if the use patterns commend microservices to abstract from data locality, even if the initial deployment is a noop to a single cluster. This depends on whether you see a rapid stream of special purpose business functions. A second question is about data access ... does Pig support your data access response times? Many clients find Hadoop ideally suited to a sophisticated ECTL (extract, cleanup, transformation, and load) model to fast, schema oriented, repositories like e.g. MySQL. All depends on the use case, growth & fragmentation expectations for your business model(s), etc.
Good luck. PS, Jack thanks, for your succint comment. On Thu, Apr 2, 2015 at 6:33 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > There is an old saying in the software industry: The structure of a system > follows from the structure of the organization that created it (Conway's > Law). Seriously, the main, first question for your end is who owns the > applications in terms of executive management, such that if management > makes a decision that dramatically affects the app's impact on the cluster, > is it likely that they will have done so with the concurrence of management > who owns the other app. Trust me, you do not want to be in the middle when > two managers are in dispute over whose app is more important. IOW, if one > manager owns both apps, you are probably safe, but if two different > managers might have differing views of each other's priorities, tread with > caution. > > In any case, be prepared to move one of the apps to a different cluster if > and when usage patterns cause them to conflict. > > There is also the concept of devOps, where the app developers also own > operations. You really can't have two separate development teams administer > operations for one set of hardware. > > If you are dedicated to operations for both app teams and the teams seem > to be reasonably compatible, then it could be fine. > > In short, sure, technically a single cluster can support any number of > key spaces, but mostly it will come down to whether there might be an > excess of contention for load and operations of the cluster in production. > > And then little things like software upgrades - one app might really need > a disruptive or risky upgrade or need to bounce the entire cluster, but > then the other app may be impacted even though it had no need for the > upgrade or be bounced. > > Are the apps synergistic in some way, such that there is an architectural > benefit from running on the same hardware? > > In the end, the simplest solution is typically the better solution, unless > any of these other factors loom too large. > > > -- Jack Krupansky > > On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose <ianr...@fullstory.com> wrote: > >> Hi all - >> >> We currently have a single cassandra cluster that is dedicated to a >> relatively narrow purpose, with just 2 tables. Soon we will need cassandra >> for another, unrelated, system, and my debate is whether to just add the >> new tables to our existing cassandra cluster or whether to spin up an >> entirely new, separate cluster for this new system. >> >> Does anyone have pros/cons to share on this? It appears from watching >> talks and such online that the big users (e.g. Netflix, Spotify) tend to >> favor multiple, single-purpose clusters, and thus that was my initial >> preference. But we are (for now) no where close to them in traffic so I'm >> wondering if running an entirely separate cluster would be a premature >> optimization which wouldn't pay for the (nontrivial) overhead in >> configuration management and ops. While we are still small it might be >> much smarter to reuse our existing clusters so that I can get it done >> faster... >> >> Thanks! >> - Ian >> >> >