Re: Multiple HBase instances

Sean Bigdatafun Fri, 29 Oct 2010 07:45:00 -0700

On Wed, Oct 20, 2010 at 7:29 AM, Stack <[email protected]> wrote:

> Hey Dan:
>
> On Wed, Oct 20, 2010 at 2:09 AM, Dan Harvey <[email protected]>
> wrote:
> > Hey,
> >
> > We're just looking into ways to run multiple instances/versions of HBase
> for
> > testing/development and were wondering how other people have gone about
> > doing this.
> >
>
> Development of replication feature has made it so tests now can put up
> multiple concurrent clusters.   See TestHBaseClusterUtility which
> starts up three clusters in the one JVM each homed on its own
> directory in a single zookeeper instance, each running its own hdfs
> (having them share an hdfs should work too though might need some
> HBaseTestingUtility fixup).
>
> At SU, there are mutliple clusters: a serving cluster for low-latency
> (replicating to backup cluster) and then a cluster for MR jobs, dev
> clusters, etc.  Generally these don't share hdfs though again
> cluster's with like SLAs could.
>
> > If we used just one hadoop cluster then we can have a different paths /
> user
> > for each hbase instance, and then have a set of zookeeper nodes for each
> > instance (or run multiple zk's on each server binding to different hosts
> for
> > each instance..).
>
> You could do that.  Have all share same zk ensemble (Run one per
> datacenter?)
>
> > If we used multiple hadoop clusters then the only difference would be
> just
> > using different hdfs for storing the data.
> >
> > Does anyone have experiences with problems or benefits to either of the
> > above?
> >
> > I'm tempted to go towards the single cluster for more efficient use of
> > hardware but I'm not sure if that's a good idea or not.
> >
>
> At SU the cluster serving the frontend is distinct from the cluster
> running the heavy-duty MR jobs. When a big MR job started up, the
> front-end latency tended to suffer.  There might be some ratio of HDFS
> nodes to HBase nodes that would make it so low-latency and MR cluster
> could share HDFS but I've not done the work to figure it.
>



I think the low-latency guarantee (or at least in some degree) requirement
prevents any heavy M/R job in the same cell, and here is the reason:
     ---- If a heavy M/R task gets started to run on a machine, it may peg
the CPU, evict memory and so on, which basically makes the access to data
belonging to that RS much higher latency than normal.

Any comment?


>
> St.Ack
>

Re: Multiple HBase instances

Reply via email to