Hi,

Sorry for not responding: I'm not on the list very often.

It seems to be of interest for some of you, so we will publish this script on GitHub, so that everybody can test and improve it.
More info latter...

Regards,

Le 24/12/12 21:23, anil gupta a écrit :
Hi Vincent,

I dont know python but i am interested in learning about your solution. It
would be great If you could also share the logic for balancing the cluster.

Thanks,
Anil Gupta

On Mon, Dec 24, 2012 at 9:53 AM, Mohit Anchlia <mohitanch...@gmail.com>wrote:

On Mon, Dec 24, 2012 at 8:27 AM, Ivan Balashov <ibalas...@gmail.com>
wrote:

Vincent Barat <vbarat@...> writes:

Hi,

Balancing regions between RS is correctly handled by HBase : I mean
that your RSs always manage the same number of regions (the balancer
takes care of it).

Unfortunately, balancing all the regions of one particular table
between the RS of your cluster is not always easy, since HBase (as
for 0.90.3) when it comes to splitting a region, create the new one
always on the same RS. This means that if you start with a 1 region
only table, and then you insert lots of data into it, new regions
will always be created to the same RS (if you insert is a M/R job,
you saturate this RS). Eventually, the balancer at a time will
decide to balance one of these regions to other RS, limiting the
issue, but it is not controllable.

Here at Capptain, we solved this problem by developing a special
Python script, based on the HBase shell, allowing to entirely
balance all the regions of all tables to all RS. It ensure that
regions of tables are uniformly deployed on all RS of the cluster,
with a minimum region transitions.

Is it possible to describe the logic at high level on what you did?

It is fast, and even if it can trigger a lot of region transitions,
there is very few impact at runtime and it can be run safely.

If you are interested, just let me know, I can share it.

Regards,

Vincent,

I would much like to see and possibly use the script that you
mentioned. We've just run  into the same issue (after the table
has been truncated it was re-created with only 1 region, and
after data loading and manual splits we ended up having all
regions within the same RS).

If you could share the script, it will be really appreciated,
I believe not only by me.

Thanks,
Ivan









Reply via email to