Hi,
 sure, we are experiencing the following:
- regions are getting unavailable for much less time, so clients are no longer failing (in fact some of them usually still fail on RetriesExhausedException caused by "failed setting up proxy", but there are only few of them) - on the other hand the cluster is a little imbalanced - this is caused by slow rebalancing, which stops as soon as hbase.regions.slop is exceeded

Jan

On 8.1.2011 22:11, M. C. Srivas wrote:
If you did the change, can you share your experience/results?

On Wed, Dec 15, 2010 at 12:04 AM, Jan Lukavský <jan.lukav...@firma.seznam.cz <mailto:jan.lukav...@firma.seznam.cz>> wrote:

    We can give it a try. Currently we use 512 MiB per region, is
    there any upper bound for this value which is not recommended to
    cross? Are there any side-effects we may expect when we set this
    value to say 1 GiB? I suppose at least a bit longer random gets?

    Thanks,
     Jan


    On 14.12.2010 18:50, Stack wrote:

        Can you do w/ less regions?  1k plus per server is pushing it
        I'd say.
         Can you up your region sizes, for instance?
        St.Ack

        On Mon, Dec 13, 2010 at 8:36 AM, Jan Lukavský
        <jan.lukav...@firma.seznam.cz
        <mailto:jan.lukav...@firma.seznam.cz>>  wrote:

            Hi all,

            we are using HBase 0.20.6 on a cluster of about 25 nodes
            with about 30k
            regions and are experiencing as issue which causes running
             M/R jobs to
            fail.
            When we restart single RegionServer, then happens the
            following:
             1) all regions of that RS get reassigned to remaing (say
            24) nodes
             2) when the restarted RegionServer comes up, HMaster
            closes about 60
            regions on all 24 nodes and assigns them back to the
            restarted node

            Now, the step 1) is usually very quick (if we can assign
            10 regions per
            heartbeat, we have 240 regions per heartbeat on the whole
            cluster).
            The step 2) seems problematic, because first about 1200
            regions get
            unassigned, and then they get slowly assigned to the
            single RS (speed again
            10 regions per heartbeat). This time causes clients of
            Maps connected to the
            regions to throw RetriesExhaustedException.

            I'm aware that we can limit number of regions closed per
            RegionServer
            heartbeat by hbase.regions.close.max, but this config
            option seems a bit
            unsatisfactory, because as we increase size of the
            cluster, we will get more
            and more regions unassigned in single cluster heartbeat
            (say we limit this
            to 1, then we get 24 unassigned regions, but only 10
            assigned per
            heartbeat). This led us to a solution, which seems quite
            simple. We have
            introduced new config option which is used to limit number
            of regions in
            transition. When regionsInTransition.size() crosses
            boundary, we temporarily
            stop load balancer. This seems to resolve our issue,
            because no region gets
            unassigned for long time and clients manage to recover
            within their number
            of retries.

            My question is, is this s general issue and a new config
            option should be
            proposed, or I am missing something a we could have
            resolved the issue with
            some other config option tuning?

            Thanks.
             Jan




--
    Jan Lukavský
    programátor
    Seznam.cz, a.s.
    Radlická 608/2
    15000, Praha 5

    jan.lukav...@firma.seznam.cz <mailto:jan.lukav...@firma.seznam.cz>
    http://www.seznam.cz




--

Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5

jan.lukav...@firma.seznam.cz
http://www.seznam.cz

Reply via email to