Regarding #1, my main concern is that if we poll the region load at a "bad"
time and get back an abnormally high or low value, the balancer could over
react. For example if your regions most recent readRequestsCount is 100 and
you've been seeing 5 for the last 9 times you polled, the "average"
outputted is 52.5 instead of 14.5. This could just be a temporary spike in
requests to a region making it seem much worse than it may be going forward
and cause a region to move when it is actually unnecessary.

On Fri, Jan 13, 2017 at 2:10 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> For #2, you're more than welcome to attach patch on the JIRA.
>
> For #1, last time I tried to trace which JIRA introduced the formula but
> ended up with one Elliott did which just moved that line of code.
> I can spend more time in the future on this.
>
> What downside have you observed for #1 ?
>
> Cheers
>
> On Fri, Jan 13, 2017 at 2:07 PM, Timothy Brown <t...@siftscience.com>
> wrote:
>
> > I tried it out on our staging cluster and saw that the total number of
> > requests per region server a bit more balanced with our current weights
> for
> > the read and write costs. I did not attempt to calculate the exact
> requests
> > per second but rather looked at a relative rate by averaging the increase
> > in reads and writes over the interval that the RegionLoad is currently
> > polled. This should have the same desired effect of balancing the number
> of
> > requests across the cluster. If you don't mind, I would like to take a
> stab
> > at the JIRA you've created.
> >
> > For #1, any idea if this is the desired behavior?
> >
> > Thanks,
> > Tim
> >
> > On Fri, Jan 13, 2017 at 10:27 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > Logged HBASE-17462 for #2.
> > >
> > > FYI
> > >
> > > On Thu, Jan 12, 2017 at 8:49 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > For #2, I think MemstoreSizeCostFunction belongs to the same category
> > if
> > > > we are to adopt moving average.
> > > >
> > > > Some factors to consider:
> > > >
> > > > The data structure used by StochasticLoadBalancer should be concise.
> > The
> > > > number of regions in a cluster can be expected to approach 1 million.
> > We
> > > > cannot afford to store long history of read / write requests in
> master.
> > > >
> > > > Efficiency of cost calculation should be high - there're many cost
> > > > functions the balancer goes through, it is expected for each cost
> > > function
> > > > to return quickly. Otherwise we would not come up with proper region
> > > > movement plan(s) in time.
> > > >
> > > > Cheers
> > > >
> > > > On Wed, Jan 11, 2017 at 5:51 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > >
> > > >> For #2, I think it makes sense to try out using request rates for
> cost
> > > >> calculation.
> > > >>
> > > >> If the experiment result turns out to be better, we can consider
> using
> > > >> such measure.
> > > >>
> > > >> Thanks
> > > >>
> > > >> On Wed, Jan 11, 2017 at 5:34 PM, Timothy Brown <t...@siftscience.com
> >
> > > >> wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> I have a couple of questions about the StochasticLoadBalancer.
> > > >>>
> > > >>> 1) In CostFromRegionLoadFunction.getRegionLoadCost the cost is
> > weights
> > > >>> later samples of the RegionLoad more than previous ones. For
> example,
> > > >>> with
> > > >>> a queue size of 4 it would be (.5 * load1 + .25*load2 + .125*load3
> +
> > > >>> .125*load4). Is this the intended behavior?
> > > >>>
> > > >>> 2) Would it make more sense to calculate the ReadRequestCost and
> > > >>> WriteRequestCost as rates? Right now it looks like the cost is just
> > > based
> > > >>> off the total number of read/write requests a region has gotten
> over
> > > its
> > > >>> lifetime.
> > > >>>
> > > >>> -Tim
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Reply via email to