I tried it out on our staging cluster and saw that the total number of
requests per region server a bit more balanced with our current weights for
the read and write costs. I did not attempt to calculate the exact requests
per second but rather looked at a relative rate by averaging the increase
in reads and writes over the interval that the RegionLoad is currently
polled. This should have the same desired effect of balancing the number of
requests across the cluster. If you don't mind, I would like to take a stab
at the JIRA you've created.

For #1, any idea if this is the desired behavior?

Thanks,
Tim

On Fri, Jan 13, 2017 at 10:27 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Logged HBASE-17462 for #2.
>
> FYI
>
> On Thu, Jan 12, 2017 at 8:49 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > For #2, I think MemstoreSizeCostFunction belongs to the same category if
> > we are to adopt moving average.
> >
> > Some factors to consider:
> >
> > The data structure used by StochasticLoadBalancer should be concise. The
> > number of regions in a cluster can be expected to approach 1 million. We
> > cannot afford to store long history of read / write requests in master.
> >
> > Efficiency of cost calculation should be high - there're many cost
> > functions the balancer goes through, it is expected for each cost
> function
> > to return quickly. Otherwise we would not come up with proper region
> > movement plan(s) in time.
> >
> > Cheers
> >
> > On Wed, Jan 11, 2017 at 5:51 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> >> For #2, I think it makes sense to try out using request rates for cost
> >> calculation.
> >>
> >> If the experiment result turns out to be better, we can consider using
> >> such measure.
> >>
> >> Thanks
> >>
> >> On Wed, Jan 11, 2017 at 5:34 PM, Timothy Brown <t...@siftscience.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I have a couple of questions about the StochasticLoadBalancer.
> >>>
> >>> 1) In CostFromRegionLoadFunction.getRegionLoadCost the cost is weights
> >>> later samples of the RegionLoad more than previous ones. For example,
> >>> with
> >>> a queue size of 4 it would be (.5 * load1 + .25*load2 + .125*load3 +
> >>> .125*load4). Is this the intended behavior?
> >>>
> >>> 2) Would it make more sense to calculate the ReadRequestCost and
> >>> WriteRequestCost as rates? Right now it looks like the cost is just
> based
> >>> off the total number of read/write requests a region has gotten over
> its
> >>> lifetime.
> >>>
> >>> -Tim
> >>>
> >>
> >>
> >
>

Reply via email to