Regarding #1, my main concern is that if we poll the region load at a "bad" time and get back an abnormally high or low value, the balancer could over react. For example if your regions most recent readRequestsCount is 100 and you've been seeing 5 for the last 9 times you polled, the "average" outputted is 52.5 instead of 14.5. This could just be a temporary spike in requests to a region making it seem much worse than it may be going forward and cause a region to move when it is actually unnecessary.
On Fri, Jan 13, 2017 at 2:10 PM, Ted Yu <yuzhih...@gmail.com> wrote: > For #2, you're more than welcome to attach patch on the JIRA. > > For #1, last time I tried to trace which JIRA introduced the formula but > ended up with one Elliott did which just moved that line of code. > I can spend more time in the future on this. > > What downside have you observed for #1 ? > > Cheers > > On Fri, Jan 13, 2017 at 2:07 PM, Timothy Brown <t...@siftscience.com> > wrote: > > > I tried it out on our staging cluster and saw that the total number of > > requests per region server a bit more balanced with our current weights > for > > the read and write costs. I did not attempt to calculate the exact > requests > > per second but rather looked at a relative rate by averaging the increase > > in reads and writes over the interval that the RegionLoad is currently > > polled. This should have the same desired effect of balancing the number > of > > requests across the cluster. If you don't mind, I would like to take a > stab > > at the JIRA you've created. > > > > For #1, any idea if this is the desired behavior? > > > > Thanks, > > Tim > > > > On Fri, Jan 13, 2017 at 10:27 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > Logged HBASE-17462 for #2. > > > > > > FYI > > > > > > On Thu, Jan 12, 2017 at 8:49 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > For #2, I think MemstoreSizeCostFunction belongs to the same category > > if > > > > we are to adopt moving average. > > > > > > > > Some factors to consider: > > > > > > > > The data structure used by StochasticLoadBalancer should be concise. > > The > > > > number of regions in a cluster can be expected to approach 1 million. > > We > > > > cannot afford to store long history of read / write requests in > master. > > > > > > > > Efficiency of cost calculation should be high - there're many cost > > > > functions the balancer goes through, it is expected for each cost > > > function > > > > to return quickly. Otherwise we would not come up with proper region > > > > movement plan(s) in time. > > > > > > > > Cheers > > > > > > > > On Wed, Jan 11, 2017 at 5:51 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > >> For #2, I think it makes sense to try out using request rates for > cost > > > >> calculation. > > > >> > > > >> If the experiment result turns out to be better, we can consider > using > > > >> such measure. > > > >> > > > >> Thanks > > > >> > > > >> On Wed, Jan 11, 2017 at 5:34 PM, Timothy Brown <t...@siftscience.com > > > > > >> wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> I have a couple of questions about the StochasticLoadBalancer. > > > >>> > > > >>> 1) In CostFromRegionLoadFunction.getRegionLoadCost the cost is > > weights > > > >>> later samples of the RegionLoad more than previous ones. For > example, > > > >>> with > > > >>> a queue size of 4 it would be (.5 * load1 + .25*load2 + .125*load3 > + > > > >>> .125*load4). Is this the intended behavior? > > > >>> > > > >>> 2) Would it make more sense to calculate the ReadRequestCost and > > > >>> WriteRequestCost as rates? Right now it looks like the cost is just > > > based > > > >>> off the total number of read/write requests a region has gotten > over > > > its > > > >>> lifetime. > > > >>> > > > >>> -Tim > > > >>> > > > >> > > > >> > > > > > > > > > >