Clara Xiong created HBASE-26178:
-----------------------------------

             Summary: Improve data structure for BalanceClusterState to improve 
computation speed for large cluster
                 Key: HBASE-26178
                 URL: https://issues.apache.org/jira/browse/HBASE-26178
             Project: HBase
          Issue Type: Bug
            Reporter: Clara Xiong


With ~800 node and ~500 regions per node on our large production cluster, 
balancer cannot complete within hours even after we just add 2% servers after 
maintenance. 

The unit tests with larger number of regions are taking longer and longer with 
changes to balancer with recent changes too, evident with the increment of the 
time limit recent PR's included.

It is time to replace some of the data structure for better time complexity 
including:

int[][] regionsPerServer; // serverIndex -> region list

int[][] regionsPerHost; // hostIndex -> list of regions
int[][] regionsPerRack; // rackIndex -> region list

// serverIndex -> sorted list of regions by primary region index
ArrayList<HashSet<Integer>> primariesOfRegionsPerServer;

// hostIndex -> sorted list of regions by primary region index
int[][] primariesOfRegionsPerHost;

// rackIndex -> sorted list of regions by primary region index
int[][] primariesOfRegionsPerRack;

which needs O(n) time to update for every move test iteration. (n = number of 
regions per server/host/rack).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to