If I understand you right, you are asking about how region splitting works ... See http://hbase.apache.org/book/regions.arch.html section 9.7.4
In a nutshell, the parent region on your RS1 will split into two daughter regions on the same RS1. If you have load balancer turned on, the master can then "reassign" the daughter regions to other RegionServers based on the number of regions being served by each RS. This is unrelated to how many requests RSn may be receiving. The "region load" above is just number of regions per RS currently. The scheme you describe below would only work in a very "static" data / region assignment scenario where a region will always stick to the same RS until you manually move it around (load balancer turned off, region size tuned up). This is a highly recommended read: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html If you are worried about latency, I hope you have also read up on "Block Cache" and MemStore and sizing them appropriately for your workload. --Suraj On Fri, Jun 29, 2012 at 10:15 AM, Ramchander Varadarajan <ram...@yahoo-inc.com> wrote: > Hi all, > > We are evaluating Hbase to store some metadata information on a very large > scale. As of now, our architecture looks like this. > > Machine 1: > Runs Client 1 > Runs Region Server 1 > Runs Data Node 1 > > Machine n: > Runs Client n > Runs Region Server n > Runs Data Node n > > Now, say, we have only one Region for the data set at the moment and its > maxing out, and the region is in Region Server 1. If a flood of new requests > come in to Machine n, and it tries to store the data, will Region Server n > store it locally on its data node n, or will the requests be routed to Region > Server 1 and a new region is created there after it splits? > > The reason I ask is because I want to see if a Client can be made sticky to a > region server. That way, if a user with an id 1111 comes in, he will be sent > to Client 1 all the time, because we know Region Server 1 will have his > region. We will know that by using his id to figure that out upfront. Just > trying to minimize the latency further. ( Of course I understand that if > nodes are down, there will be ways to route the traffic to another host to > handle the users that fall in that bucket) > > thanks in advance