Was in a meeting ... In 0.94, if you look at HConnectionManager#processBatchCallback(), you would see:
MultiAction<R> actions = actionsByServer.get(loc); if (actions == null) { actions = new MultiAction<R>(); actionsByServer.put(loc, actions); } where: Map<HRegionLocation, MultiAction<R>> actionsByServer = new HashMap<HRegionLocation, MultiAction<R>>(); And HRegionLocation#hashCode() is defined as: public int hashCode() { return this.serverName.hashCode(); } So the grouping happens at region server level. Cheers On Wed, Jul 31, 2013 at 11:00 AM, Pablo Medina <pablomedin...@gmail.com>wrote: > Isn't that a job by the multiGet at the client side?. I mean, when you > provide a list a of gets the client groups them in regions and region > servers and them submits a job to its executor in order to call the region > servers in parallel. Is that what you mean, right?. > > > > 2013/7/31 Ted Yu <yuzhih...@gmail.com> > > > From the information Demian provided in the first email: > > > > bq. a table containing 20 million keys splitted automatically by HBase > in 4 > > regions and balanced in 3 region servers > > > > I think the number of regions should be increased through (manual) > > splitting so that the data is spread more evenly across servers. > > > > If the Get's are scattered across whole key space, there is some > > optimization the client can do. Namely group the Get's by region boundary > > and issue multi get per region. > > > > Please also refer to http://hbase.apache.org/book.html#rowkey.design, > > especially 6.3.2. > > > > Cheers > > > > On Wed, Jul 31, 2013 at 10:14 AM, Dhaval Shah > > <prince_mithi...@yahoo.co.in>wrote: > > > > > Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems > > like > > > the 500 Gets are executed sequentially on the region server. > > > > > > Also 3k requests per minute = 50 requests per second. Assuming your > > > requests take 1 sec (which seems really long but who knows) then you > need > > > atleast 50 threads/region server handlers to handle these. Defaults for > > > that number on some older versions of hbase is 10 which means you are > > > running out of threads. Which brings up the following questions - > > > What version of HBase are you running? > > > How many region server handlers do you have? > > > > > > Regards, > > > Dhaval > > > > > > > > > ----- Original Message ----- > > > From: Demian Berjman <dberj...@despegar.com> > > > To: user@hbase.apache.org > > > Cc: > > > Sent: Wednesday, 31 July 2013 11:12 AM > > > Subject: Re: help on key design > > > > > > Thanks for the responses! > > > > > > > why don't you use a scan > > > I'll try that and compare it. > > > > > > > How much memory do you have for your region servers? Have you enabled > > > > block caching? Is your CPU spiking on your region servers? > > > Block caching is enabled. Cpu and memory dont seem to be a problem. > > > > > > We think we are saturating a region because the quantity of keys > > requested. > > > In that case my question will be if asking 500+ keys per request is a > > > normal scenario? > > > > > > Cheers, > > > > > > > > > On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina < > pablomedin...@gmail.com > > > >wrote: > > > > > > > The scan can be an option if the cost of scanning undesired cells and > > > > discarding them trough filters is better than accessing those keys > > > > individually. I would say that as the number of 'undesired' cells > > > decreases > > > > the scan overall performance/efficiency gets increased. It all > depends > > on > > > > how the keys are designed to be grouped together. > > > > > > > > 2013/7/30 Ted Yu <yuzhih...@gmail.com> > > > > > > > > > Please also go over http://hbase.apache.org/book.html#perf.reading > > > > > > > > > > Cheers > > > > > > > > > > On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah < > > > > prince_mithi...@yahoo.co.in > > > > > >wrote: > > > > > > > > > > > If all your keys are grouped together, why don't you use a scan > > with > > > > > > start/end key specified? A sequential scan can theoretically be > > > faster > > > > > than > > > > > > MultiGet lookups (assuming your grouping is tight, you can also > use > > > > > filters > > > > > > with the scan to give better performance) > > > > > > > > > > > > How much memory do you have for your region servers? Have you > > enabled > > > > > > block caching? Is your CPU spiking on your region servers? > > > > > > > > > > > > If you are saturating the resources on your *hot* region server > > then > > > > yes > > > > > > having more region servers will help. If no, then something else > is > > > the > > > > > > bottleneck and you probably need to dig further > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > Dhaval > > > > > > > > > > > > > > > > > > ________________________________ > > > > > > From: Demian Berjman <dberj...@despegar.com> > > > > > > To: user@hbase.apache.org > > > > > > Sent: Tuesday, 30 July 2013 4:37 PM > > > > > > Subject: help on key design > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > I would like to explain our use case of HBase, the row key design > > and > > > > the > > > > > > problems we are having so anyone can give us a help: > > > > > > > > > > > > The first thing we noticed is that our data set is too small > > compared > > > > to > > > > > > other cases we read in the list and forums. We have a table > > > containing > > > > 20 > > > > > > million keys splitted automatically by HBase in 4 regions and > > > balanced > > > > > in 3 > > > > > > region servers. We have designed our key to keep together the set > > of > > > > keys > > > > > > requested by our app. That is, when we request a set of keys we > > > expect > > > > > them > > > > > > to be grouped together to improve data locality and block cache > > > > > efficiency. > > > > > > > > > > > > The second thing we noticed, compared to other cases, is that we > > > > > retrieve a > > > > > > bunch keys per request (500 aprox). Thus, during our peaks (3k > > > requests > > > > > per > > > > > > minute), we have a lot of requests going to a particular region > > > servers > > > > > and > > > > > > asking a lot of keys. That results in poor response times (in the > > > order > > > > > of > > > > > > seconds). Currently we are using multi gets. > > > > > > > > > > > > We think an improvement would be to spread the keys (introducing > a > > > > > > randomized component on it) in more region servers, so each rs > will > > > > have > > > > > to > > > > > > handle less keys and probably less requests. Doing that way the > > multi > > > > > gets > > > > > > will be spread over the region servers. > > > > > > > > > > > > Our questions: > > > > > > > > > > > > 1. Is it correct this design of asking so many keys on each > > request? > > > > (if > > > > > > you need high performance) > > > > > > 2. What about splitting in more region servers? It's a good idea? > > How > > > > we > > > > > > could accomplish this? We thought in apply some hashing... > > > > > > > > > > > > Thanks in advance! > > > > > > > > > > > > > > > > > > > > > > > >