If you split that one hot region and then move a half to another region server then you will move the half of the load of that hot region server. The set of hot keys then will be spread over 2 region servers instead of one.
2013/7/31 Michael Segel <mse...@segel.com> > 4 regions on 3 servers? > I'd say that they were already balanced. > > The issue is that when they do their get(s) they are hitting one region. > So more splits isn't the answer. > > > On Jul 31, 2013, at 12:49 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > From the information Demian provided in the first email: > > > > bq. a table containing 20 million keys splitted automatically by HBase > in 4 > > regions and balanced in 3 region servers > > > > I think the number of regions should be increased through (manual) > > splitting so that the data is spread more evenly across servers. > > > > If the Get's are scattered across whole key space, there is some > > optimization the client can do. Namely group the Get's by region boundary > > and issue multi get per region. > > > > Please also refer to http://hbase.apache.org/book.html#rowkey.design, > > especially 6.3.2. > > > > Cheers > > > > On Wed, Jul 31, 2013 at 10:14 AM, Dhaval Shah > > <prince_mithi...@yahoo.co.in>wrote: > > > >> Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems > like > >> the 500 Gets are executed sequentially on the region server. > >> > >> Also 3k requests per minute = 50 requests per second. Assuming your > >> requests take 1 sec (which seems really long but who knows) then you > need > >> atleast 50 threads/region server handlers to handle these. Defaults for > >> that number on some older versions of hbase is 10 which means you are > >> running out of threads. Which brings up the following questions - > >> What version of HBase are you running? > >> How many region server handlers do you have? > >> > >> Regards, > >> Dhaval > >> > >> > >> ----- Original Message ----- > >> From: Demian Berjman <dberj...@despegar.com> > >> To: user@hbase.apache.org > >> Cc: > >> Sent: Wednesday, 31 July 2013 11:12 AM > >> Subject: Re: help on key design > >> > >> Thanks for the responses! > >> > >>> why don't you use a scan > >> I'll try that and compare it. > >> > >>> How much memory do you have for your region servers? Have you enabled > >>> block caching? Is your CPU spiking on your region servers? > >> Block caching is enabled. Cpu and memory dont seem to be a problem. > >> > >> We think we are saturating a region because the quantity of keys > requested. > >> In that case my question will be if asking 500+ keys per request is a > >> normal scenario? > >> > >> Cheers, > >> > >> > >> On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina <pablomedin...@gmail.com > >>> wrote: > >> > >>> The scan can be an option if the cost of scanning undesired cells and > >>> discarding them trough filters is better than accessing those keys > >>> individually. I would say that as the number of 'undesired' cells > >> decreases > >>> the scan overall performance/efficiency gets increased. It all depends > on > >>> how the keys are designed to be grouped together. > >>> > >>> 2013/7/30 Ted Yu <yuzhih...@gmail.com> > >>> > >>>> Please also go over http://hbase.apache.org/book.html#perf.reading > >>>> > >>>> Cheers > >>>> > >>>> On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah < > >>> prince_mithi...@yahoo.co.in > >>>>> wrote: > >>>> > >>>>> If all your keys are grouped together, why don't you use a scan with > >>>>> start/end key specified? A sequential scan can theoretically be > >> faster > >>>> than > >>>>> MultiGet lookups (assuming your grouping is tight, you can also use > >>>> filters > >>>>> with the scan to give better performance) > >>>>> > >>>>> How much memory do you have for your region servers? Have you enabled > >>>>> block caching? Is your CPU spiking on your region servers? > >>>>> > >>>>> If you are saturating the resources on your *hot* region server then > >>> yes > >>>>> having more region servers will help. If no, then something else is > >> the > >>>>> bottleneck and you probably need to dig further > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Regards, > >>>>> Dhaval > >>>>> > >>>>> > >>>>> ________________________________ > >>>>> From: Demian Berjman <dberj...@despegar.com> > >>>>> To: user@hbase.apache.org > >>>>> Sent: Tuesday, 30 July 2013 4:37 PM > >>>>> Subject: help on key design > >>>>> > >>>>> > >>>>> Hi, > >>>>> > >>>>> I would like to explain our use case of HBase, the row key design and > >>> the > >>>>> problems we are having so anyone can give us a help: > >>>>> > >>>>> The first thing we noticed is that our data set is too small compared > >>> to > >>>>> other cases we read in the list and forums. We have a table > >> containing > >>> 20 > >>>>> million keys splitted automatically by HBase in 4 regions and > >> balanced > >>>> in 3 > >>>>> region servers. We have designed our key to keep together the set of > >>> keys > >>>>> requested by our app. That is, when we request a set of keys we > >> expect > >>>> them > >>>>> to be grouped together to improve data locality and block cache > >>>> efficiency. > >>>>> > >>>>> The second thing we noticed, compared to other cases, is that we > >>>> retrieve a > >>>>> bunch keys per request (500 aprox). Thus, during our peaks (3k > >> requests > >>>> per > >>>>> minute), we have a lot of requests going to a particular region > >> servers > >>>> and > >>>>> asking a lot of keys. That results in poor response times (in the > >> order > >>>> of > >>>>> seconds). Currently we are using multi gets. > >>>>> > >>>>> We think an improvement would be to spread the keys (introducing a > >>>>> randomized component on it) in more region servers, so each rs will > >>> have > >>>> to > >>>>> handle less keys and probably less requests. Doing that way the multi > >>>> gets > >>>>> will be spread over the region servers. > >>>>> > >>>>> Our questions: > >>>>> > >>>>> 1. Is it correct this design of asking so many keys on each request? > >>> (if > >>>>> you need high performance) > >>>>> 2. What about splitting in more region servers? It's a good idea? How > >>> we > >>>>> could accomplish this? We thought in apply some hashing... > >>>>> > >>>>> Thanks in advance! > >>>>> > >>>> > >>> > >> > >> > >