Re: Miserable Performance of gets

lars hofhansl Tue, 05 Mar 2013 23:06:08 -0800

Hmm... So that theory is out. Anything strange in the logs?
You have 13 region server and 13 data nodes colocated on the same machines, I 
assume.


The Gets are actually sent to the involved region server in parallel, so 
anything more than a few milliseconds is suspect.
How big are the rows/columns that you are retrieving?


Does this change in any way if you major_compact your table?


-- Lars



________________________________
 From: kiran <kiran.sarvabho...@gmail.com>
To: user@hbase.apache.org; lars hofhansl <la...@apache.org> 
Sent: Tuesday, March 5, 2013 10:42 PM
Subject: Re: Miserable Performance of gets
 

Describe output of the table on which I am doing batch gets

{NAME => 'XXXXXXX', FAMILIES => [{NAME => 'XXXXX', DATA_BLOCK_ENCODING => 
'NONE', BLOOMFILTER => 'ROW', true                                              
         
  TTL => '2147483647', IN_MEMORY => 'false', REPLICATION_SCOPE => '0', VERSIONS 
=> '1', COMPRESSION => 'SNAP                                                    
        
 PY', MIN_VERSIONS => '1', COMPRESSION_COMPACT => 'SNAPPY', KEEP_DELETED_CELLS 
=> 'false', BLOCKSIZE => '655                                                   
         
 36', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]} 




On Wed, Mar 6, 2013 at 12:08 PM, kiran <kiran.sarvabho...@gmail.com> wrote:

Yes I have mistaken for regionsize. The regionsize was set to 20GB instead of 
default 10GB. Our blocksize is default 64KB. Our hdfs block size is 128MB. Our 
memstore flush size is 512MB.
>
>
>
>
>On Wed, Mar 6, 2013 at 10:59 AM, lars hofhansl <la...@apache.org> wrote:
>
>Arghh... The default is 64K (kilobytes). :)
>>
>>
>>You might have mixed up the region size with block size. If this is the 
>>actual HBase block size this behavior is perfectly explained (scans are fast 
>>because fewer blocks are loaded, gets are slow because the entire 20GB - or 
>>probably at least an HDFS block of 128MB - has to be brought in).
>>
>>
>>If you can, please attach the output of   describe '<table>'   run in the 
>>shell in order to confirm.
>>
>>
>>
>>-- Lars
>>
>>
>>
>>________________________________
>> From: kiran <kiran.sarvabho...@gmail.com>
>>To: user@hbase.apache.org; lars hofhansl <la...@apache.org>
>>Sent: Tuesday, March 5, 2013 9:24 PM
>>
>>Subject: Re: Miserable Performance of gets
>>
>>Lars,
>>
>>The hbase block size we set to 20GB....
>>
>>Anoop,
>>
>>We have about 13 regionservers and in the worst case these gets may be
>>distributed across all the regionservers...
>>
>>
>>
>>On Wed, Mar 6, 2013 at 10:43 AM, lars hofhansl <la...@apache.org> wrote:
>>
>>> Can you tell us more about your setup?
>>> What does   describe '<your-table>'   in the shell display?
>>>
>>> If I had to make a wild guess I'd say you made the HBase block size (not
>>> the HDFS block size) too big.
>>>
>>>
>>> Thanks.
>>>
>>> -- Lars
>>>
>>>
>>>
>>> ________________________________
>>>  From: kiran <kiran.sarvabho...@gmail.com>
>>> To: user@hbase.apache.org
>>> Sent: Tuesday, March 5, 2013 9:06 PM
>>> Subject: Re: Miserable Performance of gets
>>>
>>> Version is 0.94.1
>>>
>>> Yes, the gets are issued against the second table scanning the first table
>>>
>>>
>>> On Wed, Mar 6, 2013 at 10:27 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>> > Which HBase version are you using ?
>>> >
>>> > bq. But even for 20 gets
>>> > These were issued against the second table ?
>>> >
>>> > Thanks
>>> >
>>> > On Tue, Mar 5, 2013 at 8:36 PM, kiran <kiran.sarvabho...@gmail.com>
>>> wrote:
>>> >
>>> > > Dear All,
>>> > >
>>> > > I had some miserable experience with gets (batch gets) in hbase. I have
>>> > two
>>> > > tables with different rowkeys, columns are distributed across the two
>>> > > tables.
>>> > >
>>> > > Currently what I am doing is scan over one table and get all the
>>> rowkeys
>>> > in
>>> > > the first table matching my filter. Then issue a batch get on another
>>> > table
>>> > > to retrieve some columns. But even for 20 gets, the performance is like
>>> > > miserable (almost a second or two for 20 gets which is not acceptable).
>>> > > But, scanning even on few thousands of rows is getting completed in
>>> > > milliseconds.
>>> > >
>>> > > My concern is for about 20 gets if it takes second or two,
>>> > > How can it scale ??
>>> > > Will the performance be the same even if I issue 1000 gets ??
>>> > > Is it advisable in hbase to avoid gets ??
>>> > >
>>> > > I can include all columns in only one table and do a scan also, but
>>> > before
>>> > > doing that I need to really understand the issue...
>>> > >
>>> > > Is scanning a better solution for scalability and performance ???
>>> > >
>>> > > Is it advisable not to do joins or normalizations in NOSQL databases,
>>> > > include all the data in only table and not do joins with another table
>>> ??
>>> > >
>>> > >
>>> > > --
>>> > > Thank you
>>> > > Kiran Sarvabhotla
>>> > >
>>> > > -----Even a correct decision is wrong when it is taken late
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Thank you
>>> Kiran Sarvabhotla
>>>
>>> -----Even a correct decision is wrong when it is taken late
>>>
>>
>>
>>
>>--
>>Thank you
>>Kiran Sarvabhotla
>>
>>-----Even a correct decision is wrong when it is taken late
>
>
>-- 
>
>Thank you
>Kiran Sarvabhotla
>
>-----Even a correct decision is wrong when it is taken late
>
>


-- 

Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: Miserable Performance of gets

Reply via email to