The 256M = default MAX_FILE_SIZE
64K = default HBase block size
64M = HDFS default block size

If you look at a table definition in the HBase master UI you can see settings 
for your table. Like this:
{NAME => 'inrdb_rir_stats', MAX_FILESIZE => '268435456', FAMILIES => [{NAME => 
VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 
'false', BLOCKCACHE => 'true'}, {NAME => 'meta', BLOOMFILTER => 'NONE', 
'2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 

Also, have a look here to see how HBase stores data:

On 11 nov 2010, at 14:11, Michael Segel wrote:

> Correct me if I'm wrong, but isn't hbase's default block size 256MB while 
> hadoop's default blocksize is 64MB?
>> From:
>> To:
>> Subject: Re: scan performance improvement
>> Date: Thu, 11 Nov 2010 13:08:56 +0000
>> Not that block size (that's the HDFS one), but the HBase block size. You set 
>> it at table creation or it uses the default of 64K.
>> The description of hbase.client.scanner.caching says:
>> Number of rows that will be fetched when calling next
>> on a scanner if it is not served from memory. Higher caching values
>> will enable faster scanners but will eat up more memory and some
>> calls of next may take longer and longer times when the cache is empty.
>> That means that it will pre-fetch that number of rows, if the next row does 
>> not come from memory. So if your rows are small enough to fit 100 of them in 
>> one block, it doesn't matter whether you pre-fetch 1, 50 or 99, because it 
>> will only go to disk when it exhausts the whole block, which sticks in block 
>> cache. So, it will still fetch the same amount of data from disk every time. 
>> If you increase the number to a value that is certain to load multiple 
>> blocks at a time from disk, it will increase performance.
>> On 11 nov 2010, at 12:55, Oleg Ruchovets wrote:
>>> Yes , I thought about large number , so you said it depends on block size.
>>> Good point.
>>> I have one recored ~ 4k ,
>>> block size is:
>>> <property>
>>> <name>dfs.block.size</name>
>>> <value>268435456</value>
>>> <description>HDFS blocksize of 256MB for large file-systems.
>>> </description>
>>> </property>
>>> what is the number that I have choose? Assuming
>>> I am afraid that using number which is equal one block brings to
>>> socketTimeOutException? Am I write?
>>> Thanks Oleg.
>>> On Thu, Nov 11, 2010 at 1:30 PM, Friso van Vollenhoven <
>>>> wrote:
>>>> How small is small? If it is bytes, then setting the value to 50 is not so
>>>> much different from 1, I suppose. If 50 rows fit in one block, it will just
>>>> fetch one block whether the setting is 1 or 50. You might want to try a
>>>> larger value. It should be fine if the records are small and you need them
>>>> all on the client side anyway.
>>>> It also depends on the block size, of course. When you only ever do full
>>>> scans on a table and little random access, you might want to increase that.
>>>> Friso
>>>> On 11 nov 2010, at 12:15, Oleg Ruchovets wrote:
>>>>> Hi ,
>>>>> To improve client performance I  changed
>>>>> hbase.client.scanner.caching from 1 to 50.
>>>>> After running client with new value( hbase.client.scanner.caching from =
>>>> 50
>>>>> ) it didn't improve execution time at all.
>>>>> I have ~ 9 million small records.
>>>>> I have to do full scan  , so it brings all 9 million records to client .
>>>>> My assumption -- this change have to bring significant improvement , but
>>>> it
>>>>> is not.
>>>>> Additional Information.
>>>>> I scan table which has 100 regions
>>>>> 5 server
>>>>> 20 map
>>>>> 4  concurrent map
>>>>> Scan process takes 5.5 - 6 hours. As for me it is too much time? Am I
>>>> write?
>>>>> and how can I improve it
>>>>> I changed the value in all hbase-site.xml files and restart hbase.
>>>>> Any suggestions.

Reply via email to