The 256M = default MAX_FILE_SIZE 64K = default HBase block size 64M = HDFS default block size
If you look at a table definition in the HBase master UI you can see settings for your table. Like this: {NAME => 'inrdb_rir_stats', MAX_FILESIZE => '268435456', FAMILIES => [{NAME => 'data', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'LZO', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'meta', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'LZO', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} Also, have a look here to see how HBase stores data: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html On 11 nov 2010, at 14:11, Michael Segel wrote: > > Correct me if I'm wrong, but isn't hbase's default block size 256MB while > hadoop's default blocksize is 64MB? > > >> From: fvanvollenho...@xebia.com >> To: user@hbase.apache.org >> Subject: Re: scan performance improvement >> Date: Thu, 11 Nov 2010 13:08:56 +0000 >> >> Not that block size (that's the HDFS one), but the HBase block size. You set >> it at table creation or it uses the default of 64K. >> >> The description of hbase.client.scanner.caching says: >> Number of rows that will be fetched when calling next >> on a scanner if it is not served from memory. Higher caching values >> will enable faster scanners but will eat up more memory and some >> calls of next may take longer and longer times when the cache is empty. >> >> That means that it will pre-fetch that number of rows, if the next row does >> not come from memory. So if your rows are small enough to fit 100 of them in >> one block, it doesn't matter whether you pre-fetch 1, 50 or 99, because it >> will only go to disk when it exhausts the whole block, which sticks in block >> cache. So, it will still fetch the same amount of data from disk every time. >> If you increase the number to a value that is certain to load multiple >> blocks at a time from disk, it will increase performance. >> >> >> >> On 11 nov 2010, at 12:55, Oleg Ruchovets wrote: >> >>> Yes , I thought about large number , so you said it depends on block size. >>> Good point. >>> >>> I have one recored ~ 4k , >>> block size is: >>> >>> <property> >>> <name>dfs.block.size</name> >>> <value>268435456</value> >>> <description>HDFS blocksize of 256MB for large file-systems. >>> </description> >>> </property> >>> >>> what is the number that I have choose? Assuming >>> I am afraid that using number which is equal one block brings to >>> socketTimeOutException? Am I write? >>> >>> Thanks Oleg. >>> >>> >>> >>> >>> On Thu, Nov 11, 2010 at 1:30 PM, Friso van Vollenhoven < >>> fvanvollenho...@xebia.com> wrote: >>> >>>> How small is small? If it is bytes, then setting the value to 50 is not so >>>> much different from 1, I suppose. If 50 rows fit in one block, it will just >>>> fetch one block whether the setting is 1 or 50. You might want to try a >>>> larger value. It should be fine if the records are small and you need them >>>> all on the client side anyway. >>>> >>>> It also depends on the block size, of course. When you only ever do full >>>> scans on a table and little random access, you might want to increase that. >>>> >>>> Friso >>>> >>>> >>>> >>>> >>>> On 11 nov 2010, at 12:15, Oleg Ruchovets wrote: >>>> >>>>> Hi , >>>>> To improve client performance I changed >>>>> hbase.client.scanner.caching from 1 to 50. >>>>> After running client with new value( hbase.client.scanner.caching from = >>>> 50 >>>>> ) it didn't improve execution time at all. >>>>> >>>>> I have ~ 9 million small records. >>>>> I have to do full scan , so it brings all 9 million records to client . >>>>> My assumption -- this change have to bring significant improvement , but >>>> it >>>>> is not. >>>>> >>>>> Additional Information. >>>>> I scan table which has 100 regions >>>>> 5 server >>>>> 20 map >>>>> 4 concurrent map >>>>> Scan process takes 5.5 - 6 hours. As for me it is too much time? Am I >>>> write? >>>>> and how can I improve it >>>>> >>>>> >>>>> I changed the value in all hbase-site.xml files and restart hbase. >>>>> >>>>> Any suggestions. >>>> >>>> >> >