Re: scan performance improvement

2010-11-11 Thread Friso van Vollenhoven
How small is small? If it is bytes, then setting the value to 50 is not so much different from 1, I suppose. If 50 rows fit in one block, it will just fetch one block whether the setting is 1 or 50. You might want to try a larger value. It should be fine if the records are small and you need the

Re: scan performance improvement

2010-11-11 Thread Oleg Ruchovets
Yes , I thought about large number , so you said it depends on block size. Good point. I have one recored ~ 4k , block size is: dfs.block.size 268435456 HDFS blocksize of 256MB for large file-systems. what is the number that I have choose? Assuming I am afraid that using number which i

Re: scan performance improvement

2010-11-11 Thread Friso van Vollenhoven
Not that block size (that's the HDFS one), but the HBase block size. You set it at table creation or it uses the default of 64K. The description of hbase.client.scanner.caching says: Number of rows that will be fetched when calling next on a scanner if it is not served from memory. Higher caching

RE: scan performance improvement

2010-11-11 Thread Michael Segel
Correct me if I'm wrong, but isn't hbase's default block size 256MB while hadoop's default blocksize is 64MB? > From: fvanvollenho...@xebia.com > To: user@hbase.apache.org > Subject: Re: scan performance improvement > Date: Thu, 11 Nov 2010 13:08:56 + >

Re: scan performance improvement

2010-11-11 Thread Friso van Vollenhoven
CHE => 'true'}]} Also, have a look here to see how HBase stores data: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html On 11 nov 2010, at 14:11, Michael Segel wrote: > > Correct me if I'm wrong, but isn't hbase's default block size 256MB

Re: scan performance improvement

2010-11-11 Thread Oleg Ruchovets
Great , thank you for the explanation. my table schema is: {NAME => 'URLs_sanity', FAMILIES => [{NAME => 'gs', VERSIONS => '1', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'meta-data', VERSIONS => '1', COMPRESS

Re: scan performance improvement

2010-11-11 Thread Friso van Vollenhoven
> Great , thank you for the explanation. > > my table schema is: > > {NAME => 'URLs_sanity', FAMILIES => [{NAME => 'gs', VERSIONS => > '1', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', > IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'meta-data', VERSIONS > =>

Re: scan performance improvement

2010-11-11 Thread Ryan Rawson
I'd be careful about adjusting HFile block size, we took 64k after benchmarking a bunch of things, and it seemed to e a good performance point. As for scanning small rows, I'd go with a caching size of 1000-3000. When I set my scanners to that, I can pull 50k+ rows/sec from 1 client. On Thu, Nov

Re: scan performance improvement

2010-11-11 Thread Oleg Ruchovets
Hi I didn't change a block size ( it is still 64k). Running test configured with caching size of 3600. The test is still running , but I already see that there is NO performance improvement. How can I check that hbase works with changed caching size. Can I see it from logs or some debugging?