Ok. I see it in TableInputFormat: // false by default, full table scans generate too much BC churn scan.setCacheBlocks((conf.getBoolean(SCAN_CACHEBLOCKS, false)));
So ne need to to it too in initTableMapperJob I guess... Thanks, JM 2014-04-11 16:53 GMT-04:00 lars hofhansl <la...@apache.org>: > Yep. For all of our M/R jobs we do indeed disable the caching of blocks. > In fact TableInputFormat sets cache blocks to false currently anyway. > > -- Lars > > ------------------------------ > *From:* Jean-Marc Spaggiari <jean-m...@spaggiari.org> > *To:* user <user@hbase.apache.org>; lars hofhansl <la...@apache.org> > *Sent:* Friday, April 11, 2014 6:54 AM > > *Subject:* Re: BlockCache for large scans. > > Hi Lars, > > So just to continue on that, when we are do MR jobs with HBase, this > should be disable too since we will read the entire table, right? Is this > done by default or it's something the client should setup manually? On my > own code I setup this manually. I looked into > TableMapReduceUtil.initTableMapperJob and there is nothing there. Should we > not just set CacheBlocks to false in initTableMapperJob directly? > > JM > > > 2014-04-10 14:50 GMT-04:00 lars hofhansl <la...@apache.org>: > > Generally (and this is database lore not just HBase) if you use an LRU > type cache, your working set does not fit into the cache, and you > repeatedly scan this working set you have created the worst case scenario. > The database does all the work caching the blocks, and subsequent scans > will need block that were just evicted towards end of the previous scan. > > For large scans where it is likely that the entire scan does not fit into > the block cache, you should absolutely disable caching the blocks traversed > for this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not > affected, they are cached regardless. > > -- Lars > > > > ________________________________ > From: gortiz <gor...@pragsis.com> > To: user@hbase.apache.org > Sent: Wednesday, April 9, 2014 11:37 PM > Subject: Re: BlockCache for large scans. > > > But, I think there's a direct relation between improving performance in > large scan and memory for memstore. Until I understand, memstore just > work as cache to write operations. > > > On 09/04/14 23:44, Ted Yu wrote: > > Didn't quite get what you mean, Asaf. > > > > If you're talking about HBASE-5349, please read release note of > HBASE-5349. > > > > By default, memstore min/max range is initialized to memstore percent: > > > > globalMemStorePercentMinRange = conf.getFloat( > > MEMSTORE_SIZE_MIN_RANGE_KEY, > > > > globalMemStorePercent); > > > > globalMemStorePercentMaxRange = conf.getFloat( > > MEMSTORE_SIZE_MAX_RANGE_KEY, > > > > globalMemStorePercent); > > > > Cheers > > > > > > On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika <asaf.mes...@gmail.com> > wrote: > > > >> The Jira says it's enabled by auto. Is there an official explaining this > >> feature? > >> > >> On Wednesday, April 9, 2014, Ted Yu <yuzhih...@gmail.com> wrote: > >> > >>> Please take a look at http://www.n10k.com/blog/blockcache-101/ > >>> > >>> For D, hbase.regionserver.global.memstore.size is specified in terms of > >>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak > >>> global memstore and block cache sizes based on workload' > >>> > >>> > >>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gor...@pragsis.com > <javascript:;>> > >>> wrote: > >>> > >>>> I've been reading the book definitive guide and hbase in action a > >> little. > >>>> I found this question from Cloudera that I'm not sure after looking > >> some > >>>> benchmarks and documentations from HBase. Could someone explain me a > >>> little > >>>> about? . I think that when you do a large scan you should disable the > >>>> blockcache becuase the blocks are going to swat a lot, so you didn't > >> get > >>>> anything from cache, I guess you should be penalized since you're > >>> spending > >>>> memory, calling GC and CPU with this task. > >>>> > >>>> *You want to do a full table scan on your data. You decide to disable > >>>> block caching to see if this** > >>>> **improves scan performance. Will disabling block caching improve scan > >>>> performance?* > >>>> > >>>> A. > >>>> No. Disabling block caching does not improve scan performance. > >>>> > >>>> B. > >>>> Yes. When you disable block caching, you free up that memory for other > >>>> operations. With a full > >>>> table scan, you cannot take advantage of block caching anyway because > >>> your > >>>> entire table won't fit > >>>> into cache. > >>>> > >>>> C. > >>>> No. If you disable block caching, HBase must read each block index > from > >>>> disk for each scan, > >>>> thereby decreasing scan performance. > >>>> > >>>> D. > >>>> Yes. When you disable block caching, you free up memory for MemStore, > >>>> which improves, > >>>> scan performance. > >>>> > >>>> > > > -- > *Guillermo Ortiz* > /Big Data Developer/ > > Telf.: +34 917 680 490 > Fax: +34 913 833 301 > C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain > > _http://www.bidoop.es_ > > > > >