Re: BlockCache for large scans.

Jean-Marc Spaggiari Fri, 11 Apr 2014 14:44:28 -0700

Ok. I see it in TableInputFormat:
        // false by default, full table scans generate too much BC churn
        scan.setCacheBlocks((conf.getBoolean(SCAN_CACHEBLOCKS, false)));


So ne need to to it too in initTableMapperJob I guess...

Thanks,

JM


2014-04-11 16:53 GMT-04:00 lars hofhansl <la...@apache.org>:

> Yep. For all of our M/R jobs we do indeed disable the caching of blocks.
> In fact TableInputFormat sets cache blocks to false currently anyway.
>
> -- Lars
>
>   ------------------------------
>  *From:* Jean-Marc Spaggiari <jean-m...@spaggiari.org>
> *To:* user <user@hbase.apache.org>; lars hofhansl <la...@apache.org>
> *Sent:* Friday, April 11, 2014 6:54 AM
>
> *Subject:* Re: BlockCache for large scans.
>
> Hi Lars,
>
> So just to continue on that, when we are do MR jobs with HBase, this
> should be disable too since we will read the entire table, right? Is this
> done by default or it's something the client should setup manually? On my
> own code I setup this manually. I looked into
> TableMapReduceUtil.initTableMapperJob and there is nothing there. Should we
> not just set CacheBlocks to false in initTableMapperJob directly?
>
> JM
>
>
> 2014-04-10 14:50 GMT-04:00 lars hofhansl <la...@apache.org>:
>
> Generally (and this is database lore not just HBase) if you use an LRU
> type cache, your working set does not fit into the cache, and you
> repeatedly scan this working set you have created the worst case scenario.
> The database does all the work caching the blocks, and subsequent scans
> will need block that were just evicted towards end of the previous scan.
>
> For large scans where it is likely that the entire scan does not fit into
> the block cache, you should absolutely disable caching the blocks traversed
> for this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not
> affected, they are cached regardless.
>
> -- Lars
>
>
>
> ________________________________
>  From: gortiz <gor...@pragsis.com>
> To: user@hbase.apache.org
> Sent: Wednesday, April 9, 2014 11:37 PM
> Subject: Re: BlockCache for large scans.
>
>
> But, I think there's a direct relation between improving performance in
> large scan and memory for memstore. Until I understand, memstore just
> work as cache to write operations.
>
>
> On 09/04/14 23:44, Ted Yu wrote:
> > Didn't quite get what you mean, Asaf.
> >
> > If you're talking about HBASE-5349, please read release note of
> HBASE-5349.
> >
> > By default, memstore min/max range is initialized to memstore percent:
> >
> >      globalMemStorePercentMinRange = conf.getFloat(
> > MEMSTORE_SIZE_MIN_RANGE_KEY,
> >
> >          globalMemStorePercent);
> >
> >      globalMemStorePercentMaxRange = conf.getFloat(
> > MEMSTORE_SIZE_MAX_RANGE_KEY,
> >
> >          globalMemStorePercent);
> >
> > Cheers
> >
> >
> > On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika <asaf.mes...@gmail.com>
> wrote:
> >
> >> The Jira says it's enabled by auto. Is there an official explaining this
> >> feature?
> >>
> >> On Wednesday, April 9, 2014, Ted Yu <yuzhih...@gmail.com> wrote:
> >>
> >>> Please take a look at http://www.n10k.com/blog/blockcache-101/
> >>>
> >>> For D, hbase.regionserver.global.memstore.size is specified in terms of
> >>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
> >>> global memstore and block cache sizes based on workload'
> >>>
> >>>
> >>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gor...@pragsis.com
> <javascript:;>>
> >>> wrote:
> >>>
> >>>> I've been reading the book definitive guide and hbase in action a
> >> little.
> >>>> I found this question from Cloudera that I'm not sure after looking
> >> some
> >>>> benchmarks and documentations from HBase. Could someone explain me a
> >>> little
> >>>> about? . I think that when you do a large scan you should disable the
> >>>> blockcache becuase the blocks are going to swat a lot, so you didn't
> >> get
> >>>> anything from cache, I guess you should be penalized since you're
> >>> spending
> >>>> memory, calling GC and CPU with this task.
> >>>>
> >>>> *You want to do a full table scan on your data. You decide to disable
> >>>> block caching to see if this**
> >>>> **improves scan performance. Will disabling block caching improve scan
> >>>> performance?*
> >>>>
> >>>> A.
> >>>> No. Disabling block caching does not improve scan performance.
> >>>>
> >>>> B.
> >>>> Yes. When you disable block caching, you free up that memory for other
> >>>> operations. With a full
> >>>> table scan, you cannot take advantage of block caching anyway because
> >>> your
> >>>> entire table won't fit
> >>>> into cache.
> >>>>
> >>>> C.
> >>>> No. If you disable block caching, HBase must read each block index
> from
> >>>> disk for each scan,
> >>>> thereby decreasing scan performance.
> >>>>
> >>>> D.
> >>>> Yes. When you disable block caching, you free up memory for MemStore,
> >>>> which improves,
> >>>> scan performance.
> >>>>
> >>>>
>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>
>
>
>
>

Re: BlockCache for large scans.

Reply via email to