Re: optimal size for Hbase.hregion.memstore.flush.size and its impact

Ted Yu Mon, 24 Aug 2015 10:36:39 -0700

Related please see HBASE-13408 HBase In-Memory Memstore Compaction

FYI


On Mon, Aug 24, 2015 at 10:32 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> The split policy also uses the flush size to estimate how to split
> tables...
>
> It's sometime fine to upgrade thise number a bit. Like, to 256MB. But 512
> is pretty high.... And 800MB is even more.
>
> Big memstores takes more time to get flush and can block the writes if they
> are not fast enough. If yours are fast enough, then you might be able to
> stay with 512MB. I don't think 800MB is a good idea...
>
> JM
>
> 2015-08-24 13:23 GMT-04:00 Vladimir Rodionov <vladrodio...@gmail.com>:
>
> > 1. How many regions per RS?
> > 2. What is your dfs.block.size?
> > 3. What is your hbase.regionserver.maxlogs?
> >
> > Flush can be requested when:
> >
> > 1. Region size exceeds hbase.hregion.memstore.flush.size
> > 2. Region's memstore is too old (periodic memstore flusher checks the age
> > of memstore, default is 1hour) Controlled by
> >     hbase.regionserver.optionalcacheflushinterval (in ms)
> > 3. There too many unflushed changes in a Region. Controlled by
> > hbase.regionserver.flush.per.changes, default is 30,000,000
> > 4. WAL is rolling prematurely, controlled by   hbase.regionserver.maxlogs
> > and  dfs.block.size.
> >
> > You calculate optimal: hbase.regionserver.maxlogs * dfs.block.size *
> 0.95 >
> > hbase.regionserver.global.memstore.upperLimit  * HBASE_HEAPSIZE
> >
> > I recommend you to enable DEBUG logging and analyze MemStoreFlusher,
> > PeriodicMemstoreFlusher and HRegion flush related log messages to get
> idea
> > why flush was requested on a region(s), what was the region size at that
> > time.
> >
> > I think, in your case it is either premature WAL rolling or too many
> > changes in a memstore.
> >
> > -Vlad
> >
> >
> > On Wed, May 27, 2015 at 1:53 PM, Gautam Borah <gautam.bo...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > The default size of Hbase.hregion.memstore.flush.size is define as 128
> MB
> > > for Hbase.hregion.memstore.flush.size. Could anyone kindly explain what
> > > would be the impact if we increase this to a higher value 512 MB or 800
> > MB
> > > or higher.
> > >
> > > We have a very write heavy cluster. Also we run periodic end point co
> > > processor based jobs that operate on the data written in the last 10-15
> > > mins, every 10 minute. We are trying to manage the memstore flush
> > > operations such that the hot data remains in memstore for at least
> 30-40
> > > mins or longer, so that the job hits disk every 3rd or 4th time it
> tries
> > to
> > > operate on the hot data (it does scan).
> > >
> > > We have region server heap size of 20 GB and set the,
> > >
> > > hbase.regionserver.global.memstore.lowerLimit = .45
> > > hbase.regionserver.global.memstore.upperLimit = .55
> > >
> > > We observed that if we set the Hbase.hregion.memstore.flush.size=128MB
> > > only 10% of the heap is utilized by memstore, after that memstore
> > flushes.
> > >
> > > At Hbase.hregion.memstore.flush.size=512MB, we are able to increase the
> > > heap utelization to by memstore to 35%.
> > >
> > > It would be very helpful for us to understand the implication of higher
> > > Hbase.hregion.memstore.flush.size  for a long running cluster.
> > >
> > > Thanks,
> > > Gautam
> >
>

Re: optimal size for Hbase.hregion.memstore.flush.size and its impact

Reply via email to