Re: a problem of long STW because of GC ref-proc

ramkrishna vasudevan Mon, 30 Sep 2019 02:42:48 -0700

Hi

Thanks Zheng for pinging here.
As far as I know I have not delved deeper into this offset lock and its
soft reference. I think after Zheng's suggestion the STW came down a lot
after making the block size 64 KB - because the number of blocks reduces
and so the soft references.  But still seems the time is big for the user.
I think it is worth to now check the impact of this particularly when we
suggest bigger sized bucket caches. Will be back .


Regards
Ram


On Mon, Sep 30, 2019 at 9:03 AM OpenInx <[email protected]> wrote:

> OK,  the huge number of softReference from offsetLock for each block still
> be the main problem.
> I'm not sure whether there're some g1 option can help to optimize the long
> STW.
> One solution I can image for now : limit the bucketcache size for a single
> RS, say the 70g bucketcache may
> need to separate it into two RS.
>
> As far as I know, Anoop & ram have some good practice about using huge
> bucket cache.  Ping anoop & ramkrishna,
> Any thoughts about this GC issue ?
>
>
> On Mon, Sep 30, 2019 at 11:09 AM zheng wang <[email protected]> wrote:
>
> > Even if set to 64KB,it also has more than 100w softRef ,and will cost too
> > long still.
> >
> >
> > this "GC ref-proc" process 50w softRef and cost 700ms:
> >
> >
> > 2019-09-18T03:16:42.088+0800: 125161.477:
> > [GC remark
> >         2019-09-18T03:16:42.088+0800: 125161.477:
> >         [Finalize Marking, 0.0018076 secs]
> >         2019-09-18T03:16:42.089+0800: 125161.479:
> >         [GC ref-proc
> >                 2019-09-18T03:16:42.089+0800: 125161.479: [SoftReference,
> > 499278 refs, 0.1382086 secs]
> >                 2019-09-18T03:16:42.228+0800: 125161.617: [WeakReference,
> > 3750 refs, 0.0049171 secs]
> >                 2019-09-18T03:16:42.233+0800: 125161.622:
> [FinalReference,
> > 1040 refs, 0.0009375 secs]
> >                 2019-09-18T03:16:42.234+0800: 125161.623:
> > [PhantomReference, 0 refs, 21921 refs, 0.0058014 secs]
> >                 2019-09-18T03:16:42.239+0800: 125161.629: [JNI Weak
> > Reference, 0.0001070 secs]
> >         , 0.6667733 secs]
> >         2019-09-18T03:16:42.756+0800: 125162.146:
> >         [Unloading, 0.0224078 secs]
> > , 0.6987032 secs]
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "OpenInx"<[email protected]>;
> > 发送时间: 2019年9月30日(星期一) 上午10:27
> > 收件人: "Hbase-User"<[email protected]>;
> >
> > 主题: Re: a problem of long STW because of GC ref-proc
> >
> >
> >
> > 100% get is not the right reason for choosing 16KB I think, because  if
> you
> > read a block, there's larger possibility that we
> > will read the adjacent cells in the same block... I think caching a 16KB
> > block or caching a 64KB block in BucketCache won't
> > make a big difference ?  (but if you cell byte size is quite small,  then
> > it will have so many cells encoded in a 64KB block,
> > then block with smaller size will be better because we search the cells
> in
> > a block one by one , means O(N) complexity).
> >
> >
> > On Mon, Sep 30, 2019 at 10:08 AM zheng wang <[email protected]> wrote:
> >
> > > Yes,it will be remission by your advise,but there only get request in
> our
> > > business,so 16KB is better.
> > > IMO,the locks of offset will always be used,so is the strong reference
> a
> > > better choice?
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "OpenInx"<[email protected]>;
> > > 发送时间: 2019年9月30日(星期一) 上午9:46
> > > 收件人: "Hbase-User"<[email protected]>;
> > >
> > > 主题: Re: a problem of long STW because of GC ref-proc
> > >
> > >
> > >
> > > Seems your block size is very small (16KB), so there will be
> > > 70*1024*1024/16=4587520 block (at most) in your BucketCache.
> > > For each block, the RS will maintain a soft reference idLock and a
> > > BucketEntry in its bucket cache.  So maybe you can try to
> > > enlarge the block size ?
> > >
> > > On Sun, Sep 29, 2019 at 10:14 PM zheng wang <[email protected]> wrote:
> > >
> > > > Hi~
> > > >
> > > >
> > > > My live cluster env config below:
> > > > hbase version:cdh6.0.1(apache hbase2.0.0)
> > > > hbase config: bucketCache(70g),blocksize(16k)
> > > >
> > > >
> > > > java version:1.8.0_51
> > > > javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100
> > > > -XX:+ParallelRefProcEnabled
> > > >
> > > >
> > > > About 1-2days ,regionServer would occur a old gen gc that cost 1~2s
> in
> > > > remark phase:
> > > >
> > > >
> > > > 2019-09-29T01:55:45.186+0800: 365222.053:
> > > > [GC remark
> > > >         2019-09-29T01:55:45.186+0800: 365222.053:
> > > >         [Finalize Marking, 0.0016327 secs]
> > > >         2019-09-29T01:55:45.188+0800: 365222.054:
> > > >         [GC ref-proc
> > > >                 2019-09-29T01:55:45.188+0800: 365222.054:
> > [SoftReference,
> > > > 1264586 refs, 0.3151392 secs]
> > > >                 2019-09-29T01:55:45.503+0800: 365222.370:
> > [WeakReference,
> > > > 4317 refs, 0.0024381 secs]
> > > >                 2019-09-29T01:55:45.505+0800: 365222.372:
> > > [FinalReference,
> > > > 9791 refs, 0.0037445 secs]
> > > >                 2019-09-29T01:55:45.509+0800: 365222.376:
> > > > [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs]
> > > >                 2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak
> > > > Reference, 0.0001156 secs]
> > > >         , 1.4554361 secs]
> > > >         2019-09-29T01:55:46.643+0800: 365223.510:
> > > >         [Unloading, 0.0211370 secs]
> > > > , 1.4851728 secs]
> > > >
> > > > The SoftReference seems used by offsetLock in BucketCache, there is
> two
> > > > questions :
> > > > 1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at
> all?
> > > > 2:Is this a good choice to use SoftReference here?
>

Re: a problem of long STW because of GC ref-proc

Reply via email to