Hi Thanks Zheng for pinging here. As far as I know I have not delved deeper into this offset lock and its soft reference. I think after Zheng's suggestion the STW came down a lot after making the block size 64 KB - because the number of blocks reduces and so the soft references. But still seems the time is big for the user. I think it is worth to now check the impact of this particularly when we suggest bigger sized bucket caches. Will be back .
Regards Ram On Mon, Sep 30, 2019 at 9:03 AM OpenInx <[email protected]> wrote: > OK, the huge number of softReference from offsetLock for each block still > be the main problem. > I'm not sure whether there're some g1 option can help to optimize the long > STW. > One solution I can image for now : limit the bucketcache size for a single > RS, say the 70g bucketcache may > need to separate it into two RS. > > As far as I know, Anoop & ram have some good practice about using huge > bucket cache. Ping anoop & ramkrishna, > Any thoughts about this GC issue ? > > > On Mon, Sep 30, 2019 at 11:09 AM zheng wang <[email protected]> wrote: > > > Even if set to 64KB,it also has more than 100w softRef ,and will cost too > > long still. > > > > > > this "GC ref-proc" process 50w softRef and cost 700ms: > > > > > > 2019-09-18T03:16:42.088+0800: 125161.477: > > [GC remark > > 2019-09-18T03:16:42.088+0800: 125161.477: > > [Finalize Marking, 0.0018076 secs] > > 2019-09-18T03:16:42.089+0800: 125161.479: > > [GC ref-proc > > 2019-09-18T03:16:42.089+0800: 125161.479: [SoftReference, > > 499278 refs, 0.1382086 secs] > > 2019-09-18T03:16:42.228+0800: 125161.617: [WeakReference, > > 3750 refs, 0.0049171 secs] > > 2019-09-18T03:16:42.233+0800: 125161.622: > [FinalReference, > > 1040 refs, 0.0009375 secs] > > 2019-09-18T03:16:42.234+0800: 125161.623: > > [PhantomReference, 0 refs, 21921 refs, 0.0058014 secs] > > 2019-09-18T03:16:42.239+0800: 125161.629: [JNI Weak > > Reference, 0.0001070 secs] > > , 0.6667733 secs] > > 2019-09-18T03:16:42.756+0800: 125162.146: > > [Unloading, 0.0224078 secs] > > , 0.6987032 secs] > > > > > > ------------------ 原始邮件 ------------------ > > 发件人: "OpenInx"<[email protected]>; > > 发送时间: 2019年9月30日(星期一) 上午10:27 > > 收件人: "Hbase-User"<[email protected]>; > > > > 主题: Re: a problem of long STW because of GC ref-proc > > > > > > > > 100% get is not the right reason for choosing 16KB I think, because if > you > > read a block, there's larger possibility that we > > will read the adjacent cells in the same block... I think caching a 16KB > > block or caching a 64KB block in BucketCache won't > > make a big difference ? (but if you cell byte size is quite small, then > > it will have so many cells encoded in a 64KB block, > > then block with smaller size will be better because we search the cells > in > > a block one by one , means O(N) complexity). > > > > > > On Mon, Sep 30, 2019 at 10:08 AM zheng wang <[email protected]> wrote: > > > > > Yes,it will be remission by your advise,but there only get request in > our > > > business,so 16KB is better. > > > IMO,the locks of offset will always be used,so is the strong reference > a > > > better choice? > > > > > > > > > > > > > > > ------------------ 原始邮件 ------------------ > > > 发件人: "OpenInx"<[email protected]>; > > > 发送时间: 2019年9月30日(星期一) 上午9:46 > > > 收件人: "Hbase-User"<[email protected]>; > > > > > > 主题: Re: a problem of long STW because of GC ref-proc > > > > > > > > > > > > Seems your block size is very small (16KB), so there will be > > > 70*1024*1024/16=4587520 block (at most) in your BucketCache. > > > For each block, the RS will maintain a soft reference idLock and a > > > BucketEntry in its bucket cache. So maybe you can try to > > > enlarge the block size ? > > > > > > On Sun, Sep 29, 2019 at 10:14 PM zheng wang <[email protected]> wrote: > > > > > > > Hi~ > > > > > > > > > > > > My live cluster env config below: > > > > hbase version:cdh6.0.1(apache hbase2.0.0) > > > > hbase config: bucketCache(70g),blocksize(16k) > > > > > > > > > > > > java version:1.8.0_51 > > > > javaconfig:heap(32g),-XX:+UseG1GC -XX:MaxGCPauseMillis=100 > > > > -XX:+ParallelRefProcEnabled > > > > > > > > > > > > About 1-2days ,regionServer would occur a old gen gc that cost 1~2s > in > > > > remark phase: > > > > > > > > > > > > 2019-09-29T01:55:45.186+0800: 365222.053: > > > > [GC remark > > > > 2019-09-29T01:55:45.186+0800: 365222.053: > > > > [Finalize Marking, 0.0016327 secs] > > > > 2019-09-29T01:55:45.188+0800: 365222.054: > > > > [GC ref-proc > > > > 2019-09-29T01:55:45.188+0800: 365222.054: > > [SoftReference, > > > > 1264586 refs, 0.3151392 secs] > > > > 2019-09-29T01:55:45.503+0800: 365222.370: > > [WeakReference, > > > > 4317 refs, 0.0024381 secs] > > > > 2019-09-29T01:55:45.505+0800: 365222.372: > > > [FinalReference, > > > > 9791 refs, 0.0037445 secs] > > > > 2019-09-29T01:55:45.509+0800: 365222.376: > > > > [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs] > > > > 2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak > > > > Reference, 0.0001156 secs] > > > > , 1.4554361 secs] > > > > 2019-09-29T01:55:46.643+0800: 365223.510: > > > > [Unloading, 0.0211370 secs] > > > > , 1.4851728 secs] > > > > > > > > The SoftReference seems used by offsetLock in BucketCache, there is > two > > > > questions : > > > > 1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at > all? > > > > 2:Is this a good choice to use SoftReference here? >
