Thanks for reporting back Bikrant, glad that that turned out to be issue. ________________________________ From: OpenSource Dev <dev.opensou...@gmail.com> To: user@hbase.apache.org; lars hofhansl <la...@apache.org> Sent: Saturday, September 14, 2013 11:21 PM Subject: Re: High cpu usage on a region server
We patched HBase 0.94.6 with HBASE-9428, and now the difference is as day and night. Read latency has been very consistent and haven't seen any cpu load issue in last 24+hrs Thank you all for helping us out to resolve this issue. Bikrant On Thu, Sep 12, 2013 at 10:25 AM, lars hofhansl <la...@apache.org> wrote: > Not that I am aware of. Reduce the HFile block size will lessen this problem > (but then cause other issues). > > It's just a fix to the RegexStringFilter. You can just recompile that and > deploy it to the RegionServers (need to make it's in the class path before > the HBase jars). > Probably easier to roll a new release. It's a shame we did not see this > earlier. > > > -- Lars > > > > ________________________________ > From: OpenSource Dev <dev.opensou...@gmail.com> > To: user@hbase.apache.org; lars hofhansl <la...@apache.org> > Sent: Thursday, September 12, 2013 9:52 AM > Subject: Re: High cpu usage on a region server > > > Thanks Lars. > > Are there any other workarounds for this issue until we get the fix ? > If not we might have to do the patch and rollout custom pkg. > > On Thu, Sep 12, 2013 at 8:36 AM, lars hofhansl <la...@apache.org> wrote: >> Yep... Very likely HBASE-9428: >> >> 8 threads: >> java.lang.Thread.State: RUNNABLE >> at java.util.Arrays.copyOf(Arrays.java:2786) >> at java.lang.StringCoding.decode(StringCoding.java:178) >> at java.lang.String.<init>(String.java:483) >> at >>org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) >> ... >> >> 4 threads: >> java.lang.Thread.State: RUNNABLE >> at sun.nio.cs.ISO_8859_1$Decoder.decodeArrayLoop(ISO_8859_1.java:79) >> at sun.nio.cs.ISO_8859_1$Decoder.decodeLoop(ISO_8859_1.java:106) >> at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544) >> at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:140) >> at java.lang.StringCoding.decode(StringCoding.java:179) >> at java.lang.String.<init>(String.java:483) >> at >>org.apache.hadoop.hbase.filter.RegexStringComparator.compareTo(RegexStringComparator.java:96) >> >> It's also consistent with what you see: Lots of garbage (hence tweaking your >> GC options had a significant effect) >> The fix is in 0.94.12, which is in RC right now, probably to be released >> early next week. >> >> -- Lars >> >> >> >> ________________________________ >> From: OpenSource Dev <dev.opensou...@gmail.com> >> To: user@hbase.apache.org >> Sent: Thursday, September 12, 2013 8:15 AM >> Subject: Re: High cpu usage on a region server >> >> >> A server started getting busy last night, but this time it took ~5 hrs >> to get from 15% busy to 75% busy. It is not running 80% flat-out yet. >> But this is still very high compared to other servers that are running >> under ~25% cpu usage. Only change that I made yesterday was the >> addition of "-XX:+UseParNewGC" to hbase startup command. >> >> http://pastebin.com/VRmujgyH >> >> On Wed, Sep 11, 2013 at 2:28 PM, Stack <st...@duboce.net> wrote: >>> Can you thread dump the busy server and pastebin it? >>> Thanks, >>> St.Ack >>> >>> >>> On Wed, Sep 11, 2013 at 1:49 PM, OpenSource Dev >>> <dev.opensou...@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> I'm using HBase 0.94.6 (CDH 4.3) for Opentsdb. So far I have had no >>>> issues with writes/puts. System is handles upto 800k puts per seconds >>>> without issue. On average we do 250k puts per second. >>>> >>>> I am having the problem with Reads, I've also isolated where the >>>> problem is but not been able to find the root cause. >>>> >>>> I have 16 machines running hbase-region server, each has ~35 regions. >>>> Once in a while cpu goes flatout 80% in 1 region server. These are the >>>> things i've noticed in ganglia: >>>> >>>> hbase.regionserver.request - evenly distributed. Not seeing any spikes >>>> on the busy server >>>> hbase.regionserver.blockCacheSize - between 500MB and 1000MB >>>> hbase.regionserver.compactionQueueSize - avg 2 or less >>>> hbase.regionserver.blockCacheHitRatio - 30% on busy node, >60% on other >>>> nodes >>>> >>>> >>>> JVM Heap size is set to 16GB and I'm using -XX:+UseParNewGC >>>> -XX:+UseConcMarkSweepGC >>>> >>>> I've noticed the system load moves to a different region, sometimes >>>> within a minute, if the busy region is restarted. >>>> >>>> Any suggestion what could be causing the load and/or what other >>>> metrics should I check ? >>>> >>>> >>>> Thank you! >>>>