That one long GC aside, look at the timings of the others as well. Even the smaller GCs are taking up the majority of each second.
For a heap that size you might want to try a java version over java7u60 and use the G1GC. Otherwise there are a bunch of resources on the web including in the refguide for how to tune GCs for hbase. On Fri, May 29, 2015 at 11:49 AM, rahul malviya <malviyarahul2...@gmail.com> wrote: > This is a sample from gc log file. At the end I see long gc pauses. Is > there a way I can tune this ? > > 2015-04-29T22:46:12.387+0000: 98061.660: [GC2015-04-29T22:46:12.387+0000: > 98061.661: [ParNew: 572757K->63867K(580608K), 0.6549550 secs] > 13294553K->12811090K(20001132K), 0.6551600 secs] [Times: user=8.09 > sys=0.07, real=0.65 secs] > 2015-04-29T22:46:13.165+0000: 98062.438: [GC2015-04-29T22:46:13.165+0000: > 98062.439: [ParNew: 579901K->55337K(580608K), 1.1941730 secs] > 13327124K->12846913K(20001132K), 1.1944150 secs] [Times: user=5.39 > sys=0.08, real=1.20 secs] > 2015-04-29T22:46:14.485+0000: 98063.759: [GC2015-04-29T22:46:14.485+0000: > 98063.759: [ParNew: 571496K->62270K(580608K), 0.8990460 secs] > 13363071K->12891412K(20001132K), 0.8992880 secs] [Times: user=7.07 > sys=0.05, real=0.90 secs] > 2015-04-29T22:46:15.519+0000: 98064.793: [GC2015-04-29T22:46:15.519+0000: > 98064.793: [ParNew: 578417K->41496K(580608K), 0.8849310 secs] > 13407559K->12907053K(20001132K), 0.8851300 secs] [Times: user=4.98 > sys=0.08, real=0.88 secs] > 2015-04-29T22:46:16.547+0000: 98065.821: [GC2015-04-29T22:46:16.547+0000: > 98065.821: [ParNew: 557652K->50791K(580608K), 0.6345050 secs] > 13423210K->12940694K(20001132K), 0.6347240 secs] [Times: user=7.99 > sys=0.04, real=0.63 secs] > 2015-04-29T22:46:17.273+0000: 98066.547: [GC2015-04-29T22:46:17.273+0000: > 98066.547: [ParNew: 566889K->48425K(580608K), 0.6372610 secs] > 13456792K->12966994K(20001132K), 0.6375250 secs] [Times: user=3.72 > sys=0.05, real=0.64 secs] > 2015-04-29T22:46:18.035+0000: 98067.309: [GC2015-04-29T22:46:18.035+0000: > 98067.309: [ParNew: 564444K->52607K(580608K), 0.7116560 secs] > 13483013K->12997986K(20001132K), 0.7119250 secs] [Times: user=7.16 > sys=0.05, real=0.72 secs] > 2015-04-29T22:46:18.875+0000: 98068.149: [GC2015-04-29T22:46:18.875+0000: > 98068.149: [ParNew: 568635K->54078K(580608K), 0.5286160 secs] > 13514014K->13026734K(20001132K), 0.5288800 secs] [Times: user=4.70 > sys=0.05, real=0.53 secs] > 2015-04-29T22:46:19.521+0000: 98068.795: [GC2015-04-29T22:46:19.521+0000: > 98068.795: [ParNew: 570106K->64416K(580608K), 1.0709370 secs] > 13542763K->13091755K(20001132K), 1.0712320 secs] [Times: user=9.57 > sys=0.08, real=1.08 secs] > 2015-04-29T22:46:20.751+0000: 98070.024: [GC2015-04-29T22:46:20.751+0000: > 98070.024: [ParNew: 580387K->47393K(580608K), 0.5688620 secs] > 13607725K->13099551K(20001132K), 0.5690960 secs] [Times: user=6.48 > sys=0.05, real=0.57 secs] > 2015-04-29T22:46:21.393+0000: 98070.667: [GC2015-04-29T22:46:21.393+0000: > 98070.667: [ParNew: 563351K->26473K(580608K), 0.7063100 secs] > 13615509K->13429978K(20001132K), 0.7065560 secs] [Times: user=10.49 > sys=0.07, real=0.71 secs] > 2015-04-29T22:46:22.214+0000: 98071.487: [GC2015-04-29T22:46:22.214+0000: > 98071.488: [ParNew: 292415K->39128K(580608K), 0.0293500 > secs]2015-04-29T22:46:22.243+0000: 98071.517: [CMS: > 13403504K->7807614K(19420524K), 284.4275150 secs] > 13695919K->7807614K(20001132K), [CMS Perm : 30509K->30432K(51000K)], > 284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs] > Heap > par new generation total 2146944K, used 868806K [0x0000000163600000, > 0x00000001f4f90000, 0x00000001f4f90000) > eden space 1908416K, 45% used [0x0000000163600000, 0x00000001986718f8, > 0x00000001d7db0000) > from space 238528K, 0% used [0x00000001d7db0000, 0x00000001d7db0000, > 0x00000001e66a0000) > to space 238528K, 0% used [0x00000001e66a0000, 0x00000001e66a0000, > 0x00000001f4f90000) > concurrent mark-sweep generation total 19420524K, used 7807614K > [0x00000001f4f90000, 0x00000006964eb000, 0x00000007fae00000) > concurrent-mark-sweep perm gen total 51000K, used 30498K > [0x00000007fae00000, 0x00000007fdfce000, 0x0000000800000000) > > Thanks, > Rahul > > > On Fri, May 29, 2015 at 8:41 AM, Bryan Beaudreault < > bbeaudrea...@hubspot.com > > wrote: > > > > > > > 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > slept > > > 14912ms instead of 3000ms, this is likely due to a long garbage > > collecting > > > pause and it's usually bad, see > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > > > I would check your gc logs for long gc pauses. > > > > On Fri, May 29, 2015 at 11:38 AM, rahul malviya < > > malviyarahul2...@gmail.com> > > wrote: > > > > > Hi All, > > > > > > In our cluster region server logs are filled with response too slow > > > message. This is causing jobs to slow down. How can I debug what is the > > > reason for this slowness. > > > > > > We have enabled short circuit reads and region server has 27GB RAM. > > > > > > Here is a trace when regionserver starts. > > > > > > Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex > > > core file size (blocks, -c) 0 > > > data seg size (kbytes, -d) unlimited > > > scheduling priority (-e) 0 > > > file size (blocks, -f) unlimited > > > pending signals (-i) 966365 > > > max locked memory (kbytes, -l) 64 > > > max memory size (kbytes, -m) unlimited > > > open files (-n) 32768 > > > pipe size (512 bytes, -p) 8 > > > POSIX message queues (bytes, -q) 819200 > > > real-time priority (-r) 0 > > > stack size (kbytes, -s) 10240 > > > cpu time (seconds, -t) unlimited > > > max user processes (-u) 966365 > > > virtual memory (kbytes, -v) unlimited > > > file locks (-x) unlimited > > > 2014-08-14 20:23:53,341 WARN org.apache.hadoop.conf.Configuration: > > > fs.default.name is deprecated. Instead, use fs.defaultFS > > > 2014-08-14 20:23:53,342 WARN org.apache.hadoop.conf.Configuration: > > > mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id > > > 2014-08-14 20:23:53,884 WARN org.apache.hadoop.conf.Configuration: > > > slave.host.name is deprecated. Instead, use > > > mapreduce.tasktracker.host.name > > > 2014-08-14 20:24:03,999 WARN org.apache.hadoop.conf.Configuration: > > > hadoop.native.lib is deprecated. Instead, use io.native.lib.available > > > 2014-08-14 20:26:47,605 ERROR > > > org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: > Inconsistent > > > configuration. Previous configuration for using table name in metrics: > > > true, new configuration: false > > > 2014-08-14 20:28:23,491 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":18725,"call":"next(-8041903839443097981, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.248:58716 > > > > > > > > > ","starttimems":1408048084720,"queuetimems":0,"class":"HRegionServer","responsesize":5031595,"method":"next"} > > > 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We > > slept > > > 14912ms instead of 3000ms, this is likely due to a long garbage > > collecting > > > pause and it's usually bad, see > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > 2014-08-14 21:42:28,477 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":16968,"call":"next(5487686201525374976, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.249:36657 > > > > > > > > > ","starttimems":1408052531504,"queuetimems":0,"class":"HRegionServer","responsesize":1959532,"method":"next"} > > > 2014-08-14 21:42:56,923 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":10591,"call":"next(5487686201525374976, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.249:40818 > > > > > > > > > ","starttimems":1408052566327,"queuetimems":1,"class":"HRegionServer","responsesize":2987578,"method":"next"} > > > 2014-08-14 21:44:24,372 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":10656,"call":"next(5487686201525374976, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.249:41993 > > > > > > > > > ","starttimems":1408052653710,"queuetimems":1,"class":"HRegionServer","responsesize":3039779,"method":"next"} > > > 2014-08-14 21:45:50,598 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":12418,"call":"next(5487686201525374976, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.249:45197 > > > > > > > > > ","starttimems":1408052738174,"queuetimems":10,"class":"HRegionServer","responsesize":2476903,"method":"next"} > > > 2014-08-14 21:46:15,187 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":23766,"call":"next(5487686201525374976, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.249:49425 > > > > > > > > > ","starttimems":1408052751414,"queuetimems":0,"class":"HRegionServer","responsesize":5681175,"method":"next"} > > > 2014-08-14 21:47:09,041 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":12320,"call":"next(5487686201525374976, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.249:50269 > > > > > > > > > ","starttimems":1408052816698,"queuetimems":1,"class":"HRegionServer","responsesize":2986949,"method":"next"} > > > 2014-08-14 21:49:23,833 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":11389,"call":"next(1227841280814011139, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.122:41976 > > > > > > > > > ","starttimems":1408052952415,"queuetimems":0,"class":"HRegionServer","responsesize":3160025,"method":"next"} > > > 2014-08-14 21:49:23,869 WARN org.apache.hadoop.ipc.HBaseServer: > Exception > > > while changing ops : java.nio.channels.CancelledKeyException > > > 2014-08-14 21:49:23,900 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":11428,"call":"next(9103372947568217267, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.41:35241 > > > > > > > > > ","starttimems":1408052952469,"queuetimems":0,"class":"HRegionServer","responsesize":1809158,"method":"next"} > > > 2014-08-14 21:49:23,902 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":11415,"call":"next(-3120240140302998196, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.195:46046 > > > > > > > > > ","starttimems":1408052952468,"queuetimems":0,"class":"HRegionServer","responsesize":1826929,"method":"next"} > > > 2014-08-14 21:49:24,050 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":11438,"call":"next(3799907609071248384, 10), rpc > > > version=1, client version=29, > methodsFingerPrint=-1368823753","client":" > > > 17.170.176.154:42797 > > > > > > > > > ","starttimems":1408052952459,"queuetimems":0,"class":"HRegionServer","responsesize":2628568,"method":"next"} > > > 2014-08-14 21:49:24,057 WARN org.apache.hadoop.ipc.HBaseServer: > > > (responseTooSlow): > > > {"processingtimems":11843,"call":"next(-1679362783893333095, 10), rpc > > > version=1, client version=29, > > > methodsFingerPrint=-1368823753","client":"17.170 > > > > > > Thanks, > > > Rahul > > > > > >