Re: Response Too Slow in RegionServer Logs

Vladimir Rodionov Fri, 29 May 2015 09:15:36 -0700

Is there any reason for 27G heap? It seems you run M/R job? If yes, then I
would recommend you trying M/R over snapshots.  Combination of large HBase
heap and M/R is very hard to tune if possible at all
You can also try reducing number of map tasks and check your MR job
resource consumption
On May 29, 2015 8:58 AM, "Bryan Beaudreault" <bbeaudrea...@hubspot.com>
wrote:


> That one long GC aside, look at the timings of the others as well.  Even
> the smaller GCs are taking up the majority of each second.
>
> For a heap that size you might want to try a java version over java7u60 and
> use the G1GC.  Otherwise there are a bunch of resources on the web
> including in the refguide for how to tune GCs for hbase.
>
>
> On Fri, May 29, 2015 at 11:49 AM, rahul malviya <
> malviyarahul2...@gmail.com>
> wrote:
>
> > This is a sample from gc log file. At the end I see long gc pauses. Is
> > there a way I can tune this ?
> >
> > 2015-04-29T22:46:12.387+0000: 98061.660: [GC2015-04-29T22:46:12.387+0000:
> > 98061.661: [ParNew: 572757K->63867K(580608K), 0.6549550 secs]
> > 13294553K->12811090K(20001132K), 0.6551600 secs] [Times: user=8.09
> > sys=0.07, real=0.65 secs]
> > 2015-04-29T22:46:13.165+0000: 98062.438: [GC2015-04-29T22:46:13.165+0000:
> > 98062.439: [ParNew: 579901K->55337K(580608K), 1.1941730 secs]
> > 13327124K->12846913K(20001132K), 1.1944150 secs] [Times: user=5.39
> > sys=0.08, real=1.20 secs]
> > 2015-04-29T22:46:14.485+0000: 98063.759: [GC2015-04-29T22:46:14.485+0000:
> > 98063.759: [ParNew: 571496K->62270K(580608K), 0.8990460 secs]
> > 13363071K->12891412K(20001132K), 0.8992880 secs] [Times: user=7.07
> > sys=0.05, real=0.90 secs]
> > 2015-04-29T22:46:15.519+0000: 98064.793: [GC2015-04-29T22:46:15.519+0000:
> > 98064.793: [ParNew: 578417K->41496K(580608K), 0.8849310 secs]
> > 13407559K->12907053K(20001132K), 0.8851300 secs] [Times: user=4.98
> > sys=0.08, real=0.88 secs]
> > 2015-04-29T22:46:16.547+0000: 98065.821: [GC2015-04-29T22:46:16.547+0000:
> > 98065.821: [ParNew: 557652K->50791K(580608K), 0.6345050 secs]
> > 13423210K->12940694K(20001132K), 0.6347240 secs] [Times: user=7.99
> > sys=0.04, real=0.63 secs]
> > 2015-04-29T22:46:17.273+0000: 98066.547: [GC2015-04-29T22:46:17.273+0000:
> > 98066.547: [ParNew: 566889K->48425K(580608K), 0.6372610 secs]
> > 13456792K->12966994K(20001132K), 0.6375250 secs] [Times: user=3.72
> > sys=0.05, real=0.64 secs]
> > 2015-04-29T22:46:18.035+0000: 98067.309: [GC2015-04-29T22:46:18.035+0000:
> > 98067.309: [ParNew: 564444K->52607K(580608K), 0.7116560 secs]
> > 13483013K->12997986K(20001132K), 0.7119250 secs] [Times: user=7.16
> > sys=0.05, real=0.72 secs]
> > 2015-04-29T22:46:18.875+0000: 98068.149: [GC2015-04-29T22:46:18.875+0000:
> > 98068.149: [ParNew: 568635K->54078K(580608K), 0.5286160 secs]
> > 13514014K->13026734K(20001132K), 0.5288800 secs] [Times: user=4.70
> > sys=0.05, real=0.53 secs]
> > 2015-04-29T22:46:19.521+0000: 98068.795: [GC2015-04-29T22:46:19.521+0000:
> > 98068.795: [ParNew: 570106K->64416K(580608K), 1.0709370 secs]
> > 13542763K->13091755K(20001132K), 1.0712320 secs] [Times: user=9.57
> > sys=0.08, real=1.08 secs]
> > 2015-04-29T22:46:20.751+0000: 98070.024: [GC2015-04-29T22:46:20.751+0000:
> > 98070.024: [ParNew: 580387K->47393K(580608K), 0.5688620 secs]
> > 13607725K->13099551K(20001132K), 0.5690960 secs] [Times: user=6.48
> > sys=0.05, real=0.57 secs]
> > 2015-04-29T22:46:21.393+0000: 98070.667: [GC2015-04-29T22:46:21.393+0000:
> > 98070.667: [ParNew: 563351K->26473K(580608K), 0.7063100 secs]
> > 13615509K->13429978K(20001132K), 0.7065560 secs] [Times: user=10.49
> > sys=0.07, real=0.71 secs]
> > 2015-04-29T22:46:22.214+0000: 98071.487: [GC2015-04-29T22:46:22.214+0000:
> > 98071.488: [ParNew: 292415K->39128K(580608K), 0.0293500
> > secs]2015-04-29T22:46:22.243+0000: 98071.517: [CMS:
> > 13403504K->7807614K(19420524K), 284.4275150 secs]
> > 13695919K->7807614K(20001132K), [CMS Perm : 30509K->30432K(51000K)],
> > 284.4594370 secs] [Times: user=5.55 sys=2.50, real=284.42 secs]
> > Heap
> >  par new generation   total 2146944K, used 868806K [0x0000000163600000,
> > 0x00000001f4f90000, 0x00000001f4f90000)
> >   eden space 1908416K,  45% used [0x0000000163600000, 0x00000001986718f8,
> > 0x00000001d7db0000)
> >   from space 238528K,   0% used [0x00000001d7db0000, 0x00000001d7db0000,
> > 0x00000001e66a0000)
> >   to   space 238528K,   0% used [0x00000001e66a0000, 0x00000001e66a0000,
> > 0x00000001f4f90000)
> >  concurrent mark-sweep generation total 19420524K, used 7807614K
> > [0x00000001f4f90000, 0x00000006964eb000, 0x00000007fae00000)
> >  concurrent-mark-sweep perm gen total 51000K, used 30498K
> > [0x00000007fae00000, 0x00000007fdfce000, 0x0000000800000000)
> >
> > Thanks,
> > Rahul
> >
> >
> > On Fri, May 29, 2015 at 8:41 AM, Bryan Beaudreault <
> > bbeaudrea...@hubspot.com
> > > wrote:
> >
> > > >
> > > > 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > > slept
> > > > 14912ms instead of 3000ms, this is likely due to a long garbage
> > > collecting
> > > > pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > >
> > >
> > > I would check your gc logs for long gc pauses.
> > >
> > > On Fri, May 29, 2015 at 11:38 AM, rahul malviya <
> > > malviyarahul2...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > In our cluster region server logs are filled with response too slow
> > > > message. This is causing jobs to slow down. How can I debug what is
> the
> > > > reason for this slowness.
> > > >
> > > > We have enabled short circuit reads and region server has 27GB RAM.
> > > >
> > > > Here is a trace when regionserver starts.
> > > >
> > > > Thu Aug 14 20:23:51 GMT 2014 Starting regionserver on nodex
> > > > core file size          (blocks, -c) 0
> > > > data seg size           (kbytes, -d) unlimited
> > > > scheduling priority             (-e) 0
> > > > file size               (blocks, -f) unlimited
> > > > pending signals                 (-i) 966365
> > > > max locked memory       (kbytes, -l) 64
> > > > max memory size         (kbytes, -m) unlimited
> > > > open files                      (-n) 32768
> > > > pipe size            (512 bytes, -p) 8
> > > > POSIX message queues     (bytes, -q) 819200
> > > > real-time priority              (-r) 0
> > > > stack size              (kbytes, -s) 10240
> > > > cpu time               (seconds, -t) unlimited
> > > > max user processes              (-u) 966365
> > > > virtual memory          (kbytes, -v) unlimited
> > > > file locks                      (-x) unlimited
> > > > 2014-08-14 20:23:53,341 WARN org.apache.hadoop.conf.Configuration:
> > > > fs.default.name is deprecated. Instead, use fs.defaultFS
> > > > 2014-08-14 20:23:53,342 WARN org.apache.hadoop.conf.Configuration:
> > > > mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
> > > > 2014-08-14 20:23:53,884 WARN org.apache.hadoop.conf.Configuration:
> > > > slave.host.name is deprecated. Instead, use
> > > > mapreduce.tasktracker.host.name
> > > > 2014-08-14 20:24:03,999 WARN org.apache.hadoop.conf.Configuration:
> > > > hadoop.native.lib is deprecated. Instead, use io.native.lib.available
> > > > 2014-08-14 20:26:47,605 ERROR
> > > > org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics:
> > Inconsistent
> > > > configuration. Previous configuration for using table name in
> metrics:
> > > > true, new configuration: false
> > > > 2014-08-14 20:28:23,491 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":18725,"call":"next(-8041903839443097981, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.248:58716
> > > >
> > > >
> > >
> >
> ","starttimems":1408048084720,"queuetimems":0,"class":"HRegionServer","responsesize":5031595,"method":"next"}
> > > > 2014-08-14 21:35:16,740 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept
> > > > 14912ms instead of 3000ms, this is likely due to a long garbage
> > > collecting
> > > > pause and it's usually bad, see
> > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > > 2014-08-14 21:42:28,477 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":16968,"call":"next(5487686201525374976, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.249:36657
> > > >
> > > >
> > >
> >
> ","starttimems":1408052531504,"queuetimems":0,"class":"HRegionServer","responsesize":1959532,"method":"next"}
> > > > 2014-08-14 21:42:56,923 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":10591,"call":"next(5487686201525374976, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.249:40818
> > > >
> > > >
> > >
> >
> ","starttimems":1408052566327,"queuetimems":1,"class":"HRegionServer","responsesize":2987578,"method":"next"}
> > > > 2014-08-14 21:44:24,372 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":10656,"call":"next(5487686201525374976, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.249:41993
> > > >
> > > >
> > >
> >
> ","starttimems":1408052653710,"queuetimems":1,"class":"HRegionServer","responsesize":3039779,"method":"next"}
> > > > 2014-08-14 21:45:50,598 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":12418,"call":"next(5487686201525374976, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.249:45197
> > > >
> > > >
> > >
> >
> ","starttimems":1408052738174,"queuetimems":10,"class":"HRegionServer","responsesize":2476903,"method":"next"}
> > > > 2014-08-14 21:46:15,187 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":23766,"call":"next(5487686201525374976, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.249:49425
> > > >
> > > >
> > >
> >
> ","starttimems":1408052751414,"queuetimems":0,"class":"HRegionServer","responsesize":5681175,"method":"next"}
> > > > 2014-08-14 21:47:09,041 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":12320,"call":"next(5487686201525374976, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.249:50269
> > > >
> > > >
> > >
> >
> ","starttimems":1408052816698,"queuetimems":1,"class":"HRegionServer","responsesize":2986949,"method":"next"}
> > > > 2014-08-14 21:49:23,833 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":11389,"call":"next(1227841280814011139, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.122:41976
> > > >
> > > >
> > >
> >
> ","starttimems":1408052952415,"queuetimems":0,"class":"HRegionServer","responsesize":3160025,"method":"next"}
> > > > 2014-08-14 21:49:23,869 WARN org.apache.hadoop.ipc.HBaseServer:
> > Exception
> > > > while changing ops : java.nio.channels.CancelledKeyException
> > > > 2014-08-14 21:49:23,900 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":11428,"call":"next(9103372947568217267, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.41:35241
> > > >
> > > >
> > >
> >
> ","starttimems":1408052952469,"queuetimems":0,"class":"HRegionServer","responsesize":1809158,"method":"next"}
> > > > 2014-08-14 21:49:23,902 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":11415,"call":"next(-3120240140302998196, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.195:46046
> > > >
> > > >
> > >
> >
> ","starttimems":1408052952468,"queuetimems":0,"class":"HRegionServer","responsesize":1826929,"method":"next"}
> > > > 2014-08-14 21:49:24,050 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":11438,"call":"next(3799907609071248384, 10), rpc
> > > > version=1, client version=29,
> > methodsFingerPrint=-1368823753","client":"
> > > > 17.170.176.154:42797
> > > >
> > > >
> > >
> >
> ","starttimems":1408052952459,"queuetimems":0,"class":"HRegionServer","responsesize":2628568,"method":"next"}
> > > > 2014-08-14 21:49:24,057 WARN org.apache.hadoop.ipc.HBaseServer:
> > > > (responseTooSlow):
> > > > {"processingtimems":11843,"call":"next(-1679362783893333095, 10), rpc
> > > > version=1, client version=29,
> > > > methodsFingerPrint=-1368823753","client":"17.170
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> > >
> >
>

Re: Response Too Slow in RegionServer Logs

Reply via email to