Next time this happens, can you take jstack of the region server and pastebin 
it ?

Thanks

On Jul 7, 2014, at 11:06 PM, Rural Hunter <ruralhun...@gmail.com> wrote:

> Hi,
> 
> I'm using hbase-0.96.2. I saw sometimes my region servers don't accept 
> connections from clients. this could last like 10 minutes to half hour. I was 
> not able to connect to the 60020 port even with telnet command when it 
> happened. After a while, the problem disappeared and the region server 
> behaves normally. My region server is configured with 8G heap and the usage 
> always kepes around at 3-5G. I checked the log and don't see error log. The 
> log seems normal:
> 
> 2014-07-08 10:13:26,192 INFO  [MemStoreFlusher.0] 
> regionserver.DefaultStoreFlusher: Flushed, sequenceid=1450565470, 
> memsize=937.9 K, hasBloomFilter=true, into tmp file 
> hdfs://master:9000/hbase/data/default/tbdoc/f7635647bcae9291971843131c4add0b/.tmp/d1a4e8fa945444958141dcdaa07320ab
> 2014-07-08 10:13:26,208 INFO  [MemStoreFlusher.0] regionserver.HStore: Added 
> hdfs://master:9000/hbase/data/default/tbdoc/f7635647bcae9291971843131c4add0b/cf1/d1a4e8fa945444958141dcdaa07320ab,
>  entries=4494, sequenceid=1450565470, filesize=146.8 K
> 2014-07-08 10:13:26,209 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Finished memstore flush of ~957.3 K/980232, currentsize=0/0 for region 
> tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b. in 40ms, 
> sequenceid=1450565470, compaction requested=true
> 2014-07-08 10:13:26,209 INFO 
> [regionserver60020-smallCompactions-1397552664633] regionserver.HRegion: 
> Starting compaction on cf1 in region 
> tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b.
> 2014-07-08 10:13:26,209 INFO 
> [regionserver60020-smallCompactions-1397552664633] regionserver.HStore: 
> Starting compaction of 3 file(s) in cf1 of 
> tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b. into 
> tmpdir=hdfs://master:9000/hbase/data/default/tbdoc/f7635647bcae9291971843131c4add0b/.tmp,
>  totalSize=705.6 K
> 2014-07-08 10:13:26,390 INFO 
> [regionserver60020-smallCompactions-1397552664633] regionserver.HStore: 
> Completed compaction of 3 file(s) in cf1 of 
> tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b. into 
> 69b7c19f59994dffbb236faa03dc1090(size=693.8 K), total size for store is 1.1 
> G. This selection was in queue for 0sec, and took 0sec to execute.
> 2014-07-08 10:13:26,390 INFO 
> [regionserver60020-smallCompactions-1397552664633] 
> regionserver.CompactSplitThread: Completed compaction: Request = 
> regionName=tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b., 
> storeName=cf1, fileCount=3, fileSize=705.6 K, priority=6, 
> time=47043081604802310; duration=0sec
> 2014-07-08 10:29:06,383 INFO  [regionserver60020.periodicFlusher] 
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
> flush for region 
> tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
>  after a delay of 10113
> 2014-07-08 10:29:16,383 INFO  [regionserver60020.periodicFlusher] 
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting 
> flush for region 
> tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
>  after a delay of 3456
> 2014-07-08 10:29:16,537 INFO  [MemStoreFlusher.0] 
> regionserver.DefaultStoreFlusher: Flushed, sequenceid=1450574203, 
> memsize=425.0 K, hasBloomFilter=true, into tmp file 
> hdfs://master:9000/hbase/data/default/tbguid/ded050ffeacd83613a0e3db7baaee4f0/.tmp/cabc231951cb473695c3bac045456cee
> 2014-07-08 10:29:16,549 INFO  [MemStoreFlusher.0] regionserver.HStore: Added 
> hdfs://master:9000/hbase/data/default/tbguid/ded050ffeacd83613a0e3db7baaee4f0/cf1/cabc231951cb473695c3bac045456cee,
>  entries=1984, sequenceid=1450574203, filesize=83.0 K
> 2014-07-08 10:29:16,549 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Finished memstore flush of ~429 K/439296, currentsize=0/0 for region 
> tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
>  in 53ms, sequenceid=1450574203, compaction requested=true
> 2014-07-08 10:29:16,550 INFO 
> [regionserver60020-smallCompactions-1397552664633] regionserver.HRegion: 
> Starting compaction on cf1 in region 
> tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
> 2014-07-08 10:29:16,550 INFO 
> [regionserver60020-smallCompactions-1397552664633] regionserver.HStore: 
> Starting compaction of 3 file(s) in cf1 of 
> tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
>  into 
> tmpdir=hdfs://master:9000/hbase/data/default/tbguid/ded050ffeacd83613a0e3db7baaee4f0/.tmp,
>  totalSize=9.0 M
> 2014-07-08 10:29:17,053 INFO 
> [regionserver60020-smallCompactions-1397552664633] regionserver.HStore: 
> Completed compaction of 3 file(s) in cf1 of 
> tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
>  into e923e5ad17c54bf4934ef07e3d0b5496(size=9.0 M), total size for store is 
> 582.3 M. This selection was in queue for 0sec, and took 0sec to execute.
> 
> I suspect it's the gc problem so I checked the gc log and found it kept print 
> logs like this(both when the problem happens or not):
> 2014-07-08T10:26:45.273+0800: 7233757.674: [GC 7233757.674: [ParNew: 
> 112969K->8702K(118016K), 0.0652040 secs] 5262052K->5160850K(8375552K), 
> 0.0653760 secs] [Times: user=0.00 sys=0.00, real=0.07 secs]
> 2014-07-08T10:26:46.836+0800: 7233759.237: [GC 7233759.237: [ParNew: 
> 113662K->6949K(118016K), 0.0342530 secs] 5265810K->5163641K(8375552K), 
> 0.0343870 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-07-08T10:26:48.422+0800: 7233760.823: [GC 7233760.823: [ParNew: 
> 111909K->4855K(118016K), 0.0269810 secs] 5268601K->5165318K(8375552K), 
> 0.0271120 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-07-08T10:26:49.896+0800: 7233762.296: [GC 7233762.296: [ParNew: 
> 109815K->7721K(118016K), 0.0251810 secs] 5270278K->5168184K(8375552K), 
> 0.0253300 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:26:51.334+0800: 7233763.734: [GC 7233763.735: [ParNew: 
> 112681K->8670K(118016K), 0.0233590 secs] 5273144K->5171935K(8375552K), 
> 0.0237200 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-07-08T10:26:52.701+0800: 7233765.102: [GC 7233765.102: [ParNew: 
> 113390K->7880K(118016K), 0.0266700 secs] 5276656K->5175725K(8375552K), 
> 0.0268250 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-07-08T10:26:54.166+0800: 7233766.566: [GC 7233766.566: [ParNew: 
> 112840K->7022K(118016K), 0.0249410 secs] 5280685K->5178575K(8375552K), 
> 0.0250850 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:26:55.586+0800: 7233767.986: [GC 7233767.987: [ParNew: 
> 111982K->6807K(118016K), 0.0317690 secs] 5283535K->5182134K(8375552K), 
> 0.0319390 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-07-08T10:26:57.006+0800: 7233769.407: [GC 7233769.407: [ParNew: 
> 111767K->7417K(118016K), 0.0199740 secs] 5287094K->5185610K(8375552K), 
> 0.0201160 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:26:58.520+0800: 7233770.921: [GC 7233770.921: [ParNew: 
> 112377K->6285K(118016K), 0.0219640 secs] 5290570K->5188051K(8375552K), 
> 0.0220970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:00.077+0800: 7233772.478: [GC 7233772.478: [ParNew: 
> 111245K->8191K(118016K), 0.0215280 secs] 5293011K->5189957K(8375552K), 
> 0.0216740 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:01.571+0800: 7233773.971: [GC 7233773.972: [ParNew: 
> 113151K->8510K(118016K), 0.0185730 secs] 5294917K->5193731K(8375552K), 
> 0.0187220 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:02.958+0800: 7233775.359: [GC 7233775.359: [ParNew: 
> 113470K->9384K(118016K), 0.0252470 secs] 5298691K->5197072K(8375552K), 
> 0.0253900 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:04.416+0800: 7233776.817: [GC 7233776.817: [ParNew: 
> 114344K->6721K(118016K), 0.0203040 secs] 5302032K->5200076K(8375552K), 
> 0.0204510 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:05.891+0800: 7233778.291: [GC 7233778.291: [ParNew: 
> 111681K->6341K(118016K), 0.0434460 secs] 5305036K->5203021K(8375552K), 
> 0.0436020 secs] [Times: user=0.00 sys=0.00, real=0.05 secs]
> 2014-07-08T10:27:07.497+0800: 7233779.898: [GC 7233779.898: [ParNew: 
> 111301K->9838K(118016K), 0.0177230 secs] 5307981K->5206518K(8375552K), 
> 0.0178770 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:08.904+0800: 7233781.305: [GC 7233781.305: [ParNew: 
> 114788K->7575K(118016K), 0.0231210 secs] 5311469K->5209993K(8375552K), 
> 0.0233670 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:10.305+0800: 7233782.706: [GC 7233782.706: [ParNew: 
> 112535K->7616K(118016K), 0.0226150 secs] 5314953K->5213487K(8375552K), 
> 0.0227620 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:11.942+0800: 7233784.343: [GC 7233784.343: [ParNew: 
> 112576K->7835K(118016K), 0.0244290 secs] 5318447K->5217156K(8375552K), 
> 0.0245700 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-07-08T10:27:13.565+0800: 7233785.966: [GC 7233785.966: [ParNew: 
> 112795K->7254K(118016K), 0.0206670 secs] 5322116K->5220093K(8375552K), 
> 0.0207760 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:15.107+0800: 7233787.507: [GC 7233787.508: [ParNew: 
> 112101K->5551K(118016K), 0.0213290 secs] 5324940K->5222345K(8375552K), 
> 0.0214360 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-07-08T10:27:16.818+0800: 7233789.218: [GC 7233789.218: [ParNew: 
> 110511K->8559K(118016K), 0.0326800 secs] 5327305K->5225353K(8375552K), 
> 0.0328540 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 
> I also anaylzed the gc log and found the max pause of gc is only around 2 
> seconds. So I don't see any gc problem. Can anyone point out the possible 
> cause of this problem?
> 

Reply via email to