Hi

We are running OpenTSDB 2.2 with HBase 1.1.2 and are having problems
with RegionServers that are shutting down sporadically from alleged GC
pauses.

We run 2 OpenTSDB machines and 30 region servers. 8 GB heaps. The
region servers are collocated with data nodes and yarn jobs. Every
region server receive around 1000 req/s each.

Even though the logs says it's a GC pause, monitoring doesn't report
the actual pause. The rather suspicious log line says wal.FSHLog: Slow
sync cost: 56257 ms just after the GC pause detector warned and aborts
the region server. CPU, memory, network looks fine.

We have had this problem for a long time and have been troubleshooting
thoroughly, but we are still clueless.

Any advice would be helpful.

Cheers,
-Kristoffer

[1] https://www.dropbox.com/s/m2cuutcdh81itay/hbase.log?dl=0

Reply via email to