Hi
Almost every night hbase master is closed. In error log I can see:
gc.log:
2017-03-23T01:59:27.239+0200: 41752.366: [GC (Allocation Failure)
2017-03-23T01:59:27.239+0200: 41752.366: [ParNew:
159203K->11611K(166464K), 0.0115189 secs] 177260K->29669K(536512K),
0.0117362 secs] [Times: user=0.08 sys=0.00, real=0.01 secs]
Heap
par new generation total 166464K, used 137930K [0x00000000c0000000,
0x00000000cb4a0000, 0x00000000d5550000)
eden space 147968K, 85% used [0x00000000c0000000,
0x00000000c7b5b8b8, 0x00000000c9080000)
from space 18496K, 62% used [0x00000000ca290000, 0x00000000cade6fa8,
0x00000000cb4a0000)
to space 18496K, 0% used [0x00000000c9080000, 0x00000000c9080000,
0x00000000ca290000)
concurrent mark-sweep generation total 370048K, used 18057K
[0x00000000d5550000, 0x00000000ebeb0000, 0x0000000100000000)
Metaspace used 55061K, capacity 56096K, committed 56400K,
reserved 1099776K
class space used 5899K, capacity 6255K, committed 6264K, reserved
1048576K
In master.log
2017-03-23 02:02:09,178 WARN
[master/nn3/192.168.80.51:16000-EventThread]
client.ConnectionManager$HConnectionImplementation: This client just
lost it's session with ZooKeeper, closing it. It will be recreated next
time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:585)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:517)
at
org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2017-03-23 02:02:10,579 FATAL [main-EventThread] master.HMaster: Master
server abort: loaded coprocessors are:
[org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor,
org.apache.hadoop.hbase.backup.master.BackupController,
org.apache.hadoop.hbase.security.visibility.VisibilityController]
2017-03-23 02:02:10,857 FATAL [main-EventThread] master.HMaster:
master:16000-0x15adbb9b9db078a,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181, baseZNode=/hbase-unsecure
master:16000-0x15adbb9b9db078a received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:585)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:517)
at
org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2017-03-23 02:02:10,090 INFO [main-SendThread(nn3:2181)]
zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session
0x15adbb9b9db078a has expired, closing socket connection
2017-03-23 02:02:09,181 WARN [nn3:16000.activeMasterManager-EventThread]
client.ConnectionManager$HConnectionImplementation: This client just
lost it's session with ZooKeeper, closing it. It will be recreated next
time someone needs it
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:585)
at
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:517)
at
org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:534)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2017-03-23 02:02:10,894 INFO [nn3:16000.activeMasterManager-EventThread]
client.ConnectionManager$HConnectionImplementation: Closing zookeeper
sessionid=0x25adbb9ba62075d
2017-03-23 02:02:10,894 INFO [nn3:16000.activeMasterManager-EventThread]
zookeeper.ClientCnxn: EventThread shut down
2017-03-23 02:02:10,876 INFO
[master/nn3/192.168.80.51:16000-EventThread]
client.ConnectionManager$HConnectionImplementation: Closing zookeeper
sessionid=0x25adbb9ba62075c
2017-03-23 02:02:10,897 INFO
[master/nn3/192.168.80.51:16000-EventThread] zookeeper.ClientCnxn:
EventThread shut down
2017-03-23 02:02:10,925 INFO [main-EventThread]
regionserver.HRegionServer: STOPPED: master:16000-0x15adbb9b9db078a,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181, baseZNode=/hbase-unsecure
master:16000-0x15adbb9b9db078a received expired from ZooKeeper, aborting
2017-03-23 02:02:10,935 INFO [main-EventThread] zookeeper.ClientCnxn:
EventThread shut down
2017-03-23 02:02:11,005 INFO [master/nn3/192.168.80.51:16000]
regionserver.HRegionServer: Stopping infoServer
2017-03-23 02:02:11,624 INFO
[nn3,16000,1490185417271_splitLogManager__ChoreService_1]
master.SplitLogManager$TimeoutMonitor: Chore: SplitLogManager Timeout
Monitor was stopped
2017-03-23 02:02:11,628 WARN [nn3,16000,1490185417271_ChoreService_1]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/backup-masters
2017-03-23 02:02:12,104 INFO [master/nn3/192.168.80.51:16000]
mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010
2017-03-23 02:02:11,628 WARN [nn3,16000,1490185417271_ChoreService_1]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/backup-masters
2017-03-23 02:02:12,104 INFO [master/nn3/192.168.80.51:16000]
mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010
2017-03-23 02:02:12,286 INFO [master/nn3/192.168.80.51:16000]
procedure2.ProcedureExecutor: Stopping the procedure executor
2017-03-23 02:02:12,336 INFO [master/nn3/192.168.80.51:16000]
wal.WALProcedureStore: Stopping the WAL Procedure Store
2017-03-23 02:02:13,044 WARN [nn3,16000,1490185417271_ChoreService_1]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/backup-masters
2017-03-23 02:02:14,497 INFO [master/nn3/192.168.80.51:16000]
regionserver.HRegionServer: stopping server nn3,16000,1490185417271
2017-03-23 02:02:14,514 INFO [master/nn3/192.168.80.51:16000]
regionserver.HRegionServer: stopping server nn3,16000,1490185417271; all
regions closed.
2017-03-23 02:02:14,532 INFO [master/nn3/192.168.80.51:16000]
hbase.ChoreService: Chore service for: nn3,16000,1490185417271 had
[[ScheduledChore: Name: CatalogJanitor-nn3:16000 Period: 300000 Unit:
MILLISECONDS], [ScheduledChore: Name: LogsCleaner Period: 60000 Unit:
MILLISECONDS], [ScheduledChore: Name:
nn3,16000,1490185417271-ExpiredMobFileCleanerChore Period: 86400 Unit:
SECONDS], [ScheduledChore: Name:
nn3,16000,1490185417271-MobCompactionChore Period: 604800 Unit:
SECONDS], [ScheduledChore: Name:
nn3,16000,1490185417271-ClusterStatusChore Period: 60000 Unit:
MILLISECONDS], [ScheduledChore: Name:
nn3,16000,1490185417271-BalancerChore Period: 300000 Unit:
MILLISECONDS], [ScheduledChore: Name: HFileCleaner Period: 60000 Unit:
MILLISECONDS], [ScheduledChore: Name:
nn3,16000,1490185417271-RegionNormalizerChore Period: 1800000 Unit:
MILLISECONDS]] on shutdown
2017-03-23 02:02:14,630 INFO [master/nn3/192.168.80.51:16000]
master.MasterMobCompactionThread: Waiting for Mob Compaction Thread to
finish...
2017-03-23 02:02:14,644 INFO [master/nn3/192.168.80.51:16000]
master.MasterMobCompactionThread: Waiting for Region Server Mob
Compaction Thread to finish...
2017-03-23 02:02:14,671 WARN [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
2017-03-23 02:02:15,684 WARN [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
2017-03-23 02:02:17,684 WARN [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
2017-03-23 02:02:21,685 WARN [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
2017-03-23 02:02:29,685 WARN [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
2017-03-23 02:02:45,686 WARN [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
2017-03-23 02:03:17,686 WARN [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
2017-03-23 02:04:21,686 WARN [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181,
exception=org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
2017-03-23 02:04:21,687 ERROR [master/nn3/192.168.80.51:16000]
zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 7 attempts
2017-03-23 02:04:21,687 WARN [master/nn3/192.168.80.51:16000]
zookeeper.ZKUtil: master:16000-0x15adbb9b9db078a,
quorum=bigdata33:2181,bigdata36:2181,nn3:2181, baseZNode=/hbase-unsecure
Unable to get data of znode /hbase-unsecure/master
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase-unsecure/master
...
hbase-site.xml:
<configuration>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
</property>
<property>
<name>hbase.bulkload.staging.dir</name>
<value>/apps/hbase/staging</value>
</property>
<property>
<name>hbase.client.keyvalue.maxsize</name>
<value>1048576</value>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>35</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>100</value>
</property>
<property>
<name>hbase.client.scanner.timeout.period</name>
<value>600000</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.visibility.VisibilityController,org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.visibility.VisibilityController,org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint,org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
</property>
<property>
<name>hbase.coprocessor.regionserver.classes</name>
<value>org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor</value>
</property>
<property>
<name>hbase.hregion.majorcompaction</name>
<value>604800000</value>
</property>
<property>
<name>hbase.hregion.majorcompaction.jitter</name>
<value>0.50</value>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>10737418240</value>
</property>
<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>4</value>
</property>
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>134217728</value>
</property>
<property>
<name>hbase.hregion.memstore.mslab.enabled</name>
<value>true</value>
</property>
<property>
<name>hbase.hstore.blockingStoreFiles</name>
<value>10</value>
</property>
<property>
<name>hbase.hstore.compaction.max</name>
<value>10</value>
</property>
<property>
<name>hbase.hstore.compactionThreshold</name>
<value>3</value>
</property>
<property>
<name>hbase.local.dir</name>
<value>${hbase.tmp.dir}/local</value>
</property>
<property>
<name>hbase.master.info.bindAddress</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hbase.master.info.port</name>
<value>16010</value>
</property>
<property>
<name>hbase.master.loadbalance.bytable</name>
<value>true</value>
</property>
<property>
<name>hbase.master.port</name>
<value>16000</value>
</property>
<property>
<name>hbase.master.ui.readonly</name>
<value>false</value>
</property>
<property>
<name>hbase.regionserver.global.memstore.size</name>
<value>0.4</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>30</value>
</property>
<property>
<name>hbase.regionserver.info.port</name>
<value>16030</value>
</property>
<property>
<name>hbase.regionserver.port</name>
<value>16020</value>
</property>
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.WALCellCodec</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://nn3:8020/apps/hbase/data</value>
</property>
<property>
<name>hbase.rpc.protection</name>
<value>authentication</value>
</property>
<property>
<name>hbase.rpc.timeout</name>
<value>90000</value>
</property>
<property>
<name>hbase.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hbase.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hbase.superuser</name>
<value>hbase</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/tmp/hbase-${user.name}</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>bigdata33,bigdata36,nn3</value>
</property>
<property>
<name>hbase.zookeeper.useMulti</name>
<value>true</value>
</property>
<property>
<name>hfile.block.cache.size</name>
<value>0.4</value>
</property>
<property>
<name>hfile.format.version</name>
<value>3</value>
</property>
<property>
<name>phoenix.query.timeoutMs</name>
<value>60000</value>
</property>
<property>
<name>replication.executor.workers</name>
<value>2</value>
</property>
<property>
<name>replication.sleep.before.failover</name>
<value>60000</value>
</property>
<property>
<name>zookeeper.recovery.retry</name>
<value>6</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>90000</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase-unsecure</value>
</property>
<property>
<name>zookeeper.znode.replication</name>
<value>replication</value>
</property>
<property>
<name>zookeeper.znode.replication.peers</name>
<value>peers</value>
</property>
<property>
<name>zookeeper.znode.replication.peers.state</name>
<value>peer-state</value>
</property>
<property>
<name>zookeeper.znode.replication.rs</name>
<value>rs</value>
</property>
</configuration>
Any hints?
--
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
https://www.facebook.com/allan.tuuring
+372 51 48 780