Hi all,

I am currently experimenting with running a HA HDFS cluster in docker containers. I have successfully created an HA cluster using Ubuntu as my base image for running the namenode, datanode and journalnodes. The zookeeper instance runs on an image built using Alpine linux as the base and works pretty well.

I attempted to get the namenode, datanode and journalnodes running using Alpine linux as the base image. The datanode and journalnodes seem to work fine. However, while the namenodes start correctly, they seem to disconnect from Zookeeper quite often and will transition into neutral mode. This results in the namenodes being in "startup mode" forever.

These are the logs from the active namenode:
16/08/25 22:40:28 INFO blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds 16/08/25 22:40:28 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes. 16/08/25 22:40:28 INFO fs.TrashPolicyDefault: The configured checkpoint interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion instead 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Total number of blocks = 0 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of invalid blocks = 0 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of under-replicated blocks = 0 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of over-replicated blocks = 0 16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of blocks being written = 0 16/08/25 22:40:28 INFO hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 687 msec 16/08/25 22:40:28 INFO ha.ZKFailoverController: Successfully transitioned NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 to active state 16/08/25 22:40:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: `/tmp': File exists
16/08/25 22:40:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command getfileinfo is: 0 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command mkdirs is: 0 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command listStatus is: 0 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command * is: 0 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command setPermission is: 0 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command getfileinfo is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command mkdirs is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command listStatus is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command * is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command setPermission is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command getfileinfo is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command mkdirs is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command listStatus is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command * is: 1 16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command setPermission is: 1 16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x156c3dc679a0003, likely server has closed socket, closing socket connection and attempting reconnect 16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected. Entering neutral mode... 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Opening socket connection to server m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to authenticate using SASL (unknown error) 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Socket connection established to m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Session establishment complete on server m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid = 0x156c3dc679a0003, negotiated timeout = 5000
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Session connected.
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Checking for any old active which needs to be fenced... 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Old node exists: 0a036e733112036e6e311a136d3965646435312d6e6e312e6d39656464353120d43e28d33e 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: But old node has our own data, so don't need to fence it. 16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Writing znode /mycluster/ns1/ActiveBreadCrumb to indicate that the local node is the most recent active... 16/08/25 22:41:41 INFO ha.ZKFailoverController: Trying to make NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 active... 16/08/25 22:41:41 INFO ha.ZKFailoverController: Successfully transitioned NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 to active state 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command getfileinfo is: 0 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command mkdirs is: 0 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command listStatus is: 0 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command * is: 0 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command setPermission is: 0 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command getfileinfo is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command mkdirs is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command listStatus is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command * is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command setPermission is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command getfileinfo is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command mkdirs is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command listStatus is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command * is: 1 16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command setPermission is: 1
16/08/25 22:42:28 INFO namenode.FSNamesystem: Roll Edit Log from 172.18.0.8
16/08/25 22:42:28 INFO namenode.FSEditLog: Rolling edit logs
16/08/25 22:42:28 INFO namenode.FSEditLog: Ending log segment 1
16/08/25 22:42:28 INFO namenode.FSEditLog: Number of transactions: 4 Total time for transactions(ms): 16 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 40 17 16/08/25 22:42:28 INFO namenode.FSEditLog: Number of transactions: 4 Total time for transactions(ms): 16 Number of transactions batched in Syncs: 0 Number of syncs: 4 SyncTimes(ms): 50 18 16/08/25 22:42:28 INFO namenode.FileJournalManager: Finalizing edits file /var/lib/hadoop/name/current/edits_inprogress_0000000000000000001 -> /var/lib/hadoop/name/current/edits_0000000000000000001-0000000000000000004
16/08/25 22:42:28 INFO namenode.FSEditLog: Starting log segment at 5

From the standby namenode:
16/08/25 22:40:33 INFO ha.HealthMonitor: Entering state SERVICE_HEALTHY
16/08/25 22:40:33 INFO ha.ZKFailoverController: Local service NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 entered state: SERVICE_HEALTHY 16/08/25 22:40:33 INFO ha.ZKFailoverController: ZK Election indicated that NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 should become standby 16/08/25 22:40:33 INFO ha.ZKFailoverController: Successfully transitioned NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 to standby state 16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x156c3dc679a0004, likely server has closed socket, closing socket connection and attempting reconnect 16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected. Entering neutral mode... 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Opening socket connection to server m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to authenticate using SASL (unknown error) 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Socket connection established to m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session 16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Session establishment complete on server m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid = 0x156c3dc679a0004, negotiated timeout = 5000
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Session connected.
16/08/25 22:41:41 INFO ha.ZKFailoverController: ZK Election indicated that NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 should become standby 16/08/25 22:41:41 INFO ha.ZKFailoverController: Successfully transitioned NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 to standby state 16/08/25 22:42:28 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:42:28 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@3b1f16f0 expecting start txid #1 16/08/25 22:42:28 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:42:28 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 1 16/08/25 22:42:28 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 1 16/08/25 22:42:29 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 133 edits # 4 loaded in 0 seconds
16/08/25 22:42:29 INFO ha.EditLogTailer: Loaded 4 edits starting from txid 0
16/08/25 22:44:29 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:44:29 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c92162 expecting start txid #5 16/08/25 22:44:29 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:44:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 5 16/08/25 22:44:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 5 16/08/25 22:44:29 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:44:29 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 4
16/08/25 22:46:29 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:46:29 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@1f890896 expecting start txid #7 16/08/25 22:46:29 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:46:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 7 16/08/25 22:46:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 7 16/08/25 22:46:29 INFO namenode.FSImage: Edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:46:29 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 6
16/08/25 22:48:29 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:48:29 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4da07bde expecting start txid #9 16/08/25 22:48:29 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:48:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 9 16/08/25 22:48:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 9 16/08/25 22:48:30 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:48:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 8
16/08/25 22:50:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:50:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@350cea58 expecting start txid #11 16/08/25 22:50:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:50:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 11 16/08/25 22:50:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 11 16/08/25 22:50:30 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:50:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 10 16/08/25 22:52:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:52:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a6c8f51 expecting start txid #13 16/08/25 22:52:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:52:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 13 16/08/25 22:52:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 13 16/08/25 22:52:30 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:52:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 12 16/08/25 22:54:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:54:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@8212238 expecting start txid #15 16/08/25 22:54:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:54:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 15 16/08/25 22:54:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 15 16/08/25 22:54:30 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:54:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 14 16/08/25 22:56:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:56:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@58d1e7d8 expecting start txid #17 16/08/25 22:56:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:56:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 17 16/08/25 22:56:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 17 16/08/25 22:56:30 INFO namenode.FSImage: Edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:56:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 16 16/08/25 22:58:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020 16/08/25 22:58:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@791c80ff expecting start txid #19 16/08/25 22:58:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster 16/08/25 22:58:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 19 16/08/25 22:58:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 19 16/08/25 22:58:30 INFO namenode.FSImage: Edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:58:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 18

Any ideas where I can start to debug this? This seems to be a problem specific to the zookeeper failover controller. I tried this docker image that starts a standalone cluster with 2 namenodes, datanodes and journal nodes, but no zookeeper failover controller and it works correctly on Alpline linux: https://github.com/gustavonalle/yarn-docker

Cheers,
Francis

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Reply via email to