After I explicitly opened ports 2181 and 3888 on all the nodes, the NiFi instances start and run, but apparently, there is still no communication between them. Here is what gets written over and over in the nifi-app.log files:
2018-10-11 08:16:53,074 INFO [main] o.a.nifi.groups.StandardProcessGroup Template[id=f8a45adb-e68f-46c5-b627-4c9805ba74e7] added to StandardProcessGroup[identifier=31f52f8c-015d-1000-05e9-6fe2f3320429] 2018-10-11 08:16:53,080 INFO [main] o.a.nifi.groups.StandardProcessGroup Template[id=63489abd-fb73-4d26-9814-48e40511d77d] added to StandardProcessGroup[identifier=31f52f8c-015d-1000-05e9-6fe2f3320429] 2018-10-11 08:16:53,162 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow 2018-10-11 08:16:53,512 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: 0.0.0.0:8008 2018-10-11 08:17:00,781 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again 2018-10-11 08:17:00,781 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Attempted to register Leader Election for role 'Cluster Coordinator' but this role is already registered 2018-10-11 08:17:05,802 INFO [Curator-Framework-0] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED 2018-10-11 08:17:05,804 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@4240468b Connection State changed to SUSPENDED 2018-10-11 08:17:05,804 ERROR [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Is there anything else I missed? From: Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.s...@nih.gov> Sent: Thursday, October 11, 2018 6:50 AM To: users@nifi.apache.org Subject: RE: NiFi fails on cluster nodes How do I do that? The nifi.properties file on each node includes ‘nifi.state.management.embedded.zookeeper.start=true’, so I assume Zookeeper does start. From: ashmeet kandhari <ashmeetkandhar...@gmail.com<mailto:ashmeetkandhar...@gmail.com>> Sent: Thursday, October 11, 2018 4:36 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> Subject: Re: NiFi fails on cluster nodes Can you see if zookeeper node is up and running and can connect to the nifi nodes On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] <alexander.s...@nih.gov<mailto:alexander.s...@nih.gov>> wrote: Hello, We have three NiFi 1.7.1 nodes originally configured as independent instances, each on its own server. There is no firewall between them. When I tried to build a cluster following instructions here<https://mintopsblog.com/2017/11/12/apache-nifi-cluster-configuration/>, NiFi failed to start on all of them, despite the fact that I even set nifi.cluster.protocol.is.secure=false in the nifi.properties file on each node. Here is the error in the log files: 2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi... 2018-10-10 13:57:07,745 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties path to be '/opt/nifi-1.7.1/./conf/nifi.properties' 2018-10-10 13:57:07,748 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader Loaded 125 properties from /opt/nifi-1.7.1/./conf/nifi.properties 2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125 properties 2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener Started Bootstrap Listener, Listening for incoming requests on port 43744 2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to launch NiFi due to java.net.ConnectException: Connection timed out (Connection timed out) java.net.ConnectException: Connection timed out (Connection timed out) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:100) at org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83) at org.apache.nifi.NiFi.<init>(NiFi.java:102) at org.apache.nifi.NiFi.<init>(NiFi.java:71) at org.apache.nifi.NiFi.main(NiFi.java:292) 2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating shutdown of Jetty web server... 2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web server shutdown completed (nicely or otherwise). Without clustering, the instances had no problem starting. Since this is our first experiment building a cluster, I’m not sure where to look for clues. Thanks in advance, Alexander