Hi Jeremy, The issue we are facing is that we need to keep the nifi.web.http.host blank in order to have a working swarm setup, but this conflicts with the way nifi does cluster communication. Let me try to explain:
I have 2 nifi instances (cluster nodes) in a docker swarm connected to zookeeper (also running in the docker swarm). - stack1_nifi1 running on port 8080 on centos-a - stack1_nifi2 running on port 8085 on centos-b (stack1_nifi1 and stack1_nifi2 are swarm service names and are made available in the docker network via DNS). My Nifi config : # Leave blank so that it binds to all possible interfaces nifi.web.http.host= nifi.web.http.port=8080 #(8085 on the other node) nifi.cluster.is.node=true # Define the cluster node (hostname) address to uniquely identify this node. nifi.cluster.node.address=stack1_nifi1 #(stack1_nifi2 on the other node) nifi.cluster.node.protocol.port=10001 In the NiFi logs I notice this : 2017-03-17 11:44:45,298 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at stack1_nifi2:10001; will use this address for sending heartbeat messages 2017-03-17 11:44:45,433 INFO [Process Cluster Protocol Request-1] o.a.n.c.c.flow.PopularVoteFlowElection Vote cast by localhost:8085; this flow now has 1 votes In the first line the cluster node address is used, but in the second one it seems the nifi.web.http.host is used. So the nodeIds are not using the nifi.cluster.node.address, but seem to default to the empty nifi.web.http.host entry (defaults to localhost). Same thing can be seen here: 2017-03-17 11:44:50,517 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator Resetting cluster node statuses from {localhost:8080=NodeConnectionStatus[nodeId=localhost:8080, state=CONNECTING, updateId=3], localhost:8085=NodeConnectionStatus[nodeId=localhost:8085, state=CONNECTING, updateId=5]} to {localhost:8080=NodeConnectionStatus[nodeId=localhost:8080, state=CONNECTING, updateId=3], localhost:8085=NodeConnectionStatus[nodeId=localhost:8085, state=CONNECTING, updateId=5]} Shouldn't Nifi always use the nifi.cluster.node.address to generate the nodeIds ? It should also use that setting to send replication requests I guess : 2017-03-10 06:03:59,014 WARN [Replicate Request Thread-7] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/current-user to localhost:8085 due to {} Because my Nifi cluster seems to be up and running (I see heartbeats going back and forth), but I cannot access the UI due the replicate error above. The nifi running on centos-a:8080 is trying to do a request to localhost:8085 where it should go to centos-b:8085. (in order to that it should use the nifi.cluster.node.address). Jeremy Dyer wrote > Raf - Ok so good news and bad news. Good news its working for me. Bad news > its working for me =) Here is the complete list of things that I changed. > Hopefully this can at least really help narrow down what is causing the > issue. > > - I ran on a single machine. All that was available to me while at the > airport. > - I added a "network" section to the end of the docker-compose.yml file. I > think you might already have that and this was just a snippet in your > gist? > - I removed the COPY from the Dockerfile around the custom processors > since > I don't have those. > > In my mind the most likely issue is something around Docker swarm > networking. -- View this message in context: http://apache-nifi-users-list.2361937.n4.nabble.com/Nifi-1-1-0-cluster-on-Docker-Swarm-tp1229p1266.html Sent from the Apache NiFi Users List mailing list archive at Nabble.com.