Hi Jeremy,

The issue we are facing is that we need to keep the nifi.web.http.host blank
in order to have a working swarm setup, but this conflicts with the way nifi
does cluster communication. Let me try to explain:

I have 2 nifi instances (cluster nodes) in a docker swarm connected to
zookeeper (also running in the docker swarm).

- stack1_nifi1 running on port 8080 on centos-a
- stack1_nifi2 running on port 8085 on centos-b

(stack1_nifi1 and stack1_nifi2 are swarm service names and are made
available in the docker network via DNS).

My Nifi config :

# Leave blank so that it binds to all possible interfaces
nifi.web.http.host=
nifi.web.http.port=8080  #(8085 on the other node)

nifi.cluster.is.node=true
# Define the cluster node (hostname) address to uniquely identify this node.
nifi.cluster.node.address=stack1_nifi1 #(stack1_nifi2 on the other node)
nifi.cluster.node.protocol.port=10001


In the NiFi logs I notice this :

2017-03-17 11:44:45,298 INFO [main]
o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster
Coordinator is located at stack1_nifi2:10001; will use this address for
sending heartbeat messages
2017-03-17 11:44:45,433 INFO [Process Cluster Protocol Request-1]
o.a.n.c.c.flow.PopularVoteFlowElection Vote cast by localhost:8085; this
flow now has 1 votes

In the first line the cluster node address is used, but in the second one it
seems the nifi.web.http.host is used. So the nodeIds are not using the
nifi.cluster.node.address, but seem to default to the empty
nifi.web.http.host entry (defaults to localhost).


Same thing can be seen here:

2017-03-17 11:44:50,517 INFO [main] o.a.n.c.c.node.NodeClusterCoordinator
Resetting cluster node statuses from
{localhost:8080=NodeConnectionStatus[nodeId=localhost:8080,
state=CONNECTING, updateId=3],
localhost:8085=NodeConnectionStatus[nodeId=localhost:8085, state=CONNECTING,
updateId=5]} to {localhost:8080=NodeConnectionStatus[nodeId=localhost:8080,
state=CONNECTING, updateId=3],
localhost:8085=NodeConnectionStatus[nodeId=localhost:8085, state=CONNECTING,
updateId=5]}

Shouldn't Nifi always use the nifi.cluster.node.address to generate the
nodeIds ? 

It should also use that setting to send replication requests I guess :

2017-03-10 06:03:59,014 WARN [Replicate Request Thread-7]
o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET
/nifi-api/flow/current-user to localhost:8085 due to {}

Because my Nifi cluster seems to be up and running (I see heartbeats going
back and forth), but I cannot access the UI due the replicate error above.

The nifi running on centos-a:8080 is trying to do a request to
localhost:8085 where it should go to centos-b:8085. (in order to that it
should use the nifi.cluster.node.address).





Jeremy Dyer wrote
> Raf - Ok so good news and bad news. Good news its working for me. Bad news
> its working for me =) Here is the complete list of things that I changed.
> Hopefully this can at least really help narrow down what is causing the
> issue.
> 
> - I ran on a single machine. All that was available to me while at the
> airport.
> - I added a "network" section to the end of the docker-compose.yml file. I
> think you might already have that and this was just a snippet in your
> gist?
> - I removed the COPY from the Dockerfile around the custom processors
> since
> I don't have those.
> 
> In my mind the most likely issue is something around Docker swarm
> networking.





--
View this message in context: 
http://apache-nifi-users-list.2361937.n4.nabble.com/Nifi-1-1-0-cluster-on-Docker-Swarm-tp1229p1266.html
Sent from the Apache NiFi Users List mailing list archive at Nabble.com.

Reply via email to