RE: NiFi fails on cluster nodes

Saip, Alexander (NIH/CC/BTRIS) [C] Mon, 15 Oct 2018 06:39:55 -0700

Hi Mike and Bryan,

I’ve installed and started ZooKeeper 3.4.13 and re-started a single NiFi node 
so far. Here is the error from the NiFi log:


2018-10-15 09:19:48,371 ERROR [Process Cluster Protocol Request-1] 
o.a.nifi.security.util.CertificateUtils The incoming request did not contain 
client certificates and thus the DN cannot be extracted. Check that the other 
endpoint is providing a complete client certificate chain
2018-10-15 09:19:48,425 INFO [main] o.a.nifi.controller.StandardFlowService 
Connecting Node: 0.0.0.0:8008
2018-10-15 09:19:48,452 ERROR [Process Cluster Protocol Request-2] 
o.a.nifi.security.util.CertificateUtils The incoming request did not contain 
client certificates and thus the DN cannot be extracted. Check that the other 
endpoint is providing a complete client certificate chain
2018-10-15 09:19:48,456 WARN [main] o.a.nifi.controller.StandardFlowService 
Failed to connect to cluster due to: 
org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling 
'CONNECTION_REQUEST' protocol message due to: 
javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate

It is likely extraneous to NiFi, but does this mean that we need install a cert 
into ZooKeeper? Right now, both apps are running on the same box.

Thank you.

From: Mike Thomsen <mikerthom...@gmail.com>
Sent: Monday, October 15, 2018 9:02 AM
To: users@nifi.apache.org
Subject: Re: NiFi fails on cluster nodes

http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html

See the properties that start with "nifi.zookeeper."

On Mon, Oct 15, 2018 at 8:58 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
<alexander.s...@nih.gov<mailto:alexander.s...@nih.gov>> wrote:
Mike,

I wonder if you could point me to instructions how to configure a cluster with 
an external instance of ZooKeeper? The NiFi Admin Guide talks exclusively about 
the embedded one.

Thanks again.

From: Mike Thomsen <mikerthom...@gmail.com<mailto:mikerthom...@gmail.com>>
Sent: Friday, October 12, 2018 10:17 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi fails on cluster nodes

It very well could become a problem down the road. The reason ZooKeeper is 
usually on a dedicated machine is that you want it to be able to have enough 
resources to always communicate within a quorum to reconcile configuration 
changes and feed configuration details to clients.

That particular message is just a warning message. From what I can tell, it's 
just telling you that no cluster coordinator has been elected and it's going to 
try to do something about that. It's usually a problem with embedded ZooKeeper 
because each node by default points to the version of ZooKeeper it fires up.

For a development environment, a VM with 2GB of RAM and 1-2 CPU cores should be 
enough to run an external ZooKeeper.

On Fri, Oct 12, 2018 at 9:47 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
<alexander.s...@nih.gov<mailto:alexander.s...@nih.gov>> wrote:
Thanks Mike. We will get an external ZooKeeper instance deployed. I guess 
co-locating it with one of the NiFi nodes shouldn’t be an issue, or will it? We 
are chronically short of hardware. BTW, does the following message in the logs 
point to some sort of problem with the embedded ZooKeeper?

2018-10-12 08:21:35,838 WARN [main] o.a.nifi.controller.StandardFlowService 
There is currently no Cluster Coordinator. This often happens upon restart of 
NiFi when running an embedded ZooKeeper. Will register this node to become the 
active Cluster Coordinator and will attempt to connect to cluster again
2018-10-12 08:21:35,838 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager 
CuratorLeaderElectionManager[stopped=false] Attempted to register Leader 
Election for role 'Cluster Coordinator' but this role is already registered
2018-10-12 08:21:42,090 INFO [Curator-Framework-0] 
o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2018-10-12 08:21:42,092 INFO [Curator-ConnectionStateManager-0] 
o.a.n.c.l.e.CuratorLeaderElectionManager 
org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@17900f5b<mailto:org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@17900f5b>
 Connection State changed to SUSPENDED

From: Mike Thomsen <mikerthom...@gmail.com<mailto:mikerthom...@gmail.com>>
Sent: Friday, October 12, 2018 8:33 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi fails on cluster nodes

Also, in a production environment NiFi should have its own dedicated ZooKeeper 
cluster to be on the safe side. You should not reuse ZooKeeper quora (ex. have 
HBase and NiFi point to the same quorum).

On Fri, Oct 12, 2018 at 8:29 AM Mike Thomsen 
<mikerthom...@gmail.com<mailto:mikerthom...@gmail.com>> wrote:
Alexander,

I am pretty sure your problem is here: 
nifi.state.management.embedded.zookeeper.start=true

That spins up an embedded ZooKeeper, which is generally intended to be used for 
local development. For example, HBase provides the same feature, but it is 
intended to allow you to test a real HBase client application against a single 
node of HBase running locally.

What you need to try is these steps:

1. Set up an external ZooKeeper instance (or set up 3 in a quorum; must be odd 
numbers)
2. Update nifi.properties on each node to use the external ZooKeeper setup.
3. Restart all of them.

See if that works.

Mike

On Fri, Oct 12, 2018 at 8:13 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
<alexander.s...@nih.gov<mailto:alexander.s...@nih.gov>> wrote:
nifi.cluster.node.protocol.port=11443 by default on all nodes, I haven’t 
touched that property. Yesterday, we discovered some issues preventing two of 
the boxes from communicating. Now, they can talk okay. Ports 11443, 2181 and 
3888 are explicitly open in iptables, but clustering still doesn’t happen. The 
log files are filled up with errors like this:

2018-10-12 07:59:08,494 ERROR [Curator-Framework-0] 
o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Is there anything else we should check?

From: Nathan Gough <thena...@gmail.com<mailto:thena...@gmail.com>>
Sent: Thursday, October 11, 2018 9:12 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi fails on cluster nodes

You may also need to explicitly open ‘nifi.cluster.node.protocol.port’ on all 
nodes to allow cluster communication for cluster heartbeats etc.

From: ashmeet kandhari 
<ashmeetkandhar...@gmail.com<mailto:ashmeetkandhar...@gmail.com>>
Reply-To: <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Thursday, October 11, 2018 at 9:09 AM
To: <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: NiFi fails on cluster nodes

Hi Alexander,

Can you verify by pinging if the 3 nodes (tcp ping) or run nifi in standalone 
mode and see if you can ping them from other 2 servers just to be sure if they 
can communicate with one another.

On Thu, Oct 11, 2018 at 11:49 AM Saip, Alexander (NIH/CC/BTRIS) [C] 
<alexander.s...@nih.gov<mailto:alexander.s...@nih.gov>> wrote:
How do I do that? The nifi.properties file on each node includes 
‘nifi.state.management.embedded.zookeeper.start=true’, so I assume Zookeeper 
does start.

From: ashmeet kandhari 
<ashmeetkandhar...@gmail.com<mailto:ashmeetkandhar...@gmail.com>>
Sent: Thursday, October 11, 2018 4:36 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: NiFi fails on cluster nodes

Can you see if zookeeper node is up and running and can connect to the nifi 
nodes

On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] 
<alexander.s...@nih.gov<mailto:alexander.s...@nih.gov>> wrote:
Hello,

We have three NiFi 1.7.1 nodes originally configured as independent instances, 
each on its own server. There is no firewall between them. When I tried to 
build a cluster following instructions 
here<https://mintopsblog.com/2017/11/12/apache-nifi-cluster-configuration/>, 
NiFi failed to start on all of them, despite the fact that I even set 
nifi.cluster.protocol.is.secure=false in the nifi.properties file on each node. 
Here is the error in the log files:

2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi...
2018-10-10 13:57:07,745 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
Determined default nifi.properties path to be 
'/opt/nifi-1.7.1/./conf/nifi.properties'
2018-10-10 13:57:07,748 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader 
Loaded 125 properties from /opt/nifi-1.7.1/./conf/nifi.properties
2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125 properties
2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener Started 
Bootstrap Listener, Listening for incoming requests on port 43744
2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to launch 
NiFi due to java.net.ConnectException: Connection timed out (Connection timed 
out)
java.net.ConnectException: Connection timed out (Connection timed out)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at java.net.Socket.connect(Socket.java:538)
        at 
org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:100)
        at org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83)
        at org.apache.nifi.NiFi.<init>(NiFi.java:102)
        at org.apache.nifi.NiFi.<init>(NiFi.java:71)
        at org.apache.nifi.NiFi.main(NiFi.java:292)
2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating 
shutdown of Jetty web server...
2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web server 
shutdown completed (nicely or otherwise).

Without clustering, the instances had no problem starting. Since this is our 
first experiment building a cluster, I’m not sure where to look for clues.

Thanks in advance,

Alexander

RE: NiFi fails on cluster nodes

Reply via email to