Not sure what's going on here, but NiFi does not require a cert to setup ZooKeeper.
Mike On Mon, Oct 15, 2018 at 9:39 AM Saip, Alexander (NIH/CC/BTRIS) [C] < alexander.s...@nih.gov> wrote: > Hi Mike and Bryan, > > > > I’ve installed and started ZooKeeper 3.4.13 and re-started a single NiFi > node so far. Here is the error from the NiFi log: > > > > 2018-10-15 09:19:48,371 ERROR [Process Cluster Protocol Request-1] > o.a.nifi.security.util.CertificateUtils The incoming request did not > contain client certificates and thus the DN cannot be extracted. Check that > the other endpoint is providing a complete client certificate chain > > 2018-10-15 09:19:48,425 INFO [main] > o.a.nifi.controller.StandardFlowService Connecting Node: 0.0.0.0:8008 > > 2018-10-15 09:19:48,452 ERROR [Process Cluster Protocol Request-2] > o.a.nifi.security.util.CertificateUtils The incoming request did not > contain client certificates and thus the DN cannot be extracted. Check that > the other endpoint is providing a complete client certificate chain > > 2018-10-15 09:19:48,456 WARN [main] > o.a.nifi.controller.StandardFlowService Failed to connect to cluster due > to: org.apache.nifi.cluster.protocol.ProtocolException: Failed marshalling > 'CONNECTION_REQUEST' protocol message due to: > javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate > > > > It is likely extraneous to NiFi, but does this mean that we need install a > cert into ZooKeeper? Right now, both apps are running on the same box. > > > > Thank you. > > > > *From:* Mike Thomsen <mikerthom...@gmail.com> > *Sent:* Monday, October 15, 2018 9:02 AM > *To:* users@nifi.apache.org > *Subject:* Re: NiFi fails on cluster nodes > > > > http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html > > > > See the properties that start with "nifi.zookeeper." > > > > On Mon, Oct 15, 2018 at 8:58 AM Saip, Alexander (NIH/CC/BTRIS) [C] < > alexander.s...@nih.gov> wrote: > > Mike, > > > > I wonder if you could point me to instructions how to configure a cluster > with an external instance of ZooKeeper? The NiFi Admin Guide talks > exclusively about the embedded one. > > > > Thanks again. > > > > *From:* Mike Thomsen <mikerthom...@gmail.com> > *Sent:* Friday, October 12, 2018 10:17 AM > *To:* users@nifi.apache.org > *Subject:* Re: NiFi fails on cluster nodes > > > > It very well could become a problem down the road. The reason ZooKeeper is > usually on a dedicated machine is that you want it to be able to have > enough resources to always communicate within a quorum to reconcile > configuration changes and feed configuration details to clients. > > > > That particular message is just a warning message. From what I can tell, > it's just telling you that no cluster coordinator has been elected and it's > going to try to do something about that. It's usually a problem with > embedded ZooKeeper because each node by default points to the version of > ZooKeeper it fires up. > > > > For a development environment, a VM with 2GB of RAM and 1-2 CPU cores > should be enough to run an external ZooKeeper. > > > > On Fri, Oct 12, 2018 at 9:47 AM Saip, Alexander (NIH/CC/BTRIS) [C] < > alexander.s...@nih.gov> wrote: > > Thanks Mike. We will get an external ZooKeeper instance deployed. I guess > co-locating it with one of the NiFi nodes shouldn’t be an issue, or will > it? We are chronically short of hardware. BTW, does the following message > in the logs point to some sort of problem with the embedded ZooKeeper? > > > > 2018-10-12 08:21:35,838 WARN [main] > o.a.nifi.controller.StandardFlowService There is currently no Cluster > Coordinator. This often happens upon restart of NiFi when running an > embedded ZooKeeper. Will register this node to become the active Cluster > Coordinator and will attempt to connect to cluster again > > 2018-10-12 08:21:35,838 INFO [main] > o.a.n.c.l.e.CuratorLeaderElectionManager > CuratorLeaderElectionManager[stopped=false] Attempted to register Leader > Election for role 'Cluster Coordinator' but this role is already registered > > 2018-10-12 08:21:42,090 INFO [Curator-Framework-0] > o.a.c.f.state.ConnectionStateManager State change: SUSPENDED > > 2018-10-12 08:21:42,092 INFO [Curator-ConnectionStateManager-0] > o.a.n.c.l.e.CuratorLeaderElectionManager > org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@17900f5b > Connection State changed to SUSPENDED > > > > *From:* Mike Thomsen <mikerthom...@gmail.com> > *Sent:* Friday, October 12, 2018 8:33 AM > *To:* users@nifi.apache.org > *Subject:* Re: NiFi fails on cluster nodes > > > > Also, in a production environment NiFi should have its own dedicated > ZooKeeper cluster to be on the safe side. You should not reuse ZooKeeper > quora (ex. have HBase and NiFi point to the same quorum). > > > > On Fri, Oct 12, 2018 at 8:29 AM Mike Thomsen <mikerthom...@gmail.com> > wrote: > > Alexander, > > > > I am pretty sure your problem is here: > *nifi.state.management.embedded.zookeeper.start=true* > > > > That spins up an embedded ZooKeeper, which is generally intended to be > used for local development. For example, HBase provides the same feature, > but it is intended to allow you to test a real HBase client application > against a single node of HBase running locally. > > > > What you need to try is these steps: > > > > 1. Set up an external ZooKeeper instance (or set up 3 in a quorum; must be > odd numbers) > > 2. Update nifi.properties on each node to use the external ZooKeeper setup. > > 3. Restart all of them. > > > > See if that works. > > > > Mike > > > > On Fri, Oct 12, 2018 at 8:13 AM Saip, Alexander (NIH/CC/BTRIS) [C] < > alexander.s...@nih.gov> wrote: > > *nifi.cluster.node.protocol.port=11443* by default on all nodes, I > haven’t touched that property. Yesterday, we discovered some issues > preventing two of the boxes from communicating. Now, they can talk okay. > Ports 11443, 2181 and 3888 are explicitly open in *iptables*, but > clustering still doesn’t happen. The log files are filled up with errors > like this: > > > > 2018-10-12 07:59:08,494 ERROR [Curator-Framework-0] > o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up > > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss > > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:728) > > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:857) > > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) > > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) > > at > org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > > > > Is there anything else we should check? > > > > *From:* Nathan Gough <thena...@gmail.com> > *Sent:* Thursday, October 11, 2018 9:12 AM > *To:* users@nifi.apache.org > *Subject:* Re: NiFi fails on cluster nodes > > > > You may also need to explicitly open ‘nifi.cluster.node.protocol.port’ on > all nodes to allow cluster communication for cluster heartbeats etc. > > > > *From: *ashmeet kandhari <ashmeetkandhar...@gmail.com> > *Reply-To: *<users@nifi.apache.org> > *Date: *Thursday, October 11, 2018 at 9:09 AM > *To: *<users@nifi.apache.org> > *Subject: *Re: NiFi fails on cluster nodes > > > > Hi Alexander, > > > > Can you verify by pinging if the 3 nodes (tcp ping) or run nifi in > standalone mode and see if you can ping them from other 2 servers just to > be sure if they can communicate with one another. > > > > On Thu, Oct 11, 2018 at 11:49 AM Saip, Alexander (NIH/CC/BTRIS) [C] < > alexander.s...@nih.gov> wrote: > > How do I do that? The *nifi.properties* file on each node includes ‘ > *nifi.state.management.embedded.zookeeper.start=true’*, so I assume > Zookeeper does start. > > > > *From:* ashmeet kandhari <ashmeetkandhar...@gmail.com> > *Sent:* Thursday, October 11, 2018 4:36 AM > *To:* users@nifi.apache.org > *Subject:* Re: NiFi fails on cluster nodes > > > > Can you see if zookeeper node is up and running and can connect to the > nifi nodes > > > > On Wed, Oct 10, 2018 at 7:34 PM Saip, Alexander (NIH/CC/BTRIS) [C] < > alexander.s...@nih.gov> wrote: > > Hello, > > > > We have three NiFi 1.7.1 nodes originally configured as independent > instances, each on its own server. There is no firewall between them. When > I tried to build a cluster following instructions here > <https://mintopsblog.com/2017/11/12/apache-nifi-cluster-configuration/>, > NiFi failed to start on all of them, despite the fact that I even set * > nifi.cluster.protocol.is.secure=false* in the *nifi.properties* file on > each node. Here is the error in the log files: > > > > 2018-10-10 13:57:07,506 INFO [main] org.apache.nifi.NiFi Launching NiFi... > > 2018-10-10 13:57:07,745 INFO [main] > o.a.nifi.properties.NiFiPropertiesLoader Determined default nifi.properties > path to be '/opt/nifi-1.7.1/./conf/nifi.properties' > > 2018-10-10 13:57:07,748 INFO [main] > o.a.nifi.properties.NiFiPropertiesLoader Loaded 125 properties from > /opt/nifi-1.7.1/./conf/nifi.properties > > 2018-10-10 13:57:07,755 INFO [main] org.apache.nifi.NiFi Loaded 125 > properties > > 2018-10-10 13:57:07,762 INFO [main] org.apache.nifi.BootstrapListener > Started Bootstrap Listener, Listening for incoming requests on port 43744 > > 2018-10-10 13:59:15,056 ERROR [main] org.apache.nifi.NiFi Failure to > launch NiFi due to java.net.ConnectException: Connection timed out > (Connection timed out) > > java.net.ConnectException: Connection timed out (Connection timed out) > > at java.net.PlainSocketImpl.socketConnect(Native Method) > > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) > > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > > at java.net.Socket.connect(Socket.java:589) > > at java.net.Socket.connect(Socket.java:538) > > at > org.apache.nifi.BootstrapListener.sendCommand(BootstrapListener.java:100) > > at > org.apache.nifi.BootstrapListener.start(BootstrapListener.java:83) > > at org.apache.nifi.NiFi.<init>(NiFi.java:102) > > at org.apache.nifi.NiFi.<init>(NiFi.java:71) > > at org.apache.nifi.NiFi.main(NiFi.java:292) > > 2018-10-10 13:59:15,058 INFO [Thread-1] org.apache.nifi.NiFi Initiating > shutdown of Jetty web server... > > 2018-10-10 13:59:15,059 INFO [Thread-1] org.apache.nifi.NiFi Jetty web > server shutdown completed (nicely or otherwise). > > > > Without clustering, the instances had no problem starting. Since this is > our first experiment building a cluster, I’m not sure where to look for > clues. > > > > Thanks in advance, > > > > Alexander > >