If restarting is helping you then generally it should be easily found.  Is
mem usage spiking?  Are thread dumps revealing?

On Tue, Aug 17, 2021 at 6:12 AM Ryan Hendrickson <
ryan.andrew.hendrick...@gmail.com> wrote:

> We've rolled back to 1.11.4 in a couple scenarios.  We have also setup a
> cron job to restart 1.13.2 nodes once a day.
>
> Ryan
>
> On Tue, Aug 17, 2021 at 1:25 AM Axel Schwarz <axelkop...@emailn.de> wrote:
>
>> Hey Ryan,
>>
>> that sounds awefully familiar. What we successfully battled so far is the
>> load balancing problem.
>> You can find the whole plot of this drama in the mailing list archive,
>> title is "No Load Balancing since 1.13.2"
>>
>> Of course I will keep this thread updated, but unfortunately we had to
>> make the decision to role back completely to 1.12.1 because we just cannot
>> afford investing more time into this right now. But we'll certainly come
>> back to this later. We have to...
>>
>> --- Ursprüngliche Nachricht ---
>> Von: Ryan Hendrickson <ryan.andrew.hendrick...@gmail.com>
>> Datum: 17.08.2021 03:32:25
>> An: users@nifi.apache.org
>> Betreff: Re: Disfunctional cluster with version 1.13.2
>>
>> > Axel,
>> >    We've had significant issues with 1.13.2 in a Cluster as well.  We're
>> >
>> > working on a test config... Issues range from abandoned FlowFiles,
>> single
>> >
>> > Nodes locking the entire cluster, load balance relationships not
>> working,
>> >
>> > and undocumented nifi properties.  We're reluctant to move to 1.14.0
>> > because we haven't seen anything specifically fixed in it.
>> >
>> >    Please keep the community up-to-date on your findings.
>> >
>> > Ryan
>> >
>> > On Mon, Aug 16, 2021 at 11:00 AM Pierre Villard <
>> pierre.villard...@gmail.com>
>> >
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > What's the version of ZK?
>> > >
>> > > Thanks,
>> > > Pierre
>> > >
>> > > Le jeu. 12 août 2021 à 09:55, Axel Schwarz <axelkop...@emailn.de>
>> > a
>> > > écrit :
>> > >
>> > >> Dear all,
>> > >>
>> > >> after successfully battling the load balancing and installing Version
>> >
>> > >> 1.13.2 again in our 3 node production environment, we experienced
>> > another
>> > >> failure in the cluster resulting in a complete cut-off of the flow
>> > just
>> > >> 1,5h after the update.
>> > >> We noticed it just by trying to access the webinterface, which
>> > >> immediately after login showed something like:
>> > >>
>> > >> "Cannot replicate request to Node nifiHost1.contoso.com:8443
>> > because the
>> > >> node is not connected"
>> > >>
>> > >> There was nothing we could do through the webinterface aside from
>> > staring
>> > >> at this message and when looking at the live logs, there was nothing
>> >
>> > >> suspicious. The log moved on as if nothing happened.
>> > >> After a restart of the cluster everything was working fine again,
>> > but we
>> > >> saw, that the entire flow wasn't working for some period of time.
>> > This
>> > >> alone is really uncool, as we running a cluster for exactly that
>> > reason:
>> > >> The flow should keep working, even if some node decides to
>> malfunction
>> > for
>> > >> whatever reason.
>> > >>
>> > >> Digging a little deeper into the logs showed two noticable problems:
>> >
>> > >>
>> > >> 1. The Zookeeper is restarting every few minutes. Which in the log
>> > always
>> > >> looks like this:
>> > >>
>> > >> ------------------------------------(nifiHost1.contoso.com)
>> > >>
>> nifi-app.log------------------------------------------------------------------------------------
>> >
>> > >>
>> > >> 2021-08-11 12:02:39,187 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> 2021-08-11 12:02:39,187 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer shutting down
>> > >> 2021-08-11 12:02:39,194 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.z.server.FinalRequestProcessor shutdown of request processor
>> > complete
>> > >> 2021-08-11 12:02:39,196 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.z.server.SyncRequestProcessor Shutting down
>> > >> 2021-08-11 12:02:39,196 INFO [SyncThread:1]
>> > >> o.a.z.server.SyncRequestProcessor SyncRequestProcessor exited!
>> > >> 2021-08-11 12:02:39,199 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer minSessionTimeout set to 4000
>> >
>> > >> 2021-08-11 12:02:39,200 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer maxSessionTimeout set to 40000
>> >
>> > >> 2021-08-11 12:02:39,200 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer Created server with tickTime
>> > 2000
>> > >> minSessionTimeout 4000 maxSessionTimeout 40000 datadir
>> > >> /opt/nifi/logs/zookeeper/version-2 snapdir
>> > >> /opt/nifi/state/zookeeper/version-2
>> > >> 2021-08-11 12:02:39,276 ERROR [Timer-Driven Process
>> > >> Thread-27-EventThread] org.apache.zookeeper.ClientCnxn Error while
>> > calling
>> > >> watcher
>> > >> java.lang.NullPointerException: null
>> > >>         at
>> > >>
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535)
>> >
>> > >>         at
>> > >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
>> >
>> > >> 2021-08-11 12:02:39,277 INFO [main-EventThread]
>> > >> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
>> > >> 2021-08-11 12:02:40,001 INFO [main-EventThread]
>> > >> o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
>> > >> 2021-08-11 12:02:40,006 INFO [Curator-ConnectionStateManager-0]
>> >
>> > >> o.a.n.c.l.e.CuratorLeaderElectionManager
>> > >>
>> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1d37e350
>> >
>> > >> Connection State changed to SUSPENDED
>> > >> 2021-08-11 12:02:40,006 INFO [Curator-ConnectionStateManager-0]
>> >
>> > >> o.a.n.c.l.e.CuratorLeaderElectionManager
>> > >>
>> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1908642e
>> >
>> > >> Connection State changed to SUSPENDED
>> > >> 2021-08-11 12:02:40,006 INFO [Curator-ConnectionStateManager-0]
>> >
>> > >> o.a.n.c.l.e.CuratorLeaderElectionManager
>> > >>
>> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1d37e350
>> >
>> > >> Connection State changed to RECONNECTED
>> > >> 2021-08-11 12:02:40,006 INFO [Curator-ConnectionStateManager-0]
>> >
>> > >> o.a.n.c.l.e.CuratorLeaderElectionManager
>> > >>
>> org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener@1908642e
>> >
>> > >> Connection State changed to RECONNECTED
>> > >> 2021-08-11 12:02:40,014 INFO [main-EventThread]
>> > >> o.a.c.framework.imps.EnsembleTracker New config event received:
>> > {}
>> > >> 2021-08-11 12:02:40,127 INFO [Clustering Tasks Thread-2]
>> > >> o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2021-08-11
>> >
>> > >> 12:02:39,176 and sent to nifiHost3.contoso.com:54321 at 2021-08-11
>> >
>> > >> 12:02:40,127; send took 951 millis
>> > >> 2021-08-11 12:02:40,186 ERROR [Timer-Driven Process
>> > >> Thread-27-EventThread] org.apache.zookeeper.ClientCnxn Error while
>> > calling
>> > >> watcher
>> > >> java.lang.NullPointerException: null
>> > >>         at
>> > >>
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:535)
>> >
>> > >>         at
>> > >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
>> >
>> > >>
>> > >>
>> > >>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >
>> > >>
>> > >> This happens, as mentioned, every few minutes and looks like this
>> > when
>> > >> condensed through grep:
>> > >>
>> > >> ------------------------------------(nifiHost1.contoso.com)
>> nifi-app.log
>> >
>> > >> | grep "ZooKeeperServer Shutting
>> > >> down"---------------------------------------------
>> > >>
>> > >> ./nifi-app.log:4237:2021-08-11 12:02:39,187 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:4378:2021-08-11 12:03:22,084 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:4447:2021-08-11 12:03:57,639 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:4563:2021-08-11 12:04:18,921 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:4629:2021-08-11 12:04:33,199 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:4650:2021-08-11 12:05:03,939 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:4801:2021-08-11 12:05:39,348 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:5327:2021-08-11 12:06:56,736 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:7250:2021-08-11 12:21:41,464 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:7703:2021-08-11 12:24:54,244 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:8148:2021-08-11 12:27:21,093 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:8190:2021-08-11 12:27:37,820 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:8350:2021-08-11 12:28:37,417 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:8797:2021-08-11 12:33:25,184 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:8813:2021-08-11 12:33:39,672 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:8971:2021-08-11 12:34:22,003 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:8984:2021-08-11 12:34:36,511 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:9349:2021-08-11 12:36:29,802 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:9376:2021-08-11 12:36:41,541 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:9491:2021-08-11 12:37:12,390 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:9504:2021-08-11 12:37:25,888 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:9676:2021-08-11 12:39:22,775 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:9842:2021-08-11 12:40:49,396 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:10251:2021-08-11 12:43:38,807 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:10397:2021-08-11 12:44:36,326 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:11405:2021-08-11 12:52:26,384 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:11629:2021-08-11 12:53:33,054 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:11827:2021-08-11 12:54:43,783 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:11990:2021-08-11 12:56:23,389 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:12085:2021-08-11 12:57:21,410 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:12823:2021-08-11 13:01:23,674 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:13087:2021-08-11 13:04:44,524 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:14610:2021-08-11 13:16:15,432 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:14635:2021-08-11 13:16:37,724 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:14751:2021-08-11 13:17:38,169 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:15136:2021-08-11 13:17:59,547 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:15203:2021-08-11 13:18:30,695 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:15516:2021-08-11 13:21:43,631 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:15743:2021-08-11 13:22:49,630 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:15950:2021-08-11 13:23:50,574 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:16035:2021-08-11 13:24:30,913 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:16396:2021-08-11 13:28:01,873 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:16470:2021-08-11 13:29:12,550 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:16541:2021-08-11 13:29:38,490 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:16568:2021-08-11 13:29:53,554 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:16678:2021-08-11 13:31:13,437 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> ./nifi-app.log:16705:2021-08-11 13:31:28,195 INFO
>> > >> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled)]
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >>
>> > >>
>> > >>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >
>> > >>
>> > >> So my first question is: Is this behaviour of zookeeper normal?
>> > Well it
>> > >> is just an INFO, but I would love to here that this is a completely
>> > normal
>> > >> phenomenon. :D
>> > >>
>> > >> 2. After all these zookeeper restarts, nifi finally throws the
>> following
>> >
>> > >> error along with the "last" zookeeper restart:
>> > >>
>> > >> ------------------------------------(nifiHost1.contoso.com)
>> > >>
>> nifi-app.log------------------------------------------------------------------------------------
>> >
>> > >>
>> > >> 2021-08-11 13:31:13,083 INFO [main-EventThread]
>> > >> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
>> > >> 2021-08-11 13:31:13,437 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> 2021-08-11 13:31:13,437 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer shutting down
>> > >> 2021-08-11 13:31:13,439 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.z.server.FinalRequestProcessor shutdown of request processor
>> > complete
>> > >> 2021-08-11 13:31:13,440 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.z.server.SyncRequestProcessor Shutting down
>> > >> 2021-08-11 13:31:13,440 INFO [SyncThread:1]
>> > >> o.a.z.server.SyncRequestProcessor SyncRequestProcessor exited!
>> > >> 2021-08-11 13:31:13,450 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer minSessionTimeout set to 4000
>> >
>> > >> 2021-08-11 13:31:13,450 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer maxSessionTimeout set to 40000
>> >
>> > >> 2021-08-11 13:31:13,450 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer Created server with tickTime
>> > 2000
>> > >> minSessionTimeout 4000 maxSessionTimeout 40000 datadir
>> > >> /opt/nifi/logs/zookeeper/version-2 snapdir
>> > >> /opt/nifi/state/zookeeper/version-2
>> > >> 2021-08-11 13:31:13,462 ERROR [Load-Balance Server Thread-5]
>> > >> o.a.n.c.q.c.s.ConnectionLoadBalanceServer Failed to communicate
>> > with Peer
>> > >> nifiHost2.contoso.com/192.168.1.11:19500
>> > >> java.io.EOFException: Expected to receive a Transaction Completion
>> >
>> > >> Indicator from Peer nifiHost2.contoso.com but encountered EOF
>> > >>         at
>> > >>
>> org.apache.nifi.controller.queue.clustered.server.StandardLoadBalanceProtocol.completeTransaction(StandardLoadBalanceProtocol.java:299)
>> >
>> > >>         at
>> > >>
>> org.apache.nifi.controller.queue.clustered.server.StandardLoadBalanceProtocol.receiveFlowFiles(StandardLoadBalanceProtocol.java:275)
>> >
>> > >>         at
>> > >>
>> org.apache.nifi.controller.queue.clustered.server.StandardLoadBalanceProtocol.receiveFlowFiles(StandardLoadBalanceProtocol.java:133)
>> >
>> > >>         at
>> > >>
>> org.apache.nifi.controller.queue.clustered.server.ConnectionLoadBalanceServer$CommunicateAction.run(ConnectionLoadBalanceServer.java:162)
>> >
>> > >>         at java.lang.Thread.run(Thread.java:748)
>> > >> 2021-08-11 13:31:13,464 ERROR [Load-Balance Server Thread-6]
>> > >> o.a.n.c.q.c.s.ConnectionLoadBalanceServer Failed to communicate
>> > with Peer
>> > >> nifiHost2.contoso.com/192.168.1.11:19502
>> > >> java.io.EOFException: Expected to receive a Transaction Completion
>> >
>> > >> Indicator from Peer nifiHost2.contoso.com but encountered EOF
>> > >>         at
>> > >>
>> org.apache.nifi.controller.queue.clustered.server.StandardLoadBalanceProtocol.completeTransaction(StandardLoadBalanceProtocol.java:299)
>> >
>> > >>         at
>> > >>
>> org.apache.nifi.controller.queue.clustered.server.StandardLoadBalanceProtocol.receiveFlowFiles(StandardLoadBalanceProtocol.java:275)
>> >
>> > >>         at
>> > >>
>> org.apache.nifi.controller.queue.clustered.server.StandardLoadBalanceProtocol.receiveFlowFiles(StandardLoadBalanceProtocol.java:133)
>> >
>> > >>         at
>> > >>
>> org.apache.nifi.controller.queue.clustered.server.ConnectionLoadBalanceServer$CommunicateAction.run(ConnectionLoadBalanceServer.java:162)
>> >
>> > >>         at java.lang.Thread.run(Thread.java:748)
>> > >> 2021-08-11 13:31:13,589 INFO [Process Cluster Protocol Request-1]
>> >
>> > >> o.a.n.c.c.node.NodeClusterCoordinator Status of
>> > >> nifiHost1.contoso.com:8443 changed from NodeConnectionStatus[nodeId=
>> >
>> > >> nifiHost1.contoso.com:8443, state=CONNECTED, updateId=16] to
>> > >> NodeConnectionStatus[nodeId=nifiHost1.contoso.com:8443,
>> > >> state=CONNECTING, updateId=16]
>> > >> 2021-08-11 13:31:13,793 INFO [Process Cluster Protocol Request-7]
>> >
>> > >> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
>> >
>> > >> 88980d85-b956-4713-9686-50d76a407871 (type=RECONNECTION_REQUEST,
>> >
>> > >> length=2209040 bytes) from nifiHost1.contoso.com:8443 in 142 millis
>> >
>> > >> 2021-08-11 13:31:13,794 INFO [Reconnect to Cluster]
>> > >> o.a.nifi.controller.StandardFlowService Processing reconnection
>> > request
>> > >> from cluster coordinator.
>> > >> 2021-08-11 13:31:13,794 INFO [Reconnect to Cluster]
>> > >> o.a.n.c.l.e.CuratorLeaderElectionManager Election Listener for Role
>> > Primary
>> > >> Node disabled
>> > >> 2021-08-11 13:31:13,802 INFO [main-EventThread]
>> > >> o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
>> > >> 2021-08-11 13:31:28,195 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer Shutting down
>> > >> 2021-08-11 13:31:28,196 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer shutting down
>> > >> 2021-08-11 13:31:28,198 INFO [pool-22-thread-1]
>> > >> o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of
>> > FlowFile
>> > >> Repository
>> > >> 2021-08-11 13:31:28,202 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.z.server.FinalRequestProcessor shutdown of request processor
>> > complete
>> > >> 2021-08-11 13:31:28,203 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.z.server.SyncRequestProcessor Shutting down
>> > >> 2021-08-11 13:31:28,203 INFO [SyncThread:1]
>> > >> o.a.z.server.SyncRequestProcessor SyncRequestProcessor exited!
>> > >> 2021-08-11 13:31:28,204 INFO [Timer-Driven Process Thread-21]
>> > >> o.a.n.p.store.WriteAheadStorePartition Successfully rolled over
>> > Event
>> > >> Writer for Provenance Event Store Partition[directory=/opt/pr] due
>> > to
>> > >> MAX_TIME_REACHED. Event File was 6.7 MB and contained 1626 events.
>> >
>> > >> 2021-08-11 13:31:28,208 INFO [pool-22-thread-1]
>> > >> o.a.n.wali.SequentialAccessWriteAheadLog Checkpointed Write-Ahead
>> > Log with
>> > >> 10 Records and 0 Swap Files in 8 milliseconds (Stop-the-world time
>> > = 1
>> > >> milliseconds), max Transaction ID 27772177643
>> > >> 2021-08-11 13:31:28,208 INFO [pool-22-thread-1]
>> > >> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed
>> > FlowFile
>> > >> Repository with 10 records in 8 milliseconds
>> > >> 2021-08-11 13:31:28,295 INFO [main-EventThread]
>> > >> o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
>> > >> 2021-08-11 13:31:29,808 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer minSessionTimeout set to 4000
>> >
>> > >> 2021-08-11 13:31:29,809 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer maxSessionTimeout set to 40000
>> >
>> > >> 2021-08-11 13:31:29,809 INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2181
>> )(secure=disabled)]
>> >
>> > >> o.a.zookeeper.server.ZooKeeperServer Created server with tickTime
>> > 2000
>> > >> minSessionTimeout 4000 maxSessionTimeout 40000 datadir
>> > >> /opt/nifi/logs/zookeeper/version-2 snapdir
>> > >> /opt/nifi/state/zookeeper/version-2
>> > >> 2021-08-11 13:31:32,291 INFO [main-EventThread]
>> > >> o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
>> > >>
>> > >>
>> > >>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >
>> > >>
>> > >> After that, the log continues with normal messages about heartbeats
>> > and
>> > >> provenance events but no error occurs anymore. That exactly was
>> > the time,
>> > >> the whole flow cut-off.
>> > >>
>> > >> I've got to add, that we've never experienced that flow cut-off
>> > behaviour
>> > >> with version 1.12.1, but the zookeeper restarts occur there as well
>> > and as
>> > >> often as seen above.
>> > >> But when first updating to version 1.13.2 back in June, we
>> experienced
>> >
>> > >> this flow cut-off multiple times beside the not working load
>> balancing.
>> >
>> > >> Back then I thought this was connected to each other, but now we
>> > had a
>> > >> fully functional load balancing and the cut-off happened
>> nevertheless.
>> >
>> > >>
>> > >> I'm thankful for any hint. :)
>> > >>
>> > >>
>> > >>
>> > >>
>> >
>>
>>
>>

Reply via email to