Also, which version of zookeeper and what image (I've found different
versions and images provided better stability)?


Cheers,

Chris Sampson

On Tue, 29 Sep 2020, 17:34 Sushil Kumar, <skm....@gmail.com> wrote:

> Hello Wyll
>
> It may be helpful if you can send nifi.properties.
>
> Thanks
> Sushil Kumar
>
> On Tue, Sep 29, 2020 at 7:58 AM Wyll Ingersoll <
> wyllys.ingers...@keepertech.com> wrote:
>
>>
>> I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a
>> StatefulSet) using external zookeeper (3 nodes also) to manage state.
>>
>> Whenever even 1 node (pod/container) goes down or is restarted, it can
>> throw the whole cluster into a bad state that forces me to restart ALL of
>> the pods in order to recover.  This seems wrong.  The problem seems to be
>> that when the primary node goes away, the remaining 2 nodes don't ever try
>> to take over.  Instead, I have restart all of them individually until one
>> of them becomes the primary, then the other 2 eventually join and sync up.
>>
>> When one of the nodes is refusing to sync up, I often see these errors in
>> the log and the only way to get it back into the cluster is to restart it.
>> The node showing the errors below never seems to be able to rejoin or
>> resync with the other 2 nodes.
>>
>>
>> 2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster]
>> o.a.nifi.controller.StandardFlowService Handling reconnection request
>> failed due to: org.apache.nifi.cluster.ConnectionException: Failed to
>> connect node to cluster due to: java.lang.NullPointerException
>>
>> org.apache.nifi.cluster.ConnectionException: Failed to connect node to
>> cluster due to: java.lang.NullPointerException
>>
>> at
>> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035)
>>
>> at
>> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668)
>>
>> at
>> org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109)
>>
>> at
>> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415)
>>
>> at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.NullPointerException: null
>>
>> at
>> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989)
>>
>> ... 4 common frames omitted
>>
>> 2020-09-29 10:18:53,326 INFO [Reconnect to Cluster]
>> o.a.c.f.imps.CuratorFrameworkImpl Starting
>>
>> 2020-09-29 10:18:53,327 INFO [Reconnect to Cluster]
>> org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes
>>
>> 2020-09-29 10:18:53,328 INFO [Reconnect to Cluster]
>> o.a.c.f.imps.CuratorFrameworkImpl Default schema
>>
>> 2020-09-29 10:18:53,807 INFO [Reconnect to Cluster-EventThread]
>> o.a.c.f.state.ConnectionStateManager State change: CONNECTED
>>
>> 2020-09-29 10:18:53,809 INFO [Reconnect to Cluster-EventThread]
>> o.a.c.framework.imps.EnsembleTracker New config event received:
>> {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant;
>> 0.0.0.0:2181, version=0,
>> server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant;
>> 0.0.0.0:2181,
>> server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant;
>> 0.0.0.0:2181}
>>
>> 2020-09-29 10:18:53,810 INFO [Curator-Framework-0]
>> o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
>>
>> 2020-09-29 10:18:53,813 INFO [Reconnect to Cluster-EventThread]
>> o.a.c.framework.imps.EnsembleTracker New config event received:
>> {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant;
>> 0.0.0.0:2181, version=0,
>> server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant;
>> 0.0.0.0:2181,
>> server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant;
>> 0.0.0.0:2181}
>>
>> 2020-09-29 10:18:54,323 INFO [Reconnect to Cluster]
>> o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election
>> Role 'Primary Node' becuase that role is not registered
>>
>> 2020-09-29 10:18:54,324 INFO [Reconnect to Cluster]
>> o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election
>> Role 'Cluster Coordinator' becuase that role is not registered
>>
>>

Reply via email to