Jon,

Node Failure:
 You have to care about two things generally speaking.  First is the
flow execution and second is data in-flight
 For flow execution nifi clustering will take care of re-assigning the
primary node and cluster coordinator as needed.
 For data we do not at present offer distributed data durability.  The
current model is predicated on using reliable storage such as RAID,
EBS, etc..
  There is a very clear and awesome looking K8S based path though that
will make this work really nicely with persistent volumes and elastic
scaling.  No clear timeline but discussions/JIRA/contributions i hope
to start or participate in soon.

How scalable is the NiFi scaling model:
  Usually NiFi clusters are a few nodes to maybe 10-20 or so.  Some
have been larger but generally if you're needing that much flow
management then often it makes more sense to have clusters dedicated
along various domains of expertise anyway.  So say 3-10 nodes with
each handling 100,000 events per second around say 100MB per second
(conservatively) and you can see why a single fairly small cluster can
handle pretty massive volumes.

RPGs feeding back:
- This caused issues previously but I believe in recent releases has
improved significantly.

UI Actions Causing issues:
There have been reports similar to this especially for some of the
really massive flows we've seen in terms of number of components and
concurrent users.  These JIRAs when sorted will help a lot [1], [2],
[3].

Heterogenous cluster nodes:
- This should work quite well actually and is a major reason why NiFi
and the S2S protocol supports/honors backpressure.  Nodes that can
take on more work take on more work and nodes that cannot pushback.
You also want to ensure you're using good and scalable protocols to
source data into the cluster.  If you find you're using a lot of
protocols requiring you to make many data sourcing steps run 'primary
node only' then that will require that primary node to do more work
than others and I have seen uneven behavior in such cases.  Yes, you
can then route using S2S/RPG which we recommend but still...try to
design away from 'primary node only' when possible.


Thanks
Joe


[1] https://issues.apache.org/jira/browse/NIFI-950
[2] https://issues.apache.org/jira/browse/NIFI-5064
[3] https://issues.apache.org/jira/browse/NIFI-5066

On Fri, Apr 13, 2018 at 5:49 PM, Jon Logan <jmlo...@buffalo.edu> wrote:
> All, I had a few general questions regarding Clustering, and was looking for
> any sort of advice or best-practices information --
>
> - documentation discusses failure handling primarily from a NiFi crash
> scenario, but I don't recall seeing any information on entire node-failure
> scenarios. Is there a way that this is supposed to be handled?
> - at what point should we expect pain in scaling? I am particularly
> concerned about the all-to-all relationship that seems to exist if you
> connect a cluster RPG to itself, as all nodes need to distribute all data to
> all other nodes. We have been also been having some issues when things are
> not as responsive as NiFi would like -- namely, the UI seems to get very
> upset and crash
> - do UI actions (incl read-only) require delegation to all nodes underneath?
> I suspect this is the case as otherwise you wouldn't be able to determine
> queue sizes?
> - is there a way to have a cluster with heterogeneous node sizes?
>
>
> Thanks in advance!

Reply via email to