Hi Jon, Just as a note for your unrelated question: I opened NIFI-4026 few months ago but didn't have time to work on it so far.
[1] https://issues.apache.org/jira/browse/NIFI-4026 2018-04-17 20:34 GMT+02:00 Jon Logan <jmlo...@buffalo.edu>: > Thanks Joe, just a few follow-up questions: > > re:durability -- is this something that people have just been accepting as > a risk and hoping for the best? Or is this something people build their > applications around -- ie. using durability outside of the Nifi system > boundary and push it into a database, etc? > > re:heterogenous -- you can join nodes of differing hardware specs, but it > seems like you will end up causing your lighter-weight nodes to explode as > there's no way to configure how many tasks and how much to have processing > "in-flight" on the node different than the other nodes? ie. if I know my > large nodes can handle 3 of a cpu-intensive task, that's going to cause > issues for smaller nodes. This is an even bigger problem for differing > memory sizes. > > And an unrelated question to the previous -- is there a way to skew or > influence how a RPG distributes its tasks? Say, you wanted to do a group-by > type distribution? > > > Thanks again! > Jon > > > On Fri, Apr 13, 2018 at 2:17 PM, Joe Witt <joe.w...@gmail.com> wrote: > >> Jon, >> >> Node Failure: >> You have to care about two things generally speaking. First is the >> flow execution and second is data in-flight >> For flow execution nifi clustering will take care of re-assigning the >> primary node and cluster coordinator as needed. >> For data we do not at present offer distributed data durability. The >> current model is predicated on using reliable storage such as RAID, >> EBS, etc.. >> There is a very clear and awesome looking K8S based path though that >> will make this work really nicely with persistent volumes and elastic >> scaling. No clear timeline but discussions/JIRA/contributions i hope >> to start or participate in soon. >> >> How scalable is the NiFi scaling model: >> Usually NiFi clusters are a few nodes to maybe 10-20 or so. Some >> have been larger but generally if you're needing that much flow >> management then often it makes more sense to have clusters dedicated >> along various domains of expertise anyway. So say 3-10 nodes with >> each handling 100,000 events per second around say 100MB per second >> (conservatively) and you can see why a single fairly small cluster can >> handle pretty massive volumes. >> >> RPGs feeding back: >> - This caused issues previously but I believe in recent releases has >> improved significantly. >> >> UI Actions Causing issues: >> There have been reports similar to this especially for some of the >> really massive flows we've seen in terms of number of components and >> concurrent users. These JIRAs when sorted will help a lot [1], [2], >> [3]. >> >> Heterogenous cluster nodes: >> - This should work quite well actually and is a major reason why NiFi >> and the S2S protocol supports/honors backpressure. Nodes that can >> take on more work take on more work and nodes that cannot pushback. >> You also want to ensure you're using good and scalable protocols to >> source data into the cluster. If you find you're using a lot of >> protocols requiring you to make many data sourcing steps run 'primary >> node only' then that will require that primary node to do more work >> than others and I have seen uneven behavior in such cases. Yes, you >> can then route using S2S/RPG which we recommend but still...try to >> design away from 'primary node only' when possible. >> >> >> Thanks >> Joe >> >> >> [1] https://issues.apache.org/jira/browse/NIFI-950 >> [2] https://issues.apache.org/jira/browse/NIFI-5064 >> [3] https://issues.apache.org/jira/browse/NIFI-5066 >> >> On Fri, Apr 13, 2018 at 5:49 PM, Jon Logan <jmlo...@buffalo.edu> wrote: >> > All, I had a few general questions regarding Clustering, and was >> looking for >> > any sort of advice or best-practices information -- >> > >> > - documentation discusses failure handling primarily from a NiFi crash >> > scenario, but I don't recall seeing any information on entire >> node-failure >> > scenarios. Is there a way that this is supposed to be handled? >> > - at what point should we expect pain in scaling? I am particularly >> > concerned about the all-to-all relationship that seems to exist if you >> > connect a cluster RPG to itself, as all nodes need to distribute all >> data to >> > all other nodes. We have been also been having some issues when things >> are >> > not as responsive as NiFi would like -- namely, the UI seems to get very >> > upset and crash >> > - do UI actions (incl read-only) require delegation to all nodes >> underneath? >> > I suspect this is the case as otherwise you wouldn't be able to >> determine >> > queue sizes? >> > - is there a way to have a cluster with heterogeneous node sizes? >> > >> > >> > Thanks in advance! >> > >