Hi all, The 7 node cluster had 1 node underperforming today. The single node's object back pressure exceeded the limit and caused the other 6 nodes to stop processing data, waiting for the single node to complete the work. Question 1: Is this expected behavior?
To remedy the situation, we added a relationship round-robin load-balance on that relationship. We observed FlowFiles moving to one other nodes, but this unnecessarily saturates the network. Question 2: When a FlowFile is Load Balanced from one node to another, is the entire Content Claim load balanced? Or just the small portion necessary? Question 3: When the Load Balancing begins, how many threads can it take up? And how many "in-parallel" files can be moved? Question 4: When the Load Balancing attempts a Load Balance using Round Robin, will it "skip" a node if a node's Relationship has already exceeded the object backpressure limit? Similar to how DistributeLoad uses "Next Available"? Thanks, Ryan On Mon, Aug 9, 2021 at 4:59 PM Ryan Hendrickson < [email protected]> wrote: > Hi all, > To confirm, when using a NiFi Cluster, are the Relationship Settings the > "Back Pressure Object Threshold" and "Size Threshold" per node node, or > cluster-wide? > > For example, if we have a 10 node cluster and set the Back Pressure Object > Threshold to 100. Would we then expect the Relationship to queue up-to > 1000 flowfiles prior to exceeding the threshold? > > We have the following setup: > Update Attribute -----Relationship----> JoltTransform > > In our case, we set a 70,000 object threshold and have 7 servers in the > cluster. > > When hovering on the Relationship's status bar, it says: "Queue: 100% > full (based on 70,000 object threshold)" > > There's two things that don't make sense about that message: > 1. The Relationship only has ~350,000 FlowFiles in it, for it to be 100% > full, it would need 490,000. > 2. There are 7 nodes in the cluster, so should the "based on xx object > threshold" say "based on 490,000 object threshold"? > > We also have a 2GB "Size Threshold" set on the Relationship. The > Relationship hover text reads: "Queue 36% full (based on 2GB data size > threshold)". > > What doesn't make sense about this, is that the math doesn't make sense if > you check it yourself. We have 7 nodes x 2GB each limit equals 14GB > cluster-wide. > > Taking the reported value of 3.45 GB in the queue, divide it by 14, you > get ~25%. That's a far 11% off from the 36% noted in the hover text. > > We are running 1.13.2 on these servers. All servers appear to be > communicating and processing data, per the UI's NiFi Cluster overview > (thread count, queue size, status, etc) > > Any thoughts on this would be appreciated. > > Thanks, > Ryan >
