Thanks for the reply Joe.

I'm glad I wasn't missing something obvious. I'm afraid I'm stuck with
file size limitation but I'll have a word with the guys who configure
the load balancer to see what affinity options they have.

Thanks

Brian

------ Original Message ------
From: "Joe Witt" <[email protected]>
To: [email protected]; "Kiran" <[email protected]>
Sent: 15/02/2017 21:36:41
Subject: Re: MergeContent across a NiFi cluster

Brian,

Great use case and you're right we don't have an easy way of handling
this now.  If you do indeed have a load balancer in front of the
receiving nifi cluster and it can support affinity of some kind then it
is possible you can set a header in HTTP Post I believe which would
come from a flowfile attribute which would be on each split and would
be the hash of its full object.  If the load balancer ensured all
splits (based on that header matching) were on the same machine then
you'd be in business.  There are some load balancers that do this (i'm
thinking of a commercial one).  But, I admit that is a lot of moving
parts to keep in mind.  We need to improve our site-to-site feature to
do things like automatically split content for you and handle the
partitioning/affinity logic I suggested.  You might also consider
avoiding the splitting for now to keep things super simple though I
recognize that exposes alternative tradeoffs.

Great case for us to work on/rally around though.

Thanks
Joe

On Wed, Feb 15, 2017 at 4:29 PM, Kiran <[email protected]>
wrote:
Hello,

I need to send data from one organisation to another but there are
data
size limits between them (this isn't my choice and has been enforced
on
me). I've got a 4 node NiFi cluster in each organisation.

The sending NiFi cluster has the following data flow:
Ingest the data by various means
   -> Compress Data using CompressContent
     -> If file size > X amount I use SplitContent
       -> HTTPS POST to load balancer sitting in front of the NiFi
cluster in the other organisation

On the receiving NiFi cluster I wanted to:
-> Receive the data
   -> MergeContent
     -> Do what ever else with the data...

The problem I can't get round is that if I split the content into 3
fragments and send them to the receiving NiFi instance because it's
behind a load balancer I can't guarantee that the 3 fragments are
received by the same node.

Q1) I'm assuming that for MergeContent to work all the fragments of a
single piece of data have to arrive on the same NiFi node or is there
a
option to have it working across a cluster?

Q2) How long does the MergeContent processor wait for all the
fragments?
If one of the fragments gets lost does it timeout after a certain
period?

I was thinking one way to solve this of to have the HTTPListener on
the
receiving NiFi only listening on the primary node which would ensure
all
the fragments arrive on the same node. The downside would be that I
end
up with idle NiFi nodes.

Is there anything obvious that I'm missed that would solve my issue?

Thanks in advance,

Brian

Virus-free. www.avast.com


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Reply via email to