Chakri, Would love to hear what you've learned and how that differed from the docs themselves. Site-to-site has proven difficult to setup so we're clearly not there yet in having the right operator/admin experience.
Thanks Joe On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla <chakrader.dewaraga...@lifelock.com> wrote: > I was able to get site-to-site work. > I tried to follow your instructions to send data distribute across the > nodes. > > GenerateFlowFile (On Primary) —> RPG > RPG —> Input Port —> Putfile (Time driven scheduling) > > However, data is only written to one slave (Secondary slave). Primary slave > has not data. > > Image screenshot : > http://tinyurl.com/jjvjtmq > > From: Chakrader Dewaragatla <chakrader.dewaraga...@lifelock.com> > Date: Sunday, January 10, 2016 at 11:26 AM > > To: "users@nifi.apache.org" <users@nifi.apache.org> > Subject: Re: Nifi cluster features - Questions > > Bryan – Thanks – I am trying to setup site-to-site. > I have two slaves and one NCM. > > My properties as follows : > > On both Slaves: > > nifi.remote.input.socket.port=10880 > nifi.remote.input.secure=false > > On NCM: > nifi.remote.input.socket.port=10880 > nifi.remote.input.secure=false > > When I try drop remote process group (with http://<NCM IP>:8080/nifi), I see > error as follows for two nodes. > > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site > communication > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site > communication > > Do you have insight why its trying to connecting 8080 on slaves ? When do > 10880 port come into the picture ? I remember try setting site to site few > months back and succeeded. > > Thanks, > -Chakri > > > > From: Bryan Bende <bbe...@gmail.com> > Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> > Date: Saturday, January 9, 2016 at 11:22 AM > To: "users@nifi.apache.org" <users@nifi.apache.org> > Subject: Re: Nifi cluster features - Questions > > The sending node (where the remote process group is) will distribute the > data evenly across the two nodes, so an individual file will only be sent to > one of the nodes. You could think of it as if a separate NiFi instance was > sending directly to a two node cluster, it would be evenly distributing the > data across the two nodes. In this case it just so happens to all be with in > the same cluster. > > The most common use case for this scenario is the List and Fetch processors > like HDFS. You can perform the listing on primary node, and then distribute > the results so the fetching takes place on all nodes. > > On Saturday, January 9, 2016, Chakrader Dewaragatla > <chakrader.dewaraga...@lifelock.com> wrote: >> >> Bryan – Thanks, how do the nodes distribute the load for a input port. As >> port is open and listening on two nodes, does it copy same files on both >> the nodes? >> I need to try this setup to see the results, appreciate your help. >> >> Thanks, >> -Chakri >> >> From: Bryan Bende <bbe...@gmail.com> >> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> >> Date: Friday, January 8, 2016 at 3:44 PM >> To: "users@nifi.apache.org" <users@nifi.apache.org> >> Subject: Re: Nifi cluster features - Questions >> >> Hi Chakri, >> >> I believe the DistributeLoad processor is more for load balancing when >> sending to downstream systems. For example, if you had two HTTP endpoints, >> you could have the first relationship from DistributeLoad going to a >> PostHTTP that posts to endpoint #1, and the second relationship going to a >> second PostHTTP that goes to endpoint #2. >> >> If you want to distribute the data with in the cluster, then you need to >> use site-to-site. The way you do this is the following... >> >> - Add an Input Port connected to your PutFile. >> - Add GenerateFlowFile scheduled on primary node only, connected to a >> Remote Process Group. The Remote Process Group should be connected to the >> Input Port from the previous step. >> >> So both nodes have an input port listening for data, but only the primary >> node produces a FlowFile and sends it to the RPG which then re-distributes >> it back to one of the Input Ports. >> >> In order for this to work you need to set nifi.remote.input.socket.port in >> nifi.properties to some available port, and you probably want >> nifi.remote.input.secure=false for testing. >> >> -Bryan >> >> >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla >> <chakrader.dewaraga...@lifelock.com> wrote: >>> >>> Mark – I have setup a two node cluster and tried the following . >>> GenrateFlowfile processor (Run only on primary node) —> DistributionLoad >>> processor (RoundRobin) —> PutFile >>> >>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to >>> >> run on primary node only). >>> From your above comment, It should put file on two nodes. It put files on >>> primary node only. Any thoughts ? >>> >>> Thanks, >>> -Chakri >>> >>> From: Mark Payne <marka...@hotmail.com> >>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> >>> Date: Wednesday, October 7, 2015 at 11:28 AM >>> >>> To: "users@nifi.apache.org" <users@nifi.apache.org> >>> Subject: Re: Nifi cluster features - Questions >>> >>> Chakri, >>> >>> Correct - when NiFi instances are clustered, they do not transfer data >>> between the nodes. This is very different >>> than you might expect from something like Storm or Spark, as the key >>> goals and design are quite different. >>> We have discussed providing the ability to allow the user to indicate >>> that they want to have the framework >>> do load balancing for specific connections in the background, but it's >>> still in more of a discussion phase. >>> >>> Site-to-Site is simply the capability that we have developed to transfer >>> data between one instance of >>> NiFi and another instance of NiFi. So currently, if we want to do load >>> balancing across the cluster, we would >>> create a site-to-site connection (by dragging a Remote Process Group onto >>> the graph) and give that >>> site-to-site connection the URL of our cluster. That way, you can push >>> data to your own cluster, effectively >>> providing a load balancing capability. >>> >>> If you were to just run ListenHTTP without setting it to Primary Node, >>> then every node in the cluster will be listening >>> for incoming HTTP connections. So you could then use a simple load >>> balancer in front of NiFi to distribute the load >>> across your cluster. >>> >>> Does this help? If you have any more questions we're happy to help! >>> >>> Thanks >>> -Mark >>> >>> >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla >>> <chakrader.dewaraga...@lifelock.com> wrote: >>> >>> Mark - Thanks for the notes. >>> >>> >> The other option would be to have a ListenHTTP processor run on >>> >> Primary Node only and then use Site-to-Site to distribute the data to >>> >> other >>> >> nodes. >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary node, >>> collected data on primary node is not transfered to other nodes by default >>> for processing despite all nodes are part of one cluster? >>> If ListenHTTP processor is running as a dafult (with out explicit >>> setting to run on primary node), how does the data transferred to rest of >>> the nodes? Does site-to-site come in play when I make one processor to run >>> on primary node ? >>> >>> Thanks, >>> -Chakri >>> >>> From: Mark Payne <marka...@hotmail.com> >>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> >>> Date: Wednesday, October 7, 2015 at 7:00 AM >>> To: "users@nifi.apache.org" <users@nifi.apache.org> >>> Subject: Re: Nifi cluster features - Questions >>> >>> Hello Chakro, >>> >>> When you create a cluster of NiFi instances, each node in the cluster is >>> acting independently and in exactly >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the >>> same flow. However, they will be >>> pulling in different data and therefore operating on different data. >>> >>> So if you pull in 10 1-gig files from S3, each of those files will be >>> processed on the node that pulled the data >>> in. NiFi does not currently shuffle data around between nodes in the >>> cluster (you can use site-to-site to do >>> this if you want to, but it won't happen automatically). If you set the >>> number of Concurrent Tasks to 5, then >>> you will have up to 5 threads running for that processor on each node. >>> >>> The only exception to this is the Primary Node. You can schedule a >>> Processor to run only on the Primary Node >>> by right-clicking on the Processor, and going to the Configure menu. In >>> the Scheduling tab, you can change >>> the Scheduling Strategy to Primary Node Only. In this case, that >>> Processor will only be triggered to run on >>> whichever node is elected the Primary Node (this can be changed in the >>> Cluster management screen by clicking >>> the appropriate icon in the top-right corner of the UI). >>> >>> The GetFile/PutFile will run on all nodes (unless you schedule it to run >>> on primary node only). >>> >>> If you are attempting to have a single input running HTTP and then push >>> that out across the entire cluster to >>> process the data, you would have a few options. First, you could just use >>> an HTTP Load Balancer in front of NiFi. >>> The other option would be to have a ListenHTTP processor run on Primary >>> Node only and then use Site-to-Site >>> to distribute the data to other nodes. >>> >>> For more info on site-to-site, you can see the Site-to-Site section of >>> the User Guide at >>> http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site >>> >>> If you have any more questions, let us know! >>> >>> Thanks >>> -Mark >>> >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla >>> <chakrader.dewaraga...@lifelock.com> wrote: >>> >>> Nifi Team – I would like to understand the advantages of Nifi clustering >>> setup. >>> >>> Questions : >>> >>> - How does workflow work on multiple nodes ? Does it share the resources >>> intra nodes ? >>> Lets say I need to pull data 10 1Gig files from S3, how does work load >>> distribute ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ? >>> >>> - How to “isolate” the processor to the master node (or one node)? >>> >>> - Getfile/Putfile processors on cluster setup, does it get/put on primary >>> node ? How do I force processor to look in one of the slave node? >>> >>> - How can we have a workflow where the input side we want to receive >>> requests (http) and then the rest of the pipeline need to run in parallel on >>> all the nodes ? >>> >>> Thanks, >>> -Chakro >>> >>> ________________________________ >>> The information contained in this transmission may contain privileged and >>> confidential information. It is intended only for the use of the person(s) >>> named above. If you are not the intended recipient, you are hereby notified >>> that any review, dissemination, distribution or duplication of this >>> communication is strictly prohibited. If you are not the intended recipient, >>> please contact the sender by reply email and destroy all copies of the >>> original message. >>> ________________________________ >>> >>> >>> ________________________________ >>> The information contained in this transmission may contain privileged and >>> confidential information. It is intended only for the use of the person(s) >>> named above. If you are not the intended recipient, you are hereby notified >>> that any review, dissemination, distribution or duplication of this >>> communication is strictly prohibited. If you are not the intended recipient, >>> please contact the sender by reply email and destroy all copies of the >>> original message. >>> ________________________________ >>> >>> >>> ________________________________ >>> The information contained in this transmission may contain privileged and >>> confidential information. It is intended only for the use of the person(s) >>> named above. If you are not the intended recipient, you are hereby notified >>> that any review, dissemination, distribution or duplication of this >>> communication is strictly prohibited. If you are not the intended recipient, >>> please contact the sender by reply email and destroy all copies of the >>> original message. >>> ________________________________ >> >> >> ________________________________ >> The information contained in this transmission may contain privileged and >> confidential information. It is intended only for the use of the person(s) >> named above. If you are not the intended recipient, you are hereby notified >> that any review, dissemination, distribution or duplication of this >> communication is strictly prohibited. If you are not the intended recipient, >> please contact the sender by reply email and destroy all copies of the >> original message. >> ________________________________ > > > > -- > Sent from Gmail Mobile > ________________________________ > The information contained in this transmission may contain privileged and > confidential information. It is intended only for the use of the person(s) > named above. If you are not the intended recipient, you are hereby notified > that any review, dissemination, distribution or duplication of this > communication is strictly prohibited. If you are not the intended recipient, > please contact the sender by reply email and destroy all copies of the > original message. > ________________________________