thx, the flow is executing as expected now.

On Mon, Jul 2, 2018 at 10:09 AM, Matt Burgess <mattyb...@apache.org> wrote:

> Joe,
>
> Only the first (source) processor needs to be set to Primary Node
> Only. Once that happens, the flow files will only proceed down the
> flow on the primary node, so step 5 will also only run on the primary
> node. In order to redistribute the flow files among the cluster,
> you'll want a Remote Process Group to point back to an Input Port on
> your cluster, between steps 4 & 5. From that point on, the flow files
> will be distributed among the nodes and the downstream flow (steps
> 5-7) will run on all the nodes.
>
> Regards,
> Matt
>
> On Mon, Jul 2, 2018 at 10:05 AM Joe Trite <joetr...@gmail.com> wrote:
> >
> > I have a question/need confirmation about cluster execution.  I have a 3
> node - 1.6 NiFi cluster.  My use case is extracting data from Hive and
> deposting it into an RDBMS.  Here is my flow.
> >
> > 1. SelectHiveQL - executes a "show paritions" command.
> > 2. SplitText - splits the returned partition (7) into individual
> flowFiles
> > 3. ExtractText - populates a 'partition_info' attribute
> > 4. UpdateAttribute - reformat the 'partition_info' into sql syntax
> > 5. SelectHiveQL - executes the "SELECT" against hive with the provided
> 'partition_info' as the WHERE clause.
> > 6. SplitAvro - chunks the data info bit-size peices.
> > 7. PutDatabaseRecord - INSERT into the db.
> >
> > Processors 1-4 are set to 'Primary Node' only.  5-7 are set to 'All
> Nodes'.  All processors are set to 1 concurrent task.
> >
> > The question is around what happens in step 5.  I see the 7
> 'partition_info' flowFiles in the queue after step 4 completes and they
> seem to get executed one-at-a-time in step 5, atleast from viewing the
> queue drain.  I would expect that step 5 would execute on each on the nodes
> (3) and that i would see the queue drain in 3's, is this assumption correct
> and maybe I have something misconfigured?
> >
> > I do see in the provenance data that all 3 nodes did process a flowFile,
> I am just expecting it to happen in parallel.
> >
> > I did see this article about distribution but don't think it is required
> for this use case to work:
> > https://community.hortonworks.com/articles/16120/how-do-i-
> distribute-data-across-a-nifi-cluster.html
> >
> > Thanks
> > Joe
> >
> >
>

Reply via email to