Hi Lee,

The List+Fetch model in a cluster is one of the trickier configurations to
set up.

This article has a good description with a diagram under the "pulling
section" that shows ListHDFS+FetchHDFS, but should be the same for
ListFile+FetchFile:

https://community.hortonworks.com/articles/16120/how-do-i-distribute-data-across-a-nifi-cluster.html

The short answer is you would connect ListFile to a Remote Process Group
that points back to the same cluster, and then an Input Port goes to Fetch
File, and it is the Remote Process Group that distributes the data across
the cluster.

Hopefully this helps.

-Bryan


On Thu, Mar 24, 2016 at 4:55 PM, Lee Laim <[email protected]> wrote:

> I'm using the ListFile/FetchFile combination in cluster mode.
>
> When *ListFile is set to run on primary node* and *Fetch File is set
> to default*, The generated flow files only run on  the primary node,
> other nodes sit out.
>
> When *ListFile  and FetchFile is set to run on default* (timer driven),
> They generate flow files which are then consumed by all downstream nodes.
>
> Is this expected behavior? Or is something off with my deployment?
>
> What I am seeing appears to be contrary to the usage description; ListFile
> (primary) generates one list of flow files to organize and distribute work
> to the rest of the cluster.
>
> I'm running 0.5.1 on 3 nodes.
>
> Thanks,
> Lee
>

Reply via email to