Igor,

There is no automatic failover of the the node that is considered
primary.  For the upcoming 1.x release though this has been addressed
https://issues.apache.org/jira/browse/NIFI-483

Thanks
Joe

On Sun, May 1, 2016 at 2:36 PM, Igor Kravzov <igork.ine...@gmail.com> wrote:
> Thanks Aldrin for the repose.
> What didn't fully understand from documentation: is automatic fail-over
> implemented? I would rather configure entire workflow to run "On primary
> node".
>
>
> On Sun, May 1, 2016 at 1:31 PM, Aldrin Piri <aldrinp...@gmail.com> wrote:
>>
>> Igor,
>>
>> Your thoughts are correct, and without any additional configuration, the
>> GetTwitter processor would run on both nodes.  The way to avoid this is to
>> select the "On primary node" scheduling strategy which would only have the
>> processor run on whichever node is currently primary.
>>
>> PutHDFS has similar semantics but these would likely be desired.  Consider
>> where data is partitioned across each of the nodes.  PutHDFS would then need
>> to run on each node to ensure the data is delivered to HDFS.  The property
>> you list is that of where the data should land on the configured HDFS
>> instance.  Often times this is done via Expression Language (EL) to get the
>> familiar time slicing of resources when persisted such as
>> ${now():format('yyyy/MM/dd/HH')}.  You could additionally have directory
>> structure that mirrors the data making use of attributes the files may have
>> gained as they made their way through your flow or an UpdateAttribute to set
>> a property, such as "hadoop.dest.dir", that is used by the final PutHDFS
>> property to give a dynamic location on a per FlowFile basis.
>>
>> Let us know if you have additional questions or if things are unclear.
>>
>> --aldrin
>>
>>
>> On Sun, May 1, 2016 at 1:20 PM, Igor Kravzov <igork.ine...@gmail.com>
>> wrote:
>>>
>>> If I understand correctly in cluster mode the same dataflow runs on all
>>> the notes.
>>> So let's say I have a simple dataflow with GetTwitter and PutHDFS
>>> processors. And one NCM + 2 nodes.
>>> Does it actually that mean the GetTwitter will be called independently
>>> and potentially simultaneously on each node and there may be duplicate
>>> results?
>>> How about PutHDFS processor?  To where "hadoop configuration resources"
>>> "parent HDFS directory" should point to in each node?
>>
>>
>

Reply via email to