In that case, maybe you can use PutDistributedMapCache and
FetchDistributedMapCache?

Although, you might need to modify PutDistributedMapCache so that it
allowed caching attributes, currently it only uses flow file content as the
thing to cache.


On Thu, Jan 26, 2017 at 3:48 PM, Benjamin Janssen <bjanss...@gmail.com>
wrote:

> What if the bulk of the data is what is coming back from the remote
> processing?  So I really want a pattern like this:
>
> Store a couple of attributes.
> Send small request off for additional processing.
> Receive response from remote processing.
> Append to that response the originally saved off attributes.
>
> Could I accomplish the same thing by reversing the ordering of the
> Wait/Notify processors?  So do something like this:
>
> Get a file for processing.
> Send it to Notify.
> Send it to remote processing.
> Remote processing data comes back.
> Send it to Wait (where it doesn't actually wait at all).
> Data leaves Wait with appended attributes from the Notify.
>
>
> On Thu, Jan 26, 2017 at 3:35 PM Bryan Bende <bbe...@gmail.com> wrote:
>
>> Ben,
>>
>> To elaborate on NIFI-190... this ticket introduced two new processors
>> (Wait and Notify) that use the DistributedMapCacheServer to communicate.
>> They aren't released yet, but are in the master branch.
>>
>> One example of using these processors is something like the following:
>>
>> - Lets say we have a flow file where the content is a CSV file, and each
>> line is a URL to do a look-up somewhere
>> - The flow file can be sent to a SplitText processor to get each line
>> into its own flow file
>> - The "original" relationship from SplitText can go to a Wait processor
>> which will keep checking the cache for N signals (in this case N = the
>> number of splits)
>> - The "splits" relationship would go down a separate path where each
>> split would be processed and eventually hit a Notify processor, which would
>> increment the number of signals in the cache and optionally add attributes
>> - When Wait sees N signals (or when an expiration is reached) it releases
>> the original flow file and can optionally copy over attributes that the
>> signals put in the cache
>>
>> So you get to continue processing the original flow file that is the CSV,
>> but still being able to process the splits individually and get the
>> attributes from them that might be the "results" in your case.
>>
>> Hope that helps.
>>
>> -Bryan
>>
>>
>> On Thu, Jan 26, 2017 at 3:16 PM, Joe Witt <joe.w...@gmail.com> wrote:
>>
>> Ben,
>>
>> One way to approach this is using the sort of capabilities this opens
>> up: https://issues.apache.org/jira/browse/NIFI-190
>>
>> Certainly is a good case/idea to work through.  Doable and
>> increasingly seems to be an ask.
>>
>> Thanks
>> Joe
>>
>> On Thu, Jan 26, 2017 at 3:10 PM, Benjamin Janssen <bjanss...@gmail.com>
>> wrote:
>> > Hello all,
>> >
>> > I've got a use case where I get some data, I want to fork a portion of
>> that
>> > data off to an external service for asynchronous processing, and when
>> that
>> > external service has finished processing the data, I want to take its
>> > output, marry it up with the original data, and pass the whole thing on
>> for
>> > further processing.
>> >
>> > So essentially two data flows:
>> >
>> > Receive Data -> Store Some State -> Send Data To External Service
>> >
>> > Do More Processing On Original Data + Results <-  Retrieve Previously
>> Stored
>> > State  <-  Receive Results From External Service
>> >
>> > Is there a way to do this while taking advantage of NiFi's State
>> Management
>> > capabilities?  I wasn't finding any obvious processors for persisting
>> and
>> > retrieving shared state from State Management.  The closest my googling
>> was
>> > able to get me was this:  https://issues.apache.org/
>> jira/browse/NIFI-1582
>> > but if I'm understanding the State Management documentation properly,
>> that
>> > won't actually help me because I'd need the same processor to do all
>> storing
>> > and retrieving of state?
>> >
>> > Does something exist to use State Management like this?  Or is what I'm
>> > proposing to do a bad idea?
>> >
>> > Or maybe I should just be using the DistributedMapCacheServer for this?
>> >
>> > Any help/advice would be appreciated.
>>
>>
>>

Reply via email to