Vincent

Sorry for the late reply, but here it is

Based on what you have described it appears you have a mix of two problems: 
Work Flow Orchestration and Data Flow.
The main issue is that at the surface it’s not always easy to tell the 
difference, but I’ll try.

Work Flow Orchestration allows one to orchestrate a single process by breaking 
it down in a set of individual components (primarily for simplicity and 
modularization) and then composing such components into one cohesive process.
Data Flow manages individual processes, their lifecycle, execution, input and 
output from a central command/control facility.

So with the above in mind i say you have a mix problem where you have Data Flow 
consisting of simple and complex processors. And its those complex processors 
that need to invoke MR job and then act on its result is what falls into the 
category of Work Flow Orchestration where individual components within such 
process must work with awareness of the overall process they represent.  For 
example:

GetFile (NiFi)
PutHDFS (NiFi)
Process (NiFi Custom Processor) - where you execute the MR Job and react to its 
completion (success or failure) and possibly put something on the output queen 
in NiFi so the next element of Data Flow can kick in.
. . .

So, the 3 elements of Data Flow above are the individual NiFi Processors, yet 
the 3rd one internally represents a complex and orchestrated process. Now, the 
orchestration is just a term and without relying on outside frameworks that 
specifically address the orchestration it would be just a lot of custom code. 
Thankfully NiFi provides support for Spring Application Context container that 
allows you to implement your NiFi processor using work flow orchestration 
frameworks such as Spring Integration and/or Apache Camel.

I’d be more then willing to help you further with that if you’re interested, 
but wanted to see how you feel with the above architecture. I am also working 
on the blog to describe exactly that and how Data Flow and Work Flow can 
complement  one another.

Let me know
Cheers
Oleg

On Mar 29, 2016, at 9:26 AM, Oleg Zhurakousky 
<ozhurakou...@hortonworks.com<mailto:ozhurakou...@hortonworks.com>> wrote:

Vincent

I do have a suggestion for you but need a bit more time to craft my response. 
Give me till tonight EST.

Cheers
Oleg
On Mar 29, 2016, at 8:55 AM, Vincent Russell 
<vincent.russ...@gmail.com<mailto:vincent.russ...@gmail.com>> wrote:

Thanks Oleg and Joe,

I am not currently convinced that nifi is the solution as well, but it is a 
nice way for us to manage actions based on the result of a mapreduce job.

Our use cases is to have follow on processors that perform actions based on the 
results of the map reduce jobs.  One processor kicks off the M/R process and 
then the results are sent down the flow.

The problem with our current scenario is that we have two separate flows that 
utilize the same location as the output for the M/R locations.

One simple way might be to use mongo itself has a locking mechanism.

On Mon, Mar 28, 2016 at 7:07 PM, Oleg Zhurakousky 
<ozhurakou...@hortonworks.com<mailto:ozhurakou...@hortonworks.com>> wrote:
Vincent

This sounds more like an architectural question and even outside of NiFi in 
order to achieve that especially in the distributed environment one would need 
some kind of coordination component. And while we can think of variety of way 
to accomplish that I am not entirely convinced that this is the right direction.
Would you mind sharing a bit more about your use case and perhaps we can 
jointly come up with a better and hopefully simpler solution?

Cheers
Oleg

On Mar 28, 2016, at 6:45 PM, Vincent Russell 
<vincent.russ...@gmail.com<mailto:vincent.russ...@gmail.com>> wrote:


I have two processors (that aren't  part of the same flow) that write to the 
same resource (a mongo collection) via a map reduce job.

I don't want both to run at the same time.

On Mar 28, 2016 6:28 PM, "Joe Witt" 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
Vincent,

Not really and that would largely be by design.  Can you describe the
use case more so we can suggest alternatives or perhaps understand the
motivation better?

Thanks
Joe

On Mon, Mar 28, 2016 at 4:00 PM, Vincent Russell
<vincent.russ...@gmail.com<mailto:vincent.russ...@gmail.com>> wrote:
>
> Is it possible to have one processor block while another specified processor
> is running (within the onTrigger method).
>
> I can do this on a non-clustered nifi with a synchronized block I guess, but
> i wanted to know if there was a more idiomatic way of doing this.
>
> Thanks,
> Vincent




Reply via email to