Hi all, In our application, we faced the same problem. To solve it, we wrote a Django app that sat at the center of the interaction between NiFi and several other systems (including Spark and another internal application) and used it to dispatch tasks as needed. In that architecture, NiFi was not itself the orchestrator, but rather interacted with another application that acted that way. We found that this was a good solution to our problems that properly divided responsibilities between what NiFi was good at doing (moving files from place to place) and what was better done in Python code (many of the tasks described above). If you don't want to go so far as to write your own orchestrator, you might want to checkout crossbar.io, which could serve the function of communicating between different services.
Jerry On Tue, Jan 22, 2019 at 11:05 AM Otto Fowler <ottobackwa...@gmail.com> wrote: > How would nifi look or have to look to support batch cases I wonder > > > On January 22, 2019 at 10:24:10, Boris Tyukin (bo...@boristyukin.com) > wrote: > > We've looked at both...Airflow might be a way better tool for > coordination/scheduling. Why do not you take one of your pipelines and try > to implement it in both tools? > > We really liked Airflow but unfortunately, Airflow was not a good fit for > real-time processes - that's why we decided to go with NiFi. But if you use > it strictly for job coordination and typical ETL-like dependencies, you > will have hard time. Things, which are easy and obvious with Airflow or ETL > tools like Informatica or SSIS, are quite difficult with NiFi. Just check > some examples on Wait/Notify or merge patterns and you will see why. > > IMHO since NiFi was designed from the ground up to support real-time use > cases not batch cases, the design and approach are quite different from > batch oriented tools like Airflow. > > Boris > > On Fri, Jan 11, 2019 at 12:02 PM Jonathan Meran <jonathan.me...@sonos.com> > wrote: > >> Hello, >> >> I am looking into the possibility of using NiFi as a Data Pipeline >> Orchestration Tool. I’m evaluating NiFi along with some other tools such as >> Airflow and AWS Step Functions/Lambdas. >> >> >> >> Has anyone used NiFi as an orchestration/scheduling tool for tasks such >> as submitting spark jobs to an EMR cluster? These are some of the >> requirements we are considering while evaluating such a tool: >> >> >> >> 1. SSH capabilities to execute remote commands >> 2. Rich scheduling (CRON) >> 3. Ability to write custom routines and import custom libraries >> 4. Event-based triggering of a pipeline >> >> >> >> Any insight would be helpful. We have used NiFi for about a year now for >> data movement and are familiar with its capabilities. My biggest worry is >> the ability to coordinate with other machines using SSH. >> >> >> >> Thanks, >> >> Jon >> > -- http://www.google.com/profiles/grapesmoker