I am looking into the possibility of using NiFi as a Data Pipeline 
Orchestration Tool. I’m evaluating NiFi along with some other tools such as 
Airflow and AWS Step Functions/Lambdas.

Has anyone used NiFi as an orchestration/scheduling tool for tasks such as 
submitting spark jobs to an EMR cluster? These are some of the requirements we 
are considering while evaluating such a tool:

  1.  SSH capabilities to execute remote commands
  2.  Rich scheduling (CRON)
  3.  Ability to write custom routines and import custom libraries
  4.  Event-based triggering of a pipeline

Any insight would be helpful. We have used NiFi for about a year now for data 
movement and are familiar with its capabilities. My biggest worry is the 
ability to coordinate with other machines using SSH.


Reply via email to