We use Luigi for this purpose.  (Our pipelines are typically on AWS (no
EMR) backed by S3 and using combinations of Python jobs, non-Spark
Java/Scala, and Spark.  We run Spark jobs by connecting drivers/clients to
the master, and those are what is invoked from Luigi.)

—
p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/


On Thu, Jul 10, 2014 at 10:20 AM, k.tham <kevins...@gmail.com> wrote:

> I'm just wondering what's the general recommendation for data pipeline
> automation.
>
> Say, I want to run Spark Job A, then B, then invoke script C, then do D,
> and
> if D fails, do E, and if Job A fails, send email F, etc...
>
> It looks like Oozie might be the best choice. But I'd like some
> advice/suggestions.
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Recommended-pipeline-automation-tool-Oozie-tp9319.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to