We use Luigi for this purpose. (Our pipelines are typically on AWS (no EMR) backed by S3 and using combinations of Python jobs, non-Spark Java/Scala, and Spark. We run Spark jobs by connecting drivers/clients to the master, and those are what is invoked from Luigi.)
— p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Thu, Jul 10, 2014 at 10:20 AM, k.tham <kevins...@gmail.com> wrote: > I'm just wondering what's the general recommendation for data pipeline > automation. > > Say, I want to run Spark Job A, then B, then invoke script C, then do D, > and > if D fails, do E, and if Job A fails, send email F, etc... > > It looks like Oozie might be the best choice. But I'd like some > advice/suggestions. > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Recommended-pipeline-automation-tool-Oozie-tp9319.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >