We are running our Spark jobs on Amazon AWS and are using AWS Datapipeline
for orchestration of the different spark jobs. AWS datapipeline provides
automatic EMR cluster provisioning, retry on failure,SNS notification etc.
out of the box and works well for us.





On Sun, Mar 1, 2015 at 7:02 PM, Felix C <felixcheun...@hotmail.com> wrote:

>  We use Oozie as well, and it has worked well.
> The catch is each action in Oozie is separate and one cannot retain
> SparkContext or RDD, or leverage caching or temp table, going into another
> Oozie action. You could either save output to file or put all Spark
> processing into one Oozie action.
>
> --- Original Message ---
>
> From: "Mayur Rustagi" <mayur.rust...@gmail.com>
> Sent: February 28, 2015 7:07 PM
> To: "Qiang Cao" <caoqiang...@gmail.com>
> Cc: "Ted Yu" <yuzhih...@gmail.com>, "Ashish Nigam" <ashnigamt...@gmail.com>,
> "user" <user@spark.apache.org>
> Subject: Re: Tools to manage workflows on Spark
>
>  Sorry not really. Spork is a way to migrate your existing pig scripts to
> Spark or write new pig jobs then can execute on spark.
> For orchestration you are better off using Oozie especially if you are
> using other execution engines/systems besides spark.
>
>
>     Regards,
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoid.com <http://www.sigmoidanalytics.com/>
> @mayur_rustagi <http://www.twitter.com/mayur_rustagi>
>
> On Sat, Feb 28, 2015 at 6:59 PM, Qiang Cao <caoqiang...@gmail.com> wrote:
>
> Thanks Mayur! I'm looking for something that would allow me to easily
> describe and manage a workflow on Spark. A workflow in my context is a
> composition of Spark applications that may depend on one another based on
> hdfs inputs/outputs. Is Spork a good fit? The orchestration I want is on
> app level.
>
>
>
> On Sat, Feb 28, 2015 at 9:38 PM, Mayur Rustagi <mayur.rust...@gmail.com>
> wrote:
>
> We do maintain it but in apache repo itself. However Pig cannot do
> orchestration for you. I am not sure what you are looking at from Pig in
> this context.
>
>     Regards,
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoid.com <http://www.sigmoidanalytics.com/>
>  @mayur_rustagi <http://www.twitter.com/mayur_rustagi>
>
> On Sat, Feb 28, 2015 at 6:36 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> Here was latest modification in spork repo:
> Mon Dec 1 10:08:19 2014
>
>  Not sure if it is being actively maintained.
>
> On Sat, Feb 28, 2015 at 6:26 PM, Qiang Cao <caoqiang...@gmail.com> wrote:
>
> Thanks for the pointer, Ashish! I was also looking at Spork
> https://github.com/sigmoidanalytics/spork Pig-on-Spark), but wasn't sure
> if that's the right direction.
>
> On Sat, Feb 28, 2015 at 6:36 PM, Ashish Nigam <ashnigamt...@gmail.com>
> wrote:
>
> You have to call spark-submit from oozie.
> I used this link to get the idea for my implementation -
>
>
> http://mail-archives.apache.org/mod_mbox/oozie-user/201404.mbox/%3CCAHCsPn-0Grq1rSXrAZu35yy_i4T=fvovdox2ugpcuhkwmjp...@mail.gmail.com%3E
>
>
>
>  On Feb 28, 2015, at 3:25 PM, Qiang Cao <caoqiang...@gmail.com> wrote:
>
>  Thanks, Ashish! Is Oozie integrated with Spark? I knew it can
> accommodate some Hadoop jobs.
>
>
> On Sat, Feb 28, 2015 at 6:07 PM, Ashish Nigam <ashnigamt...@gmail.com>
> wrote:
>
> Qiang,
> Did you look at Oozie?
> We use oozie to run spark jobs in production.
>
>
>  On Feb 28, 2015, at 2:45 PM, Qiang Cao <caoqiang...@gmail.com> wrote:
>
>  Hi Everyone,
>
>  We need to deal with workflows on Spark. In our scenario, each workflow
> consists of multiple processing steps. Among different steps, there could
> be dependencies.  I'm wondering if there are tools available that can
> help us schedule and manage workflows on Spark. I'm looking for something
> like pig on Hadoop, but it should fully function on Spark.
>
>  Any suggestion?
>
>  Thanks in advance!
>
>  Qiang
>
>
>
>
>
>
>
>
>
>


-- 
Thanks & Regards
Himanish

Reply via email to