Hi Nick, Quick question about spark-submit command executed from azkaban with command job type. I see that when I press kill in azkaban portal on a spark-submit job, it doesn't actually kill the application on spark master and it continues to run even though azkaban thinks that it's killed. How do you get around this? Is there a way to kill the spark-submit jobs from azkaban portal?
On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > Hi Vikram, > > We use Azkaban (2.5.0) in our production workflow scheduling. We just use > local mode deployment and it is fairly easy to set up. It is pretty easy to > use and has a nice scheduling and logging interface, as well as SLAs (like > kill job and notify if it doesn't complete in 3 hours or whatever). > > However Spark support is not present directly - we run everything with > shell scripts and spark-submit. There is a plugin interface where one could > create a Spark plugin, but I found it very cumbersome when I did > investigate and didn't have the time to work through it to develop that. > > It has some quirks and while there is actually a REST API for adding jos > and dynamically scheduling jobs, it is not documented anywhere so you kinda > have to figure it out for yourself. But in terms of ease of use I found it > way better than Oozie. I haven't tried Chronos, and it seemed quite > involved to set up. Haven't tried Luigi either. > > Spark job server is good but as you say lacks some stuff like scheduling > and DAG type workflows (independent of spark-defined job flows). > > > On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com> wrote: > >> Check also falcon in combination with oozie >> >> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid> a >> écrit : >> >>> Looks like Oozie can satisfy most of your requirements. >>> >>> >>> >>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> I'm looking for open source workflow tools/engines that allow us to >>>> schedule spark jobs on a datastax cassandra cluster. Since there are tonnes >>>> of alternatives out there like Ozzie, Azkaban, Luigi , Chronos etc, I >>>> wanted to check with people here to see what they are using today. >>>> >>>> Some of the requirements of the workflow engine that I'm looking for are >>>> >>>> 1. First class support for submitting Spark jobs on Cassandra. Not some >>>> wrapper Java code to submit tasks. >>>> 2. Active open source community support and well tested at production >>>> scale. >>>> 3. Should be dead easy to write job dependencices using XML or web >>>> interface . Ex; job A depends on Job B and Job C, so run Job A after B and >>>> C are finished. Don't need to write full blown java applications to specify >>>> job parameters and dependencies. Should be very simple to use. >>>> 4. Time based recurrent scheduling. Run the spark jobs at a given time >>>> every hour or day or week or month. >>>> 5. Job monitoring, alerting on failures and email notifications on >>>> daily basis. >>>> >>>> I have looked at Ooyala's spark job server which seems to be hated >>>> towards making spark jobs run faster by sharing contexts between the jobs >>>> but isn't a full blown workflow engine per se. A combination of spark job >>>> server and workflow engine would be ideal >>>> >>>> Thanks for the inputs >>>> >>> >>> >