The spark job type was added recently - see this pull request https://github.com/azkaban/azkaban-plugins/pull/195. You can leverage the SLA feature to kill a job if it ran longer than expected.
BTW, we just solved the scalability issue by supporting multiple executors. Within a week or two, the code for that should be merged in the main trunk. Hien On Tue, Oct 6, 2015 at 9:40 PM, Vikram Kone <vikramk...@gmail.com> wrote: > Does Azkaban support scheduling long running jobs like spark steaming > jobs? Will Azkaban kill a job if it's running for a long time. > > > On Friday, August 7, 2015, Vikram Kone <vikramk...@gmail.com> wrote: > >> Hien, >> Is Azkaban being phased out at linkedin as rumored? If so, what's >> linkedin going to use for workflow scheduling? Is there something else >> that's going to replace Azkaban? >> >> On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> In my opinion, choosing some particular project among its peers should >>> leave enough room for future growth (which may come faster than you >>> initially think). >>> >>> Cheers >>> >>> On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu <h...@linkedin.com> wrote: >>> >>>> Scalability is a known issue due the the current architecture. However >>>> this will be applicable if you run more 20K jobs per day. >>>> >>>> On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> From what I heard (an ex-coworker who is Oozie committer), Azkaban is >>>>> being phased out at LinkedIn because of scalability issues (though >>>>> UI-wise, >>>>> Azkaban seems better). >>>>> >>>>> Vikram: >>>>> I suggest you do more research in related projects (maybe using their >>>>> mailing lists). >>>>> >>>>> Disclaimer: I don't work for LinkedIn. >>>>> >>>>> On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath < >>>>> nick.pentre...@gmail.com> wrote: >>>>> >>>>>> Hi Vikram, >>>>>> >>>>>> We use Azkaban (2.5.0) in our production workflow scheduling. We just >>>>>> use local mode deployment and it is fairly easy to set up. It is pretty >>>>>> easy to use and has a nice scheduling and logging interface, as well as >>>>>> SLAs (like kill job and notify if it doesn't complete in 3 hours or >>>>>> whatever). >>>>>> >>>>>> However Spark support is not present directly - we run everything >>>>>> with shell scripts and spark-submit. There is a plugin interface where >>>>>> one >>>>>> could create a Spark plugin, but I found it very cumbersome when I did >>>>>> investigate and didn't have the time to work through it to develop that. >>>>>> >>>>>> It has some quirks and while there is actually a REST API for adding >>>>>> jos and dynamically scheduling jobs, it is not documented anywhere so you >>>>>> kinda have to figure it out for yourself. But in terms of ease of use I >>>>>> found it way better than Oozie. I haven't tried Chronos, and it seemed >>>>>> quite involved to set up. Haven't tried Luigi either. >>>>>> >>>>>> Spark job server is good but as you say lacks some stuff like >>>>>> scheduling and DAG type workflows (independent of spark-defined job >>>>>> flows). >>>>>> >>>>>> >>>>>> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Check also falcon in combination with oozie >>>>>>> >>>>>>> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid> a >>>>>>> écrit : >>>>>>> >>>>>>>> Looks like Oozie can satisfy most of your requirements. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> I'm looking for open source workflow tools/engines that allow us >>>>>>>>> to schedule spark jobs on a datastax cassandra cluster. Since there >>>>>>>>> are >>>>>>>>> tonnes of alternatives out there like Ozzie, Azkaban, Luigi , Chronos >>>>>>>>> etc, >>>>>>>>> I wanted to check with people here to see what they are using today. >>>>>>>>> >>>>>>>>> Some of the requirements of the workflow engine that I'm looking >>>>>>>>> for are >>>>>>>>> >>>>>>>>> 1. First class support for submitting Spark jobs on Cassandra. Not >>>>>>>>> some wrapper Java code to submit tasks. >>>>>>>>> 2. Active open source community support and well tested at >>>>>>>>> production scale. >>>>>>>>> 3. Should be dead easy to write job dependencices using XML or web >>>>>>>>> interface . Ex; job A depends on Job B and Job C, so run Job A after >>>>>>>>> B and >>>>>>>>> C are finished. Don't need to write full blown java applications to >>>>>>>>> specify >>>>>>>>> job parameters and dependencies. Should be very simple to use. >>>>>>>>> 4. Time based recurrent scheduling. Run the spark jobs at a given >>>>>>>>> time every hour or day or week or month. >>>>>>>>> 5. Job monitoring, alerting on failures and email notifications on >>>>>>>>> daily basis. >>>>>>>>> >>>>>>>>> I have looked at Ooyala's spark job server which seems to be hated >>>>>>>>> towards making spark jobs run faster by sharing contexts between the >>>>>>>>> jobs >>>>>>>>> but isn't a full blown workflow engine per se. A combination of spark >>>>>>>>> job >>>>>>>>> server and workflow engine would be ideal >>>>>>>>> >>>>>>>>> Thanks for the inputs >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >>