The spark job type was added recently - see this pull request
https://github.com/azkaban/azkaban-plugins/pull/195.  You can leverage the
SLA feature to kill a job if it ran longer than expected.

BTW, we just solved the scalability issue by supporting multiple
executors.  Within a week or two, the code for that should be merged in the
main trunk.

Hien

On Tue, Oct 6, 2015 at 9:40 PM, Vikram Kone <vikramk...@gmail.com> wrote:

> Does Azkaban support scheduling long running jobs like spark steaming
> jobs? Will Azkaban kill a job if it's running for a long time.
>
>
> On Friday, August 7, 2015, Vikram Kone <vikramk...@gmail.com> wrote:
>
>> Hien,
>> Is Azkaban being phased out at linkedin as rumored? If so, what's
>> linkedin going to use for workflow scheduling? Is there something else
>> that's going to replace Azkaban?
>>
>> On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>
>>> In my opinion, choosing some particular project among its peers should
>>> leave enough room for future growth (which may come faster than you
>>> initially think).
>>>
>>> Cheers
>>>
>>> On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu <h...@linkedin.com> wrote:
>>>
>>>> Scalability is a known issue due the the current architecture.  However
>>>> this will be applicable if you run more 20K jobs per day.
>>>>
>>>> On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>
>>>>> From what I heard (an ex-coworker who is Oozie committer), Azkaban is
>>>>> being phased out at LinkedIn because of scalability issues (though 
>>>>> UI-wise,
>>>>> Azkaban seems better).
>>>>>
>>>>> Vikram:
>>>>> I suggest you do more research in related projects (maybe using their
>>>>> mailing lists).
>>>>>
>>>>> Disclaimer: I don't work for LinkedIn.
>>>>>
>>>>> On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <
>>>>> nick.pentre...@gmail.com> wrote:
>>>>>
>>>>>> Hi Vikram,
>>>>>>
>>>>>> We use Azkaban (2.5.0) in our production workflow scheduling. We just
>>>>>> use local mode deployment and it is fairly easy to set up. It is pretty
>>>>>> easy to use and has a nice scheduling and logging interface, as well as
>>>>>> SLAs (like kill job and notify if it doesn't complete in 3 hours or
>>>>>> whatever).
>>>>>>
>>>>>> However Spark support is not present directly - we run everything
>>>>>> with shell scripts and spark-submit. There is a plugin interface where 
>>>>>> one
>>>>>> could create a Spark plugin, but I found it very cumbersome when I did
>>>>>> investigate and didn't have the time to work through it to develop that.
>>>>>>
>>>>>> It has some quirks and while there is actually a REST API for adding
>>>>>> jos and dynamically scheduling jobs, it is not documented anywhere so you
>>>>>> kinda have to figure it out for yourself. But in terms of ease of use I
>>>>>> found it way better than Oozie. I haven't tried Chronos, and it seemed
>>>>>> quite involved to set up. Haven't tried Luigi either.
>>>>>>
>>>>>> Spark job server is good but as you say lacks some stuff like
>>>>>> scheduling and DAG type workflows (independent of spark-defined job 
>>>>>> flows).
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Check also falcon in combination with oozie
>>>>>>>
>>>>>>> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid> a
>>>>>>> écrit :
>>>>>>>
>>>>>>>> Looks like Oozie can satisfy most of your requirements.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> I'm looking for open source workflow tools/engines that allow us
>>>>>>>>> to schedule spark jobs on a datastax cassandra cluster. Since there 
>>>>>>>>> are
>>>>>>>>> tonnes of alternatives out there like Ozzie, Azkaban, Luigi , Chronos 
>>>>>>>>> etc,
>>>>>>>>> I wanted to check with people here to see what they are using today.
>>>>>>>>>
>>>>>>>>> Some of the requirements of the workflow engine that I'm looking
>>>>>>>>> for are
>>>>>>>>>
>>>>>>>>> 1. First class support for submitting Spark jobs on Cassandra. Not
>>>>>>>>> some wrapper Java code to submit tasks.
>>>>>>>>> 2. Active open source community support and well tested at
>>>>>>>>> production scale.
>>>>>>>>> 3. Should be dead easy to write job dependencices using XML or web
>>>>>>>>> interface . Ex; job A depends on Job B and Job C, so run Job A after 
>>>>>>>>> B and
>>>>>>>>> C are finished. Don't need to write full blown java applications to 
>>>>>>>>> specify
>>>>>>>>> job parameters and dependencies. Should be very simple to use.
>>>>>>>>> 4. Time based  recurrent scheduling. Run the spark jobs at a given
>>>>>>>>> time every hour or day or week or month.
>>>>>>>>> 5. Job monitoring, alerting on failures and email notifications on
>>>>>>>>> daily basis.
>>>>>>>>>
>>>>>>>>> I have looked at Ooyala's spark job server which seems to be hated
>>>>>>>>> towards making spark jobs run faster by sharing contexts between the 
>>>>>>>>> jobs
>>>>>>>>> but isn't a full blown workflow engine per se. A combination of spark 
>>>>>>>>> job
>>>>>>>>> server and workflow engine would be ideal
>>>>>>>>>
>>>>>>>>> Thanks for the inputs
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to