On Wed, Apr 12, 2017 at 4:11 PM, Sam Elamin <hussam.ela...@gmail.com> wrote:

>
> When it comes to scheduling Spark jobs, you can either submit to an
> already running cluster using things like Oozie or bash scripts, or have a
> workflow manager like Airflow or Data Pipeline to create new clusters for
> you. We went down the second route to continue with the whole immutable
> infrastructure/ "treat you're servers as cattle not pets"
>

A great overview. I just want to point out that Airflow can submit jobs to
an existing cluster if you prefer to have a shared cluster (may be ideal if
you have a bunch of smaller jobs to complete). Do keep in mind that if you
are using the EMR operator that uses the EMR add step API, these will be
submitted to YARN one at a time.

Reply via email to