Thanks Liang, Vadim and everyone for your inputs!!

With this clarity, I've tried client modes for both main and sub-spark
jobs. Every main spark job and its corresponding threaded spark jobs are
coming up on the YARN applications list and the jobs are getting executed
properly. I need to now test with cluster modes at both levels, and need to
setup spark-submit and few configurations properly on all data nodes in the
cluster. I will share the updates as and when I execute and analyze further.

Concern now which I am thinking is: how to throttle multiple jobs launching
based on the YARN cluster's availability. This exercise will be similar to
performing cluster's break-point analysis. But problem here is that we will
not know the file sizes until we read and get in memory and since Spark's
memory mechanics are more subtle and fragile, need to be 100% sure and
avoid OOM (out-of-memory) issues. Not sure if there is any process
available which can poll resource manager's information and tell if any
further jobs can be submitted to YARN.


On Thu, Dec 22, 2016 at 7:26 AM, Liang-Chi Hsieh <vii...@gmail.com> wrote:

>
> If you run the main driver and other Spark jobs in client mode, you can
> make
> sure they (I meant all the drivers) are running at the same node. Of course
> all drivers now consume the resources at the same node.
>
> If you run the main driver in client mode, but run other Spark jobs in
> cluster mode, the drivers of those Spark jobs will be launched at other
> nodes in the cluster. It should work too. It is as same as you run a Spark
> app in client mode and more others in cluster mode.
>
> If you run your main driver in cluster mode, and run other Spark jobs in
> cluster mode too, you may need  Spark properly installed in all nodes in
> the
> cluster, because those Spark jobs will be launched at the node which the
> main driver is running on.
>
>
>
>
>
> -----
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Launching-multiple-
> spark-jobs-within-a-main-spark-job-tp20311p20327.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to