Re: Parallel Execution of Spark Jobs

Jeff Zhang Tue, 24 Jul 2018 17:41:49 -0700

Regarding 1.  ZEPPELIN-3563 should be helpful. See
https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md#running-spark-sql-concurrently
for more details.
https://issues.apache.org/jira/browse/ZEPPELIN-3563


Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may hit
weird issues if your paragraph has dependency between each other. e.g.
paragraph 1 will use variable v1 which is defined in paragraph p2. Then the
order of paragraph execution matters here, and ParallelScheduler can
not guarantee the order of execution.
That's why we use FIFOScheduler for SparkInterpreter.

In your scenario where multiple users share the same sparkcontext, I would
suggest you to use scoped per user mode. Then each user will share the same
sparkcontext which means you can save resources, and also they are in each
FIFOScheduler which is isolated from each other.

Ankit Jain <[email protected]>于2018年7月25日周三 上午8:14写道：

> Forgot to mention this is for shared scoped mode, so same Spark
> application and context for all users on a single Zeppelin instance.
>
> Thanks
> Ankit
>
> On Jul 24, 2018, at 4:12 PM, Ankit Jain <[email protected]> wrote:
>
> Hi,
> I am playing around with execution policy of Spark jobs(and all Zeppelin
> paragraphs actually).
>
> Looks like there are couple of control points-
> 1) Spark scheduling - FIFO vs Fair as documented in
> https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools
> .
>
> Since we are still on .7 version and don't have
> https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully
> doing sc.setLocalProperty("spark.scheduler.pool", "fair");
> in both SparkInterpreter.java and SparkSqlInterpreter.java.
>
> Also because we are exposing Zeppelin to multiple users we may not
> actually want users to hog the cluster and always use FAIR.
>
> This may complicate our merge to .8 though.
>
> 2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
> have a scheduler queue. Each task is submitted to a FIFOScheduler except
> SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
> is turned on.
>
> I am changing SparkInterpreter.java to use ParallelScheduler too and that
> seems to do the trick.
>
> Now multiple notebooks are able to run in parallel.
>
> My question is if other people have tested SparkInterpreter with 
> ParallelScheduler?
> Also ideally this should be configurable. User should be specify fifo or
> parallel.
>
> Executing all paragraphs does add more complication and maybe
>
> https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the
> execution order sane.
>
>
> Thoughts?
>
> --
> Thanks & Regards,
> Ankit.
>
>

Re: Parallel Execution of Spark Jobs

Reply via email to