Parallel Execution of Spark Jobs

Ankit Jain Tue, 24 Jul 2018 16:13:06 -0700

Hi,
I am playing around with execution policy of Spark jobs(and all Zeppelin
paragraphs actually).


Looks like there are couple of control points-
1) Spark scheduling - FIFO vs Fair as documented in
https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools
.

Since we are still on .7 version and don't have
https://issues.apache.org/jira/browse/ZEPPELIN-3563, I am forcefully doing
sc.setLocalProperty("spark.scheduler.pool", "fair");
in both SparkInterpreter.java and SparkSqlInterpreter.java.

Also because we are exposing Zeppelin to multiple users we may not actually
want users to hog the cluster and always use FAIR.

This may complicate our merge to .8 though.

2. On top of Spark scheduling, each Zeppelin Interpreter itself seems to
have a scheduler queue. Each task is submitted to a FIFOScheduler except
SparkSqlInterpreter which creates a ParallelScheduler ig concurrentsql flag
is turned on.

I am changing SparkInterpreter.java to use ParallelScheduler too and that
seems to do the trick.

Now multiple notebooks are able to run in parallel.

My question is if other people have tested SparkInterpreter with
ParallelScheduler?
Also ideally this should be configurable. User should be specify fifo or
parallel.

Executing all paragraphs does add more complication and maybe

https://issues.apache.org/jira/browse/ZEPPELIN-2368 will help us keep the
execution order sane.


Thoughts?

-- 
Thanks & Regards,
Ankit.

Parallel Execution of Spark Jobs

Reply via email to