Re: Parallel Execution of Spark Jobs

2018-07-26 Thread Ankit Jain
Thanks for further clarification Jeff. > On Jul 26, 2018, at 8:11 PM, Jeff Zhang wrote: > > Let me rephrase it. In scoped mode, there's multiple Interpreter Group > (Personally I prefer to call it multiple sessions) in ones JVM (For spark > interpreter, there's multiple SparkInterpreter

Re: Parallel Execution of Spark Jobs

2018-07-26 Thread Jeff Zhang
Let me rephrase it. In scoped mode, there's multiple Interpreter Group (Personally I prefer to call it multiple sessions) in ones JVM (For spark interpreter, there's multiple SparkInterpreter instances). And there's one SparkContext in this JVM which is shared by all the SparkInterpreter

Re: Parallel Execution of Spark Jobs

2018-07-25 Thread Ankit Jain
Jeff, what you said seems to be in conflict with what is detailed here - https://medium.com/@leemoonsoo/apache-zeppelin-interpreter-mode-explained- bae0525d0555 "In *Scoped* mode, Zeppelin still runs single interpreter JVM process but multiple *Interpreter Group* serve each Note." In practice as

Re: Parallel Execution of Spark Jobs

2018-07-25 Thread Ankit Jain
Aah that makes sense - so only all jobs from one user will block in FIFOScheduler. By moving to ParallelScheduler, only gain achieved is jobs from same user can also be run in parallel but may have dependency resolution issues. Just to confirm I have it right - If "Run all" notebook is not a

Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Jeff Zhang
1. Zeppelin-3563 force FAIR scheduling and just allow to specify the pool 2. scheduler can not to figure out the dependencies between paragraphs. That's why SparkInterpreter use FIFOScheduler. If you use per user scoped mode. SparkContext is shared between users but SparkInterpreter is not shared.

Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Thanks for the quick feedback Jeff. Re:1 - I did see Zeppelin-3563 but we are not on .8 yet and also we may want to force FAIR execution instead of letting user control it. Re:2 - Is there an architecture issue here or we just need better thread safety? Ideally scheduler should be able to figure

Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Jeff Zhang
Regarding 1. ZEPPELIN-3563 should be helpful. See https://github.com/apache/zeppelin/blob/master/docs/interpreter/spark.md#running-spark-sql-concurrently for more details. https://issues.apache.org/jira/browse/ZEPPELIN-3563 Regarding 2. If you use ParallelScheduler for SparkInterpreter, you may

Re: Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Forgot to mention this is for shared scoped mode, so same Spark application and context for all users on a single Zeppelin instance. Thanks Ankit > On Jul 24, 2018, at 4:12 PM, Ankit Jain wrote: > > Hi, > I am playing around with execution policy of Spark jobs(and all Zeppelin > paragraphs

Parallel Execution of Spark Jobs

2018-07-24 Thread Ankit Jain
Hi, I am playing around with execution policy of Spark jobs(and all Zeppelin paragraphs actually). Looks like there are couple of control points- 1) Spark scheduling - FIFO vs Fair as documented in https://spark.apache.org/docs/2.1.1/job-scheduling.html#fair-scheduler-pools . Since we are still