Looked at the code.. the only place Zeppelin handles spark.scheduler.pool is here -
https://github.com/apache/zeppelin/blob/d762b5288536201d8a2964891c556efaa1bae867/spark/interpreter/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L103 I don't think it matches Spark documentation description that would allow multiple concurrent users to submit jobs independently. (each user's *thread* has to have different value for *spark.scheduler.pool *) Filed https://issues.apache.org/jira/browse/ZEPPELIN-3334 to set *spark.scheduler.pool* to an authenticated user name. Other ideas? -- Ruslan Dautkhanov On Wed, Mar 14, 2018 at 4:57 PM, Ruslan Dautkhanov <dautkha...@gmail.com> wrote: > Let's say we have a Spark interpreter set up as > " The interpreter will be instantiated *Globally *in *shared *process" > > When one user is using Spark interpreter, > another users that are trying to use the same interpreter, > getting PENDING until another user's code completes. > > Per Spark documentation, https://spark.apache.org/docs/ > latest/job-scheduling.html > > " *within* each Spark application, multiple “jobs” (Spark actions) may be >> running concurrently if they were submitted by different threads >> ... /skip/ >> threads. By “job”, in this section, we mean a Spark action (e.g. save, >> collect) and any tasks that need to run to evaluate that action. Spark’s >> scheduler is fully thread-safe and supports this use case to enable >> applications that serve multiple requests (e.g. queries for multiple users). >> ... /skip/ >> Without any intervention, newly submitted jobs go into a *default pool*, >> but jobs’ pools can be set by adding the *spark.scheduler.pool* “local >> property” to the SparkContext in the thread that’s submitting them. " > > > So Spark allows multiple users to use the same shared spark context.. > > Two quick questions: > 1. Why concurrent users are getting PENDING in Zeppelin? > 2. Does Zeppelin set *spark.scheduler.pool* accordingly as described > above? > > PS. > We have set following Spark interpreter settings: > - zeppelin.spark.concurrentSQL= true > - spark.scheduler.mode = FAIR > > > Thank you, > Ruslan Dautkhanov > >