Thanks Jeff. Yep, that was helpful.
Btw, (i) icon has a broken link (see highlighted part below) : - it leads to a broken link https://zeppelin.apache.org/docs//usage/interpreter/interpreter_binding_mode.html What do you think about https://issues.apache.org/jira/browse/ZEPPELIN-3334 "Set spark.scheduler.pool to authenticated user name" ? I still think it makes sense .. -- Ruslan Dautkhanov On Wed, Mar 14, 2018 at 6:32 PM, Jeff Zhang <zjf...@gmail.com> wrote: > > Globally shared mode means all the users shared the sparkcontext and also > the same spark interpreter. That's why in this mode, code is executed > sequentially, concurrency is not allowed here as there may be dependencies > between paragraphs. Concurrency can not guaranteed the execution order. > > For your scenario, I think you can use scoped per user mode where all the > users share the same sparkcontext but use different spark interpreter. > > > > ankit jain <ankitjain....@gmail.com>于2018年3月15日周四 上午7:25写道: > >> We are seeing the same PENDING behavior despite running Spark Interpreter >> in "Isolated per User" - we expected one SparkContext to be created per >> user and indeed did see multiple SparkSubmit processes spun up on Zeppelin >> pod. >> >> But why go to PENDING if there are multiple contexts that can be run in >> parallel? Is assumption of multiple SparkSubmit = multiple SparkContext >> correct? >> >> Thanks >> Ankit >> >> On Wed, Mar 14, 2018 at 4:12 PM, Ruslan Dautkhanov <dautkha...@gmail.com> >> wrote: >> >>> Looked at the code.. the only place Zeppelin handles >>> spark.scheduler.pool is here - >>> >>> https://github.com/apache/zeppelin/blob/d762b5288536201d8a2964891c556e >>> faa1bae867/spark/interpreter/src/main/java/org/apache/zeppelin/spark/ >>> SparkSqlInterpreter.java#L103 >>> >>> I don't think it matches Spark documentation description that would >>> allow multiple concurrent users to submit jobs independently. >>> (each user's *thread* has to have different value for *spark.scheduler.pool >>> *) >>> >>> Filed https://issues.apache.org/jira/browse/ZEPPELIN-3334 to set >>> *spark.scheduler.pool* to an authenticated user name. >>> >>> Other ideas? >>> >>> >>> >>> >>> -- >>> Ruslan Dautkhanov >>> >>> On Wed, Mar 14, 2018 at 4:57 PM, Ruslan Dautkhanov <dautkha...@gmail.com >>> > wrote: >>> >>>> Let's say we have a Spark interpreter set up as >>>> " The interpreter will be instantiated *Globally *in *shared *process" >>>> >>>> When one user is using Spark interpreter, >>>> another users that are trying to use the same interpreter, >>>> getting PENDING until another user's code completes. >>>> >>>> Per Spark documentation, https://spark.apache.org/docs/ >>>> latest/job-scheduling.html >>>> >>>> " *within* each Spark application, multiple “jobs” (Spark actions) may >>>>> be running concurrently if they were submitted by different threads >>>>> ... /skip/ >>>>> threads. By “job”, in this section, we mean a Spark action (e.g. save, >>>>> collect) and any tasks that need to run to evaluate that action. >>>>> Spark’s scheduler is fully thread-safe and supports this use case to >>>>> enable >>>>> applications that serve multiple requests (e.g. queries for multiple >>>>> users). >>>>> ... /skip/ >>>>> Without any intervention, newly submitted jobs go into a *default >>>>> pool*, but jobs’ pools can be set by adding the *spark.scheduler.pool* >>>>> “local property” to the SparkContext in the thread that’s submitting >>>>> them. " >>>> >>>> >>>> So Spark allows multiple users to use the same shared spark context.. >>>> >>>> Two quick questions: >>>> 1. Why concurrent users are getting PENDING in Zeppelin? >>>> 2. Does Zeppelin set *spark.scheduler.pool* accordingly as described >>>> above? >>>> >>>> PS. >>>> We have set following Spark interpreter settings: >>>> - zeppelin.spark.concurrentSQL= true >>>> - spark.scheduler.mode = FAIR >>>> >>>> >>>> Thank you, >>>> Ruslan Dautkhanov >>>> >>>> >>> >> >> >> -- >> Thanks & Regards, >> Ankit. >> >