Re: multiple users sharing single Spark context

Ruslan Dautkhanov Wed, 14 Mar 2018 16:13:45 -0700

Looked at the code.. the only place Zeppelin handles spark.scheduler.pool
is here -


https://github.com/apache/zeppelin/blob/d762b5288536201d8a2964891c556efaa1bae867/spark/interpreter/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L103

I don't think it matches Spark documentation description that would allow
multiple concurrent users to submit jobs independently.
(each user's *thread* has to have different value for  *spark.scheduler.pool
*)

Filed https://issues.apache.org/jira/browse/ZEPPELIN-3334 to set
*spark.scheduler.pool* to an authenticated user name.

Other ideas?




-- 
Ruslan Dautkhanov

On Wed, Mar 14, 2018 at 4:57 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> Let's say we have a Spark interpreter set up as
> " The interpreter will be instantiated *Globally *in *shared *process"
>
> When one user is using Spark interpreter,
> another users that are trying to use the same interpreter,
> getting PENDING until another user's code completes.
>
> Per Spark documentation, https://spark.apache.org/docs/
> latest/job-scheduling.html
>
> " *within* each Spark application, multiple “jobs” (Spark actions) may be
>> running concurrently if they were submitted by different threads
>> ... /skip/
>> threads. By “job”, in this section, we mean a Spark action (e.g. save,
>> collect) and any tasks that need to run to evaluate that action. Spark’s
>> scheduler is fully thread-safe and supports this use case to enable
>> applications that serve multiple requests (e.g. queries for multiple users).
>> ... /skip/
>> Without any intervention, newly submitted jobs go into a *default pool*,
>> but jobs’ pools can be set by adding the *spark.scheduler.pool* “local
>> property” to the SparkContext in the thread that’s submitting them.    "
>
>
> So Spark allows multiple users to use the same shared spark context..
>
> Two quick questions:
> 1. Why concurrent users are getting PENDING in Zeppelin?
> 2. Does Zeppelin set *spark.scheduler.pool* accordingly as described
> above?
>
> PS.
> We have set following Spark interpreter settings:
> - zeppelin.spark.concurrentSQL= true
> - spark.scheduler.mode = FAIR
>
>
> Thank you,
> Ruslan Dautkhanov
>
>

Re: multiple users sharing single Spark context

Reply via email to