Re: multiple users sharing single Spark context

Jeff Zhang Wed, 14 Mar 2018 17:33:07 -0700

Globally shared mode means all the users shared the sparkcontext and also
the same spark interpreter. That's why in this mode, code is executed
sequentially, concurrency is not allowed here as there may be dependencies
between paragraphs. Concurrency can not guaranteed the execution order.


For your scenario, I think you can use scoped per user mode where all the
users share the same sparkcontext but use different spark interpreter.



ankit jain <ankitjain....@gmail.com>于2018年3月15日周四 上午7:25写道：

> We are seeing the same PENDING behavior despite running Spark Interpreter
> in "Isolated per User" - we expected one SparkContext to be created per
> user and indeed did see multiple SparkSubmit processes spun up on Zeppelin
> pod.
>
> But why go to PENDING if there are multiple contexts that can be run in
> parallel? Is assumption of multiple SparkSubmit = multiple SparkContext
> correct?
>
> Thanks
> Ankit
>
> On Wed, Mar 14, 2018 at 4:12 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
> wrote:
>
>> Looked at the code.. the only place Zeppelin handles spark.scheduler.pool
>> is here -
>>
>>
>> https://github.com/apache/zeppelin/blob/d762b5288536201d8a2964891c556efaa1bae867/spark/interpreter/src/main/java/org/apache/zeppelin/spark/SparkSqlInterpreter.java#L103
>>
>> I don't think it matches Spark documentation description that would allow
>> multiple concurrent users to submit jobs independently.
>> (each user's *thread* has to have different value for  *spark.scheduler.pool
>> *)
>>
>> Filed https://issues.apache.org/jira/browse/ZEPPELIN-3334 to set
>> *spark.scheduler.pool* to an authenticated user name.
>>
>> Other ideas?
>>
>>
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>> On Wed, Mar 14, 2018 at 4:57 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
>> wrote:
>>
>>> Let's say we have a Spark interpreter set up as
>>> " The interpreter will be instantiated *Globally *in *shared *process"
>>>
>>> When one user is using Spark interpreter,
>>> another users that are trying to use the same interpreter,
>>> getting PENDING until another user's code completes.
>>>
>>> Per Spark documentation,
>>> https://spark.apache.org/docs/latest/job-scheduling.html
>>>
>>> " *within* each Spark application, multiple “jobs” (Spark actions) may
>>>> be running concurrently if they were submitted by different threads
>>>> ... /skip/
>>>> threads. By “job”, in this section, we mean a Spark action (e.g. save,
>>>> collect) and any tasks that need to run to evaluate that action.
>>>> Spark’s scheduler is fully thread-safe and supports this use case to enable
>>>> applications that serve multiple requests (e.g. queries for multiple 
>>>> users).
>>>> ... /skip/
>>>> Without any intervention, newly submitted jobs go into a *default pool*,
>>>> but jobs’ pools can be set by adding the *spark.scheduler.pool* “local
>>>> property” to the SparkContext in the thread that’s submitting them.
>>>> "
>>>
>>>
>>> So Spark allows multiple users to use the same shared spark context..
>>>
>>> Two quick questions:
>>> 1. Why concurrent users are getting PENDING in Zeppelin?
>>> 2. Does Zeppelin set *spark.scheduler.pool* accordingly as described
>>> above?
>>>
>>> PS.
>>> We have set following Spark interpreter settings:
>>> - zeppelin.spark.concurrentSQL= true
>>> - spark.scheduler.mode = FAIR
>>>
>>>
>>> Thank you,
>>> Ruslan Dautkhanov
>>>
>>>
>>
>
>
> --
> Thanks & Regards,
> Ankit.
>

Re: multiple users sharing single Spark context

Reply via email to