We are seeing the same PENDING behavior despite running Spark Interpreter
in "Isolated per User" - we expected one SparkContext to be created per
user and indeed did see multiple SparkSubmit processes spun up on Zeppelin
pod.

But why go to PENDING if there are multiple contexts that can be run in
parallel? Is assumption of multiple SparkSubmit = multiple SparkContext
correct?

Thanks
Ankit

On Wed, Mar 14, 2018 at 4:12 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> Looked at the code.. the only place Zeppelin handles spark.scheduler.pool
> is here -
>
> https://github.com/apache/zeppelin/blob/d762b5288536201d8a2964891c556e
> faa1bae867/spark/interpreter/src/main/java/org/apache/zeppelin/spark/
> SparkSqlInterpreter.java#L103
>
> I don't think it matches Spark documentation description that would allow
> multiple concurrent users to submit jobs independently.
> (each user's *thread* has to have different value for  *spark.scheduler.pool
> *)
>
> Filed https://issues.apache.org/jira/browse/ZEPPELIN-3334 to set
> *spark.scheduler.pool* to an authenticated user name.
>
> Other ideas?
>
>
>
>
> --
> Ruslan Dautkhanov
>
> On Wed, Mar 14, 2018 at 4:57 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
> wrote:
>
>> Let's say we have a Spark interpreter set up as
>> " The interpreter will be instantiated *Globally *in *shared *process"
>>
>> When one user is using Spark interpreter,
>> another users that are trying to use the same interpreter,
>> getting PENDING until another user's code completes.
>>
>> Per Spark documentation, https://spark.apache.org/docs/
>> latest/job-scheduling.html
>>
>> " *within* each Spark application, multiple “jobs” (Spark actions) may
>>> be running concurrently if they were submitted by different threads
>>> ... /skip/
>>> threads. By “job”, in this section, we mean a Spark action (e.g. save,
>>> collect) and any tasks that need to run to evaluate that action.
>>> Spark’s scheduler is fully thread-safe and supports this use case to enable
>>> applications that serve multiple requests (e.g. queries for multiple users).
>>> ... /skip/
>>> Without any intervention, newly submitted jobs go into a *default pool*,
>>> but jobs’ pools can be set by adding the *spark.scheduler.pool* “local
>>> property” to the SparkContext in the thread that’s submitting them.    "
>>
>>
>> So Spark allows multiple users to use the same shared spark context..
>>
>> Two quick questions:
>> 1. Why concurrent users are getting PENDING in Zeppelin?
>> 2. Does Zeppelin set *spark.scheduler.pool* accordingly as described
>> above?
>>
>> PS.
>> We have set following Spark interpreter settings:
>> - zeppelin.spark.concurrentSQL= true
>> - spark.scheduler.mode = FAIR
>>
>>
>> Thank you,
>> Ruslan Dautkhanov
>>
>>
>


-- 
Thanks & Regards,
Ankit.

Reply via email to