Hi Moon,
How about tracking dedicated SparkContext for a notebook in Spark's
remote interpreter - this will allow multiple users to run their spark
paragraphs in parallel. Also, within a notebook only one paragraph is
executed at a time.
Regards,
-Pranav.
On 15/07/15 7:15 pm, moon soo Lee wrote:
Hi,
Thanks for asking question.
The reason is simply because of it is running code statements. The
statements can have order and dependency. Imagine i have two paragraphs
%spark
val a = 1
%spark
print(a)
If they're not running one by one, that means they possibly runs in
random order and the output will be always different. Either '1' or
'val a can not found'.
This is the reason why. But if there are nice idea to handle this
problem i agree using parallel scheduler would help a lot.
Thanks,
moon
On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng
<linxizeng0...@gmail.com <mailto:linxizeng0...@gmail.com>> wrote:
any one who have the same question with me? or this is not a question?
2015-07-14 11:47 GMT+08:00 linxi zeng <linxizeng0...@gmail.com
<mailto:linxizeng0...@gmail.com>>:
hi, Moon:
I notice that the getScheduler function in the
SparkInterpreter.java return a FIFOScheduler which makes the
spark interpreter run spark job one by one. It's not a good
experience when couple of users do some work on zeppelin at
the same time, because they have to wait for each other.
And at the same time, SparkSqlInterpreter can chose what
scheduler to use by "zeppelin.spark.concurrentSQL".
My question is, what kind of consideration do you based on to
make such a decision?