Re: why zeppelin SparkInterpreter use FIFOScheduler

Rohit Agarwal Sat, 15 Aug 2015 14:00:07 -0700

If the problem is that multiple users have to wait for each other while
using Zeppelin, the solution already exists: they can create a new
interpreter by going to the interpreter page and attach it to their
notebook - then they don't have to wait for others to submit their job.


But I agree, having paragraphs from one note wait for paragraphs from other
notes is a confusing default. We can get around that in two ways:

   1. Create a new interpreter for each note and attach that interpreter to
   that note. This approach would require the least amount of code changes but
   is resource heavy and doesn't let you share Spark Context between different
   notes.
   2. If we want to share the Spark Context between different notes, we can
   submit jobs from different notes into different fairscheduler pools (
   
https://spark.apache.org/docs/1.4.0/job-scheduling.html#scheduling-within-an-application).
   This can be done by submitting jobs from different notes in different
   threads. This will make sure that jobs from one note are run sequentially
   but jobs from different notes will be able to run in parallel.

Neither of these options require any change in the Spark code.

--
Thanks & Regards
Rohit Agarwal
https://www.linkedin.com/in/rohitagarwal003

On Sat, Aug 15, 2015 at 12:01 PM, Pranav Kumar Agarwal <praag...@gmail.com>
wrote:

> If someone can share about the idea of sharing single SparkContext through
>> multiple SparkILoop safely, it'll be really helpful.
>>
> Here is a proposal:
> 1. In Spark code, change SparkIMain.scala to allow setting the virtual
> directory. While creating new instances of SparkIMain per notebook from
> zeppelin spark interpreter set all the instances of SparkIMain to the same
> virtual directory.
> 2. Start HTTP server on that virtual directory and set this HTTP server in
> Spark Context using classserverUri method
> 3. Scala generated code has a notion of packages. The default package name
> is "line$<linenumber>". Package name can be controlled using System
> Property scala.repl.name.line. Setting this property to "notebook id"
> ensures that code generated by individual instances of SparkIMain is
> isolated from other instances of SparkIMain
> 4. Build a queue inside interpreter to allow only one paragraph execution
> at a time per notebook.
>
> I have tested 1, 2, and 3 and it seems to provide isolation across
> classnames. I'll work towards submitting a formal patch soon - Is there any
> Jira already for the same that I can uptake? Also I need to understand:
> 1. How does Zeppelin uptake Spark fixes? OR do I need to first work
> towards getting Spark changes merged in Apache Spark github?
>
> Any suggestions on comments on the proposal are highly welcome.
>
> Regards,
> -Pranav.
>
> On 10/08/15 11:36 pm, moon soo Lee wrote:
>
>> Hi piyush,
>>
>> Separate instance of SparkILoop SparkIMain for each notebook while
>> sharing the SparkContext sounds great.
>>
>> Actually, i tried to do it, found problem that multiple SparkILoop could
>> generates the same class name, and spark executor confuses classname since
>> they're reading classes from single SparkContext.
>>
>> If someone can share about the idea of sharing single SparkContext
>> through multiple SparkILoop safely, it'll be really helpful.
>>
>> Thanks,
>> moon
>>
>>
>> On Mon, Aug 10, 2015 at 1:21 AM Piyush Mukati (Data Platform) <
>> piyush.muk...@flipkart.com <mailto:piyush.muk...@flipkart.com>> wrote:
>>
>>     Hi Moon,
>>     Any suggestion on it, have to wait lot when multiple people  working
>> with spark.
>>     Can we create separate instance of   SparkILoop  SparkIMain and
>> printstrems  for each notebook while sharing theSparkContext
>> ZeppelinContext   SQLContext and DependencyResolver and then use parallel
>> scheduler ?
>>     thanks
>>
>>     -piyush
>>
>>     Hi Moon,
>>
>>     How about tracking dedicated SparkContext for a notebook in Spark's
>>     remote interpreter - this will allow multiple users to run their spark
>>     paragraphs in parallel. Also, within a notebook only one paragraph is
>>     executed at a time.
>>
>>     Regards,
>>     -Pranav.
>>
>>
>>     On 15/07/15 7:15 pm, moon soo Lee wrote:
>>     > Hi,
>>     >
>>     > Thanks for asking question.
>>     >
>>     > The reason is simply because of it is running code statements. The
>>     > statements can have order and dependency. Imagine i have two
>> paragraphs
>>     >
>>     > %spark
>>     > val a = 1
>>     >
>>     > %spark
>>     > print(a)
>>     >
>>     > If they're not running one by one, that means they possibly runs in
>>     > random order and the output will be always different. Either '1' or
>>     > 'val a can not found'.
>>     >
>>     > This is the reason why. But if there are nice idea to handle this
>>     > problem i agree using parallel scheduler would help a lot.
>>     >
>>     > Thanks,
>>     > moon
>>     > On 2015년 7월 14일 (화) at 오후 7:59 linxi zeng
>>     > <linxizeng0...@gmail.com  <mailto:linxizeng0...@gmail.com>
>> <mailto:linxizeng0...@gmail.com  <mailto:linxizeng0...@gmail.com>>>
>> wrote:
>>     >
>>     >     any one who have the same question with me? or this is not a
>> question?
>>     >
>>     >     2015-07-14 11:47 GMT+08:00 linxi zeng <linxizeng0...@gmail.com
>> <mailto:linxizeng0...@gmail.com>
>>     >     <mailto:linxizeng0...@gmail.com  <mailto:
>> linxizeng0...@gmail.com>>>:
>>     >
>>     >         hi, Moon:
>>     >            I notice that the getScheduler function in the
>>     >         SparkInterpreter.java return a FIFOScheduler which makes the
>>     >         spark interpreter run spark job one by one. It's not a good
>>     >         experience when couple of users do some work on zeppelin at
>>     >         the same time, because they have to wait for each other.
>>     >         And at the same time, SparkSqlInterpreter can chose what
>>     >         scheduler to use by "zeppelin.spark.concurrentSQL".
>>     >         My question is, what kind of consideration do you based on
>> to
>>     >         make such a decision?
>>     >
>>     >
>>
>>
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------
>>
>>     This email and any files transmitted with it are confidential and
>>     intended solely for the use of the individual or entity to whom
>>     they are addressed. If you have received this email in error
>>     please notify the system manager. This message contains
>>     confidential information and is intended only for the individual
>>     named. If you are not the named addressee you should not
>>     disseminate, distribute or copy this e-mail. Please notify the
>>     sender immediately by e-mail if you have received this e-mail by
>>     mistake and delete this e-mail from your system. If you are not
>>     the intended recipient you are notified that disclosing, copying,
>>     distributing or taking any action in reliance on the contents of
>>     this information is strictly prohibited. Although Flipkart has
>>     taken reasonable precautions to ensure no viruses are present in
>>     this email, the company cannot accept responsibility for any loss
>>     or damage arising from the use of this email or attachments
>>
>>
>

Re: why zeppelin SparkInterpreter use FIFOScheduler

Reply via email to