Re: Multiple spark interpreters in the same Zeppelin instance

John Omernik Mon, 02 May 2016 04:51:07 -0700

Would this allows us to have multiple instances of spark, or jdbc defined
in the interpreters?  I think one of the big things I am looking for is to
have


%drill -> %jdbc with settings for my drill cluster
%oracledw -> %jdbc with settings for my oracle data warehouse
%sparkprod -> %spark with settings for my spark cluster in production
   %sparkprod.sql
   %sparkprod.pyspark
%sparkdev - %spark with settings for my spark cluster in dev
  %sparkdev.sql
  %sparkprod.pyspark

Would this be possible? It's intuitive from the user, and provides a level
of granularity for the administrator.

John






On Fri, Apr 29, 2016 at 5:23 PM, moon soo Lee <m...@apache.org> wrote:

> Hi,
>
> Thanks John and DuyHai for sharing the idea.
> I can clearly see demands for using alias instead of static interpreter
> name.
>
> How about save static interpreter name (e.g. spark.sql) in each pargraph
> of note.json and allow alias?
>
> For example, if i have 2 interpreter settings in interpreter.json,
> 'spark-dev', 'spark-cluster',  i can select interpreter in my paragraph
> such as
>
> '%spark-dev'
> '%spark-cluster'
> '%spark-dev.sql'
> '%spark-cluster.sql'
>
> Once user run paragraph, zeppelin insert information of static interpreter
> name into the paragraph. for example
>
> paragraphs : [
>    {
>          text : "%spark-dev  ....",
>          interpreter : "spark.spark",
>          ...
>    },
>    {
>        text : "%spark-cluster.sql ...",
>        interpreter: "spark.sql",
>        ...
>    }
> ]
>
> When Zeppelin imports notebook, Zeppelin can use information of static
> interpreter name from paragraphs.interpreter to suggest possible
> interpreter setting from interpreter.json.
>
> I expect every zeppelin will have different property for any interpreter
> while their system and cluster configuration will be different. So instead
> of embedding all interpreter setting in note.json, embedding only static
> interpreter name in note.json would be more simple and practical.
>
> What do you think?
>
> Thanks,
> moon
>
>
> On Fri, Apr 29, 2016 at 11:15 AM DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> I would agree with John Omernik point about portability of JSON notes
>> because of the strong dependency with configured interpreters.
>>
>> Which gives me an idea: what's about "exporting" interpreters config into
>> note.json file ?
>>
>> Let's say your note has 10 paragraphs but they are just using 3 different
>> interpreters. Upon export, we will just fetch the current config for those
>> 3 interpreters and save them in the note.json
>>
>> On import of the note.json, there is more work to do:
>>
>> - if there are already 3 interpreters as the one saved in the note.json,
>> check the current config
>>    - if the config match, import the note
>>    - else, ask the user with a dialog if they want to 1) use current
>> interpreter conf, 2) override current interpreter conf with the ones in the
>> note.json or 3) try to merge configuration
>>
>> - if for each 3 interpreters in the note.json, there is no matching
>> interpreter instance, propose the user to create one for him by using the
>> config saved in the note
>>
>> And for backward compatibility with old note.json format, on import if we
>> don't find any info related to interpreter we just skip the whole config
>> checking step above
>>
>> What do you think ? It's a little bit complex but I guess it will help
>> greatly portability. I'm not saying it's necessarily easy and indeed is
>> require a lot of code change but I'm just throwing some ideas to feed the
>> discussion
>>
>>
>>
>> On Fri, Apr 29, 2016 at 5:41 PM, John Omernik <j...@omernik.com> wrote:
>>
>>> Moon -
>>>
>>> I would be curious on your thoughts on my email from April 12th.
>>>
>>> John
>>>
>>>
>>>
>>> On Tue, Apr 12, 2016 at 7:11 AM, John Omernik <j...@omernik.com> wrote:
>>>
>>>> I would actually argue that if the user doesn't have access to the same
>>>> or a similar interpreter.json file, than notebook file portability is a
>>>> moot point. For example, if I setup %spark or %jdbc in my environment,
>>>> create a notebook, that notebook is not any more or less portable than if I
>>>> had %myspark or %drill (a jdbc  interpreter).  Mainly, because if someone
>>>> tries to open that notebook, and they don't have my setup of %spark or of
>>>> %jdbc, they can't run the notebook.  If we could allow the user to create
>>>> an alias for an instance of an interpreter, and that alias information was
>>>> stored in interpreter.json, then the portability of the notebook would
>>>> essentially that the same.
>>>>
>>>> Said another way:
>>>>
>>>> Static interpreter invocation: (%jdbc, %pysaprk, %psql):
>>>> - This notebook is 100% dependent on the interpreter.json in order to
>>>> run. %jdbc may point to Drill, %pyspark may point to an authenticated YARN
>>>> instance (specific to the user/org), %psql may point to an authenticated
>>>> Postgres instance unique to the org/user.  Without interpreter.json, this
>>>> notebook is not portable.
>>>>
>>>> Aliases for interpreter invocation stored in interpreter.json (%drill
>>>> -> jdbc with settings, %datesciencespark -> pyspark for the data science
>>>> group, %entdw -> postgres server, enterprise datawarehouse)
>>>>
>>>> - Thus notebook is still 100% dependent on the interpreter.json file in
>>>> order to run. There is no more or less dependance on the interpreter.json
>>>> (if these aliases are stored there) then there is if Zeppelin is using
>>>> static interpreter invocation, thus portability is not a benefit of the
>>>> static method, and the aliased method can provide a good deal of analyst
>>>> agility/definition in a multi data set/source environment.
>>>>
>>>>
>>>> My thought is we should people to create new interpreters of known
>>>> types, and on creation of these interpreters allow the invocation to be
>>>> stored in the interpreter.json. Also, if a new interpreter is registered,
>>>> it would follow the same interpreter group methodology. Thus if I setup a
>>>> new %spark to be %entspark, then the sub interpreters (pyspark, sparksql
>>>> etc) can be there and have access to the master entspark, and also can be
>>>> renamed.  so that sub interpreter can be renamed, and the access it has to
>>>> interpreter group is based on the parent child relationship, not just by
>>>> name...
>>>>
>>>> Thoughts?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wangzhong....@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks moon - it is good to know the ideas behind the design. It makes
>>>>> a lot more sense to use system-defined identifiers in order to make the
>>>>> notebook portable.
>>>>>
>>>>> Currently, I can name the interpreter in the WebUI, but actually the
>>>>> name doesn't help to distinguish between my spark interpreters, which is
>>>>> quite confusing to me. I am not sure whether this is a better way:
>>>>> --
>>>>> 1. the UI generates the default identifier for the first spark
>>>>> interpreter, which is %spark
>>>>> 2. when the user creates another spark interpreter, the UI asks the
>>>>> users to provide a user-defined identifier
>>>>>
>>>>> Zhong
>>>>>
>>>>> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <m...@apache.org> wrote:
>>>>>
>>>>>> In the initial stage of development, there were discussion about
>>>>>> %xxx, where xxx should be user defined interpreter identifier, or
>>>>>> should be a static interpreter identifier.
>>>>>>
>>>>>> And decided to go later one. Because we wanted keep notebook file
>>>>>> portable. i.e. Let run imported note.json file from other Zeppelin 
>>>>>> instance
>>>>>> without (or minimum) modification.
>>>>>>
>>>>>> if we use user defined identifier, running imported notebook will not
>>>>>> be very simple. This is why %xxx is not using user defined interpreter
>>>>>> identifier at the moment.
>>>>>>
>>>>>> If you have any other thoughts, ideas, please feel free to share.
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wangzhong....@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks, Moon! I got it worked. The reason why it didn't work is that
>>>>>>> I tried to use both of the spark interpreters inside one notebook. I 
>>>>>>> think
>>>>>>> I can create different notebooks for each interpreters, but it would be
>>>>>>> great if we could use "%xxx", where xxx is the user defined interpreter
>>>>>>> identifier, to identify different interpreters for different paragraphs.
>>>>>>>
>>>>>>> Besides, because currently both of the interpreters are using
>>>>>>> "spark" as the identifier, they share the same log file. I am not sure
>>>>>>> whether there are other cases they interfere with each other.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Zhong
>>>>>>>
>>>>>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <m...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Once you create another spark interpreter in Interpreter menu of
>>>>>>>> GUI,
>>>>>>>> then each notebook should able to select and use it (setting icon
>>>>>>>> on top right corner of each notebook).
>>>>>>>>
>>>>>>>> If it does not work, could you find error message on the log file?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> moon
>>>>>>>>
>>>>>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wangzhong....@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi zeppelin pilots,
>>>>>>>>>
>>>>>>>>> I am trying to run multiple spark interpreters in the same
>>>>>>>>> Zeppelin instance. This is very helpful if the data comes from 
>>>>>>>>> multiple
>>>>>>>>> spark clusters.
>>>>>>>>>
>>>>>>>>> Another useful use case is that, run one instance in cluster mode,
>>>>>>>>> and another in local mode. This will significantly boost the 
>>>>>>>>> performance of
>>>>>>>>> small data analysis.
>>>>>>>>>
>>>>>>>>> Is there anyway to run multiple spark interpreters? I tried to
>>>>>>>>> create another spark interpreter with a different identifier, which is
>>>>>>>>> allowed in UI. But it doesn't work (shall I file a ticket?)
>>>>>>>>>
>>>>>>>>> I am now trying running multiple sparkContext in the same spark
>>>>>>>>> interpreter.
>>>>>>>>>
>>>>>>>>> Zhong
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>

Re: Multiple spark interpreters in the same Zeppelin instance

Reply via email to