Would this allows us to have multiple instances of spark, or jdbc defined in the interpreters? I think one of the big things I am looking for is to have
%drill -> %jdbc with settings for my drill cluster %oracledw -> %jdbc with settings for my oracle data warehouse %sparkprod -> %spark with settings for my spark cluster in production %sparkprod.sql %sparkprod.pyspark %sparkdev - %spark with settings for my spark cluster in dev %sparkdev.sql %sparkprod.pyspark Would this be possible? It's intuitive from the user, and provides a level of granularity for the administrator. John On Fri, Apr 29, 2016 at 5:23 PM, moon soo Lee <m...@apache.org> wrote: > Hi, > > Thanks John and DuyHai for sharing the idea. > I can clearly see demands for using alias instead of static interpreter > name. > > How about save static interpreter name (e.g. spark.sql) in each pargraph > of note.json and allow alias? > > For example, if i have 2 interpreter settings in interpreter.json, > 'spark-dev', 'spark-cluster', i can select interpreter in my paragraph > such as > > '%spark-dev' > '%spark-cluster' > '%spark-dev.sql' > '%spark-cluster.sql' > > Once user run paragraph, zeppelin insert information of static interpreter > name into the paragraph. for example > > paragraphs : [ > { > text : "%spark-dev ....", > interpreter : "spark.spark", > ... > }, > { > text : "%spark-cluster.sql ...", > interpreter: "spark.sql", > ... > } > ] > > When Zeppelin imports notebook, Zeppelin can use information of static > interpreter name from paragraphs.interpreter to suggest possible > interpreter setting from interpreter.json. > > I expect every zeppelin will have different property for any interpreter > while their system and cluster configuration will be different. So instead > of embedding all interpreter setting in note.json, embedding only static > interpreter name in note.json would be more simple and practical. > > What do you think? > > Thanks, > moon > > > On Fri, Apr 29, 2016 at 11:15 AM DuyHai Doan <doanduy...@gmail.com> wrote: > >> I would agree with John Omernik point about portability of JSON notes >> because of the strong dependency with configured interpreters. >> >> Which gives me an idea: what's about "exporting" interpreters config into >> note.json file ? >> >> Let's say your note has 10 paragraphs but they are just using 3 different >> interpreters. Upon export, we will just fetch the current config for those >> 3 interpreters and save them in the note.json >> >> On import of the note.json, there is more work to do: >> >> - if there are already 3 interpreters as the one saved in the note.json, >> check the current config >> - if the config match, import the note >> - else, ask the user with a dialog if they want to 1) use current >> interpreter conf, 2) override current interpreter conf with the ones in the >> note.json or 3) try to merge configuration >> >> - if for each 3 interpreters in the note.json, there is no matching >> interpreter instance, propose the user to create one for him by using the >> config saved in the note >> >> And for backward compatibility with old note.json format, on import if we >> don't find any info related to interpreter we just skip the whole config >> checking step above >> >> What do you think ? It's a little bit complex but I guess it will help >> greatly portability. I'm not saying it's necessarily easy and indeed is >> require a lot of code change but I'm just throwing some ideas to feed the >> discussion >> >> >> >> On Fri, Apr 29, 2016 at 5:41 PM, John Omernik <j...@omernik.com> wrote: >> >>> Moon - >>> >>> I would be curious on your thoughts on my email from April 12th. >>> >>> John >>> >>> >>> >>> On Tue, Apr 12, 2016 at 7:11 AM, John Omernik <j...@omernik.com> wrote: >>> >>>> I would actually argue that if the user doesn't have access to the same >>>> or a similar interpreter.json file, than notebook file portability is a >>>> moot point. For example, if I setup %spark or %jdbc in my environment, >>>> create a notebook, that notebook is not any more or less portable than if I >>>> had %myspark or %drill (a jdbc interpreter). Mainly, because if someone >>>> tries to open that notebook, and they don't have my setup of %spark or of >>>> %jdbc, they can't run the notebook. If we could allow the user to create >>>> an alias for an instance of an interpreter, and that alias information was >>>> stored in interpreter.json, then the portability of the notebook would >>>> essentially that the same. >>>> >>>> Said another way: >>>> >>>> Static interpreter invocation: (%jdbc, %pysaprk, %psql): >>>> - This notebook is 100% dependent on the interpreter.json in order to >>>> run. %jdbc may point to Drill, %pyspark may point to an authenticated YARN >>>> instance (specific to the user/org), %psql may point to an authenticated >>>> Postgres instance unique to the org/user. Without interpreter.json, this >>>> notebook is not portable. >>>> >>>> Aliases for interpreter invocation stored in interpreter.json (%drill >>>> -> jdbc with settings, %datesciencespark -> pyspark for the data science >>>> group, %entdw -> postgres server, enterprise datawarehouse) >>>> >>>> - Thus notebook is still 100% dependent on the interpreter.json file in >>>> order to run. There is no more or less dependance on the interpreter.json >>>> (if these aliases are stored there) then there is if Zeppelin is using >>>> static interpreter invocation, thus portability is not a benefit of the >>>> static method, and the aliased method can provide a good deal of analyst >>>> agility/definition in a multi data set/source environment. >>>> >>>> >>>> My thought is we should people to create new interpreters of known >>>> types, and on creation of these interpreters allow the invocation to be >>>> stored in the interpreter.json. Also, if a new interpreter is registered, >>>> it would follow the same interpreter group methodology. Thus if I setup a >>>> new %spark to be %entspark, then the sub interpreters (pyspark, sparksql >>>> etc) can be there and have access to the master entspark, and also can be >>>> renamed. so that sub interpreter can be renamed, and the access it has to >>>> interpreter group is based on the parent child relationship, not just by >>>> name... >>>> >>>> Thoughts? >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Feb 5, 2016 at 2:15 PM, Zhong Wang <wangzhong....@gmail.com> >>>> wrote: >>>> >>>>> Thanks moon - it is good to know the ideas behind the design. It makes >>>>> a lot more sense to use system-defined identifiers in order to make the >>>>> notebook portable. >>>>> >>>>> Currently, I can name the interpreter in the WebUI, but actually the >>>>> name doesn't help to distinguish between my spark interpreters, which is >>>>> quite confusing to me. I am not sure whether this is a better way: >>>>> -- >>>>> 1. the UI generates the default identifier for the first spark >>>>> interpreter, which is %spark >>>>> 2. when the user creates another spark interpreter, the UI asks the >>>>> users to provide a user-defined identifier >>>>> >>>>> Zhong >>>>> >>>>> On Fri, Feb 5, 2016 at 12:02 AM, moon soo Lee <m...@apache.org> wrote: >>>>> >>>>>> In the initial stage of development, there were discussion about >>>>>> %xxx, where xxx should be user defined interpreter identifier, or >>>>>> should be a static interpreter identifier. >>>>>> >>>>>> And decided to go later one. Because we wanted keep notebook file >>>>>> portable. i.e. Let run imported note.json file from other Zeppelin >>>>>> instance >>>>>> without (or minimum) modification. >>>>>> >>>>>> if we use user defined identifier, running imported notebook will not >>>>>> be very simple. This is why %xxx is not using user defined interpreter >>>>>> identifier at the moment. >>>>>> >>>>>> If you have any other thoughts, ideas, please feel free to share. >>>>>> >>>>>> Thanks, >>>>>> moon >>>>>> >>>>>> On Fri, Feb 5, 2016 at 3:58 PM Zhong Wang <wangzhong....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks, Moon! I got it worked. The reason why it didn't work is that >>>>>>> I tried to use both of the spark interpreters inside one notebook. I >>>>>>> think >>>>>>> I can create different notebooks for each interpreters, but it would be >>>>>>> great if we could use "%xxx", where xxx is the user defined interpreter >>>>>>> identifier, to identify different interpreters for different paragraphs. >>>>>>> >>>>>>> Besides, because currently both of the interpreters are using >>>>>>> "spark" as the identifier, they share the same log file. I am not sure >>>>>>> whether there are other cases they interfere with each other. >>>>>>> >>>>>>> Thanks, >>>>>>> Zhong >>>>>>> >>>>>>> On Thu, Feb 4, 2016 at 9:04 PM, moon soo Lee <m...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Once you create another spark interpreter in Interpreter menu of >>>>>>>> GUI, >>>>>>>> then each notebook should able to select and use it (setting icon >>>>>>>> on top right corner of each notebook). >>>>>>>> >>>>>>>> If it does not work, could you find error message on the log file? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> moon >>>>>>>> >>>>>>>> On Fri, Feb 5, 2016 at 11:54 AM Zhong Wang <wangzhong....@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi zeppelin pilots, >>>>>>>>> >>>>>>>>> I am trying to run multiple spark interpreters in the same >>>>>>>>> Zeppelin instance. This is very helpful if the data comes from >>>>>>>>> multiple >>>>>>>>> spark clusters. >>>>>>>>> >>>>>>>>> Another useful use case is that, run one instance in cluster mode, >>>>>>>>> and another in local mode. This will significantly boost the >>>>>>>>> performance of >>>>>>>>> small data analysis. >>>>>>>>> >>>>>>>>> Is there anyway to run multiple spark interpreters? I tried to >>>>>>>>> create another spark interpreter with a different identifier, which is >>>>>>>>> allowed in UI. But it doesn't work (shall I file a ticket?) >>>>>>>>> >>>>>>>>> I am now trying running multiple sparkContext in the same spark >>>>>>>>> interpreter. >>>>>>>>> >>>>>>>>> Zhong >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> >>> >>