'p1', 'p2' is paragraphId. Regarding the readability, we could allow user
to set paragraph name, but this is another story, could be an improvement
later.



Partridge, Lucas (GE Aviation) <lucas.partri...@ge.com>于2017年9月29日周五
下午7:30写道:

> Interesting idea.  But by ‘p1’, ‘p2’, etc did you literally mean that; or
> were you using that as shorthand for the id of the paragraph?
>
> If the former then what happens if someone inserts, deletes or reorders
> paragraphs? But if the latter then the paragraph ids wouldn’t be very easy
> for someone to read and follow the dependency relationships…
>
>
>
> *From:* Jeff Zhang [mailto:zjf...@gmail.com]
> *Sent:* 29 September 2017 11:58
> *To:* users@zeppelin.apache.org
> *Subject:* EXT: Re: Implementing run all paragraphs sequentially
>
>
>
>
>
> I don't think 2 note setting (parallel/sequential) is sufficient for
> paragraph scheduling (take the spark tutorial note as an example, we should
> run the loading bank data paragraph first and then could run all the sql
> paragraph parallelly).  So the key is how we define the dependency
> relationship between paragraphs.  Paragraphs of note could build a DAG
> (directed acyclic graph). Sequential running is just one special kind of
> DAG (a linked list).
>
>
>
> I believe we discuss it before in community.  My proposal is that we could
> add attribute to the interpreter indicator of each paragraph, so that user
> can specify the paragraph's dependency (If user don't specify it, the
> default dependency is the paragraph ahead of it).  Still take the spark
> tutorial note as an example. We have 3 paragraphes, the first one will load
> bank data, and the second, third paragraph will query the data. So
> paragraph 2,3 can run parallelly but must run after paragraph 1. Then we
> need to specify their dependency in the interpreter indicator part.  Of
> course, user don't need to specify dependencies if the want to run all the
> paragraphes sequentially, because the default dependencies is the paragraph
> ahead of it.
>
>
>
> Paragraph 1.
>
>
>
> %spark
>
> // code to load bank data
>
>
>
> Paragraph 2.
>
>
>
> %spark.sql(deps=p1)
>
> // query the bank data
>
>
>
> Paragraph 3.
>
> %spark.sql(deps=p1)
>
> // query the bank data
>
>
>
>
>
>
>
>
>
> afancy <grou...@gmail.com>于2017年9月29日周五 下午5:35写道:
>
> +1
>
> I think this is one of the most important features. don't know why this
> requirement has been skipped.
>
>
>
> /afancy
>
>
>
> On Thu, Sep 28, 2017 at 5:28 PM, Belousov Maksim Eduardovich <
> m.belou...@tinkoff.ru> wrote:
>
> Hello, users!
>
> At the moment our analysts often use mixes of interpreters in their notes.
>
> For example, they prepare data using %jdbc and then use it in %pyspark.
> Besides, they often use scheduling to make some regular reporting. And they
> should do something like `time.sleep()` to wait for the data from %jdbc. It
> doesn`t guarantee the result and doesn`t look cool.
>
>
>
> You can find early attempts to implement sequential running of all
> paragraphs in [1].
>
> We are really interested in implementation of the issue [2] and are ready
> to solve it.
>
> It seems a good idea to discuss any requirements.
>
> My idea is to introduce note setting that defines the type of running to
> use (parallel or sequential) and leave "Run all" to be the only button
> running all the cells in the note. This will make sequential or parallel
> running the `note option` but not `run option`.
>
> Option will be controlled by nearby button as shown
>
> [image:
> https://lh6.googleusercontent.com/jwnb7xfb0fPbFg1CWPoMSqovu7ecSMv4pJfuP4zdKVZbyAUDwzAT2GJ5EiemXVYrqMW73yklemTpjXNyLRJABpTCoHi6us2ZI_AxWKHwZpBEA7MjpMP0-7Nk8saaJQfIF4yBMPfS]
>
>
>
>
>
> For new notes the default state would be "Run sequential all", for old -
> "Run parallel for interpreters"
>
> We are glad to hear any thoughts.
>
> Thank you.
>
>
>
> [1] https://issues.apache.org/jira/browse/ZEPPELIN-1165
>
> [2] https://issues.apache.org/jira/browse/ZEPPELIN-2368
>
>
>
>
>
>
> *Maksim Belousov*
>
>
>
>
>
>

Reply via email to