Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

Ahmed Sobhi Wed, 13 Jul 2016 10:58:03 -0700

I think this pr addresses what I need. Case 2 seem to describe the issue
I'm having if I'm reading it correctly.


The proposed solution, however, is not that clear to me.

Is it that you define workflows where a work flow is a sequence of
(notebook, paragraph) pairs that are to be run in a specific order?
If that's the case, then this definitely solves my problem, but it's really
cumbersome from a usability point of view. I think a better solution for my
use case is to just have an option to run all paragraphs in the order they
appear in on the notebook, regardless of which interpreter they use.

On Wed, Jul 13, 2016 at 12:31 PM, Hyung Sung Shim <hss...@nflabs.com> wrote:

> hi.
> Maybe https://github.com/apache/zeppelin/pull/1176 is related what you
> want.
> Please check this pr.
>
> 2016년 7월 13일 수요일, xiufeng liu<toxiuf...@gmail.com>님이 작성한 메시지:
>
> You have to change the source codes to add the dependencies of running
>> paragraphs. I think it is a really interesting feature, for example, it can
>> be use as an ETL tool. But, unfortunately, there is no configuration option
>> right now.
>>
>> /afancy
>>
>> On Wed, Jul 13, 2016 at 12:27 PM, Ahmed Sobhi <ahmed.so...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have been working on a large Spark Scala notebook. I recently had the
>>> requirement to produce graphs/plots out of these data. Python and PySpark
>>> seemed like a natural fit but since I've already invested a lot of time and
>>> effort into the Scala version, I want to restrict my usage of python to
>>> just plotting.
>>>
>>> I found a good workflow for where in the scala paragraphs I can use 
>>> *registerTempTable
>>> *and in python I can just use *sqlContext.table *to retrieve that table.
>>>
>>> The problem now is that if I try to run all paragraphs to get the
>>> notebook updated, the python paragraphs fail because they are running
>>> before the scala ones eventhough they are placed after them.
>>>
>>> It seems like the behavior in Zeppelin is that it attempts to run the
>>> paragraphs concurrently if they were running on different interpreters
>>> which might seem fine on the surface. But now that I want to introduce some
>>> dependency between spark/pyspark paragraphs, is there any way to do that?
>>>
>>> --
>>> Cheers,
>>> Ahmed
>>>
>>
>>


-- 
Cheers,
Ahmed
http://bit.ly/ahmed_abtme <http://about.me/humanzz>

Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

Reply via email to