Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

xiufeng liu Wed, 13 Jul 2016 11:19:19 -0700

It is easy to change the code. I did myself and use it as an ETL tool. It
is very powerful


Afancy

On Wednesday, July 13, 2016, Ahmed Sobhi <ahmed.so...@gmail.com> wrote:

> I think this pr addresses what I need. Case 2 seem to describe the issue
> I'm having if I'm reading it correctly.
>
> The proposed solution, however, is not that clear to me.
>
> Is it that you define workflows where a work flow is a sequence of
> (notebook, paragraph) pairs that are to be run in a specific order?
> If that's the case, then this definitely solves my problem, but it's
> really cumbersome from a usability point of view. I think a better solution
> for my use case is to just have an option to run all paragraphs in the
> order they appear in on the notebook, regardless of which interpreter they
> use.
>
> On Wed, Jul 13, 2016 at 12:31 PM, Hyung Sung Shim <hss...@nflabs.com
> <javascript:_e(%7B%7D,'cvml','hss...@nflabs.com');>> wrote:
>
>> hi.
>> Maybe https://github.com/apache/zeppelin/pull/1176 is related what you
>> want.
>> Please check this pr.
>>
>> 2016년 7월 13일 수요일, xiufeng liu<toxiuf...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','toxiuf...@gmail.com');>>님이 작성한 메시지:
>>
>> You have to change the source codes to add the dependencies of running
>>> paragraphs. I think it is a really interesting feature, for example, it can
>>> be use as an ETL tool. But, unfortunately, there is no configuration option
>>> right now.
>>>
>>> /afancy
>>>
>>> On Wed, Jul 13, 2016 at 12:27 PM, Ahmed Sobhi <ahmed.so...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have been working on a large Spark Scala notebook. I recently had the
>>>> requirement to produce graphs/plots out of these data. Python and PySpark
>>>> seemed like a natural fit but since I've already invested a lot of time and
>>>> effort into the Scala version, I want to restrict my usage of python to
>>>> just plotting.
>>>>
>>>> I found a good workflow for where in the scala paragraphs I can use 
>>>> *registerTempTable
>>>> *and in python I can just use *sqlContext.table *to retrieve that
>>>> table.
>>>>
>>>> The problem now is that if I try to run all paragraphs to get the
>>>> notebook updated, the python paragraphs fail because they are running
>>>> before the scala ones eventhough they are placed after them.
>>>>
>>>> It seems like the behavior in Zeppelin is that it attempts to run the
>>>> paragraphs concurrently if they were running on different interpreters
>>>> which might seem fine on the surface. But now that I want to introduce some
>>>> dependency between spark/pyspark paragraphs, is there any way to do that?
>>>>
>>>> --
>>>> Cheers,
>>>> Ahmed
>>>>
>>>
>>>
>
>
> --
> Cheers,
> Ahmed
> http://bit.ly/ahmed_abtme <http://about.me/humanzz>
>

Re: Order of paragraphs vs. different interpreters (spark vs. pyspark)

Reply via email to