Thanks for sharing this Ruslan - I will take a look.

I agree that paragraphs can form tasks within a DAG.  My point was that
ideally a DAG could encompass multiple notes.  I.e. the completion of one
note triggers another and so on to complete an entire chain of dependent
tasks.

For example team A has a note that generates data set A*.  Teams B & C each
have notes that depend on A* to generate B* & C* for their specific
purposes.  It doesn't make sense for all of that to have to live in one
note, but they are all part of a single workflow.

Best,
--Ben

On Fri, May 19, 2017 at 9:02 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
wrote:

> Thanks for sharing this Ben.
>
> I agree Zeppelin is a better fit with tighter integration with Spark and
> built-in visualizations.
>
> We have pretty much standardized on pySpark, so here's one of the scripts
> we use internally
> to extract %pyspark, %sql and %md paragraphs into a standalone script
> (that can be scheduled in Airflow for example)
> https://github.com/Tagar/stuff/blob/master/znote.py (patches are welcome
> :-)
>
> Hope this helps.
>
> ps. In my opinion adding dependencies between paragraphs wouldn't be that
> hard for simple cases,
> and can be first step to define a DAG in Zeppelin directly. It would be
> really awesome if we see this type of
> integration in the future.
>
> Othewise I don't see much value if a whole note/ whole workflow would run
> as a single task in Airflow.
> In my opinion, each paragraph has to be a task... then it'll be very
> useful.
>
>
> Thanks,
> Ruslan
>
>
> On Fri, May 19, 2017 at 4:55 PM, Ben Vogan <b...@shopkick.com> wrote:
>
>> I do not expect the relationship between DAGs to be described in Zeppelin
>> - that would be done in Airflow.  It just seems that Zeppelin is such a
>> great tool for a data scientists workflow that it would be nice if once
>> they are done with the work the note could be productionized directly.  I
>> could envision a couple of scenarios:
>>
>> 1. Using a zeppelin instance to run the note via the REST API.  The
>> instance could be containerized and spun up specifically for a DAG or it
>> could be a permanently available one.
>> 2. A note could be pulled from git and some part of the Zeppelin engine
>> could execute the note without the web UI at all.
>>
>> I would expect on the airflow side there to be some special operators for
>> executing these.
>>
>> If the scheduler is pluggable then it should be possible to create a plug
>> in that talks to the Airflow REST API.
>>
>> I happen to prefer Zeppelin to Jupyter - although I get your point about
>> both being python.  I don't really view that as a problem - most of the big
>> data platforms I'm talking to are implemented on the JVM after all.  The
>> python part of Airflow is really just describing what gets run and it isn't
>> hard to run something that isn't written in python.
>>
>> On Fri, May 19, 2017 at 2:52 PM, Ruslan Dautkhanov <dautkha...@gmail.com>
>> wrote:
>>
>>> We also use both Zeppelin and Airflow.
>>>
>>> I'm interested in hearing what others are doing here too.
>>>
>>> Although honestly there might be some challenges
>>> - Airflow expects a DAG structure, while a notebook has pretty linear
>>> structure;
>>> - Airflow is Python-based; Zeppelin is all Java (REST API might be of
>>> help?).
>>> Jupyter+Airflow might be a more natural fit to integrate?
>>>
>>> On top of that, the way we use Zeppelin is a lot of ad-hoc queries,
>>> while Airflow is for more finalized workflows I guess?
>>>
>>> Thanks for bringing this up.
>>>
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>> On Fri, May 19, 2017 at 2:20 PM, Ben Vogan <b...@shopkick.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are really enjoying the workflow of interacting with our data via
>>>> Zeppelin, but are not sold on using the built in cron scheduling
>>>> capability.  We would like to be able to create more complex DAGs that are
>>>> better suited for something like Airflow.  I was curious as to whether
>>>> anyone has done an integration of Zeppelin with Airflow.
>>>>
>>>> Either directly from within Zeppelin, or from the Airflow side.
>>>>
>>>> Thanks,
>>>> --
>>>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>>>
>>>> <http://www.shopkick.com/>
>>>> <https://www.facebook.com/shopkick>
>>>> <https://www.instagram.com/shopkick/>
>>>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>>>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>>>
>>>
>>>
>>
>>
>> --
>> *BENJAMIN VOGAN* | Data Platform Team Lead
>>
>> <http://www.shopkick.com/>
>> <https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
>> <https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
>> <https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>
>>
>
>


-- 
*BENJAMIN VOGAN* | Data Platform Team Lead

<http://www.shopkick.com/>
<https://www.facebook.com/shopkick> <https://www.instagram.com/shopkick/>
<https://www.pinterest.com/shopkick/> <https://twitter.com/shopkickbiz>
<https://www.linkedin.com/company-beta/831240/?pathWildcard=831240>

Reply via email to