bdbr...@gmail.com> wrote:
>>
>> Hi All,
>> I have made a first implementation that allows tracking of lineage in
>> Airflow and integration with Apache Atlas.
>
>
> Snip
>
>>
>>
>> I’m looking forward to your comments!
>>
>
On Sat, 5 May 2018 at 23.49, Bolke de Bruin <bdbr...@gmail.com> wrote:
> Hi All,
> I have made a first implementation that allows tracking of lineage in
> Airflow and integration with Apache Atlas.
Snip
>
>
> I’m looking forward to your comments!
>
> https://githu
On Sun, May 6, 2018 at 7:05 PM, Bolke de Bruin <bdbr...@gmail.com> wrote:
>
> Apache Atlas is agnostic and can receive lineage info by rest API (used in
> my implementation) and Kafk topic. It does also come with a lot of
> connectors out of the box that tie into the hadoop
ould look like:
>
> s3_file = File("s3a://bucket/key")
> Inlets = {"datasets:" [s3_file,]}
>
> Obviously if you do something with the s3 file outside of Airflow you need
> to track lineage yourself somehow.
>
> B.
>
> Sent from my iPhone
>
> &g
Forgot to answer your question for S3 it could look like:
s3_file = File("s3a://bucket/key")
Inlets = {"datasets:" [s3_file,]}
Obviously if you do something with the s3 file outside of Airflow you need to
track lineage yourself somehow.
B.
Sent from my iPhone
>
Hi Gerardo,
Any lineage tracking system is dependent on how much data you can give it. So
if you do transfers outside of the 'view' such a system has then lineage
information is gone. Airflow can help in this area by tracking its internal
lineage and providing that to those lineage systems
Hi Bolke,
Data lineage support sounds very interesting.
I'm not very familiar with Atlas but first sight seems like a tool specific
to the Hadoop ecosystem. How would this look like if the files (inlets or
outlets) were stored on s3?.
An example of a service that manages a similar use case
...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I have made a first implementation that allows tracking of lineage in
>> Airflow and integration with Apache Atlas. It was inspired by Jeremiah’s
>> work in the past on Data Flow pipelines, but I think I kept
>
> Best
> Marcin
>
> On Sat, May 5, 2018, 22:49 Bolke de Bruin <bdbr...@gmail.com> wrote:
>
> > Hi All,
> >
> > I have made a first implementation that allows tracking of lineage in
> > Airflow and integration with Apache Atlas. It was inspi
ve made a first implementation that allows tracking of lineage in
> Airflow and integration with Apache Atlas. It was inspired by Jeremiah’s
> work in the past on Data Flow pipelines, but I think I kept it a little bit
> simpler.
>
> Operators now have two new parameters called “inlets” and “o
Hi All,
I have made a first implementation that allows tracking of lineage in Airflow
and integration with Apache Atlas. It was inspired by Jeremiah’s work in the
past on Data Flow pipelines, but I think I kept it a little bit simpler.
Operators now have two new parameters called “inlets
:48 PM, "Kate-Laurel Agnew" <kag...@signal.co>
> wrote:
> > >> >>
> > >> >>+1
> > >> >>
> > >> >>On Wed, Nov 29, 2017 at 12:09 AM, Koen Mevissen <
> > >> kmevis...@tra
gt; >> >>
> >> >>On Wed, Nov 29, 2017 at 12:09 AM, Koen Mevissen <
> >> kmevis...@travix.com>
> >> >>wrote:
> >> >>
> >> >>> +1
> >> >>>
> >> >>> I'm interested as well!
> >> >>>
&g
;>> Op di 28 nov. 2017 om 14:04 schreef Marc Bollinger <
>> >> m...@lumoslabs.com>
>> >>>
>> >>>> +1
>> >>>>
>> >>>> On Mon, Nov 27, 2017 at 6:18 PM, Ruslan Dautkhanov <
>> >> dautkha...@gmail
; >>>>
> >>>> On Mon, Nov 27, 2017 at 6:18 PM, Ruslan Dautkhanov <
> >> dautkha...@gmail.com
> >>>>
> >>>> wrote:
> >>>>
> >>>>> ‘’’
> >>>>> I'm
> >>>>> now
>
>>>>> ‘’’
>>>>> I'm
>>>>> now working on sql scanners, extractors and other tools that
>> allow me
>>> to
>>>>> populate the database
>>>>> ‘’’
>>>>>
>>>>> Very cool. Clo
t
> allow me
> > to
> > > > populate the database
> > > > ‘’’
> > > >
> > > > Very cool. Cloudera Navigator ( not an open source product) does
> this
> > too
> > >
m
> > > now working on sql scanners, extractors and other tools that allow me
> to
> > > populate the database
> > > ‘’’
> > >
> > > Very cool. Cloudera Navigator ( not an open source product) does this
> too
> > >
17 at 6:18 PM, Ruslan Dautkhanov <dautkha...@gmail.com
> >
> > wrote:
> >
> > > ‘’’
> > > I'm
> > > now working on sql scanners, extractors and other tools that allow me
> to
> > > populate the database
> > > ‘’’
> > >
> > >
ers, extractors and other tools that allow me to
> > populate the database
> > ‘’’
> >
> > Very cool. Cloudera Navigator ( not an open source product) does this too
> > to some extent - collect metadata and create data lineage automatically (
> > stored as a Solr c
I am very interested in hearing more about the data portal as well.
On Tue, Nov 28, 2017 at 1:15 PM, Radek Tomšej wrote:
> Hi I am interested too. I have tried to make a POC with Elasticsearch +
> Kibana so would be nice to share some experience.
>
>
> On 2017-11-28 00:45,
s this too
> to some extent - collect metadata and create data lineage automatically (
> stored as a Solr collection) by parsing sql queries.
>
> https://www.cloudera.com/documentation/enterprise/5-12-
> x/topics/datamgmt_extraction_indexing.html
>
>
>
> On Mon, Nov 27,
‘’’
I'm
now working on sql scanners, extractors and other tools that allow me to
populate the database
‘’’
Very cool. Cloudera Navigator ( not an open source product) does this too
to some extent - collect metadata and create data lineage automatically (
stored as a Solr collection) by parsing
+1, I miss the data portal!
Max
On Mon, Nov 27, 2017 at 5:33 PM, Ruslan Dautkhanov
wrote:
> +1
>
> Thank you
>
>
> On Mon, Nov 27, 2017 at 12:38 PM Gerard Toonstra
> wrote:
>
> > Hi all,
> >
> > So something that really drew my attention recently is
+1
Thank you
On Mon, Nov 27, 2017 at 12:38 PM Gerard Toonstra
wrote:
> Hi all,
>
> So something that really drew my attention recently is a "data portal" as
> described by a team from airbnb somewhere in May. The idea is basically a
> "facebook of data":
>
>
>
>
If there are particular questions about the Data Portal, I would be happy
to get a list of these and work on looping in the Data Portal folks from
Airbnb.
Cheers,
Gurer
On Mon, Nov 27, 2017 at 2:41 PM, Megan Kearl wrote:
> I'm interested too
>
> On Nov 27, 2017 3:26 PM,
I'm interested too
On Nov 27, 2017 3:26 PM, "Bolke de Bruin" wrote:
> Natuurlijk :-)
>
> Absolutely!
>
> Sent from my iPhone
>
> > On 27 Nov 2017, at 21:23, Chris Riccomini wrote:
> >
> > Interested
> >
> >> On Mon, Nov 27, 2017 at 12:07 PM, Kerr
Natuurlijk :-)
Absolutely!
Sent from my iPhone
> On 27 Nov 2017, at 21:23, Chris Riccomini wrote:
>
> Interested
>
>> On Mon, Nov 27, 2017 at 12:07 PM, Kerr Shireman wrote:
>>
>> I am interested. I remember being pretty excited when I read that
Interested
On Mon, Nov 27, 2017 at 12:07 PM, Kerr Shireman wrote:
> I am interested. I remember being pretty excited when I read that blog
> post.
> On Mon, Nov 27, 2017 at 2:00 PM Arthur Wiedmer
> wrote:
>
> > Likewise!
> >
> > Best,
> > Arthur
>
Likewise!
Best,
Arthur
On Mon, Nov 27, 2017 at 11:57 AM, Alison Stanton <
astan...@bankofknowledge.net> wrote:
> I'd like to be kept informed.
>
> Alison Stanton
>
> On Mon, Nov 27, 2017 at 1:53 PM, Laura Lorenz
> wrote:
>
> > We're definitely looking for something
I'd like to be kept informed.
Alison Stanton
On Mon, Nov 27, 2017 at 1:53 PM, Laura Lorenz
wrote:
> We're definitely looking for something like that here, so I would like to
> jump in on this discussion.
>
> Laura
>
> On Mon, Nov 27, 2017 at 2:38 PM, Gerard Toonstra
We're definitely looking for something like that here, so I would like to
jump in on this discussion.
Laura
On Mon, Nov 27, 2017 at 2:38 PM, Gerard Toonstra
wrote:
> Hi all,
>
> So something that really drew my attention recently is a "data portal" as
> described by a
Hi all,
So something that really drew my attention recently is a "data portal" as
described by a team from airbnb somewhere in May. The idea is basically a
"facebook of data":
https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
Unfortunately it looks like it's not
33 matches
Mail list logo