Re: Lineage

2018-05-14 Thread Bolke de Bruin
bdbr...@gmail.com> wrote: >> >> Hi All, >> I have made a first implementation that allows tracking of lineage in >> Airflow and integration with Apache Atlas. > > > Snip > >> >> >> I’m looking forward to your comments! >> >

Re: Lineage

2018-05-14 Thread Jørn A Hansen
On Sat, 5 May 2018 at 23.49, Bolke de Bruin <bdbr...@gmail.com> wrote: > Hi All, > I have made a first implementation that allows tracking of lineage in > Airflow and integration with Apache Atlas. Snip > > > I’m looking forward to your comments! > > https://githu

Re: Lineage

2018-05-07 Thread Gerardo Curiel
On Sun, May 6, 2018 at 7:05 PM, Bolke de Bruin <bdbr...@gmail.com> wrote: > > Apache Atlas is agnostic and can receive lineage info by rest API (used in > my implementation) and Kafk topic. It does also come with a lot of > connectors out of the box that tie into the hadoop

Re: Lineage

2018-05-06 Thread Marcin Szymański
ould look like: > > s3_file = File("s3a://bucket/key") > Inlets = {"datasets:" [s3_file,]} > > Obviously if you do something with the s3 file outside of Airflow you need > to track lineage yourself somehow. > > B. > > Sent from my iPhone > > &g

Re: Lineage

2018-05-06 Thread Bolke de Bruin
Forgot to answer your question for S3 it could look like: s3_file = File("s3a://bucket/key") Inlets = {"datasets:" [s3_file,]} Obviously if you do something with the s3 file outside of Airflow you need to track lineage yourself somehow. B. Sent from my iPhone >

Re: Lineage

2018-05-06 Thread Bolke de Bruin
Hi Gerardo, Any lineage tracking system is dependent on how much data you can give it. So if you do transfers outside of the 'view' such a system has then lineage information is gone. Airflow can help in this area by tracking its internal lineage and providing that to those lineage systems

Re: Lineage

2018-05-06 Thread Gerardo Curiel
Hi Bolke, Data lineage support sounds very interesting. I'm not very familiar with Atlas but first sight seems like a tool specific to the Hadoop ecosystem. How would this look like if the files (inlets or outlets) were stored on s3?. An example of a service that manages a similar use case

Re: Lineage

2018-05-06 Thread Bolke de Bruin
...@gmail.com> wrote: >> >> Hi All, >> >> I have made a first implementation that allows tracking of lineage in >> Airflow and integration with Apache Atlas. It was inspired by Jeremiah’s >> work in the past on Data Flow pipelines, but I think I kept

Re: Lineage

2018-05-06 Thread Alex Tronchin-James 949-412-7220
> > Best > Marcin > > On Sat, May 5, 2018, 22:49 Bolke de Bruin <bdbr...@gmail.com> wrote: > > > Hi All, > > > > I have made a first implementation that allows tracking of lineage in > > Airflow and integration with Apache Atlas. It was inspi

Re: Lineage

2018-05-05 Thread Marcin Szymański
ve made a first implementation that allows tracking of lineage in > Airflow and integration with Apache Atlas. It was inspired by Jeremiah’s > work in the past on Data Flow pipelines, but I think I kept it a little bit > simpler. > > Operators now have two new parameters called “inlets” and “o

Lineage

2018-05-05 Thread Bolke de Bruin
Hi All, I have made a first implementation that allows tracking of lineage in Airflow and integration with Apache Atlas. It was inspired by Jeremiah’s work in the past on Data Flow pipelines, but I think I kept it a little bit simpler. Operators now have two new parameters called “inlets

Re: Data lineage and data portal

2017-12-06 Thread Gerard Toonstra
:48 PM, "Kate-Laurel Agnew" <kag...@signal.co> > wrote: > > >> >> > > >> >>+1 > > >> >> > > >> >>On Wed, Nov 29, 2017 at 12:09 AM, Koen Mevissen < > > >> kmevis...@tra

Re: Data lineage and data portal

2017-12-03 Thread Sam Elamin
gt; >> >> > >> >>On Wed, Nov 29, 2017 at 12:09 AM, Koen Mevissen < > >> kmevis...@travix.com> > >> >>wrote: > >> >> > >> >>> +1 > >> >>> > >> >>> I'm interested as well! > >> >>> &g

Re: Data lineage and data portal

2017-12-02 Thread Gerard Toonstra
;>> Op di 28 nov. 2017 om 14:04 schreef Marc Bollinger < >> >> m...@lumoslabs.com> >> >>> >> >>>> +1 >> >>>> >> >>>> On Mon, Nov 27, 2017 at 6:18 PM, Ruslan Dautkhanov < >> >> dautkha...@gmail

Re: Data lineage and data portal

2017-11-30 Thread Gerard Toonstra
; >>>> > >>>> On Mon, Nov 27, 2017 at 6:18 PM, Ruslan Dautkhanov < > >> dautkha...@gmail.com > >>>> > >>>> wrote: > >>>> > >>>>> ‘’’ > >>>>> I'm > >>>>> now

Re: Data lineage and data portal

2017-11-29 Thread Nathan McIntyre
> >>>>> ‘’’ >>>>> I'm >>>>> now working on sql scanners, extractors and other tools that >> allow me >>> to >>>>> populate the database >>>>> ‘’’ >>>>> >>>>> Very cool. Clo

Re: Data lineage and data portal

2017-11-29 Thread Alek Storm
t > allow me > > to > > > > populate the database > > > > ‘’’ > > > > > > > > Very cool. Cloudera Navigator ( not an open source product) does > this > > too > > >

Re: Data lineage and data portal

2017-11-29 Thread Igors Vaitkus
m > > > now working on sql scanners, extractors and other tools that allow me > to > > > populate the database > > > ‘’’ > > > > > > Very cool. Cloudera Navigator ( not an open source product) does this > too > > >

Re: Data lineage and data portal

2017-11-29 Thread Kate-Laurel Agnew
17 at 6:18 PM, Ruslan Dautkhanov <dautkha...@gmail.com > > > > wrote: > > > > > ‘’’ > > > I'm > > > now working on sql scanners, extractors and other tools that allow me > to > > > populate the database > > > ‘’’ > > > > > >

Re: Data lineage and data portal

2017-11-28 Thread Koen Mevissen
ers, extractors and other tools that allow me to > > populate the database > > ‘’’ > > > > Very cool. Cloudera Navigator ( not an open source product) does this too > > to some extent - collect metadata and create data lineage automatically ( > > stored as a Solr c

Re: Data lineage and data portal

2017-11-28 Thread Mark Grover
I am very interested in hearing more about the data portal as well. On Tue, Nov 28, 2017 at 1:15 PM, Radek Tomšej wrote: > Hi I am interested too. I have tried to make a POC with Elasticsearch + > Kibana so would be nice to share some experience. > > > On 2017-11-28 00:45,

Re: Data lineage and data portal

2017-11-28 Thread Marc Bollinger
s this too > to some extent - collect metadata and create data lineage automatically ( > stored as a Solr collection) by parsing sql queries. > > https://www.cloudera.com/documentation/enterprise/5-12- > x/topics/datamgmt_extraction_indexing.html > > > > On Mon, Nov 27,

Re: Data lineage and data portal

2017-11-27 Thread Ruslan Dautkhanov
‘’’ I'm now working on sql scanners, extractors and other tools that allow me to populate the database ‘’’ Very cool. Cloudera Navigator ( not an open source product) does this too to some extent - collect metadata and create data lineage automatically ( stored as a Solr collection) by parsing

Re: Data lineage and data portal

2017-11-27 Thread Maxime Beauchemin
+1, I miss the data portal! Max On Mon, Nov 27, 2017 at 5:33 PM, Ruslan Dautkhanov wrote: > +1 > > Thank you > > > On Mon, Nov 27, 2017 at 12:38 PM Gerard Toonstra > wrote: > > > Hi all, > > > > So something that really drew my attention recently is

Re: Data lineage and data portal

2017-11-27 Thread Ruslan Dautkhanov
+1 Thank you On Mon, Nov 27, 2017 at 12:38 PM Gerard Toonstra wrote: > Hi all, > > So something that really drew my attention recently is a "data portal" as > described by a team from airbnb somewhere in May. The idea is basically a > "facebook of data": > > > >

Re: Data lineage and data portal

2017-11-27 Thread Gurer Kiratli
If there are particular questions about the Data Portal, I would be happy to get a list of these and work on looping in the Data Portal folks from Airbnb. Cheers, Gurer On Mon, Nov 27, 2017 at 2:41 PM, Megan Kearl wrote: > I'm interested too > > On Nov 27, 2017 3:26 PM,

Re: Data lineage and data portal

2017-11-27 Thread Megan Kearl
I'm interested too On Nov 27, 2017 3:26 PM, "Bolke de Bruin" wrote: > Natuurlijk :-) > > Absolutely! > > Sent from my iPhone > > > On 27 Nov 2017, at 21:23, Chris Riccomini wrote: > > > > Interested > > > >> On Mon, Nov 27, 2017 at 12:07 PM, Kerr

Re: Data lineage and data portal

2017-11-27 Thread Bolke de Bruin
Natuurlijk :-) Absolutely! Sent from my iPhone > On 27 Nov 2017, at 21:23, Chris Riccomini wrote: > > Interested > >> On Mon, Nov 27, 2017 at 12:07 PM, Kerr Shireman wrote: >> >> I am interested. I remember being pretty excited when I read that

Re: Data lineage and data portal

2017-11-27 Thread Chris Riccomini
Interested On Mon, Nov 27, 2017 at 12:07 PM, Kerr Shireman wrote: > I am interested. I remember being pretty excited when I read that blog > post. > On Mon, Nov 27, 2017 at 2:00 PM Arthur Wiedmer > wrote: > > > Likewise! > > > > Best, > > Arthur >

Re: Data lineage and data portal

2017-11-27 Thread Arthur Wiedmer
Likewise! Best, Arthur On Mon, Nov 27, 2017 at 11:57 AM, Alison Stanton < astan...@bankofknowledge.net> wrote: > I'd like to be kept informed. > > Alison Stanton > > On Mon, Nov 27, 2017 at 1:53 PM, Laura Lorenz > wrote: > > > We're definitely looking for something

Re: Data lineage and data portal

2017-11-27 Thread Alison Stanton
I'd like to be kept informed. Alison Stanton On Mon, Nov 27, 2017 at 1:53 PM, Laura Lorenz wrote: > We're definitely looking for something like that here, so I would like to > jump in on this discussion. > > Laura > > On Mon, Nov 27, 2017 at 2:38 PM, Gerard Toonstra

Re: Data lineage and data portal

2017-11-27 Thread Laura Lorenz
We're definitely looking for something like that here, so I would like to jump in on this discussion. Laura On Mon, Nov 27, 2017 at 2:38 PM, Gerard Toonstra wrote: > Hi all, > > So something that really drew my attention recently is a "data portal" as > described by a

Data lineage and data portal

2017-11-27 Thread Gerard Toonstra
Hi all, So something that really drew my attention recently is a "data portal" as described by a team from airbnb somewhere in May. The idea is basically a "facebook of data": https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770 Unfortunately it looks like it's not