Re: [DISCUSS] Lineage improvements and standardization of operator signatures

2020-01-26 Thread Kaxil Naik
I quite like the PIPE option. For me I feel that is more readable and easy to understand similar to how we currently set task dependencies. Wrapper Function is also fine as long as a User doesn't have to change the way they define task i.e. they can still use their old DAGs. The main reason I lik

Re: [DISCUSS] Lineage improvements and standardization of operator signatures

2020-01-26 Thread Jarek Potiuk
> > File(url=“http://www.google.com”) | task1 | task2 > File(url=“ > file:///tmp/out”) > I love the UNIXY pipeline approach - a lot more than my operator overloading proposal :). And I think combining it with the builder pattern for more complex cases works very nicely (those complex cases will be

Re: [DISCUSS] Lineage improvements and standardization of operator signatures

2020-01-26 Thread Bolke de Bruin
Hi All, Thanks for all the responses! Couple of additional thoughts. It is the intention to have all Operators lineage aware. In that way lineage is built in so that a developer does not have to do anything. In its most simple form it will then just record the inlets and outlets to that particula

Re: [DISCUSS] Lineage improvements and standardization of operator signatures

2020-01-24 Thread Jarek Potiuk
After discussing with Bolke (we had indeed very good and constructive discussion at Polidea office this week), I am a great supporter of adding more lineage support to Airflow and I think we should all as community think about how to make it as easy as possible to use and maintain. If it is not yet

Re: [DISCUSS] Lineage improvements and standardization of operator signatures

2020-01-23 Thread Gerard Casas Saez
Hi everyone! I think the whole data lineage proposal is great and I would like to contribute a bit with my own thoughts  on how to extend the Operators API for better lineage support. Lately, I’ve been experimenting a bit on extending the Operator API to make it more `functional` to specify Da

Re: [DISCUSS] Lineage improvements and standardization of operator signatures

2020-01-22 Thread Tao Feng
Thanks Bolke. For those that are not aware, my team is working with Bolke's team on Amundsen which is a data discovery and metadata project( https://github.com/lyft/amundsen) . I think although it ships with Atlas client(or it used to be), the new API per my understanding is generic enough that doe

Re: [DISCUSS] Lineage improvements and standardization of operator signatures

2020-01-22 Thread Dan Davydov
Just want to preface my reply with the fact that I haven't thought about data lineage very much. This is an awesome idea :)! I like something like 1) personally, e.g. operators could optionally define a .outlet() and .inlet() interface which would return the inlets and outlets of a given task, and

[DISCUSS] Lineage improvements and standardization of operator signatures

2020-01-22 Thread Bolke de Bruin
Dear All, Over last few weeks I made serious improvements to the lineage support that Airflow has. Whilst not complete it’s starting to shape up and I think it is good to share some thoughts and directions. Much has been discussed with several organisations like Polidea, Daily Motion and Lyft. Som