Re: Generic Transfer Operator

2020-09-01 Thread Tomasz Urbaszek
Austin, you are right, Beam covers all (and more) important IOs. However, using Apache Beam to design a generic transfer operator requires Airflow users to have additional resources that will be used as a runner (Spark, Flink, etc.). Unless you suggest using DirectRunner? Can you please tell us mo

Re: [AIP-34] Rewrite SubDagOperator

2020-09-01 Thread Yu Qian
Okay. On one hand, we want to automatically prefix task_id so that users don't have to parametrize task_id themselves inside TaskGroup to maintain task_id uniqueness. On the other hand, we don't want people to be surprised when they introduce TaskGroup to an existing DAG and all of a sudden task_id

Re: Generic Transfer Operator

2020-09-01 Thread Austin Bennett
Are there IOs that would be desired for a generic transfer operator that don't exist in: https://beam.apache.org/documentation/io/built-in/ <- there is pretty solid coverage? Beam is getting to the point where even python beam can leverage the java IOs, which increases the range of IOs (and perfo

Re: Generic Transfer Operator

2020-09-01 Thread Jarek Potiuk
But I believe those two ideas are separate ones as Tomek explained :) On Wed, Sep 2, 2020 at 12:03 AM Jarek Potiuk wrote: > I love the idea of connecting the projects more closely! > > I've been helping recently as a consultant in improving the Apache Beam > build infrastructure (in many parts b

Re: Generic Transfer Operator

2020-09-01 Thread Jarek Potiuk
I love the idea of connecting the projects more closely! I've been helping recently as a consultant in improving the Apache Beam build infrastructure (in many parts based on my Airflow experience and Github Actions - even recently they adopted the "cancel" action I developed for Apache Airflow). h

Re: Generic Transfer Operator

2020-09-01 Thread Gerard Casas Saez
Agree on keeping those separate, just intervened as I believe its a great idea. But lets keep @beam and @spark to a separate thread. Gerard Casas Saez Twitter | Cortex | @casassaez On Tue, Sep 1, 2020 at 2:14 PM Tomasz Urbaszek wrote: > Daniel is right we have f

Re: Generic Transfer Operator

2020-09-01 Thread Tomasz Urbaszek
Daniel is right we have few Apache Beam committers in Polidea so we will ask for advice. However, I would be highly in favor of having it as Gerard suggested as @beam decorator. This is something we should put into another AIP together with the mentioned @spark decorator. Our proposition of transf

Re: Generic Transfer Operator

2020-09-01 Thread Kaxil Naik
Nice. Just a note here, we will need to make sure that those "Source" and "Destination" needs to be serializable. On Tue, Sep 1, 2020, 20:00 Daniel Imberman wrote: > Interesting! Beam also could potentially allow transfers within Dask/any > other system with a java/python SDK? I think @jarek and

Re: Generic Transfer Operator

2020-09-01 Thread Daniel Imberman
Interesting! Beam also could potentially allow transfers within Dask/any other system with a java/python SDK? I think @jarek and Polidea do a lot of work with Beam as well so I’d love their thoughts if this a good use-case. via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv

Re: Generic Transfer Operator

2020-09-01 Thread Gerard Casas Saez
I would be highly in favour of having a generic Beam operator. Similar to @spark_task decorator. Something where you can easily define and wrap a beam pipeline and convert it to an Airflow operator. Gerard Casas Saez Twitter | Cortex | @casassaez On Tue, Sep 1, 202

Re: Generic Transfer Operator

2020-09-01 Thread Austin Bennett
Are you guys familiar with Beam ? Esp. if not doing transforms, it might rather straightforward to rely on the ecosystem of connectors in that Apache Project to use as the foundations for a generic transfer operator. On Tue, Sep 1, 2020 at 11:05 AM Jarek Potiuk wrote: >

Re: Generic Transfer Operator

2020-09-01 Thread Jarek Potiuk
+1 On Tue, Sep 1, 2020 at 1:35 PM Kamil Olszewski wrote: > Hello all, > since there have been no new comments shared in the POC doc > < > https://docs.google.com/document/d/1o7Ph7RRNqLWkTbe7xkWjb100eFaK1Apjv27LaqHgNkE/edit > > > for a couple of days, then I will proceed with creating an AIP for

Re: [AIP-34] Rewrite SubDagOperator

2020-09-01 Thread Gerard Casas Saez
As I mentioned in the issue, I believe prefixing group_id is a nice thing as it makes TaskGroup an equivalent for SubDagOperator. Internally we have a similar concept to TaskGroup called FlattenedSubDagOperator that append the group_id to the task_id. One of the main usages internally for this ope

Re: Generic Transfer Operator

2020-09-01 Thread Kamil Olszewski
Hello all, since there have been no new comments shared in the POC doc for a couple of days, then I will proceed with creating an AIP for this feature, if that is ok with everybody. Best regards, Kamil On Thu, Au

Re: [AIP-34] Rewrite SubDagOperator

2020-09-01 Thread Yu Qian
The vote for this AIP-34 passed. However, there's an interesting discussion going on here reg