Re: [AIP-34] Rewrite SubDagOperator

2020-06-17 Thread Xinbin Huang
The TaskGroup will not take schedule interval as a parameter itself, and it depends on the DAG where it attaches to. In my opinion, the TaskGroup will only contain a group of tasks with interdependencies, and the TaskGroup behaves like a task. It doesn't contain any execution/scheduling logic

Re: [AIP-34] Rewrite SubDagOperator

2020-06-17 Thread 蒋晓峰
Hi Bin, Using TaskGroup, Is the schedule interval of TaskGroup the same as the parent DAG? My main concern is whether the schedule interval of TaskGroup could be different with that of the DAG? For example, there is the scenario that the schedule interval of DAG is 1 hour and the schedule

Re: [AIP-34] Rewrite SubDagOperator

2020-06-17 Thread Xinbin Huang
Hi Nicholas, I am not sure about the old behavior of SubDagOperator, maybe it will throw an error? But in the original proposal, the subdag's schedule_interval will be ignored. Or if we decide to use TaskGroup to replace SubDag, there will be no subdag schedule_interval. Bin On Wed, Jun 17,

Re: [AIP-34] Rewrite SubDagOperator

2020-06-17 Thread 蒋晓峰
Hi Bin, Thanks for your good proposal. I was confused whether the schedule interval of SubDAG is different from that of the parent DAG? I have discussed with Jiajie Zhong about the schedule interval of SubDAG. If the SubDagOperator has a different schedule interval, what will happen for the

Re: [AIP-34] Rewrite SubDagOperator

2020-06-17 Thread Xinbin Huang
Thank you, Max, Kaxil, and everyone's feedback! I have rethought about the concept of subdag and task groups. I think the better way to approach this is to entirely remove subdag and introduce the concept of TaskGroup, which is a container of tasks along with their dependencies *without

Re: [AIP-35] Add Signal Based Scheduling To Airflow

2020-06-17 Thread Becket Qin
Hi Kevin and Ash, Thanks for the feedback / comments / suggestions. It is great to know that Airbnb has already been running stream jobs using AirFlow and there might be a working solution. First of all, I'd like to say that we are still quite new to the AirFlow, our proposal likely has

Re: [AIP-34] Rewrite SubDagOperator

2020-06-17 Thread Maxime Beauchemin
+1, proposal looks good. The original intention was really to have tasks groups and a zoom-in/out in the UI. The original reasoning was to reuse the DAG object since it is a group of tasks, but as highlighted here it does create underlying confusions since a DAG is much more than just a group of

Re: [AIP-35] Add Signal Based Scheduling To Airflow

2020-06-17 Thread Ash Berlin-Taylor
How is the UI going to represent this? How is do you navigate between them? There is already the concept of try_number -- can that be used instead of making task_id to task_ids? On Jun 17 2020, at 11:24 am, 蒋晓峰 wrote: > Hi Gerard, > > Regarding the question mentioned above, it's a good

Re: [AIP-35] Add Signal Based Scheduling To Airflow

2020-06-17 Thread 蒋晓峰
Hi Gerard, Regarding the question mentioned above, it's a good point. The Operator currently contains a task_id attribute which is the same as operator id. Therefore, when a task instance runs multiple times, the task_id would be the same. So we do need to change something more than what is in

Re: [AIP-35] Add Signal Based Scheduling To Airflow

2020-06-17 Thread Ash Berlin-Taylor
I also agree with Kevin - I'd love to see better streaming support in Airflow, but I'm not sure this is the way to go about it. Something about it feels not quite right. And I'm also not a fan of the name -- perhaps just my background and dealing with the scheduler and executor code -- but to me

Re: [AIP-35] Add Signal Based Scheduling To Airflow

2020-06-17 Thread 蒋晓峰
Hi Gerard, If users follow the definition of SignalOperator correctly, the idea for the streaming-triggered-batch case is to restart the execution for evaluations of the online trained model. In other words, once the evaluation operator receives the signal from the online learning operator, the

Re: Preparing RC4 of backport packages

2020-06-17 Thread Jarek Potiuk
I have merged yesterday the "Transfer" package. There are few more checks and I am working on releasing the (hopefully final) RC of backport packages. I aim to have it released latest tomorrow with the target date Monday next week. On Wed, Jun 10, 2020 at 8:59 PM Jarek Potiuk wrote: > Agree. I

Re: [VOTE] Moving of the transfer operators to new packages

2020-06-17 Thread Jarek Potiuk
The "transfers" package change has been merged last night. On Sat, Jun 6, 2020 at 9:49 AM Kamil Breguła wrote: > The vote has now passed > > We have 4 '+1' binding votes: > > Kamil Breguła > Tomasz Urbaszek > Felix Uellendall > Jarek Potiuk > > Issue created >

Re: [UPDATE] AIP-31 .output update

2020-06-17 Thread Tomasz Urbaszek
+1 (binding) On Wed, Jun 17, 2020 at 3:39 AM 蒋晓峰 wrote: > +1(not binding) > > On Wed, Jun 17, 2020 at 3:03 AM Gerard Casas Saez > wrote: > > > Hi everyone, > > > > Sending an email here to consolidate an update to the AIP-31 that has > > happened while we have been implementing this. > > > >

Re: [AIP-35] Add Signal Based Scheduling To Airflow

2020-06-17 Thread Kevin Yang
I'm in general supportive of this idea of supporting streaming jobs. We in Airbnb have historically ran stream jobs for years on Airflow, with some hacks of course. Yes the stream jobs might not be idempotent or so to fit in the Airflow paradigm. But I personally would love to see Airflow be