Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-02-25 Thread Ash Berlin-Taylor
I've just published the draft AIP for discussion - please comment on that thread, or on the AIP directly

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-02-23 Thread Ash Berlin-Taylor
That is exactly the approach I am working on an AIP for -- I hope to have it out (at least in draft/discussion form) this week. -ash On Tue, 23 Feb, 2021 at 00:39, Dmitri Khokhlov wrote: ok, how about this: 1) implement "function" approach first. solve all plumbing & ui. 2) then implement "m

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-02-23 Thread Dmitri Khokhlov
ok, how about this: 1) implement "function" approach first. solve all plumbing & ui. 2) then implement "multiple crons" approach using 1). that way users potentially can use 2) as reference implementation to create their own custom extensions based on 1). -- Dmitri On 2021/02/17 15:47:01, Phil Yar

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-02-17 Thread Phil Yardley
Is it possible to offer both? (maybe in two releases).. that then allows the user to select the most appropriate for their scenario. My scenario for example is easy with multiple crons: Monday - Thursday run job A at 9pm Friday - run job A at 8pm this is easier in cron than writing a python ex

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-24 Thread Elad Kalif
Multiple crons also don't solve all use cases. I encountered a question on stackoverflow for scheduling daily DAG except for holidays (like Christmas, Labor Day, Independence Day etc.) On Sun, Jan 24, 2021 at 9:04 AM Jarek Potiuk wrote: > Yep. I agree with Daniel - adding multiple crons is very

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-23 Thread Jarek Potiuk
Yep. I agree with Daniel - adding multiple crons is very difficult to reason about. you can create arbitrary complex declarative way of defining complex schedule that you will have hard time understanding. We are already entering the realm of programming the schedule, which IMHO is better to do in

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-23 Thread Daniel Imberman
I worry that multiple crons would become difficult to read for stranger use-cases (for example "run on the first trading day after the 15th of the month"). If we create a python function or class we can easily create a "CronTimeTable" that does exactly what Dmitry is suggesting while still leaving

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-23 Thread Kaxil Naik
I think whatever approach we decide on we should display *next_execution_date* in the webserver for each DAG. This would help most of the users. Regards, Kaxil On Sat, Jan 23, 2021 at 10:25 PM Dmitri Khokhlov wrote: > Root problem: > - existing Airflow schedule syntax defines only one interval

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-23 Thread Dmitri Khokhlov
Root problem: - existing Airflow schedule syntax defines only one interval pattern per DAG - there are use-cases that need multiple interval patterns per DAG (during a day etc) I vote for "crontab list" solution from Deng Xiaodong. Example: *schedule_interval = ["* 0,22,23 * * *", "30 1-21 * * *

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-21 Thread Daniel Imberman
My only concern with tying this to the dag_parsing process is that that process might miss SLAs because it takes too long to loop around. I could imagine a separate thread or component that can read either TimeTable objects or SmartSensor objects and run them might make sense. Ultimately I don’t

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-21 Thread Vikram Koka
Great discussion. I generally agree with the "Custom scheduling class" / subclass approach which would run as part of the "scheduler" set of processes, rather than an internal DAG approach. I do think it would be good to have boundaries on what information this class would operate on and at what

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-21 Thread Ash Berlin-Taylor
It shouldn't need something that complex (or to my mind hacky) as in internal DAG. The way the scheduler works now it just looks at two columns on the dag (model) table called I think "next_dagrun_after" (which is the earliest date that the dag run can be created, and "next execution date" (whi

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-21 Thread Daniel Imberman
I think James Idea sounds like a pretty good idea. What would you all think of us doing something similar to how we handle smart sensors for how we implement this? Have an internal DAG that reads all custom timetables and triggers a DAG if the function returns True? Seems like a pretty simple/cu

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread James Timmins
Django provides a really good model for allowing users to customize the behavior of Class Based Views. It's in line w/ what Daniel/Kaxil and co are saying about a consistent backend class. It uses a standard base class as well as a default concrete implementation. Customization then only requires s

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread James Coder
Kaxil you beat me to it. I actually have a dag where I achieve an irregular schedule by overriding DAG.next_dagrun_info(). If that method were swapped out for an object it may be a semi-easy way to make the schedule “plugable”. James Coder > On Jan 20, 2021, at 6:37 PM, Kaxil Naik wrote: > >

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Kaxil Naik
"CronBackend" / "ScheduleIntervalBackend" :D similar to Xcom and Secrets Backend Would be definitely good to have Custom Schedule intervals using functions/class that is Serializable too. On Wed, Jan 20, 2021 at 11:02 PM QP Hou wrote: > On Wed, Jan 20, 2021 at 10:22 AM Daniel Imberman > wrot

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread QP Hou
On Wed, Jan 20, 2021 at 10:22 AM Daniel Imberman wrote: > > I love the idea of allowing users to create their own scheduling > objects/scheduling python functions. They could either live in the scheduler > or as a seperate process that trips some value in the DB when it is “true”. > Would be gr

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Ash Berlin-Taylor
This all just to say: I think this one specific use case is already achievable. On 20 January 2021 19:36:54 GMT, Ash Berlin-Taylor wrote: >I'll have to test, bit my gut says that isn't how it works, instead next >schedule = previous schedule + interval. > >On 20 January 2021 19:25:50 GMT, Elad

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Ash Berlin-Taylor
I'll have to test, bit my gut says that isn't how it works, instead next schedule = previous schedule + interval. On 20 January 2021 19:25:50 GMT, Elad Kalif wrote: >Sorry ignore my explanation. It's wrong. > > https://github.com/apache/airflow/issues/8649 explains why >`schedule_interval=timed

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Elad Kalif
Sorry ignore my explanation. It's wrong. https://github.com/apache/airflow/issues/8649 explains why `schedule_interval=timedelta(days=14), start_date=...`. is not the exactly the same as bi-weekly for an exact specified time. It also provide an example for daily. On Wed, Jan 20, 2021 at 9:19 PM

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Elad Kalif
> `schedule_interval=timedelta(days=14), start_date=...`. ash from what I've seen It doesn't give the same exact functionality This is what I observed: cron = schedule exactly on regardless of what happened with previous run (mostly) timedelta() = schedule after delta passed. However it's from end

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Kaxil Naik
> `schedule_interval=timedelta(days=14), start_date=...`. That won't support every second Thursday for example On Wed, Jan 20, 2021 at 6:54 PM Ash Berlin-Taylor wrote: > Running exactly every two weeks can be done by setting > `schedule_interval=timedelta(days=14), start_date=...`. > > Does th

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Ash Berlin-Taylor
Running exactly every two weeks can be done by setting `schedule_interval=timedelta(days=14), start_date=...`. Does this do what you need Elad? On 20 January 2021 18:12:36 GMT, Elad Kalif wrote: >>> In the example of a twice-a-month dag (not sure if it you have this use >case too?) what do you

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Jarek Potiuk
On Wed, Jan 20, 2021 at 7:28 PM Daniel Imberman wrote: > @Jarek the problem with just cron is I don’t think cron can handle “every > third Thursday” or “the next open market day after the 15th.” I think we > need something more flexible than just cron (though agree that cron syntax > can get a fa

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Daniel Imberman
@Jarek the problem with just cron is I don’t think cron can handle “every third Thursday” or “the next open market day after the 15th.” I think we need something more flexible than just cron (though agree that cron syntax can get a fair bit of mileage) On Wed, Jan 20, 2021 at 10:24 AM, Jarek Po

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Jarek Potiuk
My thoughts (no final solution in mind, just wild thoughts): 1) I think we should add support for regular CRON behaviour. Simply "cron" schedule for dag execution, without the "data interval" rhetoric. There are a number of good cases where Airflow can be used as just a scheduler to run the jobs.

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Daniel Imberman
I love the idea of allowing users to create their own scheduling objects/scheduling python functions. They could either live in the scheduler or as a seperate process that trips some value in the DB when it is “true”. Would be great from a “marketplace” standpoint as well as users could post the

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Elad Kalif
>> In the example of a twice-a-month dag (not sure if it you have this use case too?) what do you expect the "data interval" (i.e. execution_date) to be? Yes we have this use case too. The execution date does matter because I want it to be bi-weekly for starting specific day and time so with the cu

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Deng Xiaodong
A quick thought (*maybe not making sense*): if *schedule_interval* accepts a list of values, we may support much higher complexity. For example, I may want to schedule my jobs at every days' 04:05 AND 02:31 , which cannot be expressed by single Cron pattern. Then I may want to have *schedule_inter

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Ash Berlin-Taylor
Yes, we quite possibly could do this -- I'm trying to work out what the needs are here. In the example of a twice-a-month dag (not sure if it you have this use case too?) what do you expect the "data interval" (i.e. execution_date) to be? Or for this case does it not matter? -ash On Wed,

Re: Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Elad Kalif
Another case that is mentioned in one of the issues is the ability to schedule a bi-weekly job (equivalent of bi-weekly meeting that you can set in a calendar) which is very much needed. Maybe this is unrealistic but I think the game changer is if it would be possible to let the users define their

Scoping out a new feature for 2.1: improving schedule_interval

2021-01-20 Thread Ash Berlin-Taylor
Hi everyone, I'd like to (re)start the discussion about a new feature I'd like to add for Airflow 2.1, that I am loosely calling "improving schedule_interval" (catchy name I know!) I have two main high-level goals in mind here: 1. To reduce the confusion around execution_date (specifically t