Here is again a use case for dynamic scheduling:
from the origin link: https://github.com/apache/airflow/issues/10449
Description
There are several question (issues) on stack overflow, asking for the
need of a dynamic schedule interval. This means, the ability to change
the schedule interval
I think there is a reason this is an often discussed topic, because it is
something that needs to be addressed. Yes, there are ways to achieve the
end goal such as triggering one dag with another as Qian suggests, or
adding short circuit/branching operators and custom macros to many of your
dags
My understanding here is that some users just wants to trigger a dag at the
time they prefer rather than always at the end of schedule_interval? I saw
two use cases in the thread:
1. The usecase from Daniel wants to trigger at start of schedule_interval
2. The usecase from Shaw has some holidays
My 2c: adding an option to schedule at the start of an interval is yet another
option to take in, adding more complexity. Therefore I’m not in favour of it.
The scheduling at start/end has often been a discussed, IMO it's a fact which
one has to know when learning Airflow. But not something
I strongly agree with Ash, I also think we should strive to decrease the
complexity of core Airflow components and not offer
customization/extensibility especially in the form of plugins where it is
not needed to make Airflow more robust and easier to reason about (less
testing configuration). I
> Ash, you had mentioned something about some plans that were in conflict
> with the above hack could you maybe share a thought or two about what
> you were thinking?
>
The main thing here is around scheduler HA, and I want the scheduler to
be able to make all scheduling decisions without
I like it too. Perhaps throw in a start of schedule interval Schedule object in
the package too.
James Coder
> On May 10, 2020, at 4:24 PM, Kaxil Naik wrote:
>
> I like that idea Daniel of having a Schedule abstraction.
>
>> On Wed, May 6, 2020 at 7:22 PM Daniel Standish wrote:
>>
>>
I like that idea Daniel of having a Schedule abstraction.
On Wed, May 6, 2020 at 7:22 PM Daniel Standish wrote:
> Inspired by James, I tried this out...
>
> For others interested, here is sample dag to test it out:
>
> class MyDAG(DAG):
> def following_schedule(self, dttm):
> pen_dt
Inspired by James, I tried this out...
For others interested, here is sample dag to test it out:
class MyDAG(DAG):
def following_schedule(self, dttm):
pen_dt = pendulum.instance(dttm).replace(second=0, microsecond=0)
minutes = pen_dt.minute
minutes_mod = minutes % 10
I just wanted to share this with everyone related to this topic. I have a
case where I need a scheduled dag run with following schedule:
Sun: 10 PM
Monday-Thursday: 8 AM, 11AM, 3:40 PM, 4:00 PM, 7:30 PM, 10PM
Friday: 8 AM, 11AM, 3:40 PM, 4:00 PM, 7:30 PM
There are most certainly a few ways to
I think that will lead to a very large number of questions about why it worked
before and now it doesn’t when doing a clean install.
And additionally, if developing in a new install and deploying to an old
install, you would get different behavior. Adding to more confusion.
James Coder
> On
I definitely agree. If we don't update it in 2.0 it is going to be hard to
change that in any 2.x versions
On Thu, Sep 26, 2019 at 10:51 AM James Meickle
wrote:
> I am *strongly* in favor of using the 2.0 update to break compat here,
> because this is a very confusing feature to most new users
I am *strongly* in favor of using the 2.0 update to break compat here,
because this is a very confusing feature to most new users of Airflow, but
also will break a _lot_ of DAGs. I feel like if we don't change this in 2.0
we probably won't for any 2.x either, which would be a shame.
On Wed, Sep
I agree with Dan to change the default execution at start of the interval.
How about adding this for 2.0 ??
Don't want to keep delaying this if we have a consensus already.
Regards,
Kaxil
On Fri, Aug 23, 2019, 15:39 Dan Davydov
wrote:
> What are people's feelings on changing the default
Oh clever! But...
I'd like us to not make this the "official" way just yet as I'm considering
changing how the scheduler works when it comes to executing the code (part of
the larger DAG serialisation effort, where I'd like to stop the scheduler
executing python code on every loop.) - and
For my problem, and the one mentioned earlier for those of us in the financial
world dealing with holidays this could be a solid solution.
For my example below you could derive DAG and add a max_interval property that
is a timedelta and if the delta between dttm and the value coming out of
Just had a thought and looked a tiny bit at the source code to assess
feasibility, but it seems like you could just derive the DAG class and
override `previous_schedule` and `following_schedule` methods. The
signature of both is you get a `datetime.datetime` and have to return
another. It's pretty
Re:
> For example, if I need to run a DAG every 20 minutes between 8 AM and 4
> PM...
This makes a lot of sense! Thank you for providing this example. My
initial thought of course is "well can't you just set it to run */20
between 7:40am and 3:40pm," but I don't think that is possible in
I can’t see how adding a property to Dagrun that is essentially
identical to next_execution_date would add any benefit. The way I see
it the issue at hand here is not the availability of dates. There are
plenty of options in the template context for dates before and after
execution date. My view
What if we merely add a property "run_date" to DagRun? At present
this would be essentially same as "next_execution_date".
Then no change to scheduler would be required, and no new dag parameter or
config. Perhaps you could add a toggle to the DAGs UI view that lets you
choose whether to
Totally agree with Daniel here. I think that if we implement this feature
as proposed, it will actively discourage us from implementing a better
data-aware feature that would remain invisible to most users while neatly
addressing a lot of edge cases that currently require really ugly hacks. I
How about an alternative approach that would introduce 2 new keyword
arguments that are clear (something like, but maybe better than
`period_start_dttm`, `period_end_dttm`) and leave `execution_date`
unchanged, but plan it's deprecation. As a first step `execution_date`
would be inferred from the
Execution date is execution date for a dag run no matter what. There is no end
interval or start interval for a dag run. The only time this is relevant is
when we calculate the next or previous dagrun.
So I don't Daniels rationale makes sense (?)
Sent from my iPhone
> On 27 Aug 2019, at
I agree with Daniel's rationale but I am also worried about backwards
compatibility as this would perhaps be the most disruptive breaking change
possible. I think maybe we should write down the different options
available to us (AIP?) and call for a vote. What does everyone think?
On Tue, Aug 27,
Can't execution date can already mean different things depending on if the
dag run was initiated via the scheduler or manually via command line/API?
I agree that making it consistent might make it easier to explain to new
users, but should we exchange that for breaking pretty much every existing
>
> To Daniel’s concerns, I would argue this is not a change to what a dag run
> is, it is rather a change to WHEN that dag run will be scheduled.
Execution date is part of the definition of a dag_run; it is uniquely
identified by an execution_date and dag_id.
When someone asks what is a
Re
> What are people's feelings on changing the default execution to schedule
> interval start
and
> I'm in favor of doing that, but then exposing new variables of
> "interval_start" and "interval_end", etc. so that people write
> clearer-looking at-a-glance DAGs
While I am def on board with
I'm in favor of doing that, but then exposing new variables of
"interval_start" and "interval_end", etc. so that people write
clearer-looking at-a-glance DAGs
On Fri, Aug 23, 2019 at 10:39 AM Dan Davydov
wrote:
> What are people's feelings on changing the default execution to schedule
>
What are people's feelings on changing the default execution to schedule
interval start and communicating this to existing users in the Updating
notes so that they can preserve the old behavior? Could potentially cause
headaches for users who don't read the notes but I think it might make
sense to
I am for this change, since I feel like in general the start of the
interval is more intuitive (I have been working on Airflow for 3 years and
this still trips me up). That being said I'm not sure how I feel about
allowing customization at DAG level instead of cluster level as it makes it
harder
DST: I recall problems with DST especially when the hour goes back and the
daily schedule time technically occurs twice the same day or does not occur
at all. We have some code that chooses arbitrary the first occurence in the
latter case (there was a problem that it worked differently python 3.6
This is a change to one of Airflow's core concepts, and it would require a
lot of work for existing DAGs to cut over to it. Given that, my personal
preference would be to allow arbitrary customization rather than just a bit
toggle. Such as allowing passing in a mapping function: given an
Changing mid-flight is always a massive edge case already for many parts of the
scheduler. Can we easily test this sort of behaviour in unit tests?
I don't think DST needs extra tests as it uses the existing functions that are
already well tested, no?
-a
> On 23 Aug 2019, at 13:24, Jarek
Happy for it as well. There are a number of cases where scheduling at start
makes more sense and as we see Airflow is used now in multiple cases where
there is no need to process data from an interval and wait until that data
is ready.
But indeed some more tests would be great - especially for
Happy for this feature to merged
On Fri, Aug 23, 2019, 11:49 Ash Berlin-Taylor wrote:
> This has come up a few times before, someone has now opened a PR that
> makes this a global+per-dag setting:
> https://github.com/apache/airflow/pull/5787 and it also includes docs
> that I think does a good
Looks good. Two things im a bit concerned about is:
1. What happens if you changed the choice for the dag mid air?
2. Tests seem a bit light
Cheers
Bolke
Verstuurd vanaf mijn iPad
> Op 23 aug. 2019 om 12:49 heeft Ash Berlin-Taylor het
> volgende geschreven:
>
> This has come up a few times
This has come up a few times before, someone has now opened a PR that makes
this a global+per-dag setting: https://github.com/apache/airflow/pull/5787 and
it also includes docs that I think does a good job of illustrating the two
modes.
Does anyone object to this being merged? If no one says
37 matches
Mail list logo