On Wed, 18 May 2022 at 17:18, Ash Berlin-Taylor wrote:
> Start date also makes sense for a cron-based dag with catch-up too though...
True.
So,
1. A timedelta without a `start_date` is not wrong, but it'll use
midnight as the reference time (I think this is better than "date
first added"
Start date also makes sense for a cron-based dag with catch-up too though...
On 18 May 2022 16:58:54 BST, Malthe wrote:
>On Sat, 14 May 2022 at 11:21, Bas Harenslak wrote:
>> I think we have the following options when no start_date is given:
>>
>> 1. schedule_interval is alias e.g. “@daily”
On Sat, 14 May 2022 at 11:21, Bas Harenslak wrote:
> I think we have the following options when no start_date is given:
>
> 1. schedule_interval is alias e.g. “@daily” —> is a cron expression
> internally (0 0 * * *), so run at 00:00
> 2. schedule_interval is cron e.g. “0 0 * * *” —> cron
Not in favour of a special marker because that’s essentially what start_date is
for. Say somebody has a schedule_interval=timedelta(days=1) and wants their DAG
to run at 00:00 without having to think of a specific start date, then they’d
have to set start_date="random date and time 00:00" and
"starts whenever you first deploy it", this makes dags nondeterministic. It
is true that currently it is very hard to achieve this. Maybe we could use
a special start_date marker to indicate this behavior so that users can be
very aware of what they are doing.
There is also another case where
I disagree, start_date is None and catchup=True still describes a useful
behavior that’s currently difficult to achieve in Airflow: a DAG that
starts whenever you first deploy it and then catches up missed runs if you
pause and unpause it or have downtime.
On Thu, May 12, 2022 at 5:49 AM Jarek
Yeah. Maybe simply start_date should only be required when catchup=True
then? Sounds like it might correctly reflect the intention of
catchup=True, while bringing a very solid semantic for explicit start_date.
J.
On Tue, May 10, 2022 at 11:14 PM Ping Zhang wrote:
> I agree that for the
I agree that for the crontab interval with `catchup=False`, the state_date
does not make sense. However, the start_date is still very useful when
having catchup=True, whose default value is `True`,
https://github.com/apache/airflow/blob/main/airflow/config_templates/default_airflow.cfg#L989.
If
Coincidentally - this discussion in Github Discussions started just now has
a clear use cases when omitting start_date makes perfect sense:
https://github.com/apache/airflow/discussions/23594
On Mon, May 9, 2022 at 4:01 PM Bas Harenslak
wrote:
> I never understood the requirement for start_date
I never understood the requirement for start_date — 99% of the use cases simply
want to start from the time the DAG is first added and do not explicitly need
to start on a certain date. There is certainly a use case for start_date, but
defaulting to None would make more sense IMO, and we could
I think the only real need for start_date is the "catchup=True".
I think start_date is really part of the metadata of the DAG - that is
really useful in order to determine range of backfill for example. So it's
more an intention of the DAG author to describe when we actually want the
DAG livecycle
There's been some prior discussion on removing the requirement for a
DAG without a schedule:
- https://issues.apache.org/jira/browse/AIRFLOW-3739
- https://github.com/apache/airflow/pull/5423
But why actually have the requirement at all.
The documentation isn't particularly clear on why we need
12 matches
Mail list logo