Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

2025-04-24 Thread Constance Martineau
I agree with Elad. If I pause a DAG, I expect for whatever is running to
continue and for the scheduler to stop scheduling dag runs and new tasks. I
don't think we should change the behaviour to fix a chart. That feels
backwards.

Separately: In my view, choosing to to prevent or run new tasks from the
same dag run after a dag is paused could have gone either way (if you
consider the job the entire dag run, then letting the job finish feels the
most ideal), but I can see why the choice was made and at this point think
it's what people expect.

On Thu, Apr 24, 2025 at 2:09 PM Elad Kalif  wrote:

> I am  -1 for changing this especially just to solve duration calculation.
> The current behavior is key for drain use case which is very useful.
>
> I don't think I will change my -1 before
> https://github.com/apache/airflow/issues/22006 is resolved
>
>
> On Thu, Apr 24, 2025 at 7:56 PM Brent Bovenzi  >
> wrote:
>
> > Yeah, if we do a similar endpoint we should filter it to only include
> > unpaused Dags. We do check if the dag is paused during auto refresh in a
> > lot of places.
> >
> > On Fri, Apr 18, 2025 at 3:44 PM Pedro Nunes Leal
> >  wrote:
> >
> > > A 2025-04-03 19:28, Brent Bovenzi escreveu:
> > > > The issue is that duration is based off of start and end dates. If
> > > > there is
> > > > no end date we usually default to now. But that is misleading when a
> > > > dag
> > > > run is running but the dag is paused.
> > > > Let me take a look at where we use duration in the 3.0 UI and see if
> we
> > > > can
> > > > reduce that confusion. We don't have the "5 longest dag runs" in our
> > > > new
> > > > dashboard page, which replaces cluster activity. If we wanted that
> > > > feature
> > > > again, we should be mindful of this and filter out paused dags in the
> > > > API
> > > > request.
> > > >
> > > >
> > > >
> > > > On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal
> > > >  wrote:
> > > >
> > > >> A 2025-03-31 22:26, Jens Scheffler escreveu:
> > > >> > Hi,
> > > >> >
> > > >> > thanks for working on the bug and raising a PR to fix it.
> > > >> >
> > > >> > As other commiters also commented I think from product view I'd
> > expect
> > > >> > a
> > > >> > different resolution. We use the "Pause DAG" in most cases for
> > > >> > administrative or infrastructure problems to prevent further
> > failures
> > > >> > and/or to drain infra to switch some backend.
> > > >> >
> > > >> > I assume when we pause a long-running DAG that is in-between
> > execution
> > > >> > of tasks we want to really "pause" scheduling, we don't want to
> set
> > it
> > > >> > to failed. That would also not be correct because once we un-pause
> > the
> > > >> > running DAGs should continoue to work. I see no reason marking
> this
> > > >> > failed anf then manually running behind to reset the state later.
> > > >> >
> > > >> > My view on this is that as also proposed in the discussion of the
> > bug,
> > > >> > we should rather filter the paused DAG from clouster activity
> > > reporting
> > > >> > such that paused DAGs are not reported with excessive runtime.
> Also
> > > >> > later if un-paused it would be "right" that the overall DAG
> runtime
> > > was
> > > >> > longer than normal (would not expect to deduct the paused time
> from
> > > >> > runtime of the DAG.)
> > > >> >
> > > >> > If I want (as operator/admin) to really terminate existing running
> > > >> > instances I'd rather walk through Browse -> DAG Runs --> Filter
> for
> > > >> > running with paused DAG id and mark them as failed explicitly.
> > > >> >
> > > >> > Jens
> > > >> >
> > > >> > On 31.03.25 20:50, Pedro Nunes Leal wrote:
> > > >> >> Hello everyone,
> > > >> >>
> > > >> >> Currently, I'm trying to fix this bug:
> > > >> >> https://github.com/apache/airflow/issues/3
> > > >> >>
> > > >> >> Basically, the issue is that the DAGs would be stuck on running
> > even
> > > >> >> though they were paused.
> > > >> >> Consequently, the duration of the dag run will keep on increasing
> > > even
> > > >> >> though the DAG is paused.
> > > >> >>
> > > >> >> My proposal to solve this problem is changing the DAGs state from
> > > >> >> running to failed, when paused, to avoid the increment of their
> > > >> >> duration.
> > > >> >>
> > > >> >> Since this can be an impactful change, I would like to hear what
> > > >> >> others think about it.
> > > >> >>
> > > >> >> Link for the Pull Request:
> > > >> >> https://github.com/apache/airflow/pull/47557
> > > >> >>
> > > >> >>
> > > >> >>
> > -
> > > >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > >> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >> >>
> > > >> >
> > > >> >
> > -
> > > >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > >> > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >> That can be a better

Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

2025-04-24 Thread Elad Kalif
I am  -1 for changing this especially just to solve duration calculation.
The current behavior is key for drain use case which is very useful.

I don't think I will change my -1 before
https://github.com/apache/airflow/issues/22006 is resolved


On Thu, Apr 24, 2025 at 7:56 PM Brent Bovenzi 
wrote:

> Yeah, if we do a similar endpoint we should filter it to only include
> unpaused Dags. We do check if the dag is paused during auto refresh in a
> lot of places.
>
> On Fri, Apr 18, 2025 at 3:44 PM Pedro Nunes Leal
>  wrote:
>
> > A 2025-04-03 19:28, Brent Bovenzi escreveu:
> > > The issue is that duration is based off of start and end dates. If
> > > there is
> > > no end date we usually default to now. But that is misleading when a
> > > dag
> > > run is running but the dag is paused.
> > > Let me take a look at where we use duration in the 3.0 UI and see if we
> > > can
> > > reduce that confusion. We don't have the "5 longest dag runs" in our
> > > new
> > > dashboard page, which replaces cluster activity. If we wanted that
> > > feature
> > > again, we should be mindful of this and filter out paused dags in the
> > > API
> > > request.
> > >
> > >
> > >
> > > On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal
> > >  wrote:
> > >
> > >> A 2025-03-31 22:26, Jens Scheffler escreveu:
> > >> > Hi,
> > >> >
> > >> > thanks for working on the bug and raising a PR to fix it.
> > >> >
> > >> > As other commiters also commented I think from product view I'd
> expect
> > >> > a
> > >> > different resolution. We use the "Pause DAG" in most cases for
> > >> > administrative or infrastructure problems to prevent further
> failures
> > >> > and/or to drain infra to switch some backend.
> > >> >
> > >> > I assume when we pause a long-running DAG that is in-between
> execution
> > >> > of tasks we want to really "pause" scheduling, we don't want to set
> it
> > >> > to failed. That would also not be correct because once we un-pause
> the
> > >> > running DAGs should continoue to work. I see no reason marking this
> > >> > failed anf then manually running behind to reset the state later.
> > >> >
> > >> > My view on this is that as also proposed in the discussion of the
> bug,
> > >> > we should rather filter the paused DAG from clouster activity
> > reporting
> > >> > such that paused DAGs are not reported with excessive runtime. Also
> > >> > later if un-paused it would be "right" that the overall DAG runtime
> > was
> > >> > longer than normal (would not expect to deduct the paused time from
> > >> > runtime of the DAG.)
> > >> >
> > >> > If I want (as operator/admin) to really terminate existing running
> > >> > instances I'd rather walk through Browse -> DAG Runs --> Filter for
> > >> > running with paused DAG id and mark them as failed explicitly.
> > >> >
> > >> > Jens
> > >> >
> > >> > On 31.03.25 20:50, Pedro Nunes Leal wrote:
> > >> >> Hello everyone,
> > >> >>
> > >> >> Currently, I'm trying to fix this bug:
> > >> >> https://github.com/apache/airflow/issues/3
> > >> >>
> > >> >> Basically, the issue is that the DAGs would be stuck on running
> even
> > >> >> though they were paused.
> > >> >> Consequently, the duration of the dag run will keep on increasing
> > even
> > >> >> though the DAG is paused.
> > >> >>
> > >> >> My proposal to solve this problem is changing the DAGs state from
> > >> >> running to failed, when paused, to avoid the increment of their
> > >> >> duration.
> > >> >>
> > >> >> Since this can be an impactful change, I would like to hear what
> > >> >> others think about it.
> > >> >>
> > >> >> Link for the Pull Request:
> > >> >> https://github.com/apache/airflow/pull/47557
> > >> >>
> > >> >>
> > >> >>
> -
> > >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > >> >>
> > >> >
> > >> >
> -
> > >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >> > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >> That can be a better approach.
> > >>
> > >> However, if I'm not mistaken, the code related to the cluster activity
> > >> page doesn't exist in Airflow 3 (the version where I'm trying to do
> > >> the
> > >> changes).
> > >>
> > >> So what should I do in this case?
> > >> Is there any other way not involving cluster activity to solve this
> > >> problem?
> > >>
> > >> The change to queued state instead of fail was my proposal at the
> > >> beginning, and it really pauses the DAG.
> > >> This is the type of solution I was thinking, because as I said before
> > >> in
> > >> the pull request, I feel that the cluster activity behavior is just a
> > >> symptom from a bigger problem (the DAGs doesn't really pause, they
> > >> just
> > >> keep running).
> > >>
> > >> -
> > >> To unsu

Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

2025-04-24 Thread Brent Bovenzi
Yeah, if we do a similar endpoint we should filter it to only include
unpaused Dags. We do check if the dag is paused during auto refresh in a
lot of places.

On Fri, Apr 18, 2025 at 3:44 PM Pedro Nunes Leal
 wrote:

> A 2025-04-03 19:28, Brent Bovenzi escreveu:
> > The issue is that duration is based off of start and end dates. If
> > there is
> > no end date we usually default to now. But that is misleading when a
> > dag
> > run is running but the dag is paused.
> > Let me take a look at where we use duration in the 3.0 UI and see if we
> > can
> > reduce that confusion. We don't have the "5 longest dag runs" in our
> > new
> > dashboard page, which replaces cluster activity. If we wanted that
> > feature
> > again, we should be mindful of this and filter out paused dags in the
> > API
> > request.
> >
> >
> >
> > On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal
> >  wrote:
> >
> >> A 2025-03-31 22:26, Jens Scheffler escreveu:
> >> > Hi,
> >> >
> >> > thanks for working on the bug and raising a PR to fix it.
> >> >
> >> > As other commiters also commented I think from product view I'd expect
> >> > a
> >> > different resolution. We use the "Pause DAG" in most cases for
> >> > administrative or infrastructure problems to prevent further failures
> >> > and/or to drain infra to switch some backend.
> >> >
> >> > I assume when we pause a long-running DAG that is in-between execution
> >> > of tasks we want to really "pause" scheduling, we don't want to set it
> >> > to failed. That would also not be correct because once we un-pause the
> >> > running DAGs should continoue to work. I see no reason marking this
> >> > failed anf then manually running behind to reset the state later.
> >> >
> >> > My view on this is that as also proposed in the discussion of the bug,
> >> > we should rather filter the paused DAG from clouster activity
> reporting
> >> > such that paused DAGs are not reported with excessive runtime. Also
> >> > later if un-paused it would be "right" that the overall DAG runtime
> was
> >> > longer than normal (would not expect to deduct the paused time from
> >> > runtime of the DAG.)
> >> >
> >> > If I want (as operator/admin) to really terminate existing running
> >> > instances I'd rather walk through Browse -> DAG Runs --> Filter for
> >> > running with paused DAG id and mark them as failed explicitly.
> >> >
> >> > Jens
> >> >
> >> > On 31.03.25 20:50, Pedro Nunes Leal wrote:
> >> >> Hello everyone,
> >> >>
> >> >> Currently, I'm trying to fix this bug:
> >> >> https://github.com/apache/airflow/issues/3
> >> >>
> >> >> Basically, the issue is that the DAGs would be stuck on running even
> >> >> though they were paused.
> >> >> Consequently, the duration of the dag run will keep on increasing
> even
> >> >> though the DAG is paused.
> >> >>
> >> >> My proposal to solve this problem is changing the DAGs state from
> >> >> running to failed, when paused, to avoid the increment of their
> >> >> duration.
> >> >>
> >> >> Since this can be an impactful change, I would like to hear what
> >> >> others think about it.
> >> >>
> >> >> Link for the Pull Request:
> >> >> https://github.com/apache/airflow/pull/47557
> >> >>
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >> >>
> >> >
> >> > -
> >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >> That can be a better approach.
> >>
> >> However, if I'm not mistaken, the code related to the cluster activity
> >> page doesn't exist in Airflow 3 (the version where I'm trying to do
> >> the
> >> changes).
> >>
> >> So what should I do in this case?
> >> Is there any other way not involving cluster activity to solve this
> >> problem?
> >>
> >> The change to queued state instead of fail was my proposal at the
> >> beginning, and it really pauses the DAG.
> >> This is the type of solution I was thinking, because as I said before
> >> in
> >> the pull request, I feel that the cluster activity behavior is just a
> >> symptom from a bigger problem (the DAGs doesn't really pause, they
> >> just
> >> keep running).
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >>
> Hello,
>
> Any update related to the use of duration in the UI 3.0?
>
> Maybe this bug isn't really an issue if cluster activity was removed in
> the newer version, and it's just something to have in mind in case
> something similar to cluster activity is implemented in 3.0 UI.
>
>  From what I understand, the current behavior of staying on running and
> the duration increasing is what is expected from the pau

Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

2025-04-18 Thread Pedro Nunes Leal

A 2025-04-03 19:28, Brent Bovenzi escreveu:
The issue is that duration is based off of start and end dates. If 
there is
no end date we usually default to now. But that is misleading when a 
dag

run is running but the dag is paused.
Let me take a look at where we use duration in the 3.0 UI and see if we 
can
reduce that confusion. We don't have the "5 longest dag runs" in our 
new
dashboard page, which replaces cluster activity. If we wanted that 
feature
again, we should be mindful of this and filter out paused dags in the 
API

request.



On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal
 wrote:


A 2025-03-31 22:26, Jens Scheffler escreveu:
> Hi,
>
> thanks for working on the bug and raising a PR to fix it.
>
> As other commiters also commented I think from product view I'd expect
> a
> different resolution. We use the "Pause DAG" in most cases for
> administrative or infrastructure problems to prevent further failures
> and/or to drain infra to switch some backend.
>
> I assume when we pause a long-running DAG that is in-between execution
> of tasks we want to really "pause" scheduling, we don't want to set it
> to failed. That would also not be correct because once we un-pause the
> running DAGs should continoue to work. I see no reason marking this
> failed anf then manually running behind to reset the state later.
>
> My view on this is that as also proposed in the discussion of the bug,
> we should rather filter the paused DAG from clouster activity reporting
> such that paused DAGs are not reported with excessive runtime. Also
> later if un-paused it would be "right" that the overall DAG runtime was
> longer than normal (would not expect to deduct the paused time from
> runtime of the DAG.)
>
> If I want (as operator/admin) to really terminate existing running
> instances I'd rather walk through Browse -> DAG Runs --> Filter for
> running with paused DAG id and mark them as failed explicitly.
>
> Jens
>
> On 31.03.25 20:50, Pedro Nunes Leal wrote:
>> Hello everyone,
>>
>> Currently, I'm trying to fix this bug:
>> https://github.com/apache/airflow/issues/3
>>
>> Basically, the issue is that the DAGs would be stuck on running even
>> though they were paused.
>> Consequently, the duration of the dag run will keep on increasing even
>> though the DAG is paused.
>>
>> My proposal to solve this problem is changing the DAGs state from
>> running to failed, when paused, to avoid the increment of their
>> duration.
>>
>> Since this can be an impactful change, I would like to hear what
>> others think about it.
>>
>> Link for the Pull Request:
>> https://github.com/apache/airflow/pull/47557
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
That can be a better approach.

However, if I'm not mistaken, the code related to the cluster activity
page doesn't exist in Airflow 3 (the version where I'm trying to do 
the

changes).

So what should I do in this case?
Is there any other way not involving cluster activity to solve this
problem?

The change to queued state instead of fail was my proposal at the
beginning, and it really pauses the DAG.
This is the type of solution I was thinking, because as I said before 
in

the pull request, I feel that the cluster activity behavior is just a
symptom from a bigger problem (the DAGs doesn't really pause, they 
just

keep running).

-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



Hello,

Any update related to the use of duration in the UI 3.0?

Maybe this bug isn't really an issue if cluster activity was removed in 
the newer version, and it's just something to have in mind in case 
something similar to cluster activity is implemented in 3.0 UI.


From what I understand, the current behavior of staying on running and 
the duration increasing is what is expected from the pause 
functionality.


-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

2025-04-03 Thread Brent Bovenzi
The issue is that duration is based off of start and end dates. If there is
no end date we usually default to now. But that is misleading when a dag
run is running but the dag is paused.
Let me take a look at where we use duration in the 3.0 UI and see if we can
reduce that confusion. We don't have the "5 longest dag runs" in our new
dashboard page, which replaces cluster activity. If we wanted that feature
again, we should be mindful of this and filter out paused dags in the API
request.



On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal
 wrote:

> A 2025-03-31 22:26, Jens Scheffler escreveu:
> > Hi,
> >
> > thanks for working on the bug and raising a PR to fix it.
> >
> > As other commiters also commented I think from product view I'd expect
> > a
> > different resolution. We use the "Pause DAG" in most cases for
> > administrative or infrastructure problems to prevent further failures
> > and/or to drain infra to switch some backend.
> >
> > I assume when we pause a long-running DAG that is in-between execution
> > of tasks we want to really "pause" scheduling, we don't want to set it
> > to failed. That would also not be correct because once we un-pause the
> > running DAGs should continoue to work. I see no reason marking this
> > failed anf then manually running behind to reset the state later.
> >
> > My view on this is that as also proposed in the discussion of the bug,
> > we should rather filter the paused DAG from clouster activity reporting
> > such that paused DAGs are not reported with excessive runtime. Also
> > later if un-paused it would be "right" that the overall DAG runtime was
> > longer than normal (would not expect to deduct the paused time from
> > runtime of the DAG.)
> >
> > If I want (as operator/admin) to really terminate existing running
> > instances I'd rather walk through Browse -> DAG Runs --> Filter for
> > running with paused DAG id and mark them as failed explicitly.
> >
> > Jens
> >
> > On 31.03.25 20:50, Pedro Nunes Leal wrote:
> >> Hello everyone,
> >>
> >> Currently, I'm trying to fix this bug:
> >> https://github.com/apache/airflow/issues/3
> >>
> >> Basically, the issue is that the DAGs would be stuck on running even
> >> though they were paused.
> >> Consequently, the duration of the dag run will keep on increasing even
> >> though the DAG is paused.
> >>
> >> My proposal to solve this problem is changing the DAGs state from
> >> running to failed, when paused, to avoid the increment of their
> >> duration.
> >>
> >> Since this can be an impactful change, I would like to hear what
> >> others think about it.
> >>
> >> Link for the Pull Request:
> >> https://github.com/apache/airflow/pull/47557
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> That can be a better approach.
>
> However, if I'm not mistaken, the code related to the cluster activity
> page doesn't exist in Airflow 3 (the version where I'm trying to do the
> changes).
>
> So what should I do in this case?
> Is there any other way not involving cluster activity to solve this
> problem?
>
> The change to queued state instead of fail was my proposal at the
> beginning, and it really pauses the DAG.
> This is the type of solution I was thinking, because as I said before in
> the pull request, I feel that the cluster activity behavior is just a
> symptom from a bigger problem (the DAGs doesn't really pause, they just
> keep running).
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>


Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

2025-04-03 Thread Pedro Nunes Leal

A 2025-03-31 22:26, Jens Scheffler escreveu:

Hi,

thanks for working on the bug and raising a PR to fix it.

As other commiters also commented I think from product view I'd expect 
a

different resolution. We use the "Pause DAG" in most cases for
administrative or infrastructure problems to prevent further failures
and/or to drain infra to switch some backend.

I assume when we pause a long-running DAG that is in-between execution
of tasks we want to really "pause" scheduling, we don't want to set it
to failed. That would also not be correct because once we un-pause the
running DAGs should continoue to work. I see no reason marking this
failed anf then manually running behind to reset the state later.

My view on this is that as also proposed in the discussion of the bug,
we should rather filter the paused DAG from clouster activity reporting
such that paused DAGs are not reported with excessive runtime. Also
later if un-paused it would be "right" that the overall DAG runtime was
longer than normal (would not expect to deduct the paused time from
runtime of the DAG.)

If I want (as operator/admin) to really terminate existing running
instances I'd rather walk through Browse -> DAG Runs --> Filter for
running with paused DAG id and mark them as failed explicitly.

Jens

On 31.03.25 20:50, Pedro Nunes Leal wrote:

Hello everyone,

Currently, I'm trying to fix this bug:
https://github.com/apache/airflow/issues/3

Basically, the issue is that the DAGs would be stuck on running even
though they were paused.
Consequently, the duration of the dag run will keep on increasing even
though the DAG is paused.

My proposal to solve this problem is changing the DAGs state from
running to failed, when paused, to avoid the increment of their 
duration.


Since this can be an impactful change, I would like to hear what
others think about it.

Link for the Pull Request: 
https://github.com/apache/airflow/pull/47557



-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

That can be a better approach.

However, if I'm not mistaken, the code related to the cluster activity 
page doesn't exist in Airflow 3 (the version where I'm trying to do the 
changes).


So what should I do in this case?
Is there any other way not involving cluster activity to solve this 
problem?


The change to queued state instead of fail was my proposal at the 
beginning, and it really pauses the DAG.
This is the type of solution I was thinking, because as I said before in 
the pull request, I feel that the cluster activity behavior is just a 
symptom from a bigger problem (the DAGs doesn't really pause, they just 
keep running).


-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

2025-03-31 Thread Jens Scheffler

Hi,

thanks for working on the bug and raising a PR to fix it.

As other commiters also commented I think from product view I'd expect a
different resolution. We use the "Pause DAG" in most cases for
administrative or infrastructure problems to prevent further failures
and/or to drain infra to switch some backend.

I assume when we pause a long-running DAG that is in-between execution
of tasks we want to really "pause" scheduling, we don't want to set it
to failed. That would also not be correct because once we un-pause the
running DAGs should continoue to work. I see no reason marking this
failed anf then manually running behind to reset the state later.

My view on this is that as also proposed in the discussion of the bug,
we should rather filter the paused DAG from clouster activity reporting
such that paused DAGs are not reported with excessive runtime. Also
later if un-paused it would be "right" that the overall DAG runtime was
longer than normal (would not expect to deduct the paused time from
runtime of the DAG.)

If I want (as operator/admin) to really terminate existing running
instances I'd rather walk through Browse -> DAG Runs --> Filter for
running with paused DAG id and mark them as failed explicitly.

Jens

On 31.03.25 20:50, Pedro Nunes Leal wrote:

Hello everyone,

Currently, I'm trying to fix this bug:
https://github.com/apache/airflow/issues/3

Basically, the issue is that the DAGs would be stuck on running even
though they were paused.
Consequently, the duration of the dag run will keep on increasing even
though the DAG is paused.

My proposal to solve this problem is changing the DAGs state from
running to failed, when paused, to avoid the increment of their duration.

Since this can be an impactful change, I would like to hear what
others think about it.

Link for the Pull Request: https://github.com/apache/airflow/pull/47557


-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org



Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.

2025-03-31 Thread Daniel Standish
I concur with Jens. Pause should just be pause.



On Mon, Mar 31, 2025 at 2:26 PM Jens Scheffler 
wrote:

> Hi,
>
> thanks for working on the bug and raising a PR to fix it.
>
> As other commiters also commented I think from product view I'd expect a
> different resolution. We use the "Pause DAG" in most cases for
> administrative or infrastructure problems to prevent further failures
> and/or to drain infra to switch some backend.
>
> I assume when we pause a long-running DAG that is in-between execution
> of tasks we want to really "pause" scheduling, we don't want to set it
> to failed. That would also not be correct because once we un-pause the
> running DAGs should continoue to work. I see no reason marking this
> failed anf then manually running behind to reset the state later.
>
> My view on this is that as also proposed in the discussion of the bug,
> we should rather filter the paused DAG from clouster activity reporting
> such that paused DAGs are not reported with excessive runtime. Also
> later if un-paused it would be "right" that the overall DAG runtime was
> longer than normal (would not expect to deduct the paused time from
> runtime of the DAG.)
>
> If I want (as operator/admin) to really terminate existing running
> instances I'd rather walk through Browse -> DAG Runs --> Filter for
> running with paused DAG id and mark them as failed explicitly.
>
> Jens
>
> On 31.03.25 20:50, Pedro Nunes Leal wrote:
> > Hello everyone,
> >
> > Currently, I'm trying to fix this bug:
> > https://github.com/apache/airflow/issues/3
> >
> > Basically, the issue is that the DAGs would be stuck on running even
> > though they were paused.
> > Consequently, the duration of the dag run will keep on increasing even
> > though the DAG is paused.
> >
> > My proposal to solve this problem is changing the DAGs state from
> > running to failed, when paused, to avoid the increment of their duration.
> >
> > Since this can be an impactful change, I would like to hear what
> > others think about it.
> >
> > Link for the Pull Request: https://github.com/apache/airflow/pull/47557
> >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>