Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.
I agree with Elad. If I pause a DAG, I expect for whatever is running to continue and for the scheduler to stop scheduling dag runs and new tasks. I don't think we should change the behaviour to fix a chart. That feels backwards. Separately: In my view, choosing to to prevent or run new tasks from the same dag run after a dag is paused could have gone either way (if you consider the job the entire dag run, then letting the job finish feels the most ideal), but I can see why the choice was made and at this point think it's what people expect. On Thu, Apr 24, 2025 at 2:09 PM Elad Kalif wrote: > I am -1 for changing this especially just to solve duration calculation. > The current behavior is key for drain use case which is very useful. > > I don't think I will change my -1 before > https://github.com/apache/airflow/issues/22006 is resolved > > > On Thu, Apr 24, 2025 at 7:56 PM Brent Bovenzi > > wrote: > > > Yeah, if we do a similar endpoint we should filter it to only include > > unpaused Dags. We do check if the dag is paused during auto refresh in a > > lot of places. > > > > On Fri, Apr 18, 2025 at 3:44 PM Pedro Nunes Leal > > wrote: > > > > > A 2025-04-03 19:28, Brent Bovenzi escreveu: > > > > The issue is that duration is based off of start and end dates. If > > > > there is > > > > no end date we usually default to now. But that is misleading when a > > > > dag > > > > run is running but the dag is paused. > > > > Let me take a look at where we use duration in the 3.0 UI and see if > we > > > > can > > > > reduce that confusion. We don't have the "5 longest dag runs" in our > > > > new > > > > dashboard page, which replaces cluster activity. If we wanted that > > > > feature > > > > again, we should be mindful of this and filter out paused dags in the > > > > API > > > > request. > > > > > > > > > > > > > > > > On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal > > > > wrote: > > > > > > > >> A 2025-03-31 22:26, Jens Scheffler escreveu: > > > >> > Hi, > > > >> > > > > >> > thanks for working on the bug and raising a PR to fix it. > > > >> > > > > >> > As other commiters also commented I think from product view I'd > > expect > > > >> > a > > > >> > different resolution. We use the "Pause DAG" in most cases for > > > >> > administrative or infrastructure problems to prevent further > > failures > > > >> > and/or to drain infra to switch some backend. > > > >> > > > > >> > I assume when we pause a long-running DAG that is in-between > > execution > > > >> > of tasks we want to really "pause" scheduling, we don't want to > set > > it > > > >> > to failed. That would also not be correct because once we un-pause > > the > > > >> > running DAGs should continoue to work. I see no reason marking > this > > > >> > failed anf then manually running behind to reset the state later. > > > >> > > > > >> > My view on this is that as also proposed in the discussion of the > > bug, > > > >> > we should rather filter the paused DAG from clouster activity > > > reporting > > > >> > such that paused DAGs are not reported with excessive runtime. > Also > > > >> > later if un-paused it would be "right" that the overall DAG > runtime > > > was > > > >> > longer than normal (would not expect to deduct the paused time > from > > > >> > runtime of the DAG.) > > > >> > > > > >> > If I want (as operator/admin) to really terminate existing running > > > >> > instances I'd rather walk through Browse -> DAG Runs --> Filter > for > > > >> > running with paused DAG id and mark them as failed explicitly. > > > >> > > > > >> > Jens > > > >> > > > > >> > On 31.03.25 20:50, Pedro Nunes Leal wrote: > > > >> >> Hello everyone, > > > >> >> > > > >> >> Currently, I'm trying to fix this bug: > > > >> >> https://github.com/apache/airflow/issues/3 > > > >> >> > > > >> >> Basically, the issue is that the DAGs would be stuck on running > > even > > > >> >> though they were paused. > > > >> >> Consequently, the duration of the dag run will keep on increasing > > > even > > > >> >> though the DAG is paused. > > > >> >> > > > >> >> My proposal to solve this problem is changing the DAGs state from > > > >> >> running to failed, when paused, to avoid the increment of their > > > >> >> duration. > > > >> >> > > > >> >> Since this can be an impactful change, I would like to hear what > > > >> >> others think about it. > > > >> >> > > > >> >> Link for the Pull Request: > > > >> >> https://github.com/apache/airflow/pull/47557 > > > >> >> > > > >> >> > > > >> >> > > - > > > >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > >> >> For additional commands, e-mail: dev-h...@airflow.apache.org > > > >> >> > > > >> > > > > >> > > > - > > > >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > >> > For additional commands, e-mail: dev-h...@airflow.apache.org > > > >> That can be a better
Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.
I am -1 for changing this especially just to solve duration calculation. The current behavior is key for drain use case which is very useful. I don't think I will change my -1 before https://github.com/apache/airflow/issues/22006 is resolved On Thu, Apr 24, 2025 at 7:56 PM Brent Bovenzi wrote: > Yeah, if we do a similar endpoint we should filter it to only include > unpaused Dags. We do check if the dag is paused during auto refresh in a > lot of places. > > On Fri, Apr 18, 2025 at 3:44 PM Pedro Nunes Leal > wrote: > > > A 2025-04-03 19:28, Brent Bovenzi escreveu: > > > The issue is that duration is based off of start and end dates. If > > > there is > > > no end date we usually default to now. But that is misleading when a > > > dag > > > run is running but the dag is paused. > > > Let me take a look at where we use duration in the 3.0 UI and see if we > > > can > > > reduce that confusion. We don't have the "5 longest dag runs" in our > > > new > > > dashboard page, which replaces cluster activity. If we wanted that > > > feature > > > again, we should be mindful of this and filter out paused dags in the > > > API > > > request. > > > > > > > > > > > > On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal > > > wrote: > > > > > >> A 2025-03-31 22:26, Jens Scheffler escreveu: > > >> > Hi, > > >> > > > >> > thanks for working on the bug and raising a PR to fix it. > > >> > > > >> > As other commiters also commented I think from product view I'd > expect > > >> > a > > >> > different resolution. We use the "Pause DAG" in most cases for > > >> > administrative or infrastructure problems to prevent further > failures > > >> > and/or to drain infra to switch some backend. > > >> > > > >> > I assume when we pause a long-running DAG that is in-between > execution > > >> > of tasks we want to really "pause" scheduling, we don't want to set > it > > >> > to failed. That would also not be correct because once we un-pause > the > > >> > running DAGs should continoue to work. I see no reason marking this > > >> > failed anf then manually running behind to reset the state later. > > >> > > > >> > My view on this is that as also proposed in the discussion of the > bug, > > >> > we should rather filter the paused DAG from clouster activity > > reporting > > >> > such that paused DAGs are not reported with excessive runtime. Also > > >> > later if un-paused it would be "right" that the overall DAG runtime > > was > > >> > longer than normal (would not expect to deduct the paused time from > > >> > runtime of the DAG.) > > >> > > > >> > If I want (as operator/admin) to really terminate existing running > > >> > instances I'd rather walk through Browse -> DAG Runs --> Filter for > > >> > running with paused DAG id and mark them as failed explicitly. > > >> > > > >> > Jens > > >> > > > >> > On 31.03.25 20:50, Pedro Nunes Leal wrote: > > >> >> Hello everyone, > > >> >> > > >> >> Currently, I'm trying to fix this bug: > > >> >> https://github.com/apache/airflow/issues/3 > > >> >> > > >> >> Basically, the issue is that the DAGs would be stuck on running > even > > >> >> though they were paused. > > >> >> Consequently, the duration of the dag run will keep on increasing > > even > > >> >> though the DAG is paused. > > >> >> > > >> >> My proposal to solve this problem is changing the DAGs state from > > >> >> running to failed, when paused, to avoid the increment of their > > >> >> duration. > > >> >> > > >> >> Since this can be an impactful change, I would like to hear what > > >> >> others think about it. > > >> >> > > >> >> Link for the Pull Request: > > >> >> https://github.com/apache/airflow/pull/47557 > > >> >> > > >> >> > > >> >> > - > > >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >> >> For additional commands, e-mail: dev-h...@airflow.apache.org > > >> >> > > >> > > > >> > > - > > >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >> > For additional commands, e-mail: dev-h...@airflow.apache.org > > >> That can be a better approach. > > >> > > >> However, if I'm not mistaken, the code related to the cluster activity > > >> page doesn't exist in Airflow 3 (the version where I'm trying to do > > >> the > > >> changes). > > >> > > >> So what should I do in this case? > > >> Is there any other way not involving cluster activity to solve this > > >> problem? > > >> > > >> The change to queued state instead of fail was my proposal at the > > >> beginning, and it really pauses the DAG. > > >> This is the type of solution I was thinking, because as I said before > > >> in > > >> the pull request, I feel that the cluster activity behavior is just a > > >> symptom from a bigger problem (the DAGs doesn't really pause, they > > >> just > > >> keep running). > > >> > > >> - > > >> To unsu
Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.
Yeah, if we do a similar endpoint we should filter it to only include unpaused Dags. We do check if the dag is paused during auto refresh in a lot of places. On Fri, Apr 18, 2025 at 3:44 PM Pedro Nunes Leal wrote: > A 2025-04-03 19:28, Brent Bovenzi escreveu: > > The issue is that duration is based off of start and end dates. If > > there is > > no end date we usually default to now. But that is misleading when a > > dag > > run is running but the dag is paused. > > Let me take a look at where we use duration in the 3.0 UI and see if we > > can > > reduce that confusion. We don't have the "5 longest dag runs" in our > > new > > dashboard page, which replaces cluster activity. If we wanted that > > feature > > again, we should be mindful of this and filter out paused dags in the > > API > > request. > > > > > > > > On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal > > wrote: > > > >> A 2025-03-31 22:26, Jens Scheffler escreveu: > >> > Hi, > >> > > >> > thanks for working on the bug and raising a PR to fix it. > >> > > >> > As other commiters also commented I think from product view I'd expect > >> > a > >> > different resolution. We use the "Pause DAG" in most cases for > >> > administrative or infrastructure problems to prevent further failures > >> > and/or to drain infra to switch some backend. > >> > > >> > I assume when we pause a long-running DAG that is in-between execution > >> > of tasks we want to really "pause" scheduling, we don't want to set it > >> > to failed. That would also not be correct because once we un-pause the > >> > running DAGs should continoue to work. I see no reason marking this > >> > failed anf then manually running behind to reset the state later. > >> > > >> > My view on this is that as also proposed in the discussion of the bug, > >> > we should rather filter the paused DAG from clouster activity > reporting > >> > such that paused DAGs are not reported with excessive runtime. Also > >> > later if un-paused it would be "right" that the overall DAG runtime > was > >> > longer than normal (would not expect to deduct the paused time from > >> > runtime of the DAG.) > >> > > >> > If I want (as operator/admin) to really terminate existing running > >> > instances I'd rather walk through Browse -> DAG Runs --> Filter for > >> > running with paused DAG id and mark them as failed explicitly. > >> > > >> > Jens > >> > > >> > On 31.03.25 20:50, Pedro Nunes Leal wrote: > >> >> Hello everyone, > >> >> > >> >> Currently, I'm trying to fix this bug: > >> >> https://github.com/apache/airflow/issues/3 > >> >> > >> >> Basically, the issue is that the DAGs would be stuck on running even > >> >> though they were paused. > >> >> Consequently, the duration of the dag run will keep on increasing > even > >> >> though the DAG is paused. > >> >> > >> >> My proposal to solve this problem is changing the DAGs state from > >> >> running to failed, when paused, to avoid the increment of their > >> >> duration. > >> >> > >> >> Since this can be an impactful change, I would like to hear what > >> >> others think about it. > >> >> > >> >> Link for the Pull Request: > >> >> https://github.com/apache/airflow/pull/47557 > >> >> > >> >> > >> >> - > >> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> >> For additional commands, e-mail: dev-h...@airflow.apache.org > >> >> > >> > > >> > - > >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> > For additional commands, e-mail: dev-h...@airflow.apache.org > >> That can be a better approach. > >> > >> However, if I'm not mistaken, the code related to the cluster activity > >> page doesn't exist in Airflow 3 (the version where I'm trying to do > >> the > >> changes). > >> > >> So what should I do in this case? > >> Is there any other way not involving cluster activity to solve this > >> problem? > >> > >> The change to queued state instead of fail was my proposal at the > >> beginning, and it really pauses the DAG. > >> This is the type of solution I was thinking, because as I said before > >> in > >> the pull request, I feel that the cluster activity behavior is just a > >> symptom from a bigger problem (the DAGs doesn't really pause, they > >> just > >> keep running). > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> For additional commands, e-mail: dev-h...@airflow.apache.org > >> > >> > Hello, > > Any update related to the use of duration in the UI 3.0? > > Maybe this bug isn't really an issue if cluster activity was removed in > the newer version, and it's just something to have in mind in case > something similar to cluster activity is implemented in 3.0 UI. > > From what I understand, the current behavior of staying on running and > the duration increasing is what is expected from the pau
Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.
A 2025-04-03 19:28, Brent Bovenzi escreveu: The issue is that duration is based off of start and end dates. If there is no end date we usually default to now. But that is misleading when a dag run is running but the dag is paused. Let me take a look at where we use duration in the 3.0 UI and see if we can reduce that confusion. We don't have the "5 longest dag runs" in our new dashboard page, which replaces cluster activity. If we wanted that feature again, we should be mindful of this and filter out paused dags in the API request. On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal wrote: A 2025-03-31 22:26, Jens Scheffler escreveu: > Hi, > > thanks for working on the bug and raising a PR to fix it. > > As other commiters also commented I think from product view I'd expect > a > different resolution. We use the "Pause DAG" in most cases for > administrative or infrastructure problems to prevent further failures > and/or to drain infra to switch some backend. > > I assume when we pause a long-running DAG that is in-between execution > of tasks we want to really "pause" scheduling, we don't want to set it > to failed. That would also not be correct because once we un-pause the > running DAGs should continoue to work. I see no reason marking this > failed anf then manually running behind to reset the state later. > > My view on this is that as also proposed in the discussion of the bug, > we should rather filter the paused DAG from clouster activity reporting > such that paused DAGs are not reported with excessive runtime. Also > later if un-paused it would be "right" that the overall DAG runtime was > longer than normal (would not expect to deduct the paused time from > runtime of the DAG.) > > If I want (as operator/admin) to really terminate existing running > instances I'd rather walk through Browse -> DAG Runs --> Filter for > running with paused DAG id and mark them as failed explicitly. > > Jens > > On 31.03.25 20:50, Pedro Nunes Leal wrote: >> Hello everyone, >> >> Currently, I'm trying to fix this bug: >> https://github.com/apache/airflow/issues/3 >> >> Basically, the issue is that the DAGs would be stuck on running even >> though they were paused. >> Consequently, the duration of the dag run will keep on increasing even >> though the DAG is paused. >> >> My proposal to solve this problem is changing the DAGs state from >> running to failed, when paused, to avoid the increment of their >> duration. >> >> Since this can be an impactful change, I would like to hear what >> others think about it. >> >> Link for the Pull Request: >> https://github.com/apache/airflow/pull/47557 >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org >> For additional commands, e-mail: dev-h...@airflow.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org That can be a better approach. However, if I'm not mistaken, the code related to the cluster activity page doesn't exist in Airflow 3 (the version where I'm trying to do the changes). So what should I do in this case? Is there any other way not involving cluster activity to solve this problem? The change to queued state instead of fail was my proposal at the beginning, and it really pauses the DAG. This is the type of solution I was thinking, because as I said before in the pull request, I feel that the cluster activity behavior is just a symptom from a bigger problem (the DAGs doesn't really pause, they just keep running). - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org Hello, Any update related to the use of duration in the UI 3.0? Maybe this bug isn't really an issue if cluster activity was removed in the newer version, and it's just something to have in mind in case something similar to cluster activity is implemented in 3.0 UI. From what I understand, the current behavior of staying on running and the duration increasing is what is expected from the pause functionality. - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org
Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.
The issue is that duration is based off of start and end dates. If there is no end date we usually default to now. But that is misleading when a dag run is running but the dag is paused. Let me take a look at where we use duration in the 3.0 UI and see if we can reduce that confusion. We don't have the "5 longest dag runs" in our new dashboard page, which replaces cluster activity. If we wanted that feature again, we should be mindful of this and filter out paused dags in the API request. On Thu, Apr 3, 2025, 1:27 PM Pedro Nunes Leal wrote: > A 2025-03-31 22:26, Jens Scheffler escreveu: > > Hi, > > > > thanks for working on the bug and raising a PR to fix it. > > > > As other commiters also commented I think from product view I'd expect > > a > > different resolution. We use the "Pause DAG" in most cases for > > administrative or infrastructure problems to prevent further failures > > and/or to drain infra to switch some backend. > > > > I assume when we pause a long-running DAG that is in-between execution > > of tasks we want to really "pause" scheduling, we don't want to set it > > to failed. That would also not be correct because once we un-pause the > > running DAGs should continoue to work. I see no reason marking this > > failed anf then manually running behind to reset the state later. > > > > My view on this is that as also proposed in the discussion of the bug, > > we should rather filter the paused DAG from clouster activity reporting > > such that paused DAGs are not reported with excessive runtime. Also > > later if un-paused it would be "right" that the overall DAG runtime was > > longer than normal (would not expect to deduct the paused time from > > runtime of the DAG.) > > > > If I want (as operator/admin) to really terminate existing running > > instances I'd rather walk through Browse -> DAG Runs --> Filter for > > running with paused DAG id and mark them as failed explicitly. > > > > Jens > > > > On 31.03.25 20:50, Pedro Nunes Leal wrote: > >> Hello everyone, > >> > >> Currently, I'm trying to fix this bug: > >> https://github.com/apache/airflow/issues/3 > >> > >> Basically, the issue is that the DAGs would be stuck on running even > >> though they were paused. > >> Consequently, the duration of the dag run will keep on increasing even > >> though the DAG is paused. > >> > >> My proposal to solve this problem is changing the DAGs state from > >> running to failed, when paused, to avoid the increment of their > >> duration. > >> > >> Since this can be an impactful change, I would like to hear what > >> others think about it. > >> > >> Link for the Pull Request: > >> https://github.com/apache/airflow/pull/47557 > >> > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > >> For additional commands, e-mail: dev-h...@airflow.apache.org > >> > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > That can be a better approach. > > However, if I'm not mistaken, the code related to the cluster activity > page doesn't exist in Airflow 3 (the version where I'm trying to do the > changes). > > So what should I do in this case? > Is there any other way not involving cluster activity to solve this > problem? > > The change to queued state instead of fail was my proposal at the > beginning, and it really pauses the DAG. > This is the type of solution I was thinking, because as I said before in > the pull request, I feel that the cluster activity behavior is just a > symptom from a bigger problem (the DAGs doesn't really pause, they just > keep running). > > - > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >
Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.
A 2025-03-31 22:26, Jens Scheffler escreveu: Hi, thanks for working on the bug and raising a PR to fix it. As other commiters also commented I think from product view I'd expect a different resolution. We use the "Pause DAG" in most cases for administrative or infrastructure problems to prevent further failures and/or to drain infra to switch some backend. I assume when we pause a long-running DAG that is in-between execution of tasks we want to really "pause" scheduling, we don't want to set it to failed. That would also not be correct because once we un-pause the running DAGs should continoue to work. I see no reason marking this failed anf then manually running behind to reset the state later. My view on this is that as also proposed in the discussion of the bug, we should rather filter the paused DAG from clouster activity reporting such that paused DAGs are not reported with excessive runtime. Also later if un-paused it would be "right" that the overall DAG runtime was longer than normal (would not expect to deduct the paused time from runtime of the DAG.) If I want (as operator/admin) to really terminate existing running instances I'd rather walk through Browse -> DAG Runs --> Filter for running with paused DAG id and mark them as failed explicitly. Jens On 31.03.25 20:50, Pedro Nunes Leal wrote: Hello everyone, Currently, I'm trying to fix this bug: https://github.com/apache/airflow/issues/3 Basically, the issue is that the DAGs would be stuck on running even though they were paused. Consequently, the duration of the dag run will keep on increasing even though the DAG is paused. My proposal to solve this problem is changing the DAGs state from running to failed, when paused, to avoid the increment of their duration. Since this can be an impactful change, I would like to hear what others think about it. Link for the Pull Request: https://github.com/apache/airflow/pull/47557 - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org That can be a better approach. However, if I'm not mistaken, the code related to the cluster activity page doesn't exist in Airflow 3 (the version where I'm trying to do the changes). So what should I do in this case? Is there any other way not involving cluster activity to solve this problem? The change to queued state instead of fail was my proposal at the beginning, and it really pauses the DAG. This is the type of solution I was thinking, because as I said before in the pull request, I feel that the cluster activity behavior is just a symptom from a bigger problem (the DAGs doesn't really pause, they just keep running). - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org
Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.
Hi, thanks for working on the bug and raising a PR to fix it. As other commiters also commented I think from product view I'd expect a different resolution. We use the "Pause DAG" in most cases for administrative or infrastructure problems to prevent further failures and/or to drain infra to switch some backend. I assume when we pause a long-running DAG that is in-between execution of tasks we want to really "pause" scheduling, we don't want to set it to failed. That would also not be correct because once we un-pause the running DAGs should continoue to work. I see no reason marking this failed anf then manually running behind to reset the state later. My view on this is that as also proposed in the discussion of the bug, we should rather filter the paused DAG from clouster activity reporting such that paused DAGs are not reported with excessive runtime. Also later if un-paused it would be "right" that the overall DAG runtime was longer than normal (would not expect to deduct the paused time from runtime of the DAG.) If I want (as operator/admin) to really terminate existing running instances I'd rather walk through Browse -> DAG Runs --> Filter for running with paused DAG id and mark them as failed explicitly. Jens On 31.03.25 20:50, Pedro Nunes Leal wrote: Hello everyone, Currently, I'm trying to fix this bug: https://github.com/apache/airflow/issues/3 Basically, the issue is that the DAGs would be stuck on running even though they were paused. Consequently, the duration of the dag run will keep on increasing even though the DAG is paused. My proposal to solve this problem is changing the DAGs state from running to failed, when paused, to avoid the increment of their duration. Since this can be an impactful change, I would like to hear what others think about it. Link for the Pull Request: https://github.com/apache/airflow/pull/47557 - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org - To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org
Re: [DISCUSS] When a DAG is paused, change the dag run state from running to failed.
I concur with Jens. Pause should just be pause. On Mon, Mar 31, 2025 at 2:26 PM Jens Scheffler wrote: > Hi, > > thanks for working on the bug and raising a PR to fix it. > > As other commiters also commented I think from product view I'd expect a > different resolution. We use the "Pause DAG" in most cases for > administrative or infrastructure problems to prevent further failures > and/or to drain infra to switch some backend. > > I assume when we pause a long-running DAG that is in-between execution > of tasks we want to really "pause" scheduling, we don't want to set it > to failed. That would also not be correct because once we un-pause the > running DAGs should continoue to work. I see no reason marking this > failed anf then manually running behind to reset the state later. > > My view on this is that as also proposed in the discussion of the bug, > we should rather filter the paused DAG from clouster activity reporting > such that paused DAGs are not reported with excessive runtime. Also > later if un-paused it would be "right" that the overall DAG runtime was > longer than normal (would not expect to deduct the paused time from > runtime of the DAG.) > > If I want (as operator/admin) to really terminate existing running > instances I'd rather walk through Browse -> DAG Runs --> Filter for > running with paused DAG id and mark them as failed explicitly. > > Jens > > On 31.03.25 20:50, Pedro Nunes Leal wrote: > > Hello everyone, > > > > Currently, I'm trying to fix this bug: > > https://github.com/apache/airflow/issues/3 > > > > Basically, the issue is that the DAGs would be stuck on running even > > though they were paused. > > Consequently, the duration of the dag run will keep on increasing even > > though the DAG is paused. > > > > My proposal to solve this problem is changing the DAGs state from > > running to failed, when paused, to avoid the increment of their duration. > > > > Since this can be an impactful change, I would like to hear what > > others think about it. > > > > Link for the Pull Request: https://github.com/apache/airflow/pull/47557 > > > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > For additional commands, e-mail: dev-h...@airflow.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >