Thank you for the input Colin - noted.
Sent from my iPhone
> On Sep 20, 2023, at 4:53 PM, Collin McNulty
> wrote:
>
> I want to concur with Daniel and Damian, that the key behavior for SLA
> should be based on when a DAG or task _should_ have run, not when it
> actually started. I think
I want to concur with Daniel and Damian, that the key behavior for SLA
should be based on when a DAG or task _should_ have run, not when it
actually started. I think that’s important from a semantic standpoint, but
also I think it’s just an important behavior to have in Airflow, because
that’s how
would therefore agree that to many it would be
> unintuitive if the behavior is 2 but it is called SLA.
>
> Damian
>
> -Original Message-
> From: Daniel Standish
> Sent: Wednesday, September 20, 2023 11:29 AM
> To: dev@airflow.apache.org
> Subject: Re: [DISCUSS] Mech
agree that to many it would be unintuitive
if the behavior is 2 but it is called SLA.
Damian
-Original Message-
From: Daniel Standish
Sent: Wednesday, September 20, 2023 11:29 AM
To: dev@airflow.apache.org
Subject: Re: [DISCUSS] Mechanism of SLA
I don't think of it as really a question
I don't think of it as really a question about accurate record keeping but
more a question of what an SLA is, i.e. when do you want the warning, or
what do you want the warning based on. I think that the idea has been that
it really means, "if task not done by X time each day then warn". And the
mohammadirfanmemon687...@gmail.com
On Sep 19, 2023 10:19 AM, "Daniel Standish"
wrote:
> I was able to chat with a couple folks about this. Small sample, but the
> sentiment was, "this is just a timeout". In other words, if we're going to
> call this SLA, we really ought to evaluate against the
Hi Daniel,
Thank you for following up with the assessment. That’s an incredibly valuable
data point.
I know we may have some opportunity to talk about this topic more at the summit
this week, but just for the sake of offering a reference of the other
perspective, I would like to share this
I was able to chat with a couple folks about this. Small sample, but the
sentiment was, "this is just a timeout". In other words, if we're going to
call this SLA, we really ought to evaluate against the "this thing should
have run by" time and not the actual start time. And, ideally, we should
No problem! I very much appreciate your questions and critical thought
process as well. It's been pretty difficult for me to fully understand how
the SLA feature worked, given how overloaded and complicated the logic is
in its current state. So it really helps to have another invested party
OK so one difference here is, you're adding a new DAG SLA concept. Which
is useful. One subtle difference from what I think is the existing
"concept" of SLA is that you are evaluating it against when it started, as
opposed to when it should have started, and evaluating it only in the
course of
First of all, thanks for being so charitable in engaging in this dialogue,
I appreciate it.
Yeah I think that the notion that maybe Airflow is making really
impractical promises with SLA, well that could be true.
One question for you, as I continue to let this percolate.
Can you help me
Hi Daniel,
These are all really great points, and I'm going to attempt at answering
all of them in no particular order:
On Expectations / SLAs / Naming:
I think you hit the nail on the head here, and I agree with you that the
naming choice of SLA is very misleading. To my understanding,
>
> [1] Yes, that's correct. I believe that containing the SLA evaluation
> within the lifetime of a task as a duration-based sla will still have a
> purpose. It's technically implemented like an execution_timeout, but the
> goal of the SLA check is to execute a callback without killing the task:
Hi Daniel,
Thank you for the review! I'm happy to keep having the discussion to make
sure we can introduce the right way of implementing these solutions into
Airflow.
My general impression from the community in the discussions so far led me
to believe that deprecating the problematic feature
Some questions for you Sung.
I tried looking to understand why we needed to remove behavior 3 discussed
in AIP:
*[remove]* Task-level SLA measured from DAG-run scheduled start time
I'm just a little concerned that removing this would be a mistake because,
in my mind, part of the essence of
I really like the proposal as it is now. I think it is generally ready to
be put up to vote (and implement).
I think it has a chance to finally get our SLA feature straightened out.
J.
On Sat, Jul 8, 2023 at 12:00 AM Sung Yun wrote:
> Thank you for the clarification Jarek :)
>
> I’ve updated
Thank you for the clarification Jarek :)
I’ve updated the AIP on the Confluence page with your suggestion - please let
me know what you folks think!
In summary, I think it will serve as a great way to maintain some capacity to
measure a soft-timeout within a task. Obvious pros of this approach
> Which forking strategy are we exactly proposing?
The important part is that you have a separate process that will run a
separate Python interpreter so that if the task runs a "C" code without a
loop, the "timer" thread will be able to stop it regardless (for timeout)
and one that can run
Hi Jarek, I've been mulling over the implementation of (3) task:
time_limit_sla, and I have some follow up questions about the
implementation.
Which forking strategy are we exactly proposing? Currently, we invoke
task.execute_callable within the taskinstance, which we can effectively
think of as
This task_sla is more and more making me think of a ‘task’ on its own. It
would need to be run in parallel, non blocking, not overlap between each
other, etc…
How hard would it be to spawn them when a task run with SLA configured as a
normal workload on the worker ?
Maybe on a dedicated queue /
Thank you all for your continued engagement and input! It looks like
Iaroslav's layout of 3 different labels of SLA's is helping us group the
implementation into different categories, so I will organize my own
responses in those logical groupings as well.
1. dag_sla
2. task_sla
3. task:
>
> This can be IMHO implemented on the task level. We currently have timeout
> implemented this way - whenever we start the task, we can have a signal
> handler registered with "real" time registered that will cancel the task.
> But I can imagine similar approach with signal and propagate the
>
I want to say that airflow is a very popular project and the ways of
calculating SLA are different. Because of different business cases. And if
it's possible we should make most of them from the box.
вс, 18 июн. 2023 г. в 13:30, Iaroslav Poskriakov <
yaroslavposkrya...@gmail.com>:
> So, I
So, I totally agree about dag level slas. It's very important to have it
and according to Sung Yun proposal it should be implemented not on the
scheduler job level.
Regarding the second way of determining SLA: -->
--> .
It's very helpful in the way when we want to achieve not technical SLA
I am also for DAG level SLA only (but maybe there are some twists).
And I hope (since Sung Yun has not given up on that) - maybe that is the
right time that others here will chime in and maybe it will let the vote go
on? I think it would be great to get the SLA feature sorted out so that we
have
Hello!
Thank you very much for the feedback on the proposal. I’ve been hoping to get
some more traction on this proposal, so it’s great to hear from another user of
the feature.
I understand that there’s a lot of support for keeping a native task level SLA
feature, and I definitely agree with
Mechanism of SLA
Hi, I read the previous conversation regarding SLA and I think removing the
opportunity to set sla for the task level will be a big mistake.
So, the proposed implementation of the task level SLA will not be working
correctly.
That's why I guess we have to think about the
27 matches
Mail list logo