Re: Scheduler High Availability & performances

Jarek Potiuk Fri, 19 Nov 2021 08:48:54 -0800

It really depends on the scenario.

I believe it is impossible to get a straight answer from anyone,
simply because there isn't a "general" answer to that. You really have
to test it and see what works for you.

You can find for example this blog post describing how it behaved in
some scenarios: https://www.astronomer.io/blog/airflow-2-scheduler
And you can see this doc about fine-tuning the performance:
https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html#fine-tuning-your-scheduler-performance

As you will see in the docs - there are so many variables (starting
from filesystem, database choice, database performance and
optimisation, latency -  ending at the structure of your DAGs, how
many dag files vs. DAG you have but most of all the way they are
written and optimised) that the answer is pretty much always "it
depends".

You need to look at your case, verify, experiment and see what works
best for your case, identify bottlenecks you have,  look at the "fine
tuning" doc and knowing what your bottleneck you can fine-tune your
performance (but it will always be "your deployment" characteristics
which can be different from "others")  - the process of fine-tuning is
explained in the doc I linked.

J..

On Fri, Nov 19, 2021 at 5:27 PM Nicolas Paris <[email protected]> wrote:
>
> hi
>
> the HA RFC[1] had two objectives:
>
> 1. HA
> 2. performances scalability
>
>
> Can anyone confirm that adding multiple schedulers can improve
> performances in the case people having HUGE number of dags (> 500) ?
>
>
> Our finding is that increasing the ressource on a standalone scheduler
> performs better than adding scaling them horizontally (partly due to
> database impact of locks)
>
> Thanks
>
>
> [1]: 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651

Re: Scheduler High Availability & performances

Reply via email to